Publishing in Facebook each post from this blog

Some time ago we published the way for Extracting links from a webpage with Python as a first step for publishing complete blog posts in Facebook. The idea was to prepare the text obtained from an RSS feed in order to publish it in a Facebook page (or in other places). Let us remember that Facebook does not allow (or I didn’t find the way) to include html in the pages’ posts.
We had presented previously in Publishing in Twitter when posting here some related ideas, in that case for Twitter.

Now we are going to use the Facebook API and an unofficial package which implements it in Python, Facebook Python SDK.

We can install it with

fernand0@aqui:~$ sudo pip install facebook-sdk

It will need `BeautifulSoup` and `requests` (and maybe some other modules). If they are not installed in our system, we will get the adequate ‘complaints’. We can install them as usual with pip (or our preferred system).

We need some credentials in order to publish in Facebook. First we have to register our application in Facebook My Apps (button ‘Add a new App’ (there are plenty of tutorials if you need help). We will use the ‘advanced setup’ (registering web applications seems to be easier) and some identifiers will be provided (mainly the OAUTH token; we can find them at Myapps, following the link for our app). We will store this token in ~/.rssFacebook, and it will be later used in our program.
This configuration file is similar to this one

[Facebook]
oauth_access_token:XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

The program is very simple, it can be downloaded from rssToPages.py (link to the version commented here, there have been some further evolutions).

The program starts reading the configuration about the available blogs and we need to choose one. If there were just one no selection would be needed:

config = ConfigParser.ConfigParser()

config.read([os.path.expanduser('~/.rssBlogs')])

print "Configured blogs:"

i=1
for section in config.sections():
        print str(i), ')', section, config.get(section, "rssFeed")
        i = i + 1

if (int(i)>1):
        i = raw_input ('Select one: ')
else:
        i = 1

print "You have chosen ", config.get("Blog"+str(i), "rssFeed")

The configuration file must contain a section for each blog; each one of them will have an RSS feed, the Twitter account and the name of the Facebook page. For this site it would have the following entries:

[Blog1]
rssFeed:https://makingfernand0.wordpress.com/feed/
twitterAc:makingFernand0
pageFB:

Notice that the Facebook account is empty: this blog has not a Facebook page (yet?).
We could have a second blog:

[Blog2]
rssFeed:http://fernand0.github.io/feed.xml
twitterAc:mbpfernand0
pageFB:fernand0.github.io

This configuration file can have yet another field, linksToAvoid that will be used for selecting some links that won’t be shown (I have other blog and in this way I can avoid the categories’ links).

if (config.has_option("Blog"+str(i), "linksToAvoid")):
        linksToAvoid = config.get("Blog"+str(i), "linksToAvoid")
else:
        linksToAvoid = ""

We will read now the last post of the blog and we will extract the text and links in a similar way as seen in Extracting links from a webpage with Python (not shown here).

And now the links we want to avoid:

                print linksToAvoid
                print re.escape(linksToAvoid)
                print str(link['href'])
                print re.search(linksToAvoid, link['href'])
                if ((linksToAvoid =="")
                        or (not re.search(linksToAvoid, link['href']))):
                        link.append(" ["+str(j)+"]")
                        linksTxt = linksTxt + "["+str(j)+"] " + link.contents[0] + "\n"
                        linksTxt = linksTxt + "    " + link['href'] + "\n"
                        j =  j + 1

We then check if the post contains some image. If not, we will not add an image, but Facebook will (it will be the first image that it can find in our page).
We could configure one that would be used in case of need (in case we have not included an image in our post and we do not like the one chosen by Facebook) or we can try to add always to our posts some image.

if len(pageImage) > 0:
        imageLink = (pageImage[0]["src"])
else:
        imageLine = ""

Now we will read the Facebook configuration and we will ask for the list of pages the user manages (remember that we have established the desired one in ~/.rssBlogs):

config.read([os.path.expanduser('~/.rssFacebook')])
oauth_access_token= config.get("Facebook", "oauth_access_token")

graph = facebook.GraphAPI(oauth_access_token)
pages = graph.get_connections("me", "accounts")

We could define more Facebook accounts but I have not tested this feature, so maybe it won’t work as expected (and, of course, there is no way to select one of them).

for i in range(len(pages['data'])):
        if (pages['data'][i]['name'] == pageFB):
                print "Writing in... ", pages['data'][i]['name']
                graph2 = facebook.GraphAPI(pages['data'][i]['access_token'])
                graph2.put_object(pages['data'][i]['id'],
                        "feed", message = theSummary, link=theLink,
                        picture = imageLink,
                        name=theTitle, caption='',
                        description=theSummary.encode('utf-8'))

statusTxt = "Publicado: "+theTitle+" "+theLink

This program has been tested during the last months and the solution seems to be working (maybe you’ll want to check the latest version that will have some bugs corrected).
The most cumbersome part was to get the credentials and register the app (with a ‘fake’ production step; for me it is ‘fake’ because I’m the only user of the app).

This post was published originally (in Spanish) at: Publicar en Facebook las entradas de este sitio.

If you have doubts, comments, ideas… Please comment!

Extracting links from a webpage with Python

Enlaces en Página de Facebook Some time ago we presented a small program that helped us to publish in Twitter Publishing in Twitter when posting here.

Later I started having a look at the Facebook API and doing some tests. I discovered that Facebook does not allow to publish links with their anchor text. It transforms them in links that you can click on but such that they have the own link as text. I wanted to publish in Facebook the whole text (it will not show easily the whole entry, just a small part and a link to click in order to see more; and so on).

It has always called my attention the netiquette in some mailing lists where they add numbers near to the anchor text of links and they write at the end these numbers and the corresponding links. See, for example this Support page.

I decided to follow this path in order to publish in my Facebook pages. In the following I will try to explain some parts of the program for doing this. The code is available at rssToLinks.py (version in this moment, maybe they will be changes later).

There are several ways to extract links: regular expressions, some HTML parser (in our Blogómetro project we used this approach with the Simple SGML parser). Looking for alternatives I found Beautiful Soup, as a fast way to parse a web page and I decided to give it a try.

In order to use it we need some modules. We will publish in Facebook using the RSS feed, so we will also need to include the ‘feedparser’ module.

import feedparser
from bs4 import BeautifulSoup
from bs4 import NavigableString
from bs4 import Tag

Now we can read the RSS feed:

feed = feedparser.parse(url)

for i in range(len(feed.entries)):

And now the magic of BeautifulSoup can start:

soup = BeautifulSoup(feed.entries[i].summary)
links = soup("a")

That is we parse the RSS entry looking for links (“a” tag). We will have the entry in the ‘summary’ part and we are interested in the entry in position ‘i’. It will return the list of HTML elements with that tag.

In some entries we include images, but we do not want them to appear in the text. For this we use ‘isinstance’ in order to check if inside the text there is another HTML tag. We will check the list with the links together with a counter ‘j’ in order to associate the numbers and the links (in the original HTML, we have not modified it yet).

	j = 0
	linksTxt = ""
	for link in links:
		if not isinstance(link.contents[0], Tag):
			# We want to avoid embdeded tags (mainly <img ... )
			link.append(" ["+str(j)+"]")
			linksTxt = linksTxt + "["+str(j)+"] " + link.contents[0] + "\n"
			linksTxt = linksTxt + "    " + link['href'] + "\n"

The content of the link (now we know that it is not an image nor another HTML tag) will be available at `link.contents[0]` (of course, it could be more content but our links tend to be simple).

        linksTxt = linksTxt + "["+str(j)+"] " + link.contents[0] + "\n"

and the links is at `link[‘href’]`.

                linksTxt = linksTxt + "    " + link['href'] + "\n"

Now we need the text of the HTML.

        print soup.get_text()

Sometimes this text can have breaklines, spaces, … We could suppress them. We usually have very simple links, so we are not going to pay attention to this problem.

Now, we can add at the end the links:

        if linksTxt != "":
                print
                print "Links :"
                print linksTxt

Publishing in Twitter when posting here

I don’t think RSS is dead. But we can see how many people is using social networking sites to get their information. For this reason I was publishing the entries of my blogs using services like IFTTT and dlvr.it. They are easy to use and they work pretty well. Nevertheless, one is always wondering if we could prepare our own programs to manage these publications and learn something new on the way.

I started with Facebook publishing but I’m presenting here a program for Twitterpublishing: we only need to publish the title and the link (and, maybe, some introductory text).

I found the project twitter as an starting point. It has implemented an important part of the work. We can install it using pip:

fernand0@here:~$ sudo pip install twitter

It needs `BeautifulSoup` and maybe some other modules. If they are not available in our system we will get the adequate ‘complaints’.

Now we can execute it.
This step is useful in order to do the authentication steps in Twitter and getting the oauth token. Our program will not deal with this part and it will be smaller and more simple.
Not so long ago it was possible to send tweets with just the username and passowrd but Twitter decided to start using a more sofisticated systema based in OAuth.

fernand0@here:~$ twitter

The program launches a browser for authentication and then giving our app the adequate permissions. This generates the tokens and other information needed to interact with Twitter. They will be stored at `~/.twitter_oauth` (in a Unix-like system, I’d be happy to know about other systems) that we will reuse in our own application.

The program is quite simple, it can be downloaded from rssToTwitter.py V.2014-12-07) (link to the commented version, the program has been updated to correct bugs and add features).

We will start reading the configuration:

config = ConfigParser.ConfigParser()

config.read([os.path.expanduser('~/.rssBlogs')])
rssFeed = config.get("Blog1", "rssFeed")
twitterAc = config.get("Blog1", "twitterAc")

This configuration file must contain a section for each blog (this program uses only the configuration for the first one). Each section will contain the RSS feed, the name of the Twitter account and the name of the Facebook account (it can be empty if it won’t be used. For example, for this blog it would be:

[Blog1]
rssFeed:https://makingfernand0.wordpress.com/feed/
twitterAc:MakingFernand0
pageFB:

It also needs the Twitter configuration:

config.read([os.path.expanduser('~/.rssTwitter')])
CONSUMER_KEY = config.get("appKeys", "CONSUMER_KEY")
CONSUMER_SECRET = config.get("appKeys", "CONSUMER_SECRET")
TOKEN_KEY = config.get(twitterAc, "TOKEN_KEY")
TOKEN_SECRET = config.get(twitterAc, "TOKEN_SECRET")

We can use the ones that have been generated before (we can copy them from the app); in my system it is at: `/usr/local/lib/python2.7/dist-packages/twitter/cmdline.py` and the tokens are stored at `~/.twitter_oauth`

The configuration file is as follows:

[appKeys]
CONSUMER_KEY:xxxxxxxxxxxxxxxxxxxxxx
CONSUMER_SECRET:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
[makingfernand0]
TOKEN_KEY:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TOKEN_SECRET:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Notice that you can configure as many Twitter accounts as needed. The name of the second section is the same as the one used in the previous configuration file.

Now we can read the RSS feed in order to extract the required data:

feed = feedparser.parse(rssFeed)

i = 0 # It will publish the last added item

soup = BeautifulSoup(feed.entries[i].title)
theTitle = soup.get_text()
theLink = feed.entries[i].link


For this, we will use `feedparser` in order to download the RSS feed and process it.

We are chosing the first entry (position 0), that will be the last one published. For Twitter we just need the title and the link.
We use BeautifulSoup for processing the title, in order to avoid the tags (para evitar las posibles etiquetas que pueda contener (CSS, HTLL entities, ...) 

And finally, we will build the tweet:


statusTxt = "Publicado: "+theTitle+" "+theLink

We can now proceed to the steps of identification, authentication and publishing:

t = Twitter(
	auth=OAuth(TOKEN_KEY, TOKEN_SECRET, CONSUMER_KEY, CONSUMER_SECRET))

t.statuses.update(status=statusTxt)

This entry was originally published (in Spanish) at: Publicar en Twitter las entradas de este sitio.

Who is in my network?

It can be useful to know which devices are connected to our home network:
you always can assign fixed ips for each device but it is a process than can be
painful (if you are not used to manage these things) and does not scale
well when new devices appear (a frequent thing nowadays).

For this reason I enjoyed very much when I discovered Fing which is a tool for discovering devices in our network (it can be installed on android devices, iOS devices, and desktop computers). I wanted to have it in my latptop (now this work would not be necessary since they have released the tool for several operating systems) and I was looking for a solution.

The suggestion where twofold: nmap and arp should help with this, but I’m not familiar with them. When I found the project WiFinder I decided to try to adapt it for my purposes. I forked the project and started to adapt it.

The result is a small program macfinder.py (link to the commented version, in macfinder.py there can be further evolutions). It should have a better input/output system and I would like to add some features but the main ideas are there.
First of all, code related to the port scannning:

import nmap # import nmap.py

...
nm = nmap.PortScanner() # creates an'instance of nmap.PortScanner

Here the actual instruction for code scanning:

nm.scan(hosts='192.168.1.0/24', arguments='-n -sP -PE -T5')
# executes a ping scan

hosts_list = [(nm[x]['addresses']) for x in nm.all_hosts()]

From the obtained list we will keep the information using as an index the MAC address (which is the part that will remain constant for each device), and including the new discovered devices:

if not ipList.has_key(addresses['mac']):
	ipList[addresses['mac']] = ("", addresses['ipv4'])

The data structure is a hash indexed by the MAC address that contains the IP (than can change at any time) and a name that we will assign to each device (in a similar way as done in Fing).

We are using pickle for persistence
Reading:

ipList=pickle.load(fIP)

Writing:

fIP = open(fileName,"w")
pickle.dump(ipList,fIP)

Finally, I have some doubts about Fing’s inner working: it does not need special privileges (or it should not need them, since the origin is a mobile app). But nmap needs to be run as root for obtaining MAC addresses (the program must be executed with sudo and the user needs to have the adequate permisssions).
Since it is dangerous to have a program running with root privileges, I dediced to try to learn the way to drop them when they were not needed anymore. I found: Dropping Root Permissions In Python and I included the function drop_privileges:

user_name = os.getenv("SUDO_USER")
pwnam = pwd.getpwnam(user_name)

Here we are obtaining the user’s data.
With:

# Try setting the new uid/gid
os.setgid(pwnam.pw_gid)
os.setuid(pwnam.pw_uid)

We are assignig their privileges to the process, and in consequence dropping root privileges.

This has to be done in the program when we do not need these high privileges anymore (that is, in our case, when we do not need nmap anymore).

If you have ideas for improvement, comments, questions…