Simple Python Twitter rss feed parser

If you want to display your tweets somewhere on your own web page, the easiest way is to use the RSS feed in your Twitter profile page (for example http://twitter.com/teebesz). Of course if you want to parse the @, # and links, you need just a little bit of code.

Here is the Python script I use for this site's Twitter display. You'll need the feedparser library installed (how have you been living without it anyway!)

import datetime
import feedparser
import re
    
def get_twitter(url, limit=3):
    """Takes a twitter rss feed and returns a list of dictionaries, one per
    tweet. Each dictionary contains two attributes:
        - An html ready string with the @, # and links parsed to the correct
        html code
        - A datetime object of the posted date"""

    twitter_entries = []
    for entry in feedparser.parse(url)['entries'][:limit]:

        # convert the given time format to datetime
        posted_datetime = datetime.datetime(
            entry['updated_parsed'][0],
            entry['updated_parsed'][1],
            entry['updated_parsed'][2],
            entry['updated_parsed'][3],
            entry['updated_parsed'][4],
            entry['updated_parsed'][5],
            entry['updated_parsed'][6],
        )
        
        # format the date a bit
        if posted_datetime.year == datetime.datetime.now().year:
            posted = posted_datetime.strftime("%b %d")
        else:
            posted = posted_datetime.strftime("%b %d %y")
        
        # strip the "<username>: " that preceeds all twitter feed entries
        text = re.sub(r'^\w+:\s', '', entry['title'])
        
# parse links
        text = re.sub(
            r"[^\"](http://(\w|\.|/|\?|=|%|&)+)",
            lambda x: "<a href='%s'>%s</a>" % (x.group(), x.group()),
            text)
        
        # parse @tweeter
        text = re.sub(
            r'@(\w+)',
            lambda x: "<a href='http://twitter.com/%s'>%s</a>"\
                 % (x.group()[1:], x.group()),
            text)
        
        # parse #hashtag
        text = re.sub(
            r'#(\w+)',
            lambda x: "<a href='http://twitter.com/search?q=%%23%s'>%s</a>"\
                 % (x.group()[1:], x.group()),
            text)
        
        twitter_entries.append({
            'text': text,
            'posted': posted,
            })
        
    return twitter_entries
4 comments - leave a comment

November 16, 2009 4:36 p.m. by alex

I'm trying to understand the parse links in the above code, but something seems off - that code as copy/pasted gives an error in idle.

November 17, 2009 3:30 p.m. by Teebes

Alex,

What do you mean by 'an error in idle'? Can you post the error that you're getting? feel free to shoot me a mail at teebes at teebes.com if you need help.

- Teebes

November 29, 2009 3:05 a.m. by alex

When I copy pasted the code in the python IDLE editor, it wouldn't execute. I replaced it after a few trials and experimentations with the following:
<pre>text = re.sub(r"\b(http://(\w|\.|/|\?|=|%|&)+)",</pre> that seems to work (although I'm not a re expert).

December 2, 2009 7:04 a.m. by Teebes

Alex,

You were right, that re got a little messed up when I converted the python code to html, I've corrected it in the code above, it should be:

r"[^\"](http://(\w|\.|/|\?|=|%|&)+)"

thanks a lot for pointing that out!

- Teebes

Leave a comment







Twitter

Jul 28 - subprocess.Popen('<what you actually want>', shell=True, stdout=subprocess.PIPE).communicate() <- so disconcerting every time...

Jul 26 - Something really ironic about how often YouTube videos freeze on Chrome

Jul 26 - @henrymyint is this true when you're observing or authoring the failure? :)