Fetching stuff with the URL Fetch API is simple (especially if one has faith that the source is there and it will deliver inside GAE time limits):
from google.appengine.api import urlfetch from xml.dom import minidom def parse(url): r = urlfetch.fetch(url) if r.status_code == 200: return minidom.parseString(r.content)
As is accessing the resulting DOM with MiniDom. Here the source is an Atom feed:
import time dom = parse(URL) for entry in dom.getElementsByTagName('entry'): try: published = entry.getElementsByTagName('published')[0].firstChild.data published = time.strftime('%a, %d %b', time.strptime(published, '%Y-%m-%dT%H:%M:%SZ')) except IndexError, ValueError: pass …