2009-01-04

URL Fetch API, MiniDom (Google App Engine)

Fetching stuff with the URL Fetch API is simple (especially if one has faith that the source is there and it will deliver inside GAE time limits):

from google.appengine.api import urlfetch
from xml.dom import minidom

def parse(url):
  r = urlfetch.fetch(url)
  if r.status_code == 200:
    return minidom.parseString(r.content)

As is accessing the resulting DOM with MiniDom. Here the source is an Atom feed:

import time

dom = parse(URL)
for entry in dom.getElementsByTagName('entry'):
  try:
    published = entry.getElementsByTagName('published')[0].firstChild.data
    published = time.strftime('%a, %d %b', time.strptime(published, '%Y-%m-%dT%H:%M:%SZ'))
  except IndexError, ValueError:
    pass
  …