Wednesday, 14 August 2013

Parsing XML with Python ElementTree with incorrect tags

Parsing XML with Python ElementTree with incorrect tags

I am trying to use Python to parse an XML file to get the title, author,
URL, and summary out of the XML feed. Then I ensure The XML where we are
gathering the data is like this:
<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom"
xmlns:grddl="http://www.w3.org/2003/g/data-view#"
grddl:transformation="2turtle_xslt-1.0.xsl">
<title>Our Site RSS</title>
<link href="http://www.oursite.com" />
<updated>2013-08-14T20:05:08-04:00</updated>
<id>urn:uuid:c60d7202-9a58-46a6-9fca-f804s879f5ebc</id>
<rights>
Original content available for non-commercial use under a Creative
Commons license (Attribution-NonCommercial-NoDerivs 3.0 Unported),
except where noted.
</rights>
<entry>
<title>Headline #1</title>
<author>
<name>John Smith</name>
</author>
<link rel="alternate"
href="http://www.oursite.com/our-slug/" />
<id>1234</id>
<updated>2013-08-13T23:45:43-04:00</updated>
<summary type="html">
Here is a summary of our story
</summary>
</entry>
<entry>
<title>Headline #2</title>
<author>
<name>John Smith</name>
</author>
<link rel="alternate"
href="http://www.oursite.com/our-slug-2/" />
<id>1235</id>
<updated>2013-08-13T23:45:43-04:00</updated>
<summary type="html">
Here is a summary of our second story
</summary>
</entry>
My code is:
import xml.etree.ElementTree as ET
tree = ET.parse('data.xml')
root = tree.getroot()
for child in root:
print child.tag
Instead of the tag being "entry" the tag is
"{http://www.w3.org/2005/Atom}entry" when the Python print child.tag. I
had tried to use:
for entry in root.findall('entry'):
But that doesn't work since the tag for entry includes the w3 url that is
part of the root tag. Also, getting the grandchildren of root shows their
tag as "{http://www.w3.org/2005/Atom}author"
I can't change the XML, but how can I modify it (setting the root just to
) and re-save it or alter my code so that root.findall('entry') works?

No comments:

Post a Comment