xml4h is an MIT licensed library for Python to make it easier to work with XML.
This library exists because Python is awesome, XML is everywhere, and combining the two should be a pleasure but often is not. With xml4h, it can be easy.
As of version 1.0 xml4h supports Python versions 2.7 and 3.5+.
xml4h is a simplification layer over existing Python XML processing libraries such as lxml, ElementTree and the minidom. It provides:
The xml4h abstraction layer also offers some other benefits, beyond a nice API and tool set:
Install xml4h with pip::
$ pip install xml4h
Or install the tarball manually with::
$ python setup.py install
With xml4h you can easily parse XML files and access their data.
Let's start with an example XML document::
$ cat tests/data/monty_python_films.xml
<MontyPythonFilms source="http://en.wikipedia.org/wiki/Monty_Python">
<Film year="1971">
<Title>And Now for Something Completely Different</Title>
<Description>
A collection of sketches from the first and second TV series of
Monty Python's Flying Circus purposely re-enacted and shot for film.
</Description>
</Film>
<Film year="1974">
<Title>Monty Python and the Holy Grail</Title>
<Description>
King Arthur and his knights embark on a low-budget search for
the Holy Grail, encountering humorous obstacles along the way.
Some of these turned into standalone sketches.
</Description>
</Film>
<Film year="1979">
<Title>Monty Python's Life of Brian</Title>
<Description>
Brian is born on the first Christmas, in the stable next to
Jesus'. He spends his life being mistaken for a messiah.
</Description>
</Film>
<... more Film elements here ...>
</MontyPythonFilms>
With xml4h you can parse the XML file and use "magical" element and attribute lookups to read data::
>>> import xml4h
>>> doc = xml4h.parse('tests/data/monty_python_films.xml')
>>> for film in doc.MontyPythonFilms.Film[:3]:
... print(film['year'] + ' : ' + film.Title.text)
1971 : And Now for Something Completely Different
1974 : Monty Python and the Holy Grail
1979 : Monty Python's Life of Brian
You can also use more explicit (non-magical) methods to traverse the DOM::
>>> for film in doc.child('MontyPythonFilms').children('Film')[:3]:
... print(film.attributes['year'] + ' : ' + film.children.first.text)
1971 : And Now for Something Completely Different
1974 : Monty Python and the Holy Grail
1979 : Monty Python's Life of Brian
The xml4h builder makes programmatic document creation simple, with a method-chaining feature that allows for expressive but sparse code that mirrors the document itself. Here is the code to build part of the above XML document::
>>> b = (xml4h.build('MontyPythonFilms')
... .attributes({'source': 'http://en.wikipedia.org/wiki/Monty_Python'})
... .element('Film')
... .attributes({'year': 1971})
... .element('Title')
... .text('And Now for Something Completely Different')
... .up()
... .elem('Description').t(
... "A collection of sketches from the first and second TV"
... " series of Monty Python's Flying Circus purposely"
... " re-enacted and shot for film."
... ).up()
... .up()
... )
>>> # A builder object can be re-used, and has short method aliases
>>> b = (b.e('Film')
... .attrs(year=1974)
... .e('Title').t('Monty Python and the Holy Grail').up()
... .e('Description').t(
... "King Arthur and his knights embark on a low-budget search"
... " for the Holy Grail, encountering humorous obstacles along"
... " the way. Some of these turned into standalone sketches."
... ).up()
... .up()
... )
Pretty-print your XML document with xml4h's writer implementation with methods to write content to a stream or get the content as text with flexible formatting options::
>>> print(b.xml_doc(indent=4, newline=True)) # doctest: +ELLIPSIS
<?xml version="1.0" encoding="utf-8"?>
<MontyPythonFilms source="http://en.wikipedia.org/wiki/Monty_Python">
<Film year="1971">
<Title>And Now for Something Completely Different</Title>
<Description>A collection of sketches from ...</Description>
</Film>
<Film year="1974">
<Title>Monty Python and the Holy Grail</Title>
<Description>King Arthur and his knights embark ...</Description>
</Film>
</MontyPythonFilms>
<BLANKLINE>
Python has three popular libraries for working with XML, none of which are particularly easy to use:
xml.dom.minidom <https://docs.python.org/3/library/xml.dom.minidom.html>
_xml.etree.ElementTree <http://docs.python.org/3/library/xml.etree.elementtree.html>
_lxml <http://lxml.de/>
_ is a fast, full-featured XML library with an APIGiven these three options it can be difficult to choose which library to use, especially if you're new to XML processing in Python and haven't already used (struggled with) any of them.
In the past your best bet would have been to go with lxml for the most flexibility, even though it might be overkill, because at least then you wouldn't have to rewrite your code if you later find you need XPath support or powerful DOM traversal methods.
This is where xml4h comes in. It provides an abstraction layer over the existing XML libraries, taking advantage of their power while offering an improved API and tool set.
Currently xml4h includes adapter implementations for three of the main XML processing Python libraries.
If you have lxml available (highly recommended) it will use that, otherwise it will fall back to use the (c)ElementTree then the minidom libraries.
1.0 ...
up()
method to accept and distinguish between a countxml()
and xml_doc()
methods to document builder to more easilywrite()
and write_doc()
methods no longer send output tosys.stdout
by default. The user must explicitly provide a target writerxmlns
attribute defining0.2.0 .....
0.1.0 .....