Ruby XML Parsing Benchmarks

April 23, 2008
I recently benchmarked Hpricot and REXML for XML parsing and found REXML much faster. (Update: Actually, turns out Hpricot is faster.)
My colleague and I recently benchmarked REXML and libxml and found that they have very similar behavior/performance for small XML docs, but when you get into the large documents, libxml was SIGNIFICANTLY faster. Sorry I don't have any numbers handy, but if you are processing large docs, libxml is likely going to be your friend.Benjamin Smith - April 23, 2008 23:10
I was going to add libxml to the benchmark because I have heard it is very fast by it kept crashing irb when I was playing around - not a good sign. Add to that the lack of good documentation (unless my google_fu was just lacking) and I think I'll stick to REXML at least for now...zackchandler - April 23, 2008 23:18
I worked on a project last year and we ran through the progressions—REXML to libxml to straight old regular expressions (the data we were extracting was fairly straightforward). They files were somewhat large (5MB-30+MB), and REXML was unbearably slow. libxml was much quicker, but troublesome in a multi-platform environment. If you're concerned about processing time, libxml is the way to go, really.Zach Holman - April 23, 2008 23:47
Any rough guidelines for what a "large" document is?Michael Buckbee - April 23, 2008 23:48
@Zach - Luckily this project required parsing many small files on a continual basis but none are large at all.

I wish someone would do a nice writeup on libxml...zackchandler - April 24, 2008 00:26
2MB+ would be considered a large XML doc in my opinion. If you've got bloated markup, maybe 5MB is large.

Anything over a MB of XML or so and libxml takes a dominant lead.Benjamin Smith - April 24, 2008 05:38
Anyone tried libxml2 + xmlTextReader
with up-to-date libxml2, libxslt Binaries?

An XML benchmark overview with large files is XStream: benchmarks.

Phil Sommer - April 24, 2008 09:43
I posted an update to the article with updated benchmarking code from a commenter. Turns out I wasn't Hpricoting to the best of my ability - and Hpricot IS faster!zackchandler - April 24, 2008 17:47
Is there a Ruby binding for xmlstarlet?

http://bashcurescancer.com/processing-xml-on-the-command-line.html
Nino Varesco - May 29, 2008 08:31
Using the latest libxml-ruby 0.8.0:

user system total real
libxml 0.032000 0.000000 0.032000 ( 0.031000)
Hpricot 0.640000 0.031000 0.671000 ( 0.890000)
REXML 1.813000 0.047000 1.860000 ( 2.031000)

More info about the new release at my blog:

http://cfis.savagexi.com/articles/2008/07/16/resurrecting-libxml-ruby

Thanks - CharlieCharlie Savage - July 16, 2008 19:15

Post a Comment

Note: If you are a registered user, log in to populate these fields.

If you wish, you may use these HTML tags to format your comment:
<a href="" title=""> <b> <blockquote> <code> <em> <i> <strong>

FELCHBY