RubyFlow : The Ruby Community Blog

Home   Submit   Sign Up   Log In   leaders   Twitter   RSS Feed  
 

Ruby XML Parsing Benchmarks

I recently benchmarked Hpricot and REXML for XML parsing and found REXML much faster. (Update: Actually, turns out Hpricot is faster.)

Comments

My colleague and I recently benchmarked REXML and libxml and found that they have very similar behavior/performance for small XML docs, but when you get into the large documents, libxml was SIGNIFICANTLY faster. Sorry I don't have any numbers handy, but if you are processing large docs, libxml is likely going to be your friend.Benjamin Smith - April 23, 2008 23:10
I was going to add libxml to the benchmark because I have heard it is very fast by it kept crashing irb when I was playing around - not a good sign. Add to that the lack of good documentation (unless my google_fu was just lacking) and I think I'll stick to REXML at least for now...zackchandler - April 23, 2008 23:18
I worked on a project last year and we ran through the progressions—REXML to libxml to straight old regular expressions (the data we were extracting was fairly straightforward). They files were somewhat large (5MB-30+MB), and REXML was unbearably slow. libxml was much quicker, but troublesome in a multi-platform environment. If you're concerned about processing time, libxml is the way to go, really.Zach Holman - April 23, 2008 23:47
Any rough guidelines for what a "large" document is?Michael Buckbee - April 23, 2008 23:48
@Zach - Luckily this project required parsing many small files on a continual basis but none are large at all.

I wish someone would do a nice writeup on libxml...zackchandler - April 24, 2008 00:26
2MB+ would be considered a large XML doc in my opinion. If you've got bloated markup, maybe 5MB is large.

Anything over a MB of XML or so and libxml takes a dominant lead.Benjamin Smith - April 24, 2008 05:38
Anyone tried libxml2 + xmlTextReader
with up-to-date libxml2, libxslt Binaries?

An XML benchmark overview with large files is XStream: benchmarks.Phil Sommer - April 24, 2008 09:43
I posted an update to the article with updated benchmarking code from a commenter. Turns out I wasn't Hpricoting to the best of my ability - and Hpricot IS faster!zackchandler - April 24, 2008 17:47
Is there a Ruby binding for xmlstarlet?

http://bashcurescancer.com/processing-xml-on-the-command-line.htmlNino Varesco - May 29, 2008 08:31
Using the latest libxml-ruby 0.8.0:

user system total real
libxml 0.032000 0.000000 0.032000 ( 0.031000)
Hpricot 0.640000 0.031000 0.671000 ( 0.890000)
REXML 1.813000 0.047000 1.860000 ( 2.031000)

More info about the new release at my blog:

http://cfis.savagexi.com/articles/2008/07/16/resurrecting-libxml-ruby

Thanks - CharlieCharlie Savage - July 16, 2008 19:15
Ordinary women can also own one as there are hundreds of stores that offer designer replica watches uk for less.Some designer watches may also look old, but can be freshened up using leather conditioners.The internet is also a great place to find wholesale fashion bags for less. It houses hundreds of stores that offer a spectacular selection of designer Replica Breitling watches in trendy colors and designs. The biggest challenge that you would have to deal with when shopping is finding authentic bags.jamess - December 22, 2012 04:48

Post a Comment

Comment abilities for non registered users are currently deactivated, pending time to add a proper CAPTCHA to solve the escalating spam problem. Sorry!