RubyFlow The Ruby and Rails community linklog

Ruby XML Parsing Benchmarks

I recently benchmarked Hpricot and REXML for XML parsing and found REXML much faster. (Update: Actually, turns out Hpricot is faster.)

Comments

My colleague and I recently benchmarked REXML and libxml and found that they have very similar behavior/performance for small XML docs, but when you get into the large documents, libxml was SIGNIFICANTLY faster. Sorry I don’t have any numbers handy, but if you are processing large docs, libxml is likely going to be your friend.

I was going to add libxml to the benchmark because I have heard it is very fast by it kept crashing irb when I was playing around - not a good sign. Add to that the lack of good documentation (unless my google_fu was just lacking) and I think I’ll stick to REXML at least for now…

I worked on a project last year and we ran through the progressions—REXML to libxml to straight old regular expressions (the data we were extracting was fairly straightforward). They files were somewhat large (5MB-30+MB), and REXML was unbearably slow. libxml was much quicker, but troublesome in a multi-platform environment. If you’re concerned about processing time, libxml is the way to go, really.

Any rough guidelines for what a “large” document is?

@Zach - Luckily this project required parsing many small files on a continual basis but none are large at all.

I wish someone would do a nice writeup on libxml…

2MB+ would be considered a large XML doc in my opinion. If you’ve got bloated markup, maybe 5MB is large.

Anything over a MB of XML or so and libxml takes a dominant lead.

Anyone tried libxml2 + xmlTextReader with up-to-date libxml2, libxslt Binaries?

An XML benchmark overview with large files is XStream: benchmarks.

I posted an update to the article with updated benchmarking code from a commenter. Turns out I wasn’t Hpricoting to the best of my ability - and Hpricot IS faster!

Using the latest libxml-ruby 0.8.0:

        user     system      total        real libxml    0.032000   0.000000   0.032000 (  0.031000) Hpricot   0.640000   0.031000   0.671000 (  0.890000) REXML     1.813000   0.047000   1.860000 (  2.031000)

More info about the new release at my blog:

http://cfis.savagexi.com/articles/2008/07/16/resurrecting-libxml-ruby

Thanks - Charlie

Ordinary women can also own one as there are hundreds of stores that offer designer replica watches uk for less.Some designer watches may also look old, but can be freshened up using leather conditioners.The internet is also a great place to find wholesale fashion bags for less. It houses hundreds of stores that offer a spectacular selection of designer Replica Breitling watches in trendy colors and designs. The biggest challenge that you would have to deal with when shopping is finding authentic bags.

Post a comment

You can use basic HTML markup (e.g. <a>) or Markdown.

As you are not logged in, you will be
directed via GitHub to signup or sign in