My colleague and I recently benchmarked REXML and libxml and found that they have very similar behavior/performance for small XML docs, but when you get into the large documents, libxml was SIGNIFICANTLY faster. Sorry I don't have any numbers handy, but if you are processing large docs, libxml is likely going to be your friend.Benjamin Smith - April 23, 2008 23:10
I was going to add libxml to the benchmark because I have heard it is very fast by it kept crashing irb when I was playing around - not a good sign. Add to that the lack of good documentation (unless my google_fu was just lacking) and I think I'll stick to REXML at least for now...zackchandler - April 23, 2008 23:18
I worked on a project last year and we ran through the progressions—REXML to libxml to straight old regular expressions (the data we were extracting was fairly straightforward). They files were somewhat large (5MB-30+MB), and REXML was unbearably slow. libxml was much quicker, but troublesome in a multi-platform environment. If you're concerned about processing time, libxml is the way to go, really.Zach Holman - April 23, 2008 23:47
Any rough guidelines for what a "large" document is?Michael Buckbee - April 23, 2008 23:48
@Zach - Luckily this project required parsing many small files on a continual basis but none are large at all.
I wish someone would do a nice writeup on libxml...zackchandler - April 24, 2008 00:26
2MB+ would be considered a large XML doc in my opinion. If you've got bloated markup, maybe 5MB is large.
Anything over a MB of XML or so and libxml takes a dominant lead.Benjamin Smith - April 24, 2008 05:38
I posted an update to the article with updated benchmarking code from a commenter. Turns out I wasn't Hpricoting to the best of my ability - and Hpricot IS faster!zackchandler - April 24, 2008 17:47
Is there a Ruby binding for xmlstarlet?
http://bashcurescancer.com/processing-xml-on-the-command-line.html
Nino Varesco - May 29, 2008 08:31
Using the latest libxml-ruby 0.8.0:
user system total real
libxml 0.032000 0.000000 0.032000 ( 0.031000)
Hpricot 0.640000 0.031000 0.671000 ( 0.890000)
REXML 1.813000 0.047000 1.860000 ( 2.031000)
RubyFlow is a community driven Ruby links site. Items are not added automatically, but chosen and summarized by the community, resulting in a higher quality of links than social bookmarking sites. You can start your own RubyFlow site for your own community with this code.
How?
Enjoy the links, comment or make a post of your own. You don't need to be signed in! Only posts made by users with a good track record make it into the feed though, so no spam.