Fairy-Wing Wrapup: Nokogiri Performance Wednesday, May 18, 2011

TL;DR

  • Nokogiri’s DOM parser was extremely way faster than either the SAX or Reader parsers, in this particular real-world example.
  • ActiveSupport Hash#from_xml, I am dissapoint.
  • On JRuby, Nokogiri 1.5.0 is extremely way faster than Nokogiri 1.4.4, in this particular real-world example.

Artists Pre-enactment

(Shout-out to @jonathanpberger for the Artist’s Pre-enactment of Paul Dix wearing the fairy wings.)

Previously, on the Fairy Wing Throwdown …

So, you might remember that a few months back, @pauldix bet me that JSON parsing is an order of magnitude faster than XML parsing. (If you’re not in the loop, you can read the dramatization of the bet).

TL;DR, Paul lost that bet, and so will be wearing my daughter’s dress-up fairy wings during his RailsConf 2011 talk on Redis on Thursday. Awesome!

You can view the winning benchmark here.

I want to go to there.

The bet revolved around a real-world use case (Paul and I both work at Benchmark Solutions, a stealth financial market data startup in NYC).

You can view the data structure at the Offical Fairy-Wing Throwdown Repo™, https://github.com/flavorjones/fairy-wing-throwdown, but the summary is that it’s 54K when serialized as JSON, and is comprised (mostly) of an array of key-value stores (i.e., hashes).

Because I wanted to not just win, but to destroy Paul, I implemented the same parsing task using Nokogiri’s DOM parser, SAX parser, and Reader parser, expecting that code complexity and performance would correlate, somehow. In my mind, the graph looked like this:

Expected complexity and performance

But I was shocked and dismayed to see the real results:

Reality bites

What the WHAT?

Yes, that’s right. My payback for increasing the complexity of the code was a reduction in performance. The DOM parser was extremely way faster than either the Reader or SAX parsers.

Let me say that again: the DOM parser implementation was compellingly faster (1.3x) than the SAX parser implementation.

Why would that be? Good question, which I’ll deep-dive into in my next post. But suffice to say, the SAX parser is bottlenecked on making lots of callbacks from C into Ruby-land.

ActiveSupport, I am dissapoint.

Another big wow for me was how slow ActiveSupport’s Hash#from_xml method is. The benchmark shows that it’s about 40 times slower than the partial implementation using Nokogiri’s DOM parser.

Somebody should work on that! It wouldn’t be tough to hack an alternative implementation of Hash#from_xml on top of Nokogiri. If anybody’s looking for an interesting project, there it is.

You can be my @yokolet

Here’s a chart of how the DOM parser implementation works on various platforms:

DOM parser on various platforms

Holy cow! The pure-Java implementation on Nokogiri 1.5.0.beta.4 is 4 times faster than the FFI-to-C implementation on Nokogiri 1.4.4 (28s vs 117s). That’s crazytown!

Thanks to everyone who’s committed to the pure-Java code, notably @headius, @yokolet, @pmahoney and @serabe.

Chart Notes

The “expected performance” line chart is in imaginary units.

The “actual performance” line chart renders performance in number of records processed per second, so bigger is better. The Saikuro and Flog scores were normalized on their values for #transform_via_dom.

The “DOM parser on various platforms” bar chart renders total benchmark runtime, so smaller is better.

JSON vs XML: The Fairy-Wing Throwdown Thursday, March 31, 2011

TL;DR

  1. Is XML parsing more than an order of magnitude (i.e., 10x) slower than JSON parsing in real world situations?
  2. Both @pauldix and @flavorjones think XML parsing is slower than than JSON parsing.
  3. @pauldix says XML parsing is more than an order of magnitude slower than JSON parsing.
  4. @flavorjones says XML parsing is less than an order of magnitude slower than JSON parsing.
  5. The loser must wear @flavorjones’s daughter’s dress-up fairy wings on stage throughout @pauldix’s RailsConf 2011 presentation.
  6. Benchmarks must be performed by close of business Friday, April 1. (No, this is not an April Fool's joke.)

And man, I hope Confreaks is filming it.

The Fairy-Wing Throwdown

or

JSON v XML

or

How much slower, exactly, is XML in the real world?

(A One Act Drama)

Dramatis Personae

Act I, Scene I

Cast is gathered together, drinking beverages, nerding.

John: Hark! My love for Scala knows no bounds. Also, Ruby is Not Half Bad.

Mike: Rememberest thou when we first met? You had time and love for naught but Java and its dear-lov’d cousin, strong static typing.

Chorus: And don’t forget XML!

Paul: Ha ha! Java doth go nowhere without bountiful XML following it around like a little puppy.

Mike: Ha ha! And aided by Spring’s alchemy you wrote Java in XML!

Chorus: Ha ha!

John: A cold and drowsy humour to hear you mock XML so.

Chorus: Why dost thou wring thy hands?

John: Because Nokogiri hath been brought forth from his loins, and he hath intimate knowledge of XML.

Mike: Aye, I know it well, and thus my disaffection has measure and reason. In particular, namespaces are really quite broken.

Paul: Plus, it’s SO SLOW.

John: Gentle Dix, put thy rapier up.

Paul: I do protest I never injur’d thee! I would wager that XML has got to be an order of magnitude slower than JSON, at least!

(Pause.)

Mike: (to John) Forbear this outrage? For shame.

(John shrugs.)

Mike: Knowest I libxml2 so well, it is mos def dishonorably slow. But an order of magnitude? I will take that bet.

Paul: I am not affrighted, nor have I need for your money.

Mike: Then … let’s make it … interesting.

Exeunt omnes.

Tentative Conditions

  1. Benchmarks must be performed on Ruby 1.8.7 with any standard compiled extensions / gems.
  2. Objective is a specific data structure actually used by these Gentlemen at their place of business, Benchmark Solutions.
  3. Code must take a string (JSON or XML), and return an inflated Ruby data structure exactly matching the objective.
  4. Timing should encompass only in-memory operations (not IO).
  5. @jvshahid will be the arbiter of whether the implementations violate the spirit of “real worldiness”.