Skip to main content

Tizra gets faster

Non-technical summary: things are lots faster at Tizra sites and admin tools. There's certainly more to do, but we've got more tricks up our sleeves! Because the big current speed boost is related to one cause, and it took me a while to track down, the geek appendage to this post describes what we found and how we fixed it.

Geekly details

I spent a bunch of time last week looking at system performance. As we've been adding customers and usage, we were beginning to feel the pinch. Performance always varies, but the range of response times was getting wider as things slowed, leading me to think that there might be some systemic issues that would give us a quick improvement (and indeed there was some Linux tuning that helped a bit). But data access seemed to be the real issue, so I spent a bunch of time looking into hibernate, and our caching and querying, and then wound up spending a day or so basically watching all the queries go through Postgres. And you know what? most of them seemed much slower than they should be, even though they are pretty hairy.

Of course, the next step was to check for database indexes, and how the query plans were using them. But in hand testing the plans looked good, and the indexes were sensible. But when run by hand the queries were also significantly faster than when hibernate ran them! This was much easier to see now that we have a live load, which is inevitably different from a test setup. So why the difference? Postgres was ignoring our indexes only when Tizra publisher made the queries.

Turns out that there's an old bug in Postgres where it would ignore indexes on bigint fields in prepared statements unless there was an explicit data type cast. (That type confusion was an obscure result of skew between Postgresql and the SQL standard.) And that was the behavior I was seeing, even though we were using a much more recent vintage of all the software. This was terrible for us, because we have a multi-tenant publishing system for large document collections and we use bigints as primary object identifiers!

So, why the old problem if the bug is gone, and we are not using postgres 7? It turns out that we dynamically build those hairy queries, in HQL (hibernate query language), using the String trick. But nowadays instead of making your indexes work, it breaks them! The differences are invisible in the SQL. It turned out that we were in a version "donut hole." Our database was recent enough so the String trick worked the opposite way (preventing fast queries for our prepared statements), but the JDBC driver wasn't making the calls in the right way to make the old trick work. End result: we're now running the latest JDBC driver with compatibility options set while we update our hairy query generator. And now we can really start tuning our setup!

If the web had not provided the history of the old bug, I would have had a much worse time even knowing where to look to find our somewhat subtle configuration issue. So enjoy the speedup, I sure am!

Comments

Anonymous said…
This is pretty wierd, any chance you can post the exact version numbers of Postgres, Hibernate, and JDBC involved?

Popular posts from this blog

Stanford's HighWire Press Picks Tizra

We're thrilled to announce a new partnership with Stanford University's HighWire Press.  It's exciting not only as an opportunity to work side-by-side with a longtime leader in online publishing, but also as validation of the robustness and flexibility we have worked so hard to build into Tizra.  HighWire has been serving up some of the most prestigious online journals in the world since 1995, and they are extremely selective about the technology they offer their customers. But the real proof of the collaboration's value is the response from the marketplace, with organizations including Project MUSE (Johns Hopkins University Press), Duke University Press , and GeoScienceWorld already signed on in advance of the product's launch.  Clearly, the increased discoverability, ease of use and agility resulting from the collaboration are what publishers—and readers—are looking for. Further details on the partnership are in the news release below.  A PDF version is ...

Leave Web Enough Alone!

Jeremy Zawodny is rightly torqued about the needless complication of tools that purport to help with information sharing. The web's always had that pretty well covered, thanks to the simple magic of the URL. Anything you find, you can bookmark, email, or with a tinyurl , disseminate on a cocktail napkin. If my dear grandfather had been born later, he probably would never have picked up the habit of mailing articles lovingly clipped with a pen knife, and instead would have referred me to his del.icio.us feed. Zawodny points to a bizarre assortment of pop-ups, forms, and other unwelcome surprises that result from the "helpful" new sharing features, and notes... they seem to be placed on the sites under the assumption that I'm too stupid to send email (to the people I presumably email frequently already) with a URL in it... Thanks for the confidence boost. At Tizra, we're more inclined to say thanks to for the opportunity to do better. Our AgilePDF™ , for exampl...

Context is King!

John Blossom's post on traditional portal strategies resonated with my recent thinking about aggregation sites ( Shorelines: portals Passe ). I made his post into a silly slogan for my subject line, but he is making a good case that even in the "piling things up" business, there are potential problems with actually piling them up. Reading it, for a minute, I had a pang about Tizra. You might be able to read it as saying that it's not worth building your own content collection at all, but I don't think that is the practical point for publishers. I think that the notion of stressing context and tuning product offerings to user groups is exactly what we enable with our product and content management tools. You need to have a branded presentation of your content to all your different audiences, and make every audience an offer that they want to buy. That takes a lot of flexibility, which is what we've concentrated on. That flexibility should be on tap, not the en...