Skip to main content

Tizra gets faster

Non-technical summary: things are lots faster at Tizra sites and admin tools. There's certainly more to do, but we've got more tricks up our sleeves! Because the big current speed boost is related to one cause, and it took me a while to track down, the geek appendage to this post describes what we found and how we fixed it.

Geekly details

I spent a bunch of time last week looking at system performance. As we've been adding customers and usage, we were beginning to feel the pinch. Performance always varies, but the range of response times was getting wider as things slowed, leading me to think that there might be some systemic issues that would give us a quick improvement (and indeed there was some Linux tuning that helped a bit). But data access seemed to be the real issue, so I spent a bunch of time looking into hibernate, and our caching and querying, and then wound up spending a day or so basically watching all the queries go through Postgres. And you know what? most of them seemed much slower than they should be, even though they are pretty hairy.

Of course, the next step was to check for database indexes, and how the query plans were using them. But in hand testing the plans looked good, and the indexes were sensible. But when run by hand the queries were also significantly faster than when hibernate ran them! This was much easier to see now that we have a live load, which is inevitably different from a test setup. So why the difference? Postgres was ignoring our indexes only when Tizra publisher made the queries.

Turns out that there's an old bug in Postgres where it would ignore indexes on bigint fields in prepared statements unless there was an explicit data type cast. (That type confusion was an obscure result of skew between Postgresql and the SQL standard.) And that was the behavior I was seeing, even though we were using a much more recent vintage of all the software. This was terrible for us, because we have a multi-tenant publishing system for large document collections and we use bigints as primary object identifiers!

So, why the old problem if the bug is gone, and we are not using postgres 7? It turns out that we dynamically build those hairy queries, in HQL (hibernate query language), using the String trick. But nowadays instead of making your indexes work, it breaks them! The differences are invisible in the SQL. It turned out that we were in a version "donut hole." Our database was recent enough so the String trick worked the opposite way (preventing fast queries for our prepared statements), but the JDBC driver wasn't making the calls in the right way to make the old trick work. End result: we're now running the latest JDBC driver with compatibility options set while we update our hairy query generator. And now we can really start tuning our setup!

If the web had not provided the history of the old bug, I would have had a much worse time even knowing where to look to find our somewhat subtle configuration issue. So enjoy the speedup, I sure am!

Comments

Anonymous said…
This is pretty wierd, any chance you can post the exact version numbers of Postgres, Hibernate, and JDBC involved?

Popular posts from this blog

What Einstein Taught Us About Searching Inside Publications

When the Collected Papers of Albert Einstein went live on Tizra a few years ago, it was a huge step forward. Suddenly, anyone anywhere could search and access the output of one of the 20th Century’s great minds…from love letters to breakthrough articles that changed how we think about the nature of time and space. But the project also showed the limits of traditional tools for searching within large, complex publications. These limits sparked a collaboration with Princeton University Press and Einstein Papers Project editors, which this year resulted in a dynamic new search interface, which we’ll be demonstrating in a  Webcast Friday, December 15 at 1pm ET . The interface not only makes it easier for Einstein researchers to home in on relevant content on both mobile devices and desktops, it points the way toward faster, better searching within a wide range of publication types, from reference books to periodicals, technical documentation and standards to textbooks. Click To Re

Using XML to Create a Better Online Reading Experience for the American Payroll Association

Congrats to the American Payroll Association on their recent launch of XML-based publications on Tizra!  Thanks to this collaboration, APA's authoritative books for payroll professionals are now available in crisp, reflowable HTML, creating a user experience that feels like a truly digital native product, rather than a conversion from print. XML-based publishing also creates a better mobile reading experience, supports more precise search and navigation, and opens the door to better accessibility for users with low vision and other disabilities. Our partners at  Scribe  did a great job supporting APA through the process of producing the XML for loading into Tizra, and we’d definitely recommend them to anyone interested in such a transition. It’s hard to overstate what a big step forward this is for Tizra as a platform and a company. XML has long been planned for in the product's architecture, but now for the first time, we have a working example that demonstrates the pow

Behind the Screens, Pt. 1--Creating a Site With the New Tizra Publisher Control Panel

Now that it's public , we're excited to show the new Tizra Publisher web control panel in a bit more detail. To provide a real-world example, we'll show it in use building the eat.shop guides site for Cabazon Books, which went live a few weeks ago. While in practice the process is quick—with initial online selling capability available in a matter of minutes—there's a lot to the software, so we'll break it up over a few posts. 1. Upload a Document When you open your Tizra Publisher account, you're presented with the control panel homepage in your web browser. The cog dropdown provides quick access to key tasks from anywhere in the system. Start by using it to upload a PDF. In this case, it's the full 132 illustrated spreads for eat.shop nyc. Note how the progress bar informs you as the system imports the file, extracts metadata, breaks the PDF into individual pages, and indexes it for full-text searchability. Apart from the upload, which of cours