Skip to main content

Tizra gets faster

Non-technical summary: things are lots faster at Tizra sites and admin tools. There's certainly more to do, but we've got more tricks up our sleeves! Because the big current speed boost is related to one cause, and it took me a while to track down, the geek appendage to this post describes what we found and how we fixed it.

Geekly details

I spent a bunch of time last week looking at system performance. As we've been adding customers and usage, we were beginning to feel the pinch. Performance always varies, but the range of response times was getting wider as things slowed, leading me to think that there might be some systemic issues that would give us a quick improvement (and indeed there was some Linux tuning that helped a bit). But data access seemed to be the real issue, so I spent a bunch of time looking into hibernate, and our caching and querying, and then wound up spending a day or so basically watching all the queries go through Postgres. And you know what? most of them seemed much slower than they should be, even though they are pretty hairy.

Of course, the next step was to check for database indexes, and how the query plans were using them. But in hand testing the plans looked good, and the indexes were sensible. But when run by hand the queries were also significantly faster than when hibernate ran them! This was much easier to see now that we have a live load, which is inevitably different from a test setup. So why the difference? Postgres was ignoring our indexes only when Tizra publisher made the queries.

Turns out that there's an old bug in Postgres where it would ignore indexes on bigint fields in prepared statements unless there was an explicit data type cast. (That type confusion was an obscure result of skew between Postgresql and the SQL standard.) And that was the behavior I was seeing, even though we were using a much more recent vintage of all the software. This was terrible for us, because we have a multi-tenant publishing system for large document collections and we use bigints as primary object identifiers!

So, why the old problem if the bug is gone, and we are not using postgres 7? It turns out that we dynamically build those hairy queries, in HQL (hibernate query language), using the String trick. But nowadays instead of making your indexes work, it breaks them! The differences are invisible in the SQL. It turned out that we were in a version "donut hole." Our database was recent enough so the String trick worked the opposite way (preventing fast queries for our prepared statements), but the JDBC driver wasn't making the calls in the right way to make the old trick work. End result: we're now running the latest JDBC driver with compatibility options set while we update our hairy query generator. And now we can really start tuning our setup!

If the web had not provided the history of the old bug, I would have had a much worse time even knowing where to look to find our somewhat subtle configuration issue. So enjoy the speedup, I sure am!

Comments

Anonymous said…
This is pretty wierd, any chance you can post the exact version numbers of Postgres, Hibernate, and JDBC involved?

Popular posts from this blog

Digital Einstein Papers Launches on Tizra

Launching today, THE DIGITAL EINSTEIN PAPERS is a publicly available website of the collected and translated papers of Albert Einstein that allows readers to explore the writings of the world’s most famous scientist as never before. FOR IMMEDIATE RELEASE [ PDF Version ] Princeton, NJ – December 5, 2014 – Princeton University Press, in partnership with Tizra, Hebrew University of Jerusalem, and California Institute of Technology, announces the launch of THE DIGITAL EINSTEIN PAPERS ( http://einsteinpapers.press.princeton.edu ). This unique, authoritative resource provides full public access to the translated and annotated writings of the most influential scientist of the twentieth century: Albert Einstein. “Princeton University Press has a long history of publishing books by and about Albert Einstein, including the incredible work found in The Collected Papers of Albert Einstein,” said Peter Dougherty, director of Princeton University Press. “We are delighted to make these t

How the American Dental Association Moved Beyond DRM

The American Dental Association has been publishing books, brochures and other materials to help members stay current and manage their practices for more than a century, and now generates about $10 million in sales and licensing revenue from these products. The ADA migrated to Tizra this year after a rigorous RFP process, driven primarily by frustration with their previous digital publication solution, which relied on proprietary digital rights management (DRM) technology. Under the previous digital publication solution, users were required to download special software to view the publications, which caused user complaints, and the DRM solution was unable to handle video and other supplements that went along with the publications, meaning ADA had to rely on cumbersome delivery mechanisms like CD-ROMs and flash drives. Working with Tizra, ADA has not only eliminated these issues, they have gained the flexibility to pursue complex business models like group sales to large practice

AppGap: Tizra more than just a "great tool for content sellers"

Bill Ives has been writing about knowledge management since the days when for most people that meant color coding your files, so we were really pleased when he agreed to evaluate Tizra Publisher in The AppGap , a blog on the future of work. We were even more pleased when he said "I see this service as a great tool for content sellers." But we thought his keenest insight was into applications beyond traditional publishing... [Tizra Publisher] can also be a useful content distribution system for enterprises that need to manage the presentation of their information. This will be especially useful for verticals with a lot of internal content such as legal firms, pharma, and other research oriented enterprises. Ives saw Tizra's combination of easy and yet precisely controlled content distribution as key for these users, and others needing to share marketing and technical information. Read the full review .