Skip to main content

The XML Paradox

I have been working on my tutorial for the O'Reilly Tools of Change conference. I'm presenting PDF as a cost-effective option to create revenue from the the backlist as an alternative to XML. As a dedicated markup advocate from the days of SGML, and someone who helped simplify SGML down to XML, I still find it odd to be talking about other kinds of solutions, but I think I learned something from my custom web site customers... The XML Paradox is that XML is a high-quality archival medium, and obviously then, books and scholarly content would make the jump first. It just makes sense that everyone would use the high-value format for the longest-lived, highest value content. Wrong! The economics of publishing have played out the opposite way. The more ephemeral the content, the faster production methods can change. So newspapers were doing full-text databases from very early on. In the scholarly markets, journals are now almost all electronic. Books, however, are only starting to move fitfully in the XML direction, and are mostly not digital at all. So the least archivable stuff, moves to the best archival format fastest — because serial content does not have a legacy that needs conversion to make a new channel profitable, so the payoff from a production change can be pretty fast. A publisher with a rich backfile has items that can earn for 20 years or more — as long as costs can be controlled. So any change to the book production process has to pay off immediately on new books. And for any large-scale change across a publisher's line to be successful, it must be very cheap for old books. And that's where e-books stand, revenue unearned because there's not a clear path to get it. XML is great, and enables the production of an optimized presentation for a new media format, but it's not cheap at all. It's an expensive and tricky management challenge to change editorial production processes for new content, and data-conversion costs for old content are very high. Once the data is in hand, the development cost to create a new output format (print, web, handheld, or whatever) is not cheap either. Problems like typesetting, layout and display all have to be solved anew for each output format. It takes work to optimize presentation, especially from the level of abstraction gives good XML that power. So page images (and especially PDF) get a big boost from the XML paradox because they capture a lot of the production value of the existing process and they're the cheapest searchable format to produce from paper. So here I am, a guy who courted his wife over conversations about markup, working with page images. We are managing them with very rich metadata at a fine level, to capture much of the commercial benefit of XML, but still, I'm enabling something I used to rail against. And it's not easy to make page images work over the web, let publishers control the presentation, and still be good to readers. In this discussion I am leaving out the small number of crown-jewel properties that earn large amounts quickly in a new channel, and thus merit technology investment — Projects like that are important, but don't shift the business as a whole. And their emphasis on frequent updates makes them similar to serials in the need for continuous editorial management. Coming soon: I used to think that page scanning projects were a waste of money in terms of long-term investment, and I hope to post soon about why I no longer believe that either.

Comments

Popular posts from this blog

Using XML to Create a Better Online Reading Experience for the American Payroll Association

Congrats to the American Payroll Association on their recent launch of XML-based publications on Tizra!  Thanks to this collaboration, APA's authoritative books for payroll professionals are now available in crisp, reflowable HTML, creating a user experience that feels like a truly digital native product, rather than a conversion from print. XML-based publishing also creates a better mobile reading experience, supports more precise search and navigation, and opens the door to better accessibility for users with low vision and other disabilities. Our partners at  Scribe  did a great job supporting APA through the process of producing the XML for loading into Tizra, and we’d definitely recommend them to anyone interested in such a transition. It’s hard to overstate what a big step forward this is for Tizra as a platform and a company. XML has long been planned for in the product's architecture, but now for the first time, we have a working example that demonstrates t...

Tools of Change 2013

Tizra is thrilled to once again be exhibiting at the Tools of Change for Publishers Conference, February 12-14, 2013.  Please  visit us at booth TT 7 . Also, feel free to take advantage of our exhibitor discount, by  registering  with this Discount Code:  TIZRA25. ToC has always been a great event for Tizra. We were there the first time the event was held in 2007.  And just a couple of months ago, we were recognized by conference chair Joe Wikert as a tool that empowers publishers to reach readers directly. We love being surrounded by incredibly smart people who share our passion for the world of digital publishing. And what a world it's become! The population of e-book readers is growing. In the past year, the number of those who read e-books increased from 16% to 23% . And this move toward e-book reading coincides with an increase in ownership of tablets and other electronic book reading devices, growing from 18% to 33% . Digital publishing ...

Digital Einstein Papers Launches on Tizra

Launching today, THE DIGITAL EINSTEIN PAPERS is a publicly available website of the collected and translated papers of Albert Einstein that allows readers to explore the writings of the world’s most famous scientist as never before. FOR IMMEDIATE RELEASE [ PDF Version ] Princeton, NJ – December 5, 2014 – Princeton University Press, in partnership with Tizra, Hebrew University of Jerusalem, and California Institute of Technology, announces the launch of THE DIGITAL EINSTEIN PAPERS ( http://einsteinpapers.press.princeton.edu ). This unique, authoritative resource provides full public access to the translated and annotated writings of the most influential scientist of the twentieth century: Albert Einstein. “Princeton University Press has a long history of publishing books by and about Albert Einstein, including the incredible work found in The Collected Papers of Albert Einstein,” said Peter Dougherty, director of Princeton University Press. “We are delighted to make these t...