Skip to main content

The XML Paradox

I have been working on my tutorial for the O'Reilly Tools of Change conference. I'm presenting PDF as a cost-effective option to create revenue from the the backlist as an alternative to XML. As a dedicated markup advocate from the days of SGML, and someone who helped simplify SGML down to XML, I still find it odd to be talking about other kinds of solutions, but I think I learned something from my custom web site customers... The XML Paradox is that XML is a high-quality archival medium, and obviously then, books and scholarly content would make the jump first. It just makes sense that everyone would use the high-value format for the longest-lived, highest value content. Wrong! The economics of publishing have played out the opposite way. The more ephemeral the content, the faster production methods can change. So newspapers were doing full-text databases from very early on. In the scholarly markets, journals are now almost all electronic. Books, however, are only starting to move fitfully in the XML direction, and are mostly not digital at all. So the least archivable stuff, moves to the best archival format fastest — because serial content does not have a legacy that needs conversion to make a new channel profitable, so the payoff from a production change can be pretty fast. A publisher with a rich backfile has items that can earn for 20 years or more — as long as costs can be controlled. So any change to the book production process has to pay off immediately on new books. And for any large-scale change across a publisher's line to be successful, it must be very cheap for old books. And that's where e-books stand, revenue unearned because there's not a clear path to get it. XML is great, and enables the production of an optimized presentation for a new media format, but it's not cheap at all. It's an expensive and tricky management challenge to change editorial production processes for new content, and data-conversion costs for old content are very high. Once the data is in hand, the development cost to create a new output format (print, web, handheld, or whatever) is not cheap either. Problems like typesetting, layout and display all have to be solved anew for each output format. It takes work to optimize presentation, especially from the level of abstraction gives good XML that power. So page images (and especially PDF) get a big boost from the XML paradox because they capture a lot of the production value of the existing process and they're the cheapest searchable format to produce from paper. So here I am, a guy who courted his wife over conversations about markup, working with page images. We are managing them with very rich metadata at a fine level, to capture much of the commercial benefit of XML, but still, I'm enabling something I used to rail against. And it's not easy to make page images work over the web, let publishers control the presentation, and still be good to readers. In this discussion I am leaving out the small number of crown-jewel properties that earn large amounts quickly in a new channel, and thus merit technology investment — Projects like that are important, but don't shift the business as a whole. And their emphasis on frequent updates makes them similar to serials in the need for continuous editorial management. Coming soon: I used to think that page scanning projects were a waste of money in terms of long-term investment, and I hope to post soon about why I no longer believe that either.

Comments

Popular posts from this blog

Optimizing eContent Sales: 5 Strategies for Monetizing Content

Targeted promotions and content bundle upsells are two ideas to test in optimizing sales. This is the third in a series of blog posts based on our webcast,  "10 Factors to Consider when Developing your Digital Publishing Strategy."  You can still watch it in its entirety here: So far we have talked about the importance of understanding your   audience  and whether or not to sell direct . Today, we investigate the various monetization strategies publishers utilize.  The entire publishing industry has been experimenting with pricing and delivery models, from the Netflix-like subscription services offered by Oyster and Scribd, to bundling of ebooks with print editions, to chapter-at-a-time sales and more. Yet no single pricing model has emerged.  What does that tell publishers? It means that you need the flexibility to experiment with your pricing strategy and adapt quickly to market fluctuations and demands. You will need a commerce and delivery...

Slater Invests in Tizra

This is a big one for us. Rhode Island's Slater Technology Fund is betting $500,000 that Tizra will "really open the floodgates for book-based content from thousands of publishers." Their investment caps a year in which we've gone from four people , two dogs and an idea to a company that someone besides us and our friends and families believe will set online publishing on its ear. We even have our very own Forbes article . Thanks to the folks at Slater for being great advisors as well as investors, and to the many friends and family members who preceded them!

Association of Research Libraries Goes Live with Tizra

After extensive internal testing, the Association of Research Libraries has begun offering recent issues of its flagship publication on a public test site hosted by Tizra. The organization announced recently that Research Library Issues is now available in full-text searchable form at… http://publications.arl.org We're thrilled about this, not only because because it's a vote of confidence from a high profile organization, but also because ARL's membership includes some of the most prestigious research institutions in the world (including the libraries of MIT and Indiana University, whose presses are already using Tizra). In addition to greater production efficiency and flexibility, ARL's use of Tizra stems from a desire to provide members with capabilities including… Better full-text search. More targeted references via social software and other links. Better compatibility with web enabled mobile devices like the iPhone. We are proud to count ARL—and RLI reade...