Skip to main content

The XML Paradox

I have been working on my tutorial for the O'Reilly Tools of Change conference. I'm presenting PDF as a cost-effective option to create revenue from the the backlist as an alternative to XML. As a dedicated markup advocate from the days of SGML, and someone who helped simplify SGML down to XML, I still find it odd to be talking about other kinds of solutions, but I think I learned something from my custom web site customers... The XML Paradox is that XML is a high-quality archival medium, and obviously then, books and scholarly content would make the jump first. It just makes sense that everyone would use the high-value format for the longest-lived, highest value content. Wrong! The economics of publishing have played out the opposite way. The more ephemeral the content, the faster production methods can change. So newspapers were doing full-text databases from very early on. In the scholarly markets, journals are now almost all electronic. Books, however, are only starting to move fitfully in the XML direction, and are mostly not digital at all. So the least archivable stuff, moves to the best archival format fastest — because serial content does not have a legacy that needs conversion to make a new channel profitable, so the payoff from a production change can be pretty fast. A publisher with a rich backfile has items that can earn for 20 years or more — as long as costs can be controlled. So any change to the book production process has to pay off immediately on new books. And for any large-scale change across a publisher's line to be successful, it must be very cheap for old books. And that's where e-books stand, revenue unearned because there's not a clear path to get it. XML is great, and enables the production of an optimized presentation for a new media format, but it's not cheap at all. It's an expensive and tricky management challenge to change editorial production processes for new content, and data-conversion costs for old content are very high. Once the data is in hand, the development cost to create a new output format (print, web, handheld, or whatever) is not cheap either. Problems like typesetting, layout and display all have to be solved anew for each output format. It takes work to optimize presentation, especially from the level of abstraction gives good XML that power. So page images (and especially PDF) get a big boost from the XML paradox because they capture a lot of the production value of the existing process and they're the cheapest searchable format to produce from paper. So here I am, a guy who courted his wife over conversations about markup, working with page images. We are managing them with very rich metadata at a fine level, to capture much of the commercial benefit of XML, but still, I'm enabling something I used to rail against. And it's not easy to make page images work over the web, let publishers control the presentation, and still be good to readers. In this discussion I am leaving out the small number of crown-jewel properties that earn large amounts quickly in a new channel, and thus merit technology investment — Projects like that are important, but don't shift the business as a whole. And their emphasis on frequent updates makes them similar to serials in the need for continuous editorial management. Coming soon: I used to think that page scanning projects were a waste of money in terms of long-term investment, and I hope to post soon about why I no longer believe that either.

Comments

Popular posts from this blog

Online Publishing for Tough Times

"I didn't invent the rainy day, man. I just own the best umbrella" Almost Famous In an economic climate that led Publishers Weekly to predict 2009 would be "the worst year for publishing in decades," eBook sales are growing at more than 100% a year, according to the International Digital Publishing Forum (IDPF). We think it's time more content owners—both inside and outside the traditional publishing industry—had access to serious online publishing tools that will open up this kind of opportunity. That's why we're rolling out free, self-service signups to Tizra Publisher. This is the same software that MIT Press is using to sell a collection of more than 170 computer science books on its CISnet site , and you can get your hands on it right now… FREE 60 SECOND SIGNUP If you'd like to learn a bit more, here's quick overview of what Tizra Publisher can do ( click the little TV icon at the bottom to see it big! ).

Technical Podcasts

If there is something the web as surely changed, it was the way that software engineers need to work. It is now a crucial aspect of our work to be able draw from the huge internet knowledge base out there in an efficient way to get to the right answers. Part of that information extraction is related to the keeping-up-to-date effort that every developer is required to accomplish to continue to be productive. While previous a software engineer could rely mostly on print material, nowadays we need to rely as well on content available on the net. Podcasts are such a source that can bring an amazing amount of information to the mix of knowledge one needs these days. If you are a software engineer and have not jumped into the podcast wagon yet, I suggest you do so. Here is a list some technical podcasts that we hear at Tizra: The Java Posse : a fantastic podcast on Java development. Containing news info update, analysis of tools, overall software development discussions. Software as She Dev...

Why Books in Browsers? A Closer Look at e-book Publishing Software Choices

What’s an ebook?   For many, the first thought is of a Kindle, Nook or similar device, designed for a single purpose and packed with texts downloaded from a single retailer.  However, it’s easy to forget just how new that notion of ebook publishing software is...or how quickly it’s changing. Vintage Rocket eBook (left) and SoftBook devices show how quickly the basic definition of an ebook can change.  (From the collection of Tizra founder David Durand.) As broadband connections and computing power become more pervasive, the idea of “books in browsers” or “books in the cloud” is gaining traction.  As has already happened in other media with services like Hulu, Netflix and Spotify, book publishers are starting to see the value in streaming content direct to users as they need it, rather than requiring them to download it to their own device before use. The Case for Books in Browsers: Delivering books or content in browsers is ideal for publishers that wan...