Skip to main content

The XML Paradox

I have been working on my tutorial for the O'Reilly Tools of Change conference. I'm presenting PDF as a cost-effective option to create revenue from the the backlist as an alternative to XML. As a dedicated markup advocate from the days of SGML, and someone who helped simplify SGML down to XML, I still find it odd to be talking about other kinds of solutions, but I think I learned something from my custom web site customers... The XML Paradox is that XML is a high-quality archival medium, and obviously then, books and scholarly content would make the jump first. It just makes sense that everyone would use the high-value format for the longest-lived, highest value content. Wrong! The economics of publishing have played out the opposite way. The more ephemeral the content, the faster production methods can change. So newspapers were doing full-text databases from very early on. In the scholarly markets, journals are now almost all electronic. Books, however, are only starting to move fitfully in the XML direction, and are mostly not digital at all. So the least archivable stuff, moves to the best archival format fastest — because serial content does not have a legacy that needs conversion to make a new channel profitable, so the payoff from a production change can be pretty fast. A publisher with a rich backfile has items that can earn for 20 years or more — as long as costs can be controlled. So any change to the book production process has to pay off immediately on new books. And for any large-scale change across a publisher's line to be successful, it must be very cheap for old books. And that's where e-books stand, revenue unearned because there's not a clear path to get it. XML is great, and enables the production of an optimized presentation for a new media format, but it's not cheap at all. It's an expensive and tricky management challenge to change editorial production processes for new content, and data-conversion costs for old content are very high. Once the data is in hand, the development cost to create a new output format (print, web, handheld, or whatever) is not cheap either. Problems like typesetting, layout and display all have to be solved anew for each output format. It takes work to optimize presentation, especially from the level of abstraction gives good XML that power. So page images (and especially PDF) get a big boost from the XML paradox because they capture a lot of the production value of the existing process and they're the cheapest searchable format to produce from paper. So here I am, a guy who courted his wife over conversations about markup, working with page images. We are managing them with very rich metadata at a fine level, to capture much of the commercial benefit of XML, but still, I'm enabling something I used to rail against. And it's not easy to make page images work over the web, let publishers control the presentation, and still be good to readers. In this discussion I am leaving out the small number of crown-jewel properties that earn large amounts quickly in a new channel, and thus merit technology investment — Projects like that are important, but don't shift the business as a whole. And their emphasis on frequent updates makes them similar to serials in the need for continuous editorial management. Coming soon: I used to think that page scanning projects were a waste of money in terms of long-term investment, and I hope to post soon about why I no longer believe that either.

Comments

Popular posts from this blog

Stanford's HighWire Press Picks Tizra

We're thrilled to announce a new partnership with Stanford University's HighWire Press.  It's exciting not only as an opportunity to work side-by-side with a longtime leader in online publishing, but also as validation of the robustness and flexibility we have worked so hard to build into Tizra.  HighWire has been serving up some of the most prestigious online journals in the world since 1995, and they are extremely selective about the technology they offer their customers. But the real proof of the collaboration's value is the response from the marketplace, with organizations including Project MUSE (Johns Hopkins University Press), Duke University Press , and GeoScienceWorld already signed on in advance of the product's launch.  Clearly, the increased discoverability, ease of use and agility resulting from the collaboration are what publishers—and readers—are looking for. Further details on the partnership are in the news release below.  A PDF version is ...

Leave Web Enough Alone!

Jeremy Zawodny is rightly torqued about the needless complication of tools that purport to help with information sharing. The web's always had that pretty well covered, thanks to the simple magic of the URL. Anything you find, you can bookmark, email, or with a tinyurl , disseminate on a cocktail napkin. If my dear grandfather had been born later, he probably would never have picked up the habit of mailing articles lovingly clipped with a pen knife, and instead would have referred me to his del.icio.us feed. Zawodny points to a bizarre assortment of pop-ups, forms, and other unwelcome surprises that result from the "helpful" new sharing features, and notes... they seem to be placed on the sites under the assumption that I'm too stupid to send email (to the people I presumably email frequently already) with a URL in it... Thanks for the confidence boost. At Tizra, we're more inclined to say thanks to for the opportunity to do better. Our AgilePDF™ , for exampl...

Context is King!

John Blossom's post on traditional portal strategies resonated with my recent thinking about aggregation sites ( Shorelines: portals Passe ). I made his post into a silly slogan for my subject line, but he is making a good case that even in the "piling things up" business, there are potential problems with actually piling them up. Reading it, for a minute, I had a pang about Tizra. You might be able to read it as saying that it's not worth building your own content collection at all, but I don't think that is the practical point for publishers. I think that the notion of stressing context and tuning product offerings to user groups is exactly what we enable with our product and content management tools. You need to have a branded presentation of your content to all your different audiences, and make every audience an offer that they want to buy. That takes a lot of flexibility, which is what we've concentrated on. That flexibility should be on tap, not the en...