Skip to main content

The XML Paradox

I have been working on my tutorial for the O'Reilly Tools of Change conference. I'm presenting PDF as a cost-effective option to create revenue from the the backlist as an alternative to XML. As a dedicated markup advocate from the days of SGML, and someone who helped simplify SGML down to XML, I still find it odd to be talking about other kinds of solutions, but I think I learned something from my custom web site customers... The XML Paradox is that XML is a high-quality archival medium, and obviously then, books and scholarly content would make the jump first. It just makes sense that everyone would use the high-value format for the longest-lived, highest value content. Wrong! The economics of publishing have played out the opposite way. The more ephemeral the content, the faster production methods can change. So newspapers were doing full-text databases from very early on. In the scholarly markets, journals are now almost all electronic. Books, however, are only starting to move fitfully in the XML direction, and are mostly not digital at all. So the least archivable stuff, moves to the best archival format fastest — because serial content does not have a legacy that needs conversion to make a new channel profitable, so the payoff from a production change can be pretty fast. A publisher with a rich backfile has items that can earn for 20 years or more — as long as costs can be controlled. So any change to the book production process has to pay off immediately on new books. And for any large-scale change across a publisher's line to be successful, it must be very cheap for old books. And that's where e-books stand, revenue unearned because there's not a clear path to get it. XML is great, and enables the production of an optimized presentation for a new media format, but it's not cheap at all. It's an expensive and tricky management challenge to change editorial production processes for new content, and data-conversion costs for old content are very high. Once the data is in hand, the development cost to create a new output format (print, web, handheld, or whatever) is not cheap either. Problems like typesetting, layout and display all have to be solved anew for each output format. It takes work to optimize presentation, especially from the level of abstraction gives good XML that power. So page images (and especially PDF) get a big boost from the XML paradox because they capture a lot of the production value of the existing process and they're the cheapest searchable format to produce from paper. So here I am, a guy who courted his wife over conversations about markup, working with page images. We are managing them with very rich metadata at a fine level, to capture much of the commercial benefit of XML, but still, I'm enabling something I used to rail against. And it's not easy to make page images work over the web, let publishers control the presentation, and still be good to readers. In this discussion I am leaving out the small number of crown-jewel properties that earn large amounts quickly in a new channel, and thus merit technology investment — Projects like that are important, but don't shift the business as a whole. And their emphasis on frequent updates makes them similar to serials in the need for continuous editorial management. Coming soon: I used to think that page scanning projects were a waste of money in terms of long-term investment, and I hope to post soon about why I no longer believe that either.

Comments

Popular posts from this blog

What Einstein Taught Us About Searching Inside Publications

When the Collected Papers of Albert Einstein went live on Tizra a few years ago, it was a huge step forward. Suddenly, anyone anywhere could search and access the output of one of the 20th Century’s great minds…from love letters to breakthrough articles that changed how we think about the nature of time and space. But the project also showed the limits of traditional tools for searching within large, complex publications. These limits sparked a collaboration with Princeton University Press and Einstein Papers Project editors, which this year resulted in a dynamic new search interface, which we’ll be demonstrating in a  Webcast Friday, December 15 at 1pm ET . The interface not only makes it easier for Einstein researchers to home in on relevant content on both mobile devices and desktops, it points the way toward faster, better searching within a wide range of publication types, from reference books to periodicals, technical documentation and standards to textbooks. Click To Re

FEB 11: Catch our Lightning Demo at Tools of Change

When the first Agile PDF sites launched recently, we promised we'd be saying more soon about how the sites were built. Well, the first public demo is coming up at the O'Reilly Tools of Change Conference in NYC: The Five-minute Publication Site —Part of the TOC Lightning Demos Series, February 11, 7:30-8:30pm, Broadway Ballroom As the name suggests, we'll be showing just how quickly a publisher can move from PDF files to a flexible, customizable website selling digital content. And of course, if you've got a bit more than five minutes, we'd be happy to answer any questions. We'll be exhibiting immediately after the demos at the TOC Faire , and happy to talk at any point during the rest of the conference. Drop us a note at info@tizra.com .

Behind the Screens, Pt. 1--Creating a Site With the New Tizra Publisher Control Panel

Now that it's public , we're excited to show the new Tizra Publisher web control panel in a bit more detail. To provide a real-world example, we'll show it in use building the eat.shop guides site for Cabazon Books, which went live a few weeks ago. While in practice the process is quick—with initial online selling capability available in a matter of minutes—there's a lot to the software, so we'll break it up over a few posts. 1. Upload a Document When you open your Tizra Publisher account, you're presented with the control panel homepage in your web browser. The cog dropdown provides quick access to key tasks from anywhere in the system. Start by using it to upload a PDF. In this case, it's the full 132 illustrated spreads for eat.shop nyc. Note how the progress bar informs you as the system imports the file, extracts metadata, breaks the PDF into individual pages, and indexes it for full-text searchability. Apart from the upload, which of cours