Skip to main content

The XML Paradox

I have been working on my tutorial for the O'Reilly Tools of Change conference. I'm presenting PDF as a cost-effective option to create revenue from the the backlist as an alternative to XML. As a dedicated markup advocate from the days of SGML, and someone who helped simplify SGML down to XML, I still find it odd to be talking about other kinds of solutions, but I think I learned something from my custom web site customers... The XML Paradox is that XML is a high-quality archival medium, and obviously then, books and scholarly content would make the jump first. It just makes sense that everyone would use the high-value format for the longest-lived, highest value content. Wrong! The economics of publishing have played out the opposite way. The more ephemeral the content, the faster production methods can change. So newspapers were doing full-text databases from very early on. In the scholarly markets, journals are now almost all electronic. Books, however, are only starting to move fitfully in the XML direction, and are mostly not digital at all. So the least archivable stuff, moves to the best archival format fastest — because serial content does not have a legacy that needs conversion to make a new channel profitable, so the payoff from a production change can be pretty fast. A publisher with a rich backfile has items that can earn for 20 years or more — as long as costs can be controlled. So any change to the book production process has to pay off immediately on new books. And for any large-scale change across a publisher's line to be successful, it must be very cheap for old books. And that's where e-books stand, revenue unearned because there's not a clear path to get it. XML is great, and enables the production of an optimized presentation for a new media format, but it's not cheap at all. It's an expensive and tricky management challenge to change editorial production processes for new content, and data-conversion costs for old content are very high. Once the data is in hand, the development cost to create a new output format (print, web, handheld, or whatever) is not cheap either. Problems like typesetting, layout and display all have to be solved anew for each output format. It takes work to optimize presentation, especially from the level of abstraction gives good XML that power. So page images (and especially PDF) get a big boost from the XML paradox because they capture a lot of the production value of the existing process and they're the cheapest searchable format to produce from paper. So here I am, a guy who courted his wife over conversations about markup, working with page images. We are managing them with very rich metadata at a fine level, to capture much of the commercial benefit of XML, but still, I'm enabling something I used to rail against. And it's not easy to make page images work over the web, let publishers control the presentation, and still be good to readers. In this discussion I am leaving out the small number of crown-jewel properties that earn large amounts quickly in a new channel, and thus merit technology investment — Projects like that are important, but don't shift the business as a whole. And their emphasis on frequent updates makes them similar to serials in the need for continuous editorial management. Coming soon: I used to think that page scanning projects were a waste of money in terms of long-term investment, and I hope to post soon about why I no longer believe that either.

Comments

Popular posts from this blog

2018 Tizra User Summit: What We Learned by Meeting Our Customers Where We Are

They say you should "meet your customers where they are." Well, for Tizra 's 4th annual user conference we decided to take a chance and do the opposite. Instead of going to Chicago or DC, where Tizra customers are concentrated, we bet on hosting in our home town of Providence, RI. It's not that we don't love Chicago and DC, but we really felt we could do something special with the home field advantage. As it turns out, we were right. The Tizra user community is a silo-busting mix of creatives, technologists, content strategists and executives, who share the goal of building, engaging and generating value from audiences with digital content. For this crowd, we didn't want a sterile conference facility. We felt that by giving them a place they could really connect with, we'd help them connect with each other. Providence, with its vibrant tech and design scene, walkable downtown, and non-traditional venues, provided just the funky catalyst we were loo

Free Webinar: How to get off the mult-format content treadmill

Free Webinar: Friday, September 21 12-12:30 pm (ET) How to wrangle ALL your content types into one beautiful online hub… and get off the treadmill for good! It never lets up. First it was publications and conference materials. Then blogs and social media. Then webinars, infographics, podcasts and online courses. You keep cranking them out, but where do they all go? How can you keep your communications investment from evaporating at the speed of Twitter? Tizra lets you bring it all together into a great-looking, searchable, mobile-friendly website that delivers long-lasting value to your audience. In 30 minutes you will learn... How to broadcast and curate mixed media types for maximum impact. How to categorize content for ease of use and maintenance. How a well-tuned search can reveal hidden gems. REGISTER NOW!

How the American Dental Association Moved Beyond DRM

The American Dental Association has been publishing books, brochures and other materials to help members stay current and manage their practices for more than a century, and now generates about $10 million in sales and licensing revenue from these products. The ADA migrated to Tizra this year after a rigorous RFP process, driven primarily by frustration with their previous digital publication solution, which relied on proprietary digital rights management (DRM) technology. Under the previous digital publication solution, users were required to download special software to view the publications, which caused user complaints, and the DRM solution was unable to handle video and other supplements that went along with the publications, meaning ADA had to rely on cumbersome delivery mechanisms like CD-ROMs and flash drives. Working with Tizra, ADA has not only eliminated these issues, they have gained the flexibility to pursue complex business models like group sales to large practice