Skip to main content

The XML Paradox

I have been working on my tutorial for the O'Reilly Tools of Change conference. I'm presenting PDF as a cost-effective option to create revenue from the the backlist as an alternative to XML. As a dedicated markup advocate from the days of SGML, and someone who helped simplify SGML down to XML, I still find it odd to be talking about other kinds of solutions, but I think I learned something from my custom web site customers... The XML Paradox is that XML is a high-quality archival medium, and obviously then, books and scholarly content would make the jump first. It just makes sense that everyone would use the high-value format for the longest-lived, highest value content. Wrong! The economics of publishing have played out the opposite way. The more ephemeral the content, the faster production methods can change. So newspapers were doing full-text databases from very early on. In the scholarly markets, journals are now almost all electronic. Books, however, are only starting to move fitfully in the XML direction, and are mostly not digital at all. So the least archivable stuff, moves to the best archival format fastest — because serial content does not have a legacy that needs conversion to make a new channel profitable, so the payoff from a production change can be pretty fast. A publisher with a rich backfile has items that can earn for 20 years or more — as long as costs can be controlled. So any change to the book production process has to pay off immediately on new books. And for any large-scale change across a publisher's line to be successful, it must be very cheap for old books. And that's where e-books stand, revenue unearned because there's not a clear path to get it. XML is great, and enables the production of an optimized presentation for a new media format, but it's not cheap at all. It's an expensive and tricky management challenge to change editorial production processes for new content, and data-conversion costs for old content are very high. Once the data is in hand, the development cost to create a new output format (print, web, handheld, or whatever) is not cheap either. Problems like typesetting, layout and display all have to be solved anew for each output format. It takes work to optimize presentation, especially from the level of abstraction gives good XML that power. So page images (and especially PDF) get a big boost from the XML paradox because they capture a lot of the production value of the existing process and they're the cheapest searchable format to produce from paper. So here I am, a guy who courted his wife over conversations about markup, working with page images. We are managing them with very rich metadata at a fine level, to capture much of the commercial benefit of XML, but still, I'm enabling something I used to rail against. And it's not easy to make page images work over the web, let publishers control the presentation, and still be good to readers. In this discussion I am leaving out the small number of crown-jewel properties that earn large amounts quickly in a new channel, and thus merit technology investment — Projects like that are important, but don't shift the business as a whole. And their emphasis on frequent updates makes them similar to serials in the need for continuous editorial management. Coming soon: I used to think that page scanning projects were a waste of money in terms of long-term investment, and I hope to post soon about why I no longer believe that either.


Popular posts from this blog

Texas Tech University Center Goes Digital and Reduces Print Budget by 80 Percent

CCFCS curriculum materials hosted by Tizra are winning raves from teachers. After 44 years of empowering teachers with print materials that were aligned with key instructional goals, Texas Tech Curriculum Center for Family and Consumer Sciences (CCFCS) made the bold decision to go 100 percent digital using the Tizra digital publishing platform. At first, the task seemed daunting.  “We printed out a copy of each of the curricula and some of them were four inches thick,” says center director Patti Rambo.  In addition to a massive quantity of materials for its 33 courses with 300-350 teaching strategies per course, the school also needed to meet aggressive revenue goals and appeal to a diverse customer base. The Center’s search for solutions was exhaustive until they were directed to Tizra. “Tizra is flexible enough for us to make up the rules as we go along,” said Rambo. “We were able to design our pages so there’s less scrolling, and we were able to color code the courses

Case Study: ARL Walks the Walk on Accessible Content

The Association of Research Libraries (ARL) provides leadership in public and information policy to 125 research and academic libraries in the U.S. and Canada. A few years ago, in an effort to increase community engagement and reduce costs, ARL made the move from print to digital publications using the Tizra platform. "Our goal was not just to go e-only, but to get there with enhanced functionality," said Publications Program Officer Lee Anne George. When evaluating digital publishing platforms, ARL had some critical requirements including: Support for existing publication formats A full e-commerce platform supporting both free and paid content, as well as password- and IP-authenticated access for individual users or entire organizations Mobile responsive design support Full-text searching including relevancy ranking and linking to specific pages in search and social sharing Support for ARL's digital accessibility mandates After extensive testing, incl

Webinar Sneak Preview: Strategic SEO - Increasing Your Organization's Visibility

Q+A with SEO expert Pam Long of True Digital We caught up with Pam Long and asked her to answer just a few SEO-related questions in advance of her upcoming webinar - Strategic SEO: Increasing Your Organization's Visibility . In this session, Pam will focus on the SEO challenges faced by association and mission-based publishers and organizations, and the Q+A below is just a small taste of the information she has to share, and she'll be taking your questions throughout the webinar! To register for this free webinar which takes place on Thursday, October 12 from 1pm ET - 2pm ET, click here . 1. What's the biggest SEO challenge facing association/mission-based publishers? You need to face the reality that there are organizations out there that have been generating content with the intent of being found far longer than many associations have had digital content available at all. You’re competing against the likes of Google Books and Amazon, who are very, very good