Monday, December 10, 2007

Why is there still print?

The Newsweek article on Kindle quotes Jeff Bezos as saying "Books are the last bastion of analog." I take his point, but it seems an odd statement. Text, after all, is arguably the first real digital medium. What he means by "digital", of course, is "available to computers". Unlike music and video, which are now routinely released in computer-readable form, books are still released in a form you can't just download. Bezos aims to change this with the Kindle.

The interesting question is, why does print resist digitization so well? I've suggested that publishers like it because it provides copy protection, but why does it? The answer has to be economic, not technical. Technically, it's trivial to digitize a book. Just scan it in. Don't bother to try to convert the image back to text. If all that people want to do with the result is read it, the image should work fine.

There's an interesting subplot here. Optical character recognition (OCR) seems to do fairly well these days on well-printed books, judging by Google books and Amazon's own "Search inside the book" feature. On the other hand, the fully general problem of reading anything a person can make out still appears to be hard, which is why sites use distorted text CAPTCHAs to try to stop bots. This seems like the equivalent of anX-prize for freelance OCR hackers, and indeed the inevitable arms race appears to be well under way. Finally, bringing us full circle, one source of these CAPTCHAs is printed text that failed to scan correctly.

In any case, the difficulty doesn't seem to be digitizing text in a readable form. The problem is, what do you do with it once you've got it? It's technically trivial to scan a book, but it still takes some time and effort to flip through all the pages, at least without expensive specialized equipment. So if I've done this, I'd like to see some compensation -- assuming I don't mind violating copyright laws.

Can I put it on the web and sell it? Well, um, I've just brought it into digital form, thereby making it hugely easier to copy. In other words, I've just put myself in the position of the publisher whose print-based copy protection I've just broken. If copy-protection is out, there's always advertising. Except that's maybe not such a good idea given that I've just broken the law.

This same argument would seem to act as a counterbalance to all sorts of unauthorized copying, but obviously it doesn't apply as effectively to audio and video. This is probably because copying CDs and DVDs is much, much easier than scanning books, and also because books are simply a different medium. I'd expect that PhDs have already been earned on just such matters.

3 comments:

john said...

In this digital age, every publisher should digitize their publications as the internet users are increasing rapidly. Through digitization circulations can be increased and this is the best solution for the instant reach.

There is a website called www.pressmart.net provides the digitization services for all print publications and most of the publishers are using the services of pressmart.net. Having these kind of services would be added advantage to the publishers.

Anonymous said...
This comment has been removed by a blog administrator.
Anonymous said...

Hi David,
Nice post and I’d love to read more similar posts. I agree that digitization is rapidly increasing and generating more revenue to the publishers and this is the best tool to compete the rising broadcast media.
John- Thanks for the information. I saw http://www.pressmart.net demo and I was much impressed by their services. Publishers would benefited by these kinds of services.
Look forward to read more from you David.