Sunday, October 21, 2007

Personal datastores and Vixie's dystopia

Paul Vixie argues in his 2006 Commonwealth Club speech that OS vendors, in their quest to monetize their customer bases, will develop a system in which we effectively pay for access to our own data. At the root of this, I think, is the fact that managing our increasing piles of data and tracking ownership of it is not something most people want to take on. Think financial and health records, contacts, appointments, correspondence, home movies, commercial music, home-made mixes of commercial music, home movies with a mix of commercial music in the sound track ...

Software vendors are more than happy to supply the tools for managing this, and this is a very legitimate and useful thing for them to do. Service providers like banks are also happy to provide similar tools. Service providers like hospitals and clinics might want to, but at least in the US HIPPA will tend to prevent that.

The problem here is that the data tends to get tied to the tools and providers, leading to two kinds of lock-in:
  • Weak lock-in due to proprietary formats. The maker of a word smasher has no particular incentive to make its format widely available. At best it has an incentive to read everyone else's format, write a few least-common-denominator formats (e.g., PDF) -- and make sure that its format has a few extras that no one else has.
  • Strong lock-in due to restricted access. Unless I take steps to keep a copy, I can only get at my bank records by signing in at my bank. It may choose to limit my access to, say, only the previous year. My bank is also free to derive whatever secondary data it likes from my raw transaction record (e.g., whether or not I qualify for a given loan) without disclosing exactly how it arrived at that conclusion.
Neither of these inherently needs to keep people from accessing their own data. Proprietary formats are security through obscurity. Strong lock-in may work for content like music created by someone else (though I have my doubts), but shouldn't be an issue for content you create yourself. If the bank doesn't want to disclose its secret loan-qualification formula, that's its information, not mine, but I have the option to download my own transaction records every few months and store them where I choose..


The dystopia Vixie describes is pretty much the antithesis of the personal datastore. In that model, I control access to my information, probably paying someone to keep it safely backed up and to limit access to it. Software vendors sell tools for manipulating and searching that data. Service providers like banks and medical providers populate that data and may access parts of it with my permission.

Except, how does that really change the picture? Right now, if I want to save a document, I have it written to a file on my disk. In the personal datastore model, I have it written to a chunk of my datastore. In either case, a tool could refuse to write a version anything but it could read, and in either case, the antidote is to make sure buyers understand that this is happening and that there are alternatives.

It is important here to distinguish two key parts of the personal datastore model. The first is that the datastore is a network resource; it is safely maintained and visible from anywhere, but with fine-grained access control. The second is that it serves as a point of integration by keeping its data in open formats.

The two are largely orthogonal. I could, for example, carry a smartcard from health care provider to health care provider, and as long as the data on it is in a standard format I can keep my medical records all in one place as opposed to in separate silos. The smartcard is a point of integration but not a network resource.

On the other hand, I could keep my data on the net but in a format not even I can read directly. I wouldn't want to do this, but a vendor might want to do this, say by encrypting the data with a key only it knows. In that case I'd have a network resource that was considerably limited as a point of integration (unless I wanted to limit myself to other tools in that vendor's suite, of course ...)


In the end, I don't think that resistance to vendor lock-in will by itself be enough to make personal datastores happen, but it should at least make the environment more hospitable to them.

1 comment:

David Hull said...

Note to self: typo

Also. Cloud storage