Friday, September 28, 2007

This iPhone will self-destruct in five seconds

Two questions come to mind about Apple's recent iPhone update which, as Apple had warned, makes hacked iPhones inoperable:

Who's better off for this? Owners of hacked phones now have $500 paperweights. Granted, they were warned and I would think were in violation of some license or service agreement. There are reports that some owners of non-hacked phones have lost contact data and possibly the use of their phones. Apple comes off looking like The Man instead of The Rest of Us, thereby calling down the wrath of hackers everywhere, but what were they going to do? The one group clearly to gain is makers of whizzy phones that aren't locked to a single carrier and/or don't self-destruct if you try to unlock them.

Just how does the self-destruct feature work? Apple asserts that the hacked phones are now "permanently inoperable". Did the update fry some hard-to-replace chip? If not, just what claim is Apple making? Clearly the self-destruct update will have left affected phones unable to receive further updates the usual way. But is it impossible even in principle to re-load the OS, for example by copying the image from a working phone? I would expect it to be difficult -- dongle-based copy protection is a lot easier to pull off for something highly integrated like a phone -- but could not even Apple do it back at the factory? [My understanding is that they just re-flashed the firmware and that Apple could fix such a phone at the factory (but has no reason to). In some cases, such a phone might also be fixable without help from Apple.]

Wednesday, September 26, 2007

A use case for provenances

It's the not-too-distant future. I'm walking down the street, vaguely aware that there are all manner of webcams around me. As it happens, I'm touring my city with a friend from out of town. "Oooh!" my friend says, "Let's take your picture in front of that statue!"

I say OK and we do. What makes that picture different from the dozens or hundreds of thousands that various webcams took as we strolled?

When my friend took my picture, the camera got in touch with my personal datastore, using the appropriate PK mojo and one of my friend's keys. My personal datastore stored the picture for me and sent it (or a pointer to it) to my friend's datastore, bundled together with "On this date, <my friend> took this picture with permission from <me>" all signed with one of my keys. The random webcam pictures lack that permission.

Again, that doesn't mean that no one can take my picture and look at it. That's just a fact of going out in public, even today (though ubiquitous webcams do change the picture, so to speak). What it does mean is that if they share such a picture, my permission will be conspicuously absent. Publishing? Swim at your own risk. Publishing content without permission will probably be grounds for a civil suit, at least.

Of course, there's the issue of The Man getting access to those pictures, since The Man don't care about permission. But that's a separate issue.

As before, the point here is that while using encryption to try to prevent unauthorized copying has, at the least, a few hurdles to overcome, it may well be better to de-emphasize that and use signatures instead to leave a paper trail of authorized copying.

If you can copy music all you want, but commercial players won't play music without proof of purchase (or proof of permission, for non-commercial works), then yes, you can always get a bootleg player, but it ought to be much easier to control those than the music itself. Or, if you prefer, players can just report whether there's permmission and let the listener's conscience be their guide.

In the case of pictures, where you can't control whether someone takes your picture, you can at least say convincingly whether they had your permission to do so. In this scenario, your browser (or whatever does what browsers now do) will be able to tell you the provenance (or lack of provenance) of a particular piece of content.

It won't cure all ills, but it seems useful.

Names and addresses

Consider the humble HTTP URL (there are other kinds, of course, but http:// and https:// have by far the lion's share of the action). Compared to the email address which, after a couple of stages of evolution, became a nice abstract way of denoting a mailbox, the HTTP URL might seem to leave a bit to be desired.

For most purposes, the literal meaning of an HTTP URL is "make a TCP connection to the server in question and send an HTTP GET request to it." It says not only where to look for a resource, but what protocol you have to use to get it. By contrast, a mailto: URL says neither. Find any server you like. It might speak SMTP, but it doesn't have to. The receiver has equal latitude in retrieving the message you send.

To be fair, a URL is a Resource Locator, so what else would you expect? Except for a couple of things.

First, people don't just use URLs for locating resources. For example, XML namespaces are generally HTTP URLs. There is no requirement that they point to a schema or even that they point to anything at all (again, to be fair, URLs in general aren't required to point to anything at all, either).

If a namespace happens to dereference to a schema, there is no requirement that it be the right schema for a given document. It's certainly a good idea to bump the namespace when you change the schema and to keep schemas backward compatible, but it's not required. Regardless of this, the namespace URL serves as a unique name to disambiguate my PurchaseOrder element from yours. In other words, it's acting as a name, whether or not it's any good as an address.

Second, there's already a facility, the URN (Universal Resource Name), for naming things independent of their location. I'm not sure who uses it. There's also XDI, but again, I'm not sure what traction that's got. I'm not claiming that HTTP URLs are better or worse than anything else, but it's pretty clear that for most of us they're perfectly good identifiers and there's not a pressing need to use anything else.

So here's the thing. In the case of bang paths vs. modern email addresses, it was night-and-day clear that moving away from "where it is and how to get there" toward "unique identifier" was a Good Thing. But with URLs, the world seems perfectly happy and functional using a thing that says "where and how" both for its intended purpose and as a name. What gives?

For one thing, HTTP is a completely different beast from UUCP (the protocol behind bang paths). It has explicit support for redirects, proxies, caches and other things that decouple the URL from exactly how you get to the resource behind it. Essentially, the URL says "start here", namely at the server/port part (more properly the authority). What happens after that is fairly flexible.

For another thing, a URL is generally pretty opaque. If it doesn't work, I don't go and consult a map to figure out what alternative might work for some part in the middle. I either give up or go hunting for a whole new URL for the same resource.

Finally, URLs are better and better hidden these days. If I see a web address on a billboard, it's probably just the domain name. I type the domain name into my browser and it fills in the http:// for me. Most of the time I won't even do that. I'll just chase a link somewhere. At that point I really don't care what URL the link uses to find its referent.

Somewhere among or around those three things is an explanation for why HTTP URLs work as well as they do in practice, while in theory they shouldn't. Or rather, why the theory that identifiers shouldn't talk about wheres and hows doesn't seem to hold in all cases.

Proof of purchase

This is another popped-into-my-head, maybe-I've-seen-it-somewhere, don't-remember-where, someone's-probably-thought-of-it things.

If I have some digital content in my store (or on my disk, for that matter), it might be good if my particular copy had a nonce in it and the seller included a digitally signed receipt to the effect that "On this date this person bought this copy of this content for this amount from this seller".

Legitimate copies (backups, versions converted to other formats, etc.) would carry either the original proof of purchase (for verbatim copies) or a link to the original or at least the previous link in the chain. In effect, everything can have a provenance.

This doesn't keep me from stripping out the bits and giving an unsigned, de-nonced version to anyone and everyone for free. Airtight copy protection is an inherently hard if not impossible problem. What it does do, though, is cover me by saying hey, at least I paid for my copy.

Whether it's illegal to possess a copy without accompanying receipt is a separate issue. But chances are pirating in general will continue to be illegal, and having a convincing means of proving non-piracy ought to be worth something.

Conversely, having such a scheme in place makes content without clear provenance inherently suspect. Whether The Man can find out you have such, and if so, prove that you have it on purpose, is another separate issue.

Tuesday, September 25, 2007

A strawman access control model

This is so simple I'm sure it's been invented already, but I can't be bothered at the moment to track down where. [Ed. Note: It's pretty close to, but not necessarily exactly the same as row-level security and cell-level security in modern databases]
  • The world consists of a set of records/objects (I'll call them objects, risking confusion with other definitions).
  • Pick a query language by which you can select a subset of this universe.
  • With each party that can access the world, associate a number of queries, for example
    • A read query. Any query that party makes must return a subset of the results of its read query. If it contains anything else, the entire query is rejected with a simple "access denied" error.
    • An add query. Notionally, execute this query after tentatively making the addition(s). If all additions are in the result, the addition commits. Otherwise it fails with a simple "access denied" error.
    • A delete query. The records to be deleted must be a subset of the results of the delete query. Otherwise ... you know the drill.
As always, an implementation doesn't actually have to make the queries in question, as long as you can prove it acts exactly as if it did.

Minimal as this is, it may be enough to usefully control access to a personal datastore. For example, I could allow anyone who successfully authenticated as a travel agent add permission to, say, (type=calendar-entry and tags contains travel and tags doesn't contain public). I would grant random people read access to (tags contains public), delete access to nothing and maybe add access to (type=comment and tags contains public).

By default my travel agent's entries would be private unless I did something to make them public. I can blacklist people by reducing their add and read queries, and conversely I can allow privileged access by expanding their queries.

What I'm doing here is basically punting almost all of access control over to the query model of the datastore and to the authentication model that establishes the identity of a party and finds its associated metadata.

A major assumption is that in order to be useful, the query model has to be pretty powerful anyway. In other words, figuring out what objects to grant access to (and what additions/deletions to grant permission for) is no harder than getting useful information out of the datastore in the first place.

Access control models have query models hidden in them one way or another, so why not use what you've already got?

The (bright) future of VRM

Joe Andrieu reports having a "mega-watt flash bulb" experience at a VRM summit about user-centric architecture, and then lays out a detailed vision of what this means. It's a long post, but well worth the effort of (re-)reading.

I had been about to amplify my previous post on data stores by talking about how it might work in practice. I still will, but with the caveat that Joe has already spelled pretty much all of it out in greater depth.

Suppose I have my personal datastore securely set up at a "databank", that is, at a hosting service for personal datastores that meets my and/or the state's requirements for security, integrity and liability in case of mishap.

This datastore will be partitioned, possibly across several axes. E.g., there might be travel data, entertainment preferences, email, calendars, photos, video, bookmarks, browsing history, what-have you. Everything in it can be tagged uniformly, so if I ask for "Japan", I can find email about Japan, photos of Japan, calendar entries from my trip there, web sites I've tagged "Japan", and so forth.

There is no single format for the data, but formats for a given type of data will tend to standardize via the usual market forces.

Different people will see different slices of the data. For example, I see the full details of my calendar, but a business associate might just see "on vacation" or "unavailable" or even "unknown" for part or all of it.

The datastore may also re-format or otherwise re-present data depending on the accessing party. For example, I'll probably store my audio as flac, but I'd like my phone/mp3 player to see it as mp3.

Some parties will have write access to parts of it (or more likely, append access). For example, entertainment vendors will be able to make entries in the "entertainment preferences/history" area, while travel agents will be able to (among other things) write to my calendar when I book a trip. As Joe says, access control will be very fine-grained, at least by present standards.

This will tend to turn present patterns inside out. Right now, if I go to an entertainment or travel site, I authenticate (generally pretty weakly) to convince them I'm me. Then I can see their chunk of my personal data and do various things to modify it.

Under the personal datastore model, I might contact them, or they might contact me with a suggestion. In either case, they will authenticate with my personal datastore (strongly, I hope) and convince it that they're them. They would then see the appropriate chunk of my personal data and work with it appropriately.

If I change vendors, my personal data stays. If some other party needs access to the same data, they see the same data, where presently they would have to build or obtain their own copy.

This is the power of the "user as integration point" paradigm. There are benefits to be had by all:
  • I get better control over who sees what part of my data.
  • Others get a single, consistent view without having to constantly re-discover what someone else knows.
Naturally, there will be a counter-current of vendors wanting to keep their information about me away from their competitors, and there will always be some data they keep privately, but there are any number of plausible cases where cooperation ought to win, given the enabling technology.

If I keep it all in my wallet, where's my backup?

Michael O'Connor Clarke writes
I'm still not quite sure where I'm going with this, but I feel the need for some secure, personal repository that would hold all of my connections and "whuffie" together. I want to keep my whuffie in my wallet - but not in a Microsoft Passport/Hailstorm kind of way. Ack, no.
And here's the dilemma. On the one hand, if I carry all my connections with me physically, I had better have a backup somewhere. Maybe I have it on a machine at home, and maybe I keep backups of that on DVD or something in a safe deposit box at the bank?

That's not so good. It puts me in the data integrity/security business, which I'm almost certainly not as good at as the next guy. It's essentially equivalent to keeping my money in cash under my mattress. Some people do it, and they have their reasons, but most of us don't.

Another option is to let one of the major players keep track of everything for you. I agree: ack, no. A major player has too much of a vested interest in trying to steer me towards its products and making it more difficult to use those of other players (major and minor). To that end it would like very much to know every little thing about my buying and browsing habits, those of my friends and so forth.

How about my ISP? It's a neutral party, and I pretty much trust it not to lose my data. I don't, however, trust it completely not to let anyone else access my data, either by accident, through a disgruntled employee, or to malicious hacking. I can mitigate that at least a bit by encrypting the data I keep on the ISP. The tools are there, at least mostly, but not as robustly or seamlessly is I'd like.

If nothing else, the pipe between me and my ISP's servers is just too thin right now to, say, keep all my music, photos, videos and such online conveniently. Right now I could probably get away with things like passwords, profiles and other metadata, calendars, contacts and such but then I have to keep that sequestered from everything else. It's much simpler just to keep everything in one place.

If the problem is cash under the mattress, the solution is banks, at least in some form ("databanks" ... what a concept). These would probably be more like ISPs than anything else currently around, but with some significant differences:
  • Regulation: Banks are tightly regulated (though not as tightly as they used to be), with requirements for things like reserves and accounting standards. ISPs are market-regulated, which works fine for QoS matters like uptime and storage cost, but if they're going to store the keys to the kingdom, I don't think that will be enough.
  • Liability: If I'm going to trust someone else with my highly personal details, I want to know that someone's butt is on the line if they screw up and leak it. I'd also like to know I'd be compensated, but I'm more interested in prevention than redress. This could happen either through market forces or legal requirements, but either way it needs to happen.
  • Access: Ideally, I'd like to a have high-speed, secure pipe between me and my databank at all times, whether I'm at my laptop or on a plane without it (like that would happen). If I buy a new phone, I want to know that whatever brand I go with it will Just Work reliably and securely with the databank. This is just part of the 4G vision.
  • Access control: I want to be able to grant other parties access to selected pockets (more on that later) but be sure that no one has access outside what I want to share. I want to do this simply and securely, without having to mount special file systems, futz around with keys all the time or whatever.
We can certainly get a lot done without databanks of this sort, but it seems to me that something like this is going to happen sooner or later, and a lot of cool stuff will be enabled when it does.

Monday, September 24, 2007

Some kinds of reputation

I've argued that reputation is an inherently subjective assessment of a set of assertions associated with a persona (I need a less-jawbreaking way to say that). That doesn't mean that there's no point in trying to model it, just that there's no single or objective model.

On the other hand, there are any number of real-world examples of reputations, some of which already have well-known models. Some that come to mind:

eBay: Michael O'Connor Clarke talks about this in his piece on personal reputation management. In eBay, your reputation is a summary of the positive and negative feedback you've received over the course of your transactions. There are various checks and balances that make the system work (e.g., if you bash someone needlessly, it will catch up to you).

Clarke states that "It's your reputation, but you don't even own it. If you ever chose to leave eBay, you can't take it with you."

I'd basically agree but I might put it differently. Your eBay reputation is a summary of the history of your eBay persona. If you can convince someone that, say, your Blogger persona and your eBay persona are both you, then you've convinced them that, say, the person who posted "I just bought a new camera on eBay" has a 98% positive rating on eBay. If you leave eBay, you lose a means of convincing someone that you made those particular transactions and got those particular feedback votes.

If you can carry those assertions with you, you can carry your reputation with you. For example, a subculture of eBay users could all agree to send appropriately signed records of their transactions and feedback to a trusted third party as they make them. This slice of your eBay reputation would then be portable (assuming eBay doesn't take this sort of thing amiss).

Blogrolls and social networking sites: Clarke mentions these as well. Blogrolls are just a less formal version of the same basic idea. Again, the problem in carrying over reputations from one service to the next is mostly a matter of linking up the personae.

I could imagine a sort of "meta-network" service where I can tell the meta-service that I've joined a new service, giving it my login, password etc. The meta-service can then look up who else it knows that has already joined that service and who is connected to me in other services I belong to and, subject to rules I specify, invite them on my behalf.

Similarly, it can accept invitations on my behalf from people it knows I've accepted invitations from in other services. It can monitor my various services and propagate changes across them. E.g., if I blogroll someone on Blogger, it could automatically invite the other person on LinkedIn, and if they also subscribe, accept the invitation on their behalf.

This seems somewhat different from OpenID, in that the service stores connections as well as identities, and from FOAF, which appears more concerned with marking up existing pages to make them more easily machine-readable. As always, though, I may be slow on the uptake here.

The various public-key webs of trust: This is probably what I was thinking about here, when I said "It occurs to me that there is a different, more Web 2.0-ish notion of reputation as a network of people's ratings, other people's ratings of their ratings, and so forth." Each key is a persona, and the trust scheme of the system you're using accumulates the various assertions known about a persona into an assessment of the persona -- its reputation, in other words.

Poker: This is a world where you literally buy and sell your reputation. Each bet you make says something about how you play. Do you bluff a lot? Are you likely to call someone else's bluff? Do you play every hand or do you fold all but the best? Every time you act, you provide information for the other players' answers to those questions, and that in turn affects how they will play against you and thus affects your winning chances.

Are there other poker-like situations on the web, outside gaming? I'm sure there must be, but they're not coming to mind at the moment.

Recommendation lists on e-commerce sites: These aren't so much reputations in the usual sense of trustworthiness, but they are assessments of assertions about a persona. The user ID is the persona and the assertions record the purchasing history and other actions of that user. The site then uses its own secret sauce to cook these up, often together with the histories of other users, into a list of recommendations.

Remember bang paths?

For those who don't remember, back in the mists of time if I wanted to send email to my friend down the street at the Prestigious University Research Lab, I would have to spell out exactly how it was supposed to get there.

We did this by giving a list of machines that the message was to pass through, separated by bangs ("!"). In those days much of the heavy lifting was done by VAXen and Sun workstations were popular for the desktop. The result might look something like

mycorpgw!foovax!bigvax!arewethereyetvax!prestigiousvax!sparcorama!myfriend

(Except probably with all names 8 characters or shorter)

I was never very good at this, even with help from the ASCII-art maps they posted to the newsgroups from time to time, so I usually just called or hopped on my bike and went over if I really wanted to be sure the message got through.

Then along came the email addresses we know and love today, and I could just say

myfriend@researchsparc.prestigious.edu

That was much better. I didn't care how the message got there, just which machine it was headed for. Except that sometimes my friend was logged into researchsparc and sometimes othersparc, and mail would miss from time to time. That didn't take long to fix (and the capability was there all along, I think). Soon everyone had a single mailbox, independent of where the messages were stored, what route they took to get there, what protocols the mail system spoke, or anything like that. Then I could just say

myfriend@prestigious.edu

and mail would get there. Even I could handle that.

Score one for abstraction. Successful abstraction isn't drifting off into some ethereal realm of ideas about ideas. It's cutting out everything that's not essential to the task at hand and holding on for dear life to what is (in this case, whom I'm trying to reach).

Footnote: For several years during that era my address was dmh@tss.com, which was both a valid email address and a valid MS-DOS filename. A desert topping and a floor wax. I always got a strange sort of pleasure out of that.

Sunday, September 23, 2007

Refining the persona/reputation model

Previously I asked whether it was worth breaking up the M:N relation between personae and attributes. My personal bias is to break these up whenever there's a natural name for the resulting link or (and this is closely correlated) whenever the link might naturally have relations outside the two linked entities. My further bias is that this is generally true.

I also asked whether opinions are different from other assertions.

I think these can both be answered by dropping attribute, action and opinion, adding
  • An assertion is a claim made by a persona
and reworking the relations as follows:
  • A person (1) assumes an identity (N)
  • An identity (1) is assumed at a time (N)
  • An identity (N) is as a persona (1)
  • A persona (1) makes an assertion (N)
  • An assertion (N) is made at a time (1)
  • An assertion (1) is about a persona (0 .. N)
Instead of attributes, actions and opinions, we have various assertions, e.g.,
  • I assert (at time t) that the sky is blue (plain assertion, not about anyone)
  • I assert that I am handsome (one way of bestowing an attribute)
  • I assert that my friend is homely (another way)
  • My friend asserts that my opinion can't be trusted
  • My bank asserts that I made a deposit this morning (assertion of an action)
  • I assert (at time t+1) that the sky is not blue (changing my opinion)
This model is probably missing a key ingredient or two, particularly a means of granting access to assertions, but it seems enough to support reputations. E.g., if I assert that I have $1,000,000 in the bank but my bank asserts I have $0, that will influence my reputation. Reputations can be built both by tying assertions to the outside world (You're convinced that my bank is trustworthy) or by studying the relationships between them (X has a history of lying about his bank account).

A real live field note

The other day I was walking past a field and saw two deer -- a doe and a fawn. This was in an area where the deer had no reason to fear people, so I was able to come fairly close without scaring them off. As I approached, I noticed the doe looking at me. I watched her as she took a few steps off toward the woods, then a few more, then a few more, before breaking into a slow trot.

I had been so absorbed in this that I completely lost track of the fawn. I looked back to the field. It was no longer there. When I finally spotted it in the trees, it was already well away. When it saw I'd seen it, it ran off.

At which point I realized I'd been played. The whole point was for the mother to catch my attention so the fawn could sneak off. To do this, the doe had not only to see me, but be sure that I saw her. The same scene has played out, in various forms, for millions of years. The "Who's watching me?" detector (and similar ones like "Do they see me?" and "Do they know I can see them?") have probably been around about as long as eyes have.

So what exactly does this have to do with the web? This sort of awareness and awareness of awareness is one of the key themes in Michael Chwe's book, Rational Ritual: Culture, Coordination, and Common Knowledge, which poses questions such as
  • How can a public declaration have political consequences even when it says something that everyone already knows?
  • Why were circular forms considered ideal for public festivals during the French revolution?
  • Why was the advertising during the Barbara Walters television interview of Monica Lewinsky dominated by Internet companies?
  • Why are close friendships important for collective action even though people typically "reach'' many more people through casual acquaintances?
and "tries to answer these and other questions with a single argument, trying to find a common thread among a variety of cultural and social practices usually thought disparate."

The answers, Chwe persuasively argues, are founded in the notion of "common knowledge" -- things that we all know, and that we all know we all know.

Much of Chwe's work involves social networks, knowledge and identity, all viewed through the lens of economics. Good stuff.

("Chwe" is pronounced sort of like "Chet" without the "t")

Persona, reputation and VRM

One thing that should jump out from the previous post on modeling reputation is that there's nothing called "reputation" in the proposed model. This is not a complete accident. Reputation is a very subjective thing, both in that one's reputation will vary depending on whom one asks, and in that it can be reckoned in any number of different ways. What the model models is not reputation itself, but the elements from which one might derive a reputation.

There is, however, a very prominent piece called "persona". From a VRM standpoint, as I understand VRM, this is the piece we want to break out and re-use. In the stock example of movie preferences, I might like to share my "movie buff" persona amongst the IMDB and the various online vendors, so that whenever I notice something anywhere and comment on it, or rent a movie, everyone knows about it. Renting or reviewing a movie is an action on the part of my "movie buff" persona.

Under this model, I will have different reputations with different vendors, even if they see the exact same information about my persona, because they will draw different conclusions from the same facts.

It occurs to me that there is a different, more Web 2.0-ish notion of reputation as a network of people's ratings, other people's ratings of their ratings, and so forth. There is probably a useful analogue to page rank in web sites to be had here, for example.

That's cool stuff, but for my money it's one particular way of deriving a reputation from the public acts of a persona.

Information age: Not dead yet.

Joe has another interesting article, this one on the end of the information age.

I like the article, I like the argument, but I don't quite like the conclusion. As Churchill said, this is not the end. It is not even the beginning of the end. But it is, perhaps, the end of the beginning.

What has happened is that we have moved from information scarcity to information abundance. You could just as well argue that this marks the beginning of the real information age. In which case I think Joe is saying the same thing, except that instead of "real information age" we should call it something else.

For that matter, when did the "classic" information age start? Did it start when it became possible for someone to make a living dealing solely in information? That would be quite a while ago. Did it start when information management allowed geographically large entities to persist over time? Also quite a ways back.

Did it start when people on opposite sides of a continent could communicate with each other instantaneously or nearly so? That would be sometime in the 19th century. When the first modern computer was built? Mid-20th. The first PC? The first use of the term internet? Take your pick.

Ages are not mutually exclusive. We are still very much in the industrial age. New industrial products and processes are invented all the time. Large parts of the world remain largely unindustrialized -- even as they build out their information infrastructure.

We're just not focused on being in the industrial age. William Gibson asserts that if it's not new technology, it's not considered technology. In the present case, maybe it's industry itself that's not new?

The industrial age comprises a number of milestones such as the invention of the steam engine, the assembly line and other mass production techniques, numerically-controlled machine tools and so forth. It also comprises a number of trends, such as a decreasing work week and machines replacing human workers over time.

The information age is no different. The (over)saturation of human bandwidth by available information is one of many milestones and, as far as I can tell, a fairly early one.

Modeling reputation

Joe Andrieu has some interesting points about reputation on the web, among them that no one seems to have a concrete model for this crucial concept.

Going on that basis, and probably re-inventing a few wheels along the way, here are some thoughts:
  • You have a reputation with someone as something
  • A reputation is another person's interpretation of what they associate with you, including
    • Attributes ("Statutory")
    • Actions ("Case-based")
    • Your reputation with third parties and their reputations as judges.
  • Everyone will have a different view of those things, and even if two people somehow had the same view, they will almost certainly interpret it differently.
  • We try to control our reputations (at least)
    • through our actions
    • by controlling access to information about us
    • by influencing people's interpretation of the information (we think) we know
  • One way to control this access is to take on a persona. The most obvious way is via a pseudonym, but subtler forms are also common. For example, we tend to separate, say, our business correspondence from our personal correspondence.
  • This control is very incomplete.
A bit more on personae:
  • A given person will take on many personae
  • Some personae are taken on by many people, either sequentially (The Dread Pirate Roberts) or concurrently (Bourbaki)
  • The association between a person and persona is an independent entity. If nothing else, it has a life cycle (on this day, Peter Parker became Spiderman)
  • The knowledge (or insinuation) that two personae are the same person can make a drastic difference in certain cases (X is just a sock puppet for Y; Someone on a company's message boards is actually the CEO)
Translating this into old high ERM, here are some entities that appear to fall out:
  • A person is a flesh-and-blood person.
  • A persona is a public image of a person.
  • An identity is the association between a person and a persona.
  • An attribute is something the owner of a persona asserts to be true about it.
  • An action is something a persona is publicly seen to have done.
  • An opinion is something one persona asserts to be true about another persona (rendering an opinion is an action).
and some relations:
  • A person (1) assumes an identity (N)
  • An identity (1) is assumed at a time (N)
  • An identity (N) is as a persona (1)
  • A persona (M) carries attributes (N)
  • A persona (1) takes an action (N)
  • An action (1) occurs at a time (N)
  • A persona (1) renders an opinion (N)
  • An opinion(N) concerns a persona (1)
  • A persona (1) may change an opinion (N)
This is only a very basic model, of course, and there may well be other models that deal with reputation as well or better. Some questions:
  • Is it worth breaking up the M:N relation between personae and attributes?
  • Is rendering an opinion any different from making any other assertion? Probably not.
  • What's the best way to handle group reputations. In a case like Bourbaki, there is only one persona and the present model will handle it. In a case like "the legislature is incompetent", both the reputations of the legislators and of the legislature as a whole are involved.

Friday, September 21, 2007

Now available in living Technorati

Here's my shiny new Technorati Profile [Nothing much ever came of this.  See here for the official death notice.].

They're right. This is absolutely fascinating.

Blogger just put up Blogger Play [Now, alas, defunct], a feed of all the photographs people are uploading to their blogs.

Flickr has had a similar feature for a while now, but not as a slide show. Watching the pictures flow by has a completely different feel. I won't even try to summarize or describe it here. Just chase the link (and prepare to lose a chunk of time).

Lies, damn lies and benchmarks

Computeractive magazine in the UK recently put up a broadband speed checker and asked readers to send in their results. Come to find out, typical speeds are a far cry from the "speeds of up to" numbers given by the providers.

More than half of respondents reported getting less than half the advertised speed. Naturally, users are not best pleased. Spurred by the reaction Computeractive is calling for government regulation of broadband advertising.

On the one hand, truth-in-advertising laws can help provide clarity when sellers have little incentive to provide it themselves. On the other hand, I'm curious just what sort of measurement will end up being required. The whole problem with the current advertising is that it provides a single, easy-to-digest number instead of a more nuanced picture that allows for (among other things) congestion at any of several points.

I wouldn't be surprised to see some sort of standard "speed rating" hashed out, perhaps with separate flavors for DSL-like, cable-like or WiMax-like services. I wouldn't mind being a fly on that wall.

Thursday, September 20, 2007

The New Yorker looks at MySpace

In a New Yorker article entitled Social Studies, Michael Schulman reports, with some amusement and some bemusement, on Facebook in the Flesh, a seminar at N.Y.U. aimed at introducing students face-to-face who had previously only known each other via Facebook.

The subtext here is that kids these days are so used to connecting virtually that they need remedial help when confronted with flesh and blood. I'm a bit old to have caught the Facebook wave (though I am LinkedIn), but one is never too old to remember the skin-crawling awkwardness that many freshmen -- geekly or not -- experience in those first crucial days.

Combine that with a small, self-selected sample of 35 freshmen out of around 5,000 and it's hard to say what valid conclusions, if any, one might draw. It's still an interesting article, though.

Wednesday, September 19, 2007

Babel

We don't know exactly how it all got started, but each of us is born with the ability to learn languages. None of us is born with the knowledge of any particular language. We get that from having to interact with the people around us.

That works fine on a small scale, but what if you want to get the whole world talking? There are only a few basic possibilities:
  • One language: It's the only game in town. QED.
  • Lingua Franca: If there have to be multiple languages, at least have one that everyone speaks.
  • Multilingualism: If you want to understand someone, learn to speak their language or get them to speak yours.
  • Translate: If someone doesn't speak your language, get someone to translate what they're saying and likewise for you.
The world has never in recorded history been even close to monolingual, but it's not for lack of people making the suggestion, at some times more forcefully than at others.

Multilingualism is common in much of the world. Combined with Metcalfe's Law and/or its cousins, multilingualism tends to promote the emergence of at least a local lingua franca. It's not clear whether this discourages multilingualism and ultimately tends toward the one-language world. It might seem easier just to learn the lingua franca, but children have no trouble learning multiple languages and there is a strong tendency to maintain the local language when the lingua franca is not local.

Latin is an interesting case. At one time an educated person was expected to know Latin and publish in it. This did not, however, lead to the widespread revival of spoken Latin. One had, after all, to talk to the merchants in the square as well as one's learned colleagues.

Lingua francas (or linguae francae if you prefer) have so far been partial in any case, both in that different regions have different ones and that not everyone in a given region finds need to learn the lingua franca, or speak it as fluently as a native. It's probably too early to know exactly what effect the web will have. You'd want to bet on homogenization, but you wouldn't want to bet the farm.

And that leaves translation. It's expensive, awkward and not completely reliable, but it's still with us. If machines ever get as good as human translators it may even eliminate much of the need for the other approaches.

There are obvious analogies here to standards on the web. One significant difference is that enabling a service to understand a new format or writing a converter is much less expensive than teaching a person (even a child) a new language or hiring a translator.

Tuesday, September 18, 2007

What is this "web" of which I speak?

I've been throwing the term "web" around from the get-go, so what do I mean by it?

I think I came closest to a definition in the post on Deutsch's "Fallacies of Distributed Computing": the web is all resources accessible on the net. That's maybe a little more inclusive than the usual formulations, but I think it's a good target to shoot for.

Now all we have to do is figure out what a "resource" is and what "the net" is, and we're golden.

Happy birthday, Smiley

As far as anyone can determine, Scott Fahlman posted the first message with a :-) smiley 25 years ago today. There's a more complete account on Fahlman's site at CMU.

Cheers :-)!

A distinguishing feature of human networks

OK, so the real distinguishing feature is that humans are involved. Duh. But I had a particular feature in mind ...

In the example of the spread of contract bridge, I guessed that a critical event, then as now, would be a well-known authority publicly endorsing a new idea. Well-known authorities are by definition connected to a large portion of the general population, because large numbers of people read/watch/listen to their pronouncements. This is a classic "small world" feature, which keeps the overall diameter of the graph (the maximum number of "degrees of separation" between any two members) small.

In a network like the internet, there are also a relative few "hubs" that are connected to a large number of more peripheral nodes. These tend to have monstrous bandwidth available and will carry huge amounts of traffic, both in and out. When I publish this post, for example, it will almost certainly pass through one or more of a relatively small number of "backbone" servers along with a mass of completely unrelated information.

In the human case, the well-known authority produces human-sized traffic, just like everyone else. It's just that this information gets broadcast, verbatim, to a large number of people. Similarly, anyone can try to send a message to the authority, but only a human-sized portion of it will actually get through. In practice, the authority's input will be heavily biased toward a small number of trusted people (who thus may be influential without being well-known), with maybe occasional input from random people.

I'm not sure what all the consequences of this may be, though I've been grasping at them a recent post or two. One way to look at it is that in a human network everyone's CPU and network are more or less the same size, while in a computer network they can vary by orders of magnitude. This in turn affects scalability.

Nor is it at all clear (to me, working off the top of my head here) that throwing more people at the problem would work, even if they'd sign up for it. Even if I decided to have ten or a hundred or a thousand people act together as a virtual hub, there's a limit to how fast they can talk to each other.

Hmm ... to what extent does something like Wikipedia act as a virtual hub?

The small world of contract bridge

In the spread of a new idea, which is more important: the speed at which information flows or the speed at which we absorb it?

I have no doubt that modern communication, including the net, has increased the speed of information, but let's not underestimate our forebears. From a recent New Yorker book review on contract bridge:
The modern version, contract bridge, was created in 1925 by the railroad heir and master yachtsman Harold Stirling Vanderbilt, who had been annoyed by what he felt were deficiencies in the previous version, auction bridge. Vanderbilt was a passenger on a ship that was travelling from Los Angeles to Havana by way of the Panama Canal, and on the evening of October 31st, while playing with three friends, he introduced several improvements that he’d been mulling over, including a method of scoring that required players to more accurately assess, during the bidding, the number of tricks they would take, a prediction known as a contract. Vanderbilt shared his ideas with a few other friends in Newport and New York, and his game spread across the country and around the world at almost unbelievable speed. “Half a year after Vanderbilt’s voyage,” McPherson writes, “a notice appeared in the Los Angeles Times announcing that a Chicago woman was suing her husband for divorce on the inexcusable grounds that he trumped her ace.”
I doubt that communication speed was the limiting factor here. More likely, it was the time required for a player to try out the new rules, mull them over, deem them good and decide to try to introduce them in their next game. By the small world principle, it doesn't take too many such hops to reach the entire community.

From a graph theory point of view, you have a set of players connected by lines of communication (we play with the so-and-so's at our Tuesday game; I correspond with such-and-such; we all read thus-and-such's column in the Times). Propagation through the graph is a matter of incubation time (playing, mulling it over) and transmission time (the epidemiological terms are deliberate here; I'm sure there is quite a bit of relevant research in that field).

In the present case, there is probably a classic small world graph, with a few well-connected nodes (major celebrities like Vanderbilt and the major columnists) and lots of less-connected nodes (the Tuesday night games of the world). It would be interesting to see when the new rules first appeared in print. It would also be interesting to see how long it took contract bridge to supplant auction. That's clearly not a matter of propagation speed.

These days communication time is much more limited by the reader than by the medium. If a headline appears in a major news source, it may still take some amount of time before everyone gets around to reading it. Blogs and online news sources shorten the time from writing to availability to practically nothing, but they don't necessarily make me read it that much sooner.

Which has had more impact overall: "Old-school" electronic communication (think telegraph, radio, TV) or the internet? One could make a decent case for the old school.

Sunday, September 16, 2007

Classical music online

This article on classical music sales explores why classical music appears to be benefiting from the internet where everyone else seems to be struggling. It gives a few reasons, only one of which looks convincing for the long run:
  • Classical music is harder to pirate because it demands better sound quality and one generally buys it in bigger blocks. A high-fidelity recording of the Ring Cycle is a lot more bits than a 3-minute pop song in mp3 form. Classical listeners are also more interested in liner notes, biographies of the performers and such. Well, maybe, but people seem to have no trouble pirating DVDs.
  • Classical music requires more sophisticated cataloging. OK, so popular music players tend to assume all you care about is the performer and get confused when you have to talk about the composer as well, and they don't tend to recognize that a single piece can comprise several tracks. That's lame, but so what? This is not an overwhelming technical problem. It's a matter of picking a system and going with it. Libraries have been cataloging classical music by hand for decades or centuries depending on how you count, so that might be a good place to start. More to the point, several players have already had a go. This will not remain a differentiator for long.
  • Classical audiences are more suited to the "long tail". This I'll buy. Clasical listeners are more interested in, say, finding several performances of the same piece, or finding obscure composers, as opposed to buying what everyone else is buying. This means sellers can put out tons and tons of selections, the more the better, knowing that while only ten people might buy a given selection, someone will likely buy any given piece and in aggregate volume will be good. I don't think this is unique to the classical world, but it's not what's shifting millions of units at the top of the charts.
The main point I want to make here is that the "defensive" technological points (that is, the first two), don't wash. Piracy can't be lower for classical music because it's harder. It has to be lower for other reasons.

You could read this two ways:
  1. Once classical listeners figure out they can pirate, the game is up.
  2. In some markets, there is less desire and/or incentive to pirate.
I personally doubt that classical listeners are less technically savvy than everyone else (and the article claims the opposite). That makes understanding (2) interesting and important.

I don't know what to call it, but I know I don't like it

You know the story. I've got an appointment tomorrow. My phone knows about it. My calendar program knows about it and can even tell the phone. The other person's corporate schedule knows about it. The other person's PDA knows about it. That's four copies of the same thing, variously synced up automatically, semi-automatically or manually (as when we both agree verbally on the time and place, then each enter the information into the appropriate system).

There ought to be one resource with one URN and (potentially) several URLs. Instead we just have the copies and the potential URLs.

I've heard several names for this, but none of them quite works for me. I'd be glad to hear more and better:
  • Silos: You know, those tall towers in the countryside, each full of, well, something, and each separate from the others. Except that a silo is generally full of silage. You take stuff like corn stubble that the combine leaves behind, dump it in a silo (maybe the familiar tower, maybe just a big trench or plastic bag) let it ferment and then feed it to the livestock in the winter. Not the flow of information we're talking about (or hmm ... maybe it is).
  • Balkanization: Refers, of course, to the Balkans, small states formed from the breakup of the Ottoman Empire, and later Yugoslavia. Each has its own government, language, culture etc. and they don't necessarily cooperate all that well. That matches up pretty well with a mess of operating systems, file formats and so forth, but my cell phone didn't suddenly declare independence from my laptop. I'd also prefer to stay out of recent geopolitical history out of deference to those actually involved.
  • Fragmentation: Again, this assumes there was a coherent whole to begin with.
  • Fiefdoms: This may be closest. Fiefdoms could arise for all kinds of reasons. Each had its own house rules and customs. Sometimes they would cooperate, sometimes they would guard their resources like, well, little fiefdoms. The defining feature of a fiefdom is allegiance to a higher rank of nobility, ultimately up to the monarch. My laptop and cell phone both owe fealty to me. My counterpart's cell phone is liege to my counterpart or to corporate, depending. If you buy that, the metaphor works fairly well.

Friday, September 14, 2007

"Ten Future Web Trends"

This article on Read/Write Web, which lays out ten likely future trends for the web, has been getting bookmarked a bit lately. It's a perfectly good article as it stands, but here are some comments on it anyway, by way of possibly staking out some sort of overall position, philosophy, weltanschauung or whatever. I'll try to keep the commentary shorter than the article, but I make no promises.

The Semantic Web:
The basic idea, from Tim Berners-Lee's Weaving the Web, is that "[m]achines become capable of analyzing all the data on the Web - the content, links, and transactions between people and computers." There are any number of refinements, restatements and variations, and there is probably more than the usual danger of the term being applied to anything and everything, but that's the concept in a nutshell, straight from the horse's mouth (now there's an image).

This is really material for several posts (or books), but my quick take is that the web will indeed gradually become more machine-understandable. Presumably we'll know more precisely what that means when we see it.

I'm not sure whether that will happen more because data becomes more structured or because computers get better at extracting latent structure from not-deliberately-structured data. Either way, I don't believe we need anywhere near all data on the web to be machine-understood in order to benefit, and conversely, I'm not sure to what extent all of it ever will be machine-understandable. Is everything on the web human-understandable?

Artificial Intelligence: Well. What would that be? AI is whatever we don't understand how to do yet. Not so long ago a black box that you type a few words into and get back relevant documents would have been AI. Now it's a search engine. In the context of the web, AI will be things like non-trivial image processing (find me pictures of mountains regardless of whether someone tagged them "mountain") or automatic translation.
(Translation seems to be slowly getting better. The sentence above, round-tripped by way of Spanish with a popular translation engine, came back as "In the context of the fabric, the AI will be things like the process of image non-trivial (encuéntreme the mountain pictures without mattering if somebody marked with label "mountain to them") and the automatic translation". Believe it or not, this looks to be an improvement over, say, a year ago)
The article mentions cellular automata and neural networks, two incarnations of massively parallel computing. I tend to think the technology matters much less than understanding the problem.

It took quite a while to figure out that playing chess is (relatively) easy and walking is fiendishly difficult (particularly if you're supposed to see where you're walking). It also took a while to figure out that matching up raw words and looking at the human-imposed structure of document links works better than trying to "understand" documents in any deep sense. I call this theme "dumb is smarter" and one of these days I'll round up a good list of examples.

As the article points out AI and the semantic web are related. One way to look at it: A machine that could "understand" the web as well as a human would be a de facto AI.

Virtual worlds: In the hardcore version, we all end up completely virtual beings, our every sensory input supplied electronically. Or perhaps we no longer physically exist at all. I'm not willing to rule this sort of thing, or at least the first version, out for the still-fairly-distant future, but in the near term there are some obstacles.

I've argued that our senses are (probably) dominated by sight and sound and that available bandwidth is more or less enough to saturate those by now. But it's pretty easy to fake out the eyes and ears. Faking out the vestibular sense or the kinesthetic senses may well require surgery. Even smell has proved difficult. So the really fully immersive virtual world is a ways away and bandwidth is not the problem.

In the meantime, as the article points out, lots of interesting stuff is going on, both in creating artificial online worlds and in making the physical world more accessible online. Speaking for myself, other than dipping my toes in MUD several years back I'm not virtualized to any significant degree, but Google Earth is one of my personal favorite timesinks.

Interestingly, William Gibson himself has done a reading in Second Life. Due to bandwidth limitations, it was a fairly private affair. Gibson's take:
"I think what struck me most about it was how normal it felt. I was expecting it to be memorably weird, and it wasn't," he says. "It was just another way of doing a reading."
I think this is an example of the limitations imposed by the human element of the web. We can imagine a lot of weird stuff, but we can only deal with so much weirdness day to day.

Gibson also argues that good old fashioned black-marks-on-a-white-background is a pretty good form of virtual reality, using the reader's imagination as a rendering engine. I tend to agree.

Mobile: I've already raved a bit about a more-mobile web experience. To me mobile computing is more about seamlessness than the iPhone or any particular device. Indeed, it's a lot about not caring which particular device(s) you happen to be using at a given time or where you're using them.

Attention Economy: "Paying attention" is not necessarily just a metaphor. The article references a good overview you may want to check out if the term is not familiar.

OK, we have to pay for all this somehow, and it's pretty clear the "you own information by owning a physical medium" model that worked so well for centuries is breaking down. But if no one pays people to create content, a lot less will be created (hmm ... I'm not getting paid to write this).

Because we humans can only process so much information, and there's so much information out there, our attention is relatively scarce and therefore likely to be worth something. Ultimately it's worth something at least in part because what we pay attention to will influence how we spend money on tangible goods or less-tangible services. So we should develop tools to make that explicit and to reduce the friction in the already-existing market for attention.

My take is that this will happen, and is happening, more due to market forces than to specific efforts. That doesn't mean that such efforts are useless, just that markets largely do what they're going to do. They make the waves, we ride them and build breakwaters here and there to mitigate their worst effects.

Web Sites as Web Services: The idea here is that information on web sites will become more easily accessible programatically and generally more structured. This is one path to the Semantic Web. It's already happening and I have no doubt it will happen more. A good thing, too.

On the other hand, I wonder how far this will go how fast. Clearly there is a lot of information out there that would quite a bit more useful with just a bit more structure. It would also be nice if everyone purveying the same kind of information used the same structure. Microformats are a good step in this direction.

My guess is that tooling will gradually have more and more useful stuff baked in, so that when you put up, say, a list of favorite books it will be likely to have whatever "book" microformatting is appropriate without your doing too much on your part. For example if you copy a book title from Amazon or wherever, it should automagically carry stuff like the ISBN and the appropriate tagging.

In other words, it will, and will have to, become easier and easier for non-specialists to add in metadata without realizing they're doing it. I see this happening by fits and starts, a piece at a time, and incompletely, but even this will add considerable value and drive the tooling to get better and better.

Online Video/Internet TV: I don't really have much to add to what the article says. It'll be interesting and fun to (literally) watch this play out. It'll be particularly interesting to see if subscription models can be made to work. If so, I doubt it will be because of some unbreakable protection scheme.

Rich Internet Apps: I occasionally wonder how much longer browsers will be recongizable as such. The features a browser provides -- tabs, searching, bookmarks and such, are clearly applicable to any resource and sure enough, editors, filesystem explorers and such are looking like more like browsers. OS's are getting into the act, too, allowing you to mount web resources as though they were local objects, perhaps doing some conversion or normalization along the way.

Browsers are also growing more and more toolbars, making them look more like desktops, and desktops are growing widgets that show information you used to get through a browser. Behind the scenes, toolkits will continue to go through the usual refactoring, making it easier to present resource X in context Y.

The upshot is that the range of UI options gets bigger and the UI presented for various resources gets better tuned to both the resource and your preferences. Good stuff, and it will continue to happen because it's cool and useful and people can get paid to make it happen.

International Web: Well, yeah!

Personalizaiton: This is a thread through a couple of the trends above, including Attention Economy and Rich Internet Apps. It will also aid internationalization. The big question, of course, is privacy. But that's a thread in itself.

Thursday, September 13, 2007

What you look like to your computer

We're used to thinking of computers as mind-bogglingly fast, but it's useful to look at it the other way as well: from the hardware's point of view, people are mind-bogglingly slow.

A decent CPU can now execute huge numbers of instructions in the time it takes for my fingers to move from one key to the next. If you assign some human-scale unit to an instruction cycle, actual humans move at a geologically slow pace.

Storage and bandwidth have to keep up with the CPU (more or less), so it's the same story there. An email (or this post) is tiny compared to a terabyte disk. Audio and video are still computer-sized, but this will change. Human bandwidth is shifting, in our lifetimes, from completely overwhelming computer capacities to being dwarfed by them.

For my money this disparity is the hole in Searle's "Chinese room" argument. The scenario with a person in the room would take millions of years at the least to play out, if scaled to match any plausible AI.

Wednesday, September 12, 2007

Limits on human bandwidth

Along the lines of the "Rules of Thumb" posts:

I won't claim that the internet has changed nothing. It's at least changed how far and how fast news travels, and this has a number of subtle and unsubtle effects. But no matter how fast the network, storage and processors, as long as people are using the web there will be certain hard limits. Some that come to mind:

How much information can a person absorb?

If we're talking about raw sensory input, which appears to be dominated by sight and (to a lesser extent) sound, then my guess is that HD video comes pretty close to the limit. That's on the order of 20Mbit/s, or 10GB/hour, 250GB/day or 100TB/year. I'm taking the MPEG compressed rate as opposed to the raw frame rate as that more closely represents what the visual system is really processing (because successful lossy video compression is finely tuned to the way the visual system works)

Given that disk capacity increases about 100-fold every decade, in ten years one could reasonably afford to buy enough storage to store a fairly immersive audio/video stream that would take a year, 24/7, to watch. Taking time out for things like, um, sleeping and eating, it would probably be more like two or three years.

Conversely, if you wanted to record everything you saw and heard, you could do it for a reasonable -- and decreasing -- annual cost in the not-too-distant future. Anyone could do it, unobtrusively. "Be careful, his bowtie is really a camera".

If you want to boil that raw content to the more abstract images stored in the rest of the brain, there's a pretty well-established medium that covers that reasonably well, though not perfectly: words and pictures. It's trivial now to store all the words a person could reasonably process or produce in a lifetime, or even every mouse click or keypress, timestamped to, say, the nearest millisecond.

A picture on disk is probably worth more like hundreds of thousands of words, but storing tens of thousands of pictures is no big deal these days, either. That's a lot of pictures, if you want to take the time to look a them.

In short, when it comes to words and pictures, the limitation is not what the computer can handle but what the people using it can handle. Audio and video are rapidly approaching the same state.

How many people can a person keep in touch with?

With modern technology, I can now keep in touch with people all over the world, but I can't keep in touch with any more people than I ever could. Somehow the advent of the internet didn't add any new hours to the day. I don't have objective numbers handy on how many people people interact with, though I'm sure there are studies on the subject. At a guess, I'd expect the usual log-normal distribution, with a handful of people accounting for most of a typical person's interactions and maybe a few dozen accounting for almost all of it.

Balancing that is the small world property of social networks and many other structures, including the web itself. In such cases the degree of separation between any two individuals grows very slowly, if at all, as the network expands. In the movie version, there are no more than six degrees of separation between any two people. The actual number (neglecting any groups that really are totally isolated) is probably larger but not much larger. [See these later posts for a bit more on the topic]

How quickly can a group reach consensus?

Whether it's everyone deciding that magenta is the new chartreuse or a deliberative body deciding that the bylaws should be amended to allow for amendments to amendments, the game has probably not changed appreciably in recorded history.

In the mass-consensus case of global pop culture, the scope is bigger, but one of the "small world network" results is that both the average person's view of the social network and the overall structure of the network itself change little as the network grows. In other words, fashions in the malls of the world are driven by the same basic forces as those in, say, Louis XV's France or Julius Caesar's Rome, just on a bigger scale.

In the small-scale case of a deliberative group, the limiting factor is how quickly the members can get the others to understand and (ideally) accept their view of the world. Again, it seems to matter little whether the members are sitting around a campfire or exchanging messages electronically.

I do find it interesting that most of the distributed groups I've been involved with develop a mix of email, live conferencing and face-to-face meetings. Most of the routine stuff can be worked out via email, sometimes you have to pick up the phone and talk to a particular person, and every so often you should all meet. Most of those meetings can be by phone/IRC, but a few of them should have everyone sitting in the same room. It'll be a while yet before technology can completely replace this.

Friday, September 7, 2007

The Million Dollar Homepage

I'm sure everyone remembers this one -- it was only a couple of years ago and got tons of buzz -- but what an interesting tale. It's high on my (notional) list of neat hacks, social/business category.

First, Alex Tew is looking for ways to pay his tuition and decides to follow Mae West's "million men with a dollar" approach, selling off a 1000x1000 pixel image on milliondollarhomepage.com for $1/pixel, in 10x10 pixel blocks. With your block you also get a hover and a link to the site of your choice. Once you buy it, you can't change it, so choose carefully, grasshopper.

The 10x10 minimum was because a single pixel would be hard to click on and the resulting page would look "ghastly". But if you consider the net effect of hundreds of completely unrelated parties each trying to make a small block of pixels as attention-getting as possible ... well, "garish" doesn't begin to describe it. If you mashed up Liberace, Elton John and Bootsy Collins and ran the result through a blender you might be in the neighborhood. So I'll go with "ghastly" on this one.

As soon as word starts to get out on this "You've got to be joking ... no, it's serious ... why didn't I think of that?" idea, traffic goes through the roof. At one point the page is #127 on Alexa. Naturally, everyone wants to be part of the action.

Then it gets interesting. Someone buys up a largish group of adjacent blocks and rents out the sites behind them. Makes perfect sense. Copycats spring up, offering lower rates, whizzier features or both. Some even advertise on the page itself. A new term, "pixel advertising" is coined. Are we seeing the birth of a whole new business model here? Dare we say, a whole new economy?

Well, no. The original home page accomplishes its task admirably. All million pixels are snapped up quickly (I dithered over buying a block myself but couldn't think of anything worth $100 to put on it. Do I regret that? Not really.) The final 1000 pixels fetch $38,100 on EBay. Well done, Alex!

The copycats? Not so much. Even Tew's own offshoot, Pixelotto, is way undersubscribed at about 75,000 pixels sold and looks very unlikely to sell out before its self-imposed December 2007 deadline. Pixels1.com, which advertised on the M$HP itself, selling pixels for a penny with animation allowed, is even sparser yet. Another site, onecentads.com, is about half full. That's about $5000, less $900 for a 30x30 ad on M$HP. Its image is still there but doesn't appear to be clickable.

None of this seems surprising. The original M$HP got all kinds of traffic because it was newsworthy. That traffic lasted as long as it remained newsworthy, which was approximately until the last pixel sold. Then, poof. Old news, no traffic, or at least not a lot of paying traffic. People still visit the page itself from time to time, but I doubt many click through to the advertisers. I expect even fewer click through to the copycats (at least one appears to have disappeared completely), much less their advertisers.

If you managed to buy in early you probably got a good jolt of traffic, but you were probably just expecting to own a piece of an internet time capsule. That was all the site ever promised, after all. If you bought in late, you were probably expecting tons of traffic, but you ended up owning a piece of an internet time capsule.

Caveat emptor.

Wednesday, September 5, 2007

RFID Guardian

OK, here's the neatest hack I've seen in a while:

Melanie Rieback, a graduate student [now assistant professor -- gefeliciteerd!] at the Vrije Universiteit Amsterdam, has built the RFID Guardian a firewall for RFID tags. You tell it who you want to be able to read which of your tags, and it jams any requests you don't want.

To do this, it uses about 12K lines of carefully-crafted code, an RTOS and "a beast" of a processor in order to meet the hard real-time deadlines required to fake out an RFID reader.

Maybe this was the "something missing" from the previous post on California's RFID law?

(RF)ID privacy in California

It is now illegal in California to implant an identifying device under someone's skin without permission.

The law, introduced by Silicon Valley state senator Joe Simitian, seems reasonably well-drafted (keeping in mind that I'm not a lawyer) and is to be "liberally construed so as to protect privacy and bodily integrity."

This seems like a reasonable step in a good direction, but I can't help feeling something's missing, somewhere.

GPS, transportation and privacy

An advocacy group representing about 20% New York's yellow cabs is calling a strike [This link has rotted away] for today and tomorrow over an upcoming requirement for cabs to carry a GPS and credit card payment system. The cabbies' beef, of course, is that this will allow The Man to know exactly where they are and have been. The Man, of course, argues that this will be better for customers and ultimately for the cabbies as well.

Long-haul truckers have been through the same conflict. As I understand it, GPS in some form is a fact of life, but there is definitely still resistance to increased monitoring.

I'm not going to take a position here on who's right. It's worth thinking over, though. Three are some pretty similar, and significant, privacy issues in the 4G picture I painted, and which various players are working hard to make happen.

What killed parallel computing?

When I was an undergrad, parallel computing was the Next Big Thing. By "parallel computing" I mean a large number of CPUs that either share memory or a have relatively little local memory but pass (generally small) messages on a very fast local message bus. This is as opposed to distributed computing, where CPUs have lots of local memory and communicate in larger chunks over relatively slow networks.

So what happened? Multiple choice:
  • What do you mean "what killed it?" Supercomputers today are all massively parallel. Next time you do a Google search, thank a cluster.
  • Distributed computing killed it. If you want to really crunch a lot of data, get a bunch of people on the net to do it with their spare cycles, a la SETI@home and GIMPs.
  • Moore's law killed it. Most people don't need more than one or two processors because they're so darn fast. Sure you can use parallel techniques if you really need to, but most people don't need to.
Personally, I'd go with "all of the above" (but then, I wrote the quiz).

Another worthwhile question is "What's the difference between parallel and distributed anyway?" The definitions I gave above are more than a bit weasely. What's "relatively small"? What's the difference between a few dozen computers running GIMPs on the web and a few-dozen-node Beowulf? At any given time, the Beowulf ought to be faster, due to higher bandwidth and lower latency, but today's virtual Beowulf ought to be as fast as a real one from N years ago.

A distinction I didn't mention above is that classic parallel algorithms have all the nodes running basically the same code, while nodes in distributed systems specialize (client-server being the most popular case). From that point of view the architecture of the code is more important than the hardware running it. And that's probably about right.

A few more "Rules of Thumb" highlights

More tidbits from Rules of Thumb in Data Engineering:
  • In ten years RAM will cost what disk does today.
  • A (full-time) person can administer a million dollars worth of disk storage (if I got the math right, that's about 3PB these days -- it was 30TB in 1999)
  • In 1999, a CPU could keep 40-50 disks busy (and for some applications it should be doing just that). The number is probably not changing very quickly.
  • At the time the article was written, two ratios appeared to be dropping rapidly. If the predictions held true (I haven't checked yet), the impact could be significant:
    • The CPU cost of network access vs. disk access, measured both per message and per byte.
    • The dollar cost per byte transferred of WAN vs. LAN
  • You should pretty much always cache a web page.