Sunday, September 16, 2007

I don't know what to call it, but I know I don't like it

You know the story. I've got an appointment tomorrow. My phone knows about it. My calendar program knows about it and can even tell the phone. The other person's corporate schedule knows about it. The other person's PDA knows about it. That's four copies of the same thing, variously synced up automatically, semi-automatically or manually (as when we both agree verbally on the time and place, then each enter the information into the appropriate system).

There ought to be one resource with one URN and (potentially) several URLs. Instead we just have the copies and the potential URLs.

I've heard several names for this, but none of them quite works for me. I'd be glad to hear more and better:
  • Silos: You know, those tall towers in the countryside, each full of, well, something, and each separate from the others. Except that a silo is generally full of silage. You take stuff like corn stubble that the combine leaves behind, dump it in a silo (maybe the familiar tower, maybe just a big trench or plastic bag) let it ferment and then feed it to the livestock in the winter. Not the flow of information we're talking about (or hmm ... maybe it is).
  • Balkanization: Refers, of course, to the Balkans, small states formed from the breakup of the Ottoman Empire, and later Yugoslavia. Each has its own government, language, culture etc. and they don't necessarily cooperate all that well. That matches up pretty well with a mess of operating systems, file formats and so forth, but my cell phone didn't suddenly declare independence from my laptop. I'd also prefer to stay out of recent geopolitical history out of deference to those actually involved.
  • Fragmentation: Again, this assumes there was a coherent whole to begin with.
  • Fiefdoms: This may be closest. Fiefdoms could arise for all kinds of reasons. Each had its own house rules and customs. Sometimes they would cooperate, sometimes they would guard their resources like, well, little fiefdoms. The defining feature of a fiefdom is allegiance to a higher rank of nobility, ultimately up to the monarch. My laptop and cell phone both owe fealty to me. My counterpart's cell phone is liege to my counterpart or to corporate, depending. If you buy that, the metaphor works fairly well.

Friday, September 14, 2007

"Ten Future Web Trends"

This article on Read/Write Web, which lays out ten likely future trends for the web, has been getting bookmarked a bit lately. It's a perfectly good article as it stands, but here are some comments on it anyway, by way of possibly staking out some sort of overall position, philosophy, weltanschauung or whatever. I'll try to keep the commentary shorter than the article, but I make no promises.

The Semantic Web:
The basic idea, from Tim Berners-Lee's Weaving the Web, is that "[m]achines become capable of analyzing all the data on the Web - the content, links, and transactions between people and computers." There are any number of refinements, restatements and variations, and there is probably more than the usual danger of the term being applied to anything and everything, but that's the concept in a nutshell, straight from the horse's mouth (now there's an image).

This is really material for several posts (or books), but my quick take is that the web will indeed gradually become more machine-understandable. Presumably we'll know more precisely what that means when we see it.

I'm not sure whether that will happen more because data becomes more structured or because computers get better at extracting latent structure from not-deliberately-structured data. Either way, I don't believe we need anywhere near all data on the web to be machine-understood in order to benefit, and conversely, I'm not sure to what extent all of it ever will be machine-understandable. Is everything on the web human-understandable?

Artificial Intelligence: Well. What would that be? AI is whatever we don't understand how to do yet. Not so long ago a black box that you type a few words into and get back relevant documents would have been AI. Now it's a search engine. In the context of the web, AI will be things like non-trivial image processing (find me pictures of mountains regardless of whether someone tagged them "mountain") or automatic translation.
(Translation seems to be slowly getting better. The sentence above, round-tripped by way of Spanish with a popular translation engine, came back as "In the context of the fabric, the AI will be things like the process of image non-trivial (encuéntreme the mountain pictures without mattering if somebody marked with label "mountain to them") and the automatic translation". Believe it or not, this looks to be an improvement over, say, a year ago)
The article mentions cellular automata and neural networks, two incarnations of massively parallel computing. I tend to think the technology matters much less than understanding the problem.

It took quite a while to figure out that playing chess is (relatively) easy and walking is fiendishly difficult (particularly if you're supposed to see where you're walking). It also took a while to figure out that matching up raw words and looking at the human-imposed structure of document links works better than trying to "understand" documents in any deep sense. I call this theme "dumb is smarter" and one of these days I'll round up a good list of examples.

As the article points out AI and the semantic web are related. One way to look at it: A machine that could "understand" the web as well as a human would be a de facto AI.

Virtual worlds: In the hardcore version, we all end up completely virtual beings, our every sensory input supplied electronically. Or perhaps we no longer physically exist at all. I'm not willing to rule this sort of thing, or at least the first version, out for the still-fairly-distant future, but in the near term there are some obstacles.

I've argued that our senses are (probably) dominated by sight and sound and that available bandwidth is more or less enough to saturate those by now. But it's pretty easy to fake out the eyes and ears. Faking out the vestibular sense or the kinesthetic senses may well require surgery. Even smell has proved difficult. So the really fully immersive virtual world is a ways away and bandwidth is not the problem.

In the meantime, as the article points out, lots of interesting stuff is going on, both in creating artificial online worlds and in making the physical world more accessible online. Speaking for myself, other than dipping my toes in MUD several years back I'm not virtualized to any significant degree, but Google Earth is one of my personal favorite timesinks.

Interestingly, William Gibson himself has done a reading in Second Life. Due to bandwidth limitations, it was a fairly private affair. Gibson's take:
"I think what struck me most about it was how normal it felt. I was expecting it to be memorably weird, and it wasn't," he says. "It was just another way of doing a reading."
I think this is an example of the limitations imposed by the human element of the web. We can imagine a lot of weird stuff, but we can only deal with so much weirdness day to day.

Gibson also argues that good old fashioned black-marks-on-a-white-background is a pretty good form of virtual reality, using the reader's imagination as a rendering engine. I tend to agree.

Mobile: I've already raved a bit about a more-mobile web experience. To me mobile computing is more about seamlessness than the iPhone or any particular device. Indeed, it's a lot about not caring which particular device(s) you happen to be using at a given time or where you're using them.

Attention Economy: "Paying attention" is not necessarily just a metaphor. The article references a good overview you may want to check out if the term is not familiar.

OK, we have to pay for all this somehow, and it's pretty clear the "you own information by owning a physical medium" model that worked so well for centuries is breaking down. But if no one pays people to create content, a lot less will be created (hmm ... I'm not getting paid to write this).

Because we humans can only process so much information, and there's so much information out there, our attention is relatively scarce and therefore likely to be worth something. Ultimately it's worth something at least in part because what we pay attention to will influence how we spend money on tangible goods or less-tangible services. So we should develop tools to make that explicit and to reduce the friction in the already-existing market for attention.

My take is that this will happen, and is happening, more due to market forces than to specific efforts. That doesn't mean that such efforts are useless, just that markets largely do what they're going to do. They make the waves, we ride them and build breakwaters here and there to mitigate their worst effects.

Web Sites as Web Services: The idea here is that information on web sites will become more easily accessible programatically and generally more structured. This is one path to the Semantic Web. It's already happening and I have no doubt it will happen more. A good thing, too.

On the other hand, I wonder how far this will go how fast. Clearly there is a lot of information out there that would quite a bit more useful with just a bit more structure. It would also be nice if everyone purveying the same kind of information used the same structure. Microformats are a good step in this direction.

My guess is that tooling will gradually have more and more useful stuff baked in, so that when you put up, say, a list of favorite books it will be likely to have whatever "book" microformatting is appropriate without your doing too much on your part. For example if you copy a book title from Amazon or wherever, it should automagically carry stuff like the ISBN and the appropriate tagging.

In other words, it will, and will have to, become easier and easier for non-specialists to add in metadata without realizing they're doing it. I see this happening by fits and starts, a piece at a time, and incompletely, but even this will add considerable value and drive the tooling to get better and better.

Online Video/Internet TV: I don't really have much to add to what the article says. It'll be interesting and fun to (literally) watch this play out. It'll be particularly interesting to see if subscription models can be made to work. If so, I doubt it will be because of some unbreakable protection scheme.

Rich Internet Apps: I occasionally wonder how much longer browsers will be recongizable as such. The features a browser provides -- tabs, searching, bookmarks and such, are clearly applicable to any resource and sure enough, editors, filesystem explorers and such are looking like more like browsers. OS's are getting into the act, too, allowing you to mount web resources as though they were local objects, perhaps doing some conversion or normalization along the way.

Browsers are also growing more and more toolbars, making them look more like desktops, and desktops are growing widgets that show information you used to get through a browser. Behind the scenes, toolkits will continue to go through the usual refactoring, making it easier to present resource X in context Y.

The upshot is that the range of UI options gets bigger and the UI presented for various resources gets better tuned to both the resource and your preferences. Good stuff, and it will continue to happen because it's cool and useful and people can get paid to make it happen.

International Web: Well, yeah!

Personalizaiton: This is a thread through a couple of the trends above, including Attention Economy and Rich Internet Apps. It will also aid internationalization. The big question, of course, is privacy. But that's a thread in itself.

Thursday, September 13, 2007

What you look like to your computer

We're used to thinking of computers as mind-bogglingly fast, but it's useful to look at it the other way as well: from the hardware's point of view, people are mind-bogglingly slow.

A decent CPU can now execute huge numbers of instructions in the time it takes for my fingers to move from one key to the next. If you assign some human-scale unit to an instruction cycle, actual humans move at a geologically slow pace.

Storage and bandwidth have to keep up with the CPU (more or less), so it's the same story there. An email (or this post) is tiny compared to a terabyte disk. Audio and video are still computer-sized, but this will change. Human bandwidth is shifting, in our lifetimes, from completely overwhelming computer capacities to being dwarfed by them.

For my money this disparity is the hole in Searle's "Chinese room" argument. The scenario with a person in the room would take millions of years at the least to play out, if scaled to match any plausible AI.

Wednesday, September 12, 2007

Limits on human bandwidth

Along the lines of the "Rules of Thumb" posts:

I won't claim that the internet has changed nothing. It's at least changed how far and how fast news travels, and this has a number of subtle and unsubtle effects. But no matter how fast the network, storage and processors, as long as people are using the web there will be certain hard limits. Some that come to mind:

How much information can a person absorb?

If we're talking about raw sensory input, which appears to be dominated by sight and (to a lesser extent) sound, then my guess is that HD video comes pretty close to the limit. That's on the order of 20Mbit/s, or 10GB/hour, 250GB/day or 100TB/year. I'm taking the MPEG compressed rate as opposed to the raw frame rate as that more closely represents what the visual system is really processing (because successful lossy video compression is finely tuned to the way the visual system works)

Given that disk capacity increases about 100-fold every decade, in ten years one could reasonably afford to buy enough storage to store a fairly immersive audio/video stream that would take a year, 24/7, to watch. Taking time out for things like, um, sleeping and eating, it would probably be more like two or three years.

Conversely, if you wanted to record everything you saw and heard, you could do it for a reasonable -- and decreasing -- annual cost in the not-too-distant future. Anyone could do it, unobtrusively. "Be careful, his bowtie is really a camera".

If you want to boil that raw content to the more abstract images stored in the rest of the brain, there's a pretty well-established medium that covers that reasonably well, though not perfectly: words and pictures. It's trivial now to store all the words a person could reasonably process or produce in a lifetime, or even every mouse click or keypress, timestamped to, say, the nearest millisecond.

A picture on disk is probably worth more like hundreds of thousands of words, but storing tens of thousands of pictures is no big deal these days, either. That's a lot of pictures, if you want to take the time to look a them.

In short, when it comes to words and pictures, the limitation is not what the computer can handle but what the people using it can handle. Audio and video are rapidly approaching the same state.

How many people can a person keep in touch with?

With modern technology, I can now keep in touch with people all over the world, but I can't keep in touch with any more people than I ever could. Somehow the advent of the internet didn't add any new hours to the day. I don't have objective numbers handy on how many people people interact with, though I'm sure there are studies on the subject. At a guess, I'd expect the usual log-normal distribution, with a handful of people accounting for most of a typical person's interactions and maybe a few dozen accounting for almost all of it.

Balancing that is the small world property of social networks and many other structures, including the web itself. In such cases the degree of separation between any two individuals grows very slowly, if at all, as the network expands. In the movie version, there are no more than six degrees of separation between any two people. The actual number (neglecting any groups that really are totally isolated) is probably larger but not much larger. [See these later posts for a bit more on the topic]

How quickly can a group reach consensus?

Whether it's everyone deciding that magenta is the new chartreuse or a deliberative body deciding that the bylaws should be amended to allow for amendments to amendments, the game has probably not changed appreciably in recorded history.

In the mass-consensus case of global pop culture, the scope is bigger, but one of the "small world network" results is that both the average person's view of the social network and the overall structure of the network itself change little as the network grows. In other words, fashions in the malls of the world are driven by the same basic forces as those in, say, Louis XV's France or Julius Caesar's Rome, just on a bigger scale.

In the small-scale case of a deliberative group, the limiting factor is how quickly the members can get the others to understand and (ideally) accept their view of the world. Again, it seems to matter little whether the members are sitting around a campfire or exchanging messages electronically.

I do find it interesting that most of the distributed groups I've been involved with develop a mix of email, live conferencing and face-to-face meetings. Most of the routine stuff can be worked out via email, sometimes you have to pick up the phone and talk to a particular person, and every so often you should all meet. Most of those meetings can be by phone/IRC, but a few of them should have everyone sitting in the same room. It'll be a while yet before technology can completely replace this.

Friday, September 7, 2007

The Million Dollar Homepage

I'm sure everyone remembers this one -- it was only a couple of years ago and got tons of buzz -- but what an interesting tale. It's high on my (notional) list of neat hacks, social/business category.

First, Alex Tew is looking for ways to pay his tuition and decides to follow Mae West's "million men with a dollar" approach, selling off a 1000x1000 pixel image on milliondollarhomepage.com for $1/pixel, in 10x10 pixel blocks. With your block you also get a hover and a link to the site of your choice. Once you buy it, you can't change it, so choose carefully, grasshopper.

The 10x10 minimum was because a single pixel would be hard to click on and the resulting page would look "ghastly". But if you consider the net effect of hundreds of completely unrelated parties each trying to make a small block of pixels as attention-getting as possible ... well, "garish" doesn't begin to describe it. If you mashed up Liberace, Elton John and Bootsy Collins and ran the result through a blender you might be in the neighborhood. So I'll go with "ghastly" on this one.

As soon as word starts to get out on this "You've got to be joking ... no, it's serious ... why didn't I think of that?" idea, traffic goes through the roof. At one point the page is #127 on Alexa. Naturally, everyone wants to be part of the action.

Then it gets interesting. Someone buys up a largish group of adjacent blocks and rents out the sites behind them. Makes perfect sense. Copycats spring up, offering lower rates, whizzier features or both. Some even advertise on the page itself. A new term, "pixel advertising" is coined. Are we seeing the birth of a whole new business model here? Dare we say, a whole new economy?

Well, no. The original home page accomplishes its task admirably. All million pixels are snapped up quickly (I dithered over buying a block myself but couldn't think of anything worth $100 to put on it. Do I regret that? Not really.) The final 1000 pixels fetch $38,100 on EBay. Well done, Alex!

The copycats? Not so much. Even Tew's own offshoot, Pixelotto, is way undersubscribed at about 75,000 pixels sold and looks very unlikely to sell out before its self-imposed December 2007 deadline. Pixels1.com, which advertised on the M$HP itself, selling pixels for a penny with animation allowed, is even sparser yet. Another site, onecentads.com, is about half full. That's about $5000, less $900 for a 30x30 ad on M$HP. Its image is still there but doesn't appear to be clickable.

None of this seems surprising. The original M$HP got all kinds of traffic because it was newsworthy. That traffic lasted as long as it remained newsworthy, which was approximately until the last pixel sold. Then, poof. Old news, no traffic, or at least not a lot of paying traffic. People still visit the page itself from time to time, but I doubt many click through to the advertisers. I expect even fewer click through to the copycats (at least one appears to have disappeared completely), much less their advertisers.

If you managed to buy in early you probably got a good jolt of traffic, but you were probably just expecting to own a piece of an internet time capsule. That was all the site ever promised, after all. If you bought in late, you were probably expecting tons of traffic, but you ended up owning a piece of an internet time capsule.

Caveat emptor.

Wednesday, September 5, 2007

RFID Guardian

OK, here's the neatest hack I've seen in a while:

Melanie Rieback, a graduate student [now assistant professor -- gefeliciteerd!] at the Vrije Universiteit Amsterdam, has built the RFID Guardian a firewall for RFID tags. You tell it who you want to be able to read which of your tags, and it jams any requests you don't want.

To do this, it uses about 12K lines of carefully-crafted code, an RTOS and "a beast" of a processor in order to meet the hard real-time deadlines required to fake out an RFID reader.

Maybe this was the "something missing" from the previous post on California's RFID law?

(RF)ID privacy in California

It is now illegal in California to implant an identifying device under someone's skin without permission.

The law, introduced by Silicon Valley state senator Joe Simitian, seems reasonably well-drafted (keeping in mind that I'm not a lawyer) and is to be "liberally construed so as to protect privacy and bodily integrity."

This seems like a reasonable step in a good direction, but I can't help feeling something's missing, somewhere.

GPS, transportation and privacy

An advocacy group representing about 20% New York's yellow cabs is calling a strike [This link has rotted away] for today and tomorrow over an upcoming requirement for cabs to carry a GPS and credit card payment system. The cabbies' beef, of course, is that this will allow The Man to know exactly where they are and have been. The Man, of course, argues that this will be better for customers and ultimately for the cabbies as well.

Long-haul truckers have been through the same conflict. As I understand it, GPS in some form is a fact of life, but there is definitely still resistance to increased monitoring.

I'm not going to take a position here on who's right. It's worth thinking over, though. Three are some pretty similar, and significant, privacy issues in the 4G picture I painted, and which various players are working hard to make happen.

What killed parallel computing?

When I was an undergrad, parallel computing was the Next Big Thing. By "parallel computing" I mean a large number of CPUs that either share memory or a have relatively little local memory but pass (generally small) messages on a very fast local message bus. This is as opposed to distributed computing, where CPUs have lots of local memory and communicate in larger chunks over relatively slow networks.

So what happened? Multiple choice:
  • What do you mean "what killed it?" Supercomputers today are all massively parallel. Next time you do a Google search, thank a cluster.
  • Distributed computing killed it. If you want to really crunch a lot of data, get a bunch of people on the net to do it with their spare cycles, a la SETI@home and GIMPs.
  • Moore's law killed it. Most people don't need more than one or two processors because they're so darn fast. Sure you can use parallel techniques if you really need to, but most people don't need to.
Personally, I'd go with "all of the above" (but then, I wrote the quiz).

Another worthwhile question is "What's the difference between parallel and distributed anyway?" The definitions I gave above are more than a bit weasely. What's "relatively small"? What's the difference between a few dozen computers running GIMPs on the web and a few-dozen-node Beowulf? At any given time, the Beowulf ought to be faster, due to higher bandwidth and lower latency, but today's virtual Beowulf ought to be as fast as a real one from N years ago.

A distinction I didn't mention above is that classic parallel algorithms have all the nodes running basically the same code, while nodes in distributed systems specialize (client-server being the most popular case). From that point of view the architecture of the code is more important than the hardware running it. And that's probably about right.

A few more "Rules of Thumb" highlights

More tidbits from Rules of Thumb in Data Engineering:
  • In ten years RAM will cost what disk does today.
  • A (full-time) person can administer a million dollars worth of disk storage (if I got the math right, that's about 3PB these days -- it was 30TB in 1999)
  • In 1999, a CPU could keep 40-50 disks busy (and for some applications it should be doing just that). The number is probably not changing very quickly.
  • At the time the article was written, two ratios appeared to be dropping rapidly. If the predictions held true (I haven't checked yet), the impact could be significant:
    • The CPU cost of network access vs. disk access, measured both per message and per byte.
    • The dollar cost per byte transferred of WAN vs. LAN
  • You should pretty much always cache a web page.