Field notes on the Web: September 2008

Sunday, September 28, 2008

More on Chinese smart phones

I previously wondered whether the lack of a full-sized keyboard on smart phones would be less of a problem in China than in the western world. So I asked a friend who analyzes the cell phone market and has recently been to China on just such business. He told me that a smart phone with a touch screen is a pretty good match for Chinese. There are two input methods, one more based on picking from a menu and the other on actually drawing the characters. Both seem to work well.

Saturday, September 27, 2008

Virtualizing buzzword compliance

The words of the day are "virtualization" and "cloud computing". The idea behind them is that actual computing -- CPUs crunching through instructions, databases being updated on permanent storage, nuts and bolts computing -- can happen in data centers instead of on people's desktops.

It's "virtual" because the chunk of computing resources you're using is not necessarily tied to any particular physical machine, much less the particular machine on your desktop, on your lap or in your handset. It's "cloud computing" because you neither know care where the actual computing is taking place. It's happening somewhere out in "the cloud".

I first heard someone talk about "the cloud" many years ago, back when bang paths still walked the earth. At the time I was accessing the internet, sort of, by dialing into a friend's system at his lab. That box was connected to more-connected boxes at the university, which in turn were connected to similar boxes at other sites, and so forth. In other words, to connect box A to box B, you had to know at least something about how to get from one to another.

Or, you could get a newfangled connection into "the cloud". If box A and box B both had such a connection, you could go directly from one to the other. Any intermediate connections happened "in the cloud". In other words, it was exactly the kind of connection everyone has today.

I first heard about virtualization even longer ago, though not under that name, in a high school class on business computing. And by business computing I mean COBOL, RPG, radix-sorting of Hollerith cards and similar specimens from adjoining strata. At one point we had a real live computing professional come in and explain to us how a real business computer worked.

He explained that it could be a problem to have, say, everyone trying to print on the same printer at the same time (and by printer I mean a beer-fridge-sized box chewing through wide-format green-and-white-striped fanfold paper). But, he went on, with a little magic in the operating system you could say "disk drive, you're a printer". The print job would sit in a queue on the disk drive until a printer was free and then it would print. If you had five paper-spewing beer fridges, you could divvy the work up among them without caring which one was going to actually print any particular job. Print spooling, in other words.

That was an eye-opener. The task of printing was separated from exactly where or when the printing took place.

At the same time, in a different class, I could dial into the "time sharing system" at the university and hook up to what looked like my own personal computer (and by hook up, I mean dial up at 110 baud by mashing a phone receiver into a foam receptacle). It ran whatever program I told it to, it remembered my files from last time, and so forth. Except it wasn't actually my own computer. I was really sharing one room-sized computer with everyone else and the computer's operating system was multitasking amongst us (here's a bit more from those days).

That seemed like a really neat trick, and it still does.

Now, you can read this sort of "nothing new under the sun" spiel one of two ways. One is "All this 'cloud computing' and 'virtualization' stuff was done years ago. There's nothing to it." The other is "The basic ideas behind 'cloud computing' and 'virtualization' are so old and useful we forget they're even there until someone drums up a marketing campaign to remind us."

I tilt toward the lattter.

Thursday, September 25, 2008

Will web 2.0 "change the shape of scientific debate"?

The subtitle of this piece in The Economist set my spidey-sense tingling: "Web 2.0 tools are beginning to change the shape of scientific debate."

"Which web 2.0 tools are they talking about," I wondered, "and how are those tools changing scientific debate?" I also picked up notes of "it's all different now." (I'm a technologist by training and practice and a moderate technophile by inclination, but I'm also convinced that technology use reflects human nature at least as much as it changes it.)

Come to find out, they mean that people are -- in some cases -- bypassing journal publication as a means for publicizing results, and there is now a meta-blog, Research Blogging, aggregating research blogs. Cool stuff, and well worth a browse, but is it all that big a change?

As I said, I'm a technologist, not a scientist. The scientists I've known, I've known on a personal, non-professional basis. I won't claim to be an expert on the workings of the research community, but as far as I can tell there have always been several different levels of scientific information exchange, ranging from beer-soaked discussions at conferences through personal correspondence to publication in peer-reviewed journals.

My experience as a former member of the ACM is that there are also various flavors of publication: some are lightly reviewed and aimed at speedy publication of interesting ideas (e.g., SIGPLAN Notices) while some have heavyweight formal review processes and are aimed at publishing major developments in definitive form (e.g., Journal of the ACM). Now, one can argue whether computer science is a proper science, but as I understand it computer science research has much the same form as other areas of academic research.

From this point of view, a site like Blogging Research probably fits in between personal correspondence and publication in an unreviewed journal. More a filling in of a niche than a major shift in the landscape.

It also seems good to bear in mind two general trends: First, the overall volume and speed of communication has been steadily increasing for centuries, as has the number of people one can potentially communicate with easily. Second, the scientific community has been growing and becoming more specialized.

In Marin Mersenne's day, one could argue that the (Western) scientific community numbered in the dozens or hundreds, and Mersenne personally corresponded with a significant portion of them. Over time, communication has improved, the community has grown and the overall volume of scientific writing has increased dramatically. From this point of view, the adoption of email, newsgroups and now blogs is just part of a natural progression. For that matter, so is the development of the peer-reviewed journal.

The shape of scientific debate has more to do with the process of research and collaboration, I think, than with the particular means of communicating findings and theories. In that sense, I wouldn't expect the shape of debate to change much at all in the face of web 2.0 or web x.0 in general.

The article does point out an amusing example of the cobbler's children going barefoot, though:

[T]he internet was created for and by scientists, yet they have been slow to embrace its more useful features ... 35% of researchers surveyed say they use blogs. This figure may seem underwhelming, but it was almost nought just a few years ago.

(On the other hand, there were a lot fewer blogs of any kind a few years ago ... now how would one properly account for that statistically?)

[Not long after this was posted, ArXiv came along -- D.H. Dec 2018]

Monday, September 22, 2008

Another data point on immersion

A while ago I estimated the bandwidth required for 3-D Imax at 13GB/s, uncompressed. Today I got the latest newsletter from the California Academy of Sciences bragging about the new and improved Morrison Planetarium. They say this bad boy can blast 300 million pixels per second onto its hemispherical screen. At 32 bits per pixel, that's about 1.2GB/s, or about a tenth of my IMAX estimate (I think I used 32 bits per pixel to ensure a conservative estimate of the bandwidth required. 24 ought to be good enough).

Take out a factor of two since 3-D requires two images and assume that the frame rate is 24 frames/s in both cases. The remaining difference is down to spatial resolution. IMAX is about 10K by 7k, or 70 megapixels, so the Morrison is more like 14 megapixels. In the earlier article I guessed that the 13GB/s could probably be compressed down to 1GB/s, partly because the two 3-D images would be largely redundant. Planetarium-level 2-D video would also compress, but not quite as much. Bottom line, you're still looking at gigabits per second to be able to handle this stuff.

The Academy also claims that the panetarium would hold 2,402,494,700 M&Ms, but I'm skeptical about either the volume of an M&M or the planetarium, much less both, being known to a part in 10 million.

Signing webmail

While looking up something on S/MIME, I ran across one of those little "hmm ... interesting point" points. How do you digitally sign webmail?

The problem isn't that mail sent through a web interface is somehow harder to sign. The question is who do you trust to do the signing. The whole point of webmail (and one version of "the cloud") in general, is that you can access it from anywhere on any device. The whole point of digital signatures is convincing everyone that only you could have done the signing. Two great tastes that don't go so great together.

If I only send mail from a particular box, I can just leave my private keys in a file on that box (password protected, of course) and use a mail client running locally. I tell the mail client my password, it fetches the keys and signs my message. This approach is as secure as my box and the firewall it sits behind.

If I'm in a cybercafe or using one of those airport internet kiosks that I've never, ever seen anyone use, I don't want to copy my secrets onto a local drive (assuming I even can) or otherwise hand them over to a the browser directly. You don't know where that thing's been.

The next candidate would be the webmail server. Hand it my keys, once, very securely. Then I log in to my webmail and send a message from wherever. The server signs it for me without revealing my secrets to the browser.

There are a couple of problems with that. One is that I'll need to be careful logging into the server. If the browser is actually a password-stealing trojan horse, then whoever is running it can grab my login and forge whatever they like under my identity. Another is that if the webmail server is ever compromised, the attacker doesn't just get my key, but a whole treasure trove of keys to go cracking for weak passwords. That's a problem for the people running the server, but it's also a problem for me since the server is a nice, big, visible target.

There are ways to make logins trojan-horse-resistant, for example smart cards that generate a fresh hunk of random digits every so often, but these are not widely deployed. The bottom line is that any server with your keys on it is only as strong as its authentication system, and most of those aren't that strong.

What would be better? The basic trust issues are clear enough, but the kind of mental ju-jitsu needed to think through all the various counter-measures and counter-counter-measures is hairy in the extreme. True black belts are relatively rare, and I'm not one of them. But from what I can tell an airtight solution would probably involve some kind of smart card that you could plug into the device you're actually using (the pocket-thing would be a candidate, but then, it could probably send signed mail directly). The server would then need a conduit through which it could send data up to the smart card and get signed data back.

And even then it would do well to check that what you signed was actually what it sent you.

Friday, September 19, 2008

Is it too early for a greatest hits collection?

Of course it is. But that never stopped anyone.

About a year ago I started tracking traffic to this site, such as it is. For the most part, it looks like a few people stop by on an average day (thanks!), more if I've been posting more lately, fewer if I haven't. Gizmodo I ain't.

Most pages get read when they're first posted, if then, but a few have fared somewhat better. Here are the top several:

Two things I didn't know about Blu-Ray got more hits than anything else on the site, all on one day when it briefly appeared in the "From the Blogs" section of the CNN piece it links to. Since then, nada.
On the other hand, Digital vs. interactive vs. internet vs. broadcast got some hits when it was first posted, I think because it showed up on Bloglines, but continues to show up in people's searches from time to time.
Information age: Not dead yet has been a slow but steady earner. It seems to pop up consistently in searches like "When did the information age start?", a question it asks but doesn't definitively answer (indeed, one of the main points is that you can't definitively answer that one).
Wikipedia's angle on anonymous IP addresses has consistently shown up in searches with "Wikipedia", "anonymous" and "IP" in them. Interestingly, the top search is anonymous IP, with no "Wikipedia" in it. Apparently the post managed to acquire a good page rank at some point, though I couldn't find it in the quick searches I just tried, and indeed the hits stopped a couple of months ago.
I'm very glad to see that people are still interested in Peter Deutsch's list of fallacies of distributed computing, and glad if my post on it helps people find and appreciate it. The post still shows up on the first page of hits for deutsch fallacies. While verifying that, I ran across this interesting piece on the history of the list. I had no clue as to the history of the list until just now. I just liked its contents.
The very first post here, E-Tickets and copy protection, still turns up in a motley assortment of searches, including copying a dongle, copies of e tickets and beowulf copy protected cannot copy. I get the feeling that at least some of these visitors are going to come away disappointed. Quel dommage.
Megacommunities is another one with stronger interest at first and blips afterward. I'm afraid it doesn't offer much more on the subject than a few rough numbers and some musings, but maybe it'll help get the ball rolling.
Hacking the ENIAC turns up in searches for the people involved, and I'm very glad if I can help direct a little more credit towards that team.
My fairly skeptical take on hyperreality, Is hyperreality the new reality? has turned up in searches, though it seems to have dropped off of late. While checking that, though, I noticed that someone is actually using the tile I originally had in mind for this blog ("morphisms"), though under a different spelling (and with somewhat different theme and content).

The elephant in the web

The nice thing about cliches is that everybody's heard them. So if I say "the blind men and the elephant" you know what's coming ...

From an "immobile web" point of view like mine (and I suspect I'm not quite in a dying breed yet), the web is relatively text-heavy. In particular, with my full-sized keyboard and mad touch-typing skills I can crank out post after post of, well, whatever it is I crank out. With my full-sized screen, I can take in big web pages with multiple panes, whether a news portal or a bunch of javadocs. Twitter? What's that? My web is big and slow, and I like it that way.

From a "mobile web" point of view, I would expect the web to be or become audio and video heavy. Audio is easy -- you can listen to music on a portable device as easily as you can at home and there's already plenty of stuff to listen to on the web: podcasts, live streams, tunes, what-have-you. Video on a small screen is not exactly fully immersive, but it's fine for stuff like YouTube (which gives you small-screen video on your big screen anyway). Video still wins despite the small screen mainly because it doesn't require typing. When it comes to writing text, Twitter seems like a more natural fit than full-blown blogging.

But hang on here. Is this really a blind-men-and-elephant situation? I've only got two contestants here and their perceptions of the web actually have quite a bit of overlap. It might be harder (though certainly not impossible) to blog from a mobile device, but everyone can read a blog. Similarly, most major portals have mobile-friendly editions with scaled-down layouts. Back in big-and-slow land, I can watch YouTube at my desk and listen to podcasts and tunes. I can even Twitter if I like [ahem ... that's "tweet", but in my defense I'm not sure if "tweet" had escaped beyond the twitterati yet when I wrote that].

As far as I can tell, the biggest pain point on the mobile web has got to be typing. Since I do a lot of that, my view is bound to be biased. For most people, it may not matter so much. And, to be clear, I'm not against the mobile web and may yet end up immersed in it. That pocket-thing certainly has a lot of potential.

Thursday, September 18, 2008

More on leapfrogging

This article in the Economist fits in nicely with the theme of developing areas leapfrogging the wired web that many of us know and going directly to wireless. The main point is right there in the first sentence, just like they teach in journalism school: "In future, most new internet users will be in developing countries and will use mobile phones".

I have to admit the idea takes some getting used to, even though it makes perfect sense and I've been hearing buzz about the "mobile web" for about a decade now. For me, the web is a sit-at-a-desk, type-on-a-big-keyboard, look-at-a-big-screen kind of thing, not a walk-and-talk, type-on-tiny-chiclets, squint-at-a-tiny-screen kind of thing. Being a curmudgeon, I use a desktop system (or occasionally a full-sized laptop) for browsing and I use my phone as a phone and an alarm clock.

That's not to say I think the whole mobile web thing is silly. It just hasn't been my thing so far. I did get to play with a friend's everything-but-the-phone-subscription almost-iPhone iPod and it seemed pretty slick, but I didn't try to type on it. But then, maybe most people aren't bothered by a small keyboard.

It certainly doesn't seem to be a problem for much of the world. One bit that leapt out at me from the article:

Jim Lee, a manager at Nokia’s Beijing office, says he was surprised to find that university students in remote regions of China were buying Nokia Nseries smart-phones, costing several months of their disposable income. Such handsets are status symbols, but there are also pragmatic reasons to buy them. With up to eight students in each dorm room, phones are often the only practical way for students to access the web for their studies. And smart-phones are expensive, but operators often provide great deals on data tariffs to attract new customers.

When a market the size of China gets interested in something, it bears watching. I also wonder how you enter text with a smart phone in Chinese and whether that makes the small-keyboard problem more of a problem, or less, or just different.

Tuesday, September 16, 2008

Six-foot man eating chicken

The link was sitting there quietly, off to the side of a site I was checking for the price of something normally expensive (it doesn't matter what). The link looked like any other, so I chased it to get another data point. "I've been researching this topic," the site said, "and I've found that" -- again the actual name doesn't matter; I'll call it moosedung2diamonds.com here, or md2d for short -- "has the most reasonable prices." The number quoted was about five times lower than anyone else, well into the "too good to be true" category.

Now the fun starts. I chased the link to md2d.com. It was a reasonably well laid out page complete with an avuncular-looking picture of the founder and a mini-video of a spokesperson inviting me to take advantage of the opportunity offered. All I had to do was send in about $50, at which point I would be able to download an eBook explaining how to (let's say) produce diamonds from moose dung at a very reasonable cost.

Riiiiiight.

Out of curiosity I googled "moosdung2diamonds scam". In the past this approach has generally been good for a batch of strongly-worded warnings that yes, md2d is a scam. What I actually got was a raft of pages asking "Is md2d a scam? Find out the shocking answer here ..."

You can probably guess that the narrative in these was consistent: "I thought that md2d must be a scam, but I tried it and it works!" some of them even warned only to purchase the genuine md2d article. There were, after all, a lot of imposters out there.

Some of these sites were sponsored links for the search. Some were on domain names like moosdung2diamondsscam.com. A few -- many extra chutzpah points here -- were apparently spammed onto otherwise legitimate scam-reporting sites.

Let's run that through in slo-mo: You run across a too-good-to-be-true proposition. But who knows ... just maybe? You google to see if it might be a scam. You skip past the sites that seem to have the scam name in them to something that looks independent. You chase the link there and find out that, no, it's not a scam at all. You gotta hand it to 'em.

Most scams (maybe all?) count on our assuming that something that's normally true is always true. In this case, we depend on the "many eyes" effect of the web -- if enough people see something, the truth will come out in the end. Importantly, this effect also assumes that the eyes involved are independent of each other (see this post and the first part of this post for a bit more on that).

But what the nice folks at md2d have done, and I'm sure they're not the first or last to do this, is create a little backwater in the web, a sort of Potemkin village where the "many independent eyes" assumption doesn't apply because it's basically a closed system. The domain name is unusual enough that no one is going to mention it by accident. They buy up a bunch of domain names (cheap) and sponsor some links (probably also cheap, and if not it's because they're getting lots of traffic for the scam). They create a mess of links all pointing back at the main site and probably to each other. This boosts the page rank of the scam sites, and again, there's nothing to wash that out.

The only thing to worry about, besides the cops, is the possibility of someone posting a legitimate scam complaint that gets enough outside attention to move up the page rank. Evidently, this is harder than you might think. I did find two legitimate scam complaints for the actual site, including one on the prominent site that had been spammed with the fake non-complaint (um, I hope that's clear enough), but they were both nearly lost in the noise. Your best bet in such a case is not to use raw google, but go to a scam-reporting site you trust and search there.

The Economist on crowds

The Economist has a couple of interesting recent articles about crowds. This should hardly be a surprise, given that economics is all about the aggregate behavior of large numbers of people.

The first article deals with crowdsourcing, its benefits and limitations. It mentions several more interesting crowdsourcing examples I forgot (even though I read the article before I wrote that post), including the 1714 longitude prize and Google's initiative to have volunteers in India carry around GPS units to help map India's roads.

It also wonders whether crowdsourcing and business are a good match. There are plenty of examples of volunteers joining in a project for fun and the chance to be part of something important, but this seems inherently hard to monetize. People don't seem so keen on volunteering their efforts so someone else can make money. The counterexample would be the various prize competitions, including the longitude prize and the various X prizes, but in those cases the participants are in it to be part of something important and to make themselves money.

As an aside, if you're interested in the general topic of crowdsourcing before the web, I'd recommend Longitude, by Dava Sobel, and Simon Winchester's The Meaning of Everything, about the Oxford English Dictionary.

The second article doesn't explicitly mention the wisdom of crowds, but cites a study finding that "when individual drivers each try to choose the quickest route it can cause delays for others and even increase hold-ups in the entire road network." Interestingly enough, closing down sections of road can actually make things run faster, not too surprising if you consider that taking a little-used side street generally involves merging back onto the main road at some point. That will tend to gum up the main road, but if the side street saves you enough time you won't particularly care.

This is yet another example of a "local optimum," a fundamental problem with "dumb is smarter" approaches. Deliberately ignoring the big, complex picture and concentrating on small, local features can lead to solutions that look good on the small scale but bad on the large scale. This is another of those lessons we get to learn as many times as we like. Genetic algorithms provide another fairly recent example. The "no free lunch" theorem is also worth keeping in mind.

Crowd wisdom vs. polling

It being election season in the US, I couldn't help noticing that the classic Oscar(TM) party example of crowd wisdom looks an awful lot like polling. For that matter, so does the "ask the game-show audience" example. What's the difference?

As far as I can see, the main possible differences are in the type of question being asked and interpretation of the results. Otherwise, the processes look largely identical. In polling, you're generally asking people's opinions (do you like so-and-so, do you favor such-and-such) in order to estimate a result you don't know yet. In wisdom-of-crowds scenarios, you might be trying to answer a more objective question (is the answer a, b, c or d).

But the more I think about it, the less sure I am that there's any substantive difference at all. Asking a dozen people at an Oscar party their picks is a poll. The only difference is that you're not polling the actual voters. In the game show, the contestant is asking the audience for the exact same reason that pollsters poll voters: to get an estimate of a result not yet known.

Crowdsourcing patents - is wisdom involved?

Before I start, let me say that I don't even want to get near the hornets' nest of a debate over software patents. At least not today. If the intent of Peer-to-Patent is to fix the patent system, then I'd rather not be drawn on how or whether it's broken or how or whether it can be fixed. Backing away from the hornets, though, I did find this piece on Peer-to-Patent interesting.

Peer-to-Patent is designed to crowdsource the task of finding prior art for pending patents. As I understand it (and once again, I'm not a lawyer), an invention must be original to be patented. If it was publicly known before you invented it, then it's not original, even if you didn't happen to know about it. You're expected to do your homework. On the other hand, if someone already invented it but kept it a secret, you can still patent your independent discovery.

This makes the searching of prior art a crucial and time-consuming part of the patent application process. The US patent office is heavily backlogged. Enter Peer-to-patent. Peer-to-patent publicizes patent applications and asks the world at large to turn up relevant examples of prior art.

Again leaving aside the policy issues here, a couple of things jump out. One is that crowdsourcing does not necessarily involve the "wisdom of crowds". Wisdom has to do with estimation and judgment. In a typical wisdom of crowds example, the aggregate result of the crowd's responses is more accurate than any particular individual's result. A hundred people can pick Oscar(TM) winners better than any particular person in the group can. As far as I can tell, this is more a matter of the central limit theorem (not to be confused with the law of large numbers) than anything else.

A case like prior art is more a matter of resources. If the aim is to find things known to "persons skilled in the art", then a crowd of people skilled in the art ought to turn up more than a patent examiner because there are more of them. In the typical "wisdom of crowds" scenario, the crowd members need not be particularly skilled in the task at hand. In peer-to-patent, they had better be. Having hordes of people send in piles of junk to sort through would be worse than useless.

A crowdsourced task needn't be difficult. In fact, it should be relatively easy for the participants. As a corollary, you might have to select your participants, while the point of "wisdom of crowds" is that you don't make any particular effort to find skillful participants. So

Harnessing the "wisdom of crowds" means aggregating the results of a large number of random people to get a better answer than any particular person might give. Everyone gets the same task.
Crowdsourcing means divvying up a large task amongst a large number of qualified participants. They get different instances of a particular kind of task, though they may pick them from a common set.

The second thing that jumps out at me is that all the prominent examples I can think of that get results from a crowd involve crowdsourcing, not the wisdom of crowds:

The Oxford English Dictionary relied on volunteers to find citations of words in print.
The Galaxy Zoo project asks volunteers to classify galaxies by shape (mainly clockwise spiral, anticlockwise spiral, elliptical and "other") -- not a hard task for humans, but one that the participants are assumed to be skillful at. In fact, you have to take a couple of little tests before getting real live galaxies to classify.
Projects like GIMPS and SETI@home split large tasks across masses of mostly-idle processors.
Wikipedia asks everyone to do a bit of editing. The aim is not to average everyone's opinions on a topic but to turn up and aggregate facts from independent sources.
And now Peer-to-Patent asks everyone to turn up prior art on their choice of patent applications.

The less judgment a task involves, the easier pure crowdsourcing is. GIMPS produces pretty much no controversy over the answers produced. Wikipedia somewhat moreso ...

Thursday, September 11, 2008

Now what happened to my bookmarks?

[If you came here trying to recover lost bookmarks for Firefox, mozillazine.org has a knowledge base article on the topic. For Chrome, try this Google search. For IE, try this Google search. For Safari, try this one. For Opera, try this one. The Opera search also turned up this PC Today article for Firefox, IE and Opera. For Ma.gnolia (FaceBook and maybe others?), try this. In any case, please feel free to have a look around since you're here.]

About a year ago I wondered what had happened to my bookmarks. It wasn't that they'd disappeared, but they'd certainly become less prominent in my web.life. I had also just started using deli.cio.us to track bookmarks out in the cloud instead of on my local box.

I noted that most of what I'd been using bookmarks for -- remembering frequently-visited sites and navigating the web at large -- had been subsumed by my browser's history feature and toolbars, and by Google search. My actual bookmarks (and deli.cio.us) were mainly for remembering memorable sites that I might not revisit enough to keep fresh in the broswer's memory.

Since then I've updated my browser, at which point the deli.cio.us plugin I'd been using stopped working. Rather than track that down, I thought I'd see if I missed it first. I didn't. In fact, months later I still haven't fixed it. My actual browser bookmarks list has slowly grown but I'm not sure when I actually used it last. At least the links are there if I ever want them (unless they've rotted away in the mean time).

My browser also grew a "smart bookmarks" feature with the update, which automagically collects a reasonable facsimile of what I actually visit frequently. I'm sure there's some annoying technical reason that it misses a couple of frequently-visisted sites, but I can't be bothered to track it down. I think I see the smart bookmarks as a sort of freebie. If it helps, great, otherwise no big deal.

Maybe this is just my slowly attaining my career goal of "curmudgeon", but I'm finding myself more and more indifferent to Web 2.0 in general. Some of the AJAX stuff is nice, but some of I can just as well do without (like this). There's also a larger drawback to the whole approach: Anything that encourages widespread customization also encourages widespread quirks, glitches (like this), bugs and maddeningly not-quite-identical behavior of functionally identical pieces across sites (or even within a site).

Tagging seemed fun and useful, but I hardly ever use it except to revisit a tag for this blog for use in a new post (as I did with annoyances just now). I think I find more value in the exercise of figuring out which tags to put on an entry. Of course if that helps you, dear reader, I'm more than glad to help.

Social networking is its own little world. I've certainly written about it a fair bit, but again I don't make frequent use of it. My LinkedIn account isn't completely dormant, but the joint is not exactly jumpin' either. What else was Web 2.0 supposed to be? Microformats? I played with them at one point, I think.

This is not to suggest that Web 2.0 is mere hype -- obviously lots of people use the stuff and like it -- just to say that nothing has really changed my position that while Web 1.0 was a grand slam home run, Web 2.0 is more a bunch of singles. And probably a fair number of foul balls. They won't strike you out, but they don't do much else for you either.

However, thinking all this over, I did notice one more place that bookmarks for interesting sites ended up, and it's in the cloud, no less. A great many interesting sites that I run across that would previously have sat quietly in my browser's bookmarks find their way into this blog where everyone can find them. Of course that means I can, too, and from time to time I do. Is blogging considered Web 2.0 per se? I forget, but if it is I'd consider it one of the more successful examples. Maybe a ground-rule double?

Wednesday, September 10, 2008

Leapfrogging the net in the world at large

I started out to write about O3b's initiative to radically extend the reach of the internet by a combination of satellites, WiMax and 3G cell service. I'll get to that [I don't think I ever did, but O3b looks to still be in business, so I guess there's still time --D.H. June 2015], but as so often happens I ran across something else while running that down. It's probably worth its own post, but it's in line with the theme of leapfrogging so I'll go ahead and mention it here:

There is now approximately one cell phone subscription for every two people on the planet -- pretty mind-boggling considering that about a quarter of the world's population is under 15. Now, I know for a fact that some people have more than one cell phone subscription and some portion of the under-15 set has cell phones, too, but still ...

Not only are cell phones competing vigorously against land lines head to head, to the point that many people don't even bother to set up a land line when moving into a new place, but they have a significant advantage in areas where wires are expensive to build out or simply haven't been. This includes large rural areas, mountainous areas, archipelagos (which, come to think of it, are just mountainous areas with a higher waterline) and much of the developing world.

This isn't just a cell phone thing. It's a general wires vs. wireless thing which should apply equally well to internet service. Which brings us back to O3b.

O3b is short for "Other 3 billion," though as far as I can tell there are more like 1.5 billion internet users (a lot, but fewer than have cell phones), which left me to wonder about the other other billion and a half. It turns out they mean the 3 billion for whom fiber is not likely to be an option anytime soon.

If you buy the premise that internet access is a good thing, and if you don't you're probably not reading this, then the bad news right now is that large swathes of the world lack the infrastructure to offer copper-based service to everyone, much less fiber. The good news is that whoever wants to build out the internet has a blank slate to work with and won't have to wrangle with telecoms and cable operators. Thus the notion of leapfrogging over the wired stage directly to wireless.

When I first read that someone was going to use satellites to cover previously uncovered areas, my thought was "why?" Putting birds up in orbit is expensive. Building radio towers on the ground is much cheaper. There have been attempts to bring the internet to rural customers via satellite, but as I understand it the results are cumbersome. Typically the satellite broadcasts your incoming data and everyone else's down and you use something low-bandwidth (maybe even dialup?) for the uplink.

But this is not what O3b is going for. Their tagline says it all: "3G / WiMAX Wireless Backhaul and IP Trunking." That's clear enough, right? If not, perhaps this Financial Times article linked from the O3b site will help. It certainly helped me. [you'll have to register unless their server arbitraily decides you've downloaded fewer than four FT articles in the last 30 days]

The problem these days is not getting billions of people wireless connections. Everyone has a cell phone, or soon will, it seems. The problem is, what is that cell connection connected to? Right now, in much of the world, the answer is "not much", or at least not much beyond the carrier's cell phone network.

This is where the satellites come in. The satellites connect the cell towers to the internet backbone. Since the messy work of gathering up the incoming data and dispersing the outgoing data has been done on the ground, the satellites just have to beam high-bandwidth, much more uniform traffic around to each other and to and from the backbone and cell networks. This seems like a much better match, to the limited extent that I can tell, and at least some major players seem interested as well (notably, Google).

If done right, this could be quite a Good Thing. Lots of people get cheap, fast internet access, the local carriers make some money, O3b makes some money and all those One Laptop Per Child laptops get something to connect to.

Friday, September 5, 2008

Weather modeling at the NHC

This is another "not really about the web but I did find it there" posts.

In my post on the National Hurricane Center I said it was fascinating to get to know the personalities of the various computer models. Well, fascinating for a geek, at least. If, like me, you like that sort of thing, the NHC's explanation of what models they use, and how, and why, is a veritable feast. Some of the high points:

There are a dozen basic models, plus several more ensemble models combining them.
The forecasters don't rely on any one model in making a forecast. There's no "model of the day". Instead, they consider the results of all of them and make a judgment call.
The aggregate results of the models are generally more accurate than any particular model. Another "wisdom of crowds" effect, if you will.
The NHC continually reviews its forecasts to see whether the models and the official forecasts are "skillful" A forecast is skillful if it's more accurate than the statistical models, which just look at what past hurricanes have done and don't even try to take current weather conditions like wind and sea surface temperature into account.
By that measure there are skillful models, but it's a difficult bar to clear. Not quite "dumb is smarter", but dumb is smarter than you might think.
In particular, it's only now becoming possible to predict the intensity of a storm skillfully. Predicting a storm's track skillfully is less of a problem.
Finally, implicit in the whole report is the understanding that in good science and engineering, it's vital to know what you don't know.

Weather forecasting is a notoriously hard problem. It was one of the drivers behind the discovery of chaos theory. The NHC's technical model summary provides an excellent window into the process of using computer modeling on hard problems in the real world. It also gives some insight about how and whether to use "dumb" statistical models and how and when to try to be "smart". Good stuff!

Monday, September 1, 2008

The National Hurricane Center

The National Hurricane Center doesn't exactly present a cutting-edge appearance on the web. The graphical tropical weather outlook on their main page is nice, but the FIXED-SPACE ALL CAPS discussions and advisories are so old-school it's not even funny. If you listen closely, you can hear the teletypes banging on endless rolls of yellow paper. More Web 0.1 than Web 2.0. Nonetheless, it's one of my favorite sites and an outstanding example of what the web is supposed to be, because the NHC is all about making information available to the world at large.

If you watch coverage of a major tropical storm in the US, the meteorologist is basically giving you a digested version of this site. It's well worth reading in the original, the same way Chaucer is but easier. Once you get to know a few terms like "dropsonde" (an instrument-laden capsule dropped from an airplane) and "initial motion" (the current speed and direction of the center), there is a wealth of useful information to be had.

They tell you what they know. They tell you how they know it. They talk about how tropical storms work. They tell you what their computer models are telling them and how well they agree with each other. They tell you what they don't know, why the official prediction is what it is and how much confidence they have in it.

They do this 24/7 (I'm sure there's been a lot of coffee consumed in Miami lately), with a just a dash of personality showing through and absolutely no pretense or fluff. Having spent most of my life well away from hurricanes I've only recently discovered the NHC, but I'm very grateful that they're there. I'm sure I'm not alone in that gratitude.

The original World-Wide Web originated at CERN as a means of disseminating scientific information for the good of all. The NHC site doesn't just look like an early web site. It's very much in the original spirit of the web, in this case not just making information available to the world, but potentially saving lives.

Side note: I would bet that the old-school look is an integral part of the NHC's life-saving mission. Though largely obsolete, the old 5-bit Baudot code of Teletype fame is not completely gone. Sticking to that may LOOK FUNNY, but it ensures that the message can be sent unmodified to as many people as possible, using technology like RTTY via amateur radio if need be. The web is robust, but a category 5 storm is a whole other deal.

Assuming the definitive dispatch is 5-bit friendly, trying to prettify the contents for the web site would be a waste of valuable time. Going the other way and trying to reduce a web-friendly definitive version to pre-web form risks garbling when it matters most.

Another side note: One of the more fascinating aspects of the NHC's discussions is getting to know the personalities of the computer models. Each has its own strengths and weaknesses, its own tendencies and track records. The NHC forecasters know them intimately and are happy to share that knowledge.

Field notes on the Web