Field notes on the Web: 2010

Monday, December 27, 2010

Bandwidth is the new coverage

Something caught my ear on the radio today: an ad from a major cell provider twitting the competition for making all sorts of claims about speed (and then claiming to be the fastest themselves ... so there!)

Not so long ago the battle was over coverage, but evidently coverage is now about as good as it's going to get. The internet connectivity of a phone is now a bigger selling point than the mere fact that you can call people with it. If you're outside the US, or just more into smart phones than I am, this may well be old news. I tend to use my phone predominately as a phone and an alarm clock, probably because I almost always have access to WiFi and a decent-sized screen and keyboard [Hmm ... this one might be worth a followup -- D.H. Dec 2015].

So far I've found a smartphone most useful as an easily portable GPS, an application for which I'd gladly trade speed for coverage. A GPS function without a usable map is not much use if you're lost out in the boonies. But maybe that's just me.

What also caught my ear about the ad was that carriers appear to be claiming to be "X times faster." Faster than what, I'm not really sure, but it sounded reminiscent of internet providers claiming to be "X times faster" than dialup. Even with that nice low bar set, providers had trouble providing quite what they promised. Perhaps cell providers will do better in that regard?

Friday, December 17, 2010

Another fine timesink

This one's from Google Labs, as it turns out, though the radio piece I heard had it coming from Harvard. According to the site itself, it's the product of "a team spanning the Cultural Observatory, Harvard, Encyclopaedia Britannica, the American Heritage Dictionary, and Google*."

So what is it? It's culturomics. What's that, you may ask?

There's a bit of hype, mostly in the press, about opening up a "new field", but the web site is simply a tool to mine the Google Books corpus for trends in word frequency.

OK, maybe that doesn't sound overly exciting, but try it. I'll be here when you get back. [Looks like there's not much at that link anymore. I think this eventually was supplanted by or morphed into ngrams, which is still alive and well -- D.H. Dec 2015]

Time permitting, I'll probably have a bit more to say about the word-mining itself on the other blog.

* As usual, I don't know any more about the Google end of it than you do [and I still don't -- D.H. Dec 2015] and if I did I would recuse myself. It does fit pretty squarely with Google's mission "to organize the world’s information and make it universally accessible and useful," though it's not clear how useful the site might be to the non-specialist.

Tuesday, December 14, 2010

It's a slow news day -- why not pick on Babelfish some more?

It's not necessarily a fair test of a translation engine's usefulness to take a short phrase and round-trip it from one language to another and back, but it sure is fun. The good folks at Sporcle must have come to the same conclusion, judging by the "Babelfish Videogame" quiz. Try it yourself. See if you can figure out what "The Developmental Soccer which is Occupational" and "Spatial Aggression Person" started life as.

I got 7 out of 20, and that was still good enough 57th percentile, so garbled were the results.

And now for extra amusement, here's the above translated into, oh, say, Portuguese and back:

It' s not necessarily one has just tested of a translation engine' utility of s to take a short phrase and round-trip of a back language to another one and, but it is sure amusement. The good peoples in Sporcle must have come to the same conclusion, julg for " Babelfish Videogame" questionnaire. It tries it you yourselves. He sees if you can appear for you are that " The developing soccer that is Occupational" e " Space aggression Person" started life as.

I começ 7 of 20, and that age 57th still good percentile sufficiently, truncated thus was the results.

E now for the extra amusement, here' of translated s above, oh in for example Portuguese and has broken back:

Sunday, December 12, 2010

@papabear this is @babybear. What's your 20?

I was listening to a piece on All Tech Considered about the hackathon Random Hacks of Kindness and was duly impressed, not only by the presenter's brave effort to rescue the original meaning of "hack" from the dustbin, but of course by the whole idea of hacking together apps to make it easier to save lives and otherwise make the world a better place. Well done, all.

One of the hacks was an app that would use Twitter traffic during disasters to help pinpoint where aid workers were needed most or could generally do the most good. Again, very cool stuff. Then it hit me: Twitter is the new citizen's band. Think about it. A populist medium allowing people to converse with strangers and broadcast to (a portion of) the world at large. Users of the medium go by handles. Traffic is subdivided into channels. And, what led me to the conclusion in the first place, the traffic itself is a fascinating combination of pure drivel and vital information (with a fair bit in between).

Not that CB itself has gone away. Not only is the technology still in use, evidently so is most of the slang I recall from the 70s or so.

Friday, December 3, 2010

How many ways can a person send a message these days?

Well, how many ways can a person send a message these days?

Talk to a person.
Join a group conversation
Write a letter to a person
Write a letter to the editor
Call a person on the phone
Call into a broadcast show
Participate in a conference call
SMS (text) on cell phone
Email
IM a person
Join an IRC or IM chat room
Update your status on your IM service
Send a message on a social networking site
Update your status on your social networking site
Tweet
Write a blog post
Comment on a blog post
Post a comment on a forum
Send a message in your virtual world or MMORPG
Produce a podcast
Participate in a webcast
Um, yikes, that's a lot, but it's not hard to come up with more. Put a message in a bottle. Update a bug report. Post a sign. Spray-paint your house. Wear a message T-Shirt. Put a bumper sticker on your car ...

I'm limiting (yes, limiting) myself here to things a typical individual might be expected to do, so writing a magazine article or holding a nationally televised press conference wouldn't count. Even so, there's a seemingly limitless supply of ways to send a message.

When faced with such abundance, a plausible explanation is combinatorial explosion -- a reasonably small number of factors which can be combined in a large number of ways. What are some possible factors?

Cardinality: How many senders and how many receivers are there? The choices for each are one or more than one (denoted N) This is why I distinguished calling a person from joining a conference call or phoning into a show.
Symmetry: Are there specific roles, e.g., sender and receiver, or is everyone on an equal footing?
Potential recipients: Who could possibly receive the message, or perhaps better, what group do you need to belong to in order to be able to receive a message?
Potential senders: For the purposes of this exercise, this is generally "Anyone" (though in some cases it's narrower).
Access control: Who controls who can send or receive?
Persistence: Can the message be expected to be permanently available to the recipient? This would be within the messaging system. Once a message is received, the recipient can generally keep a private copy and/or resend it.
Latency: How much time typically passes between sending and receiving? Latency can be higher or lower. It can also be arbitrary, as in the case of forums and email.
Bandwidth: The maximum rate at which information can be transferred
Message size
Anonymity: Does the sender typically know who is receiving? Does the receiver typically know who's sending? If so, can one easily make oneself anonymous?

So does that help sort anything out? If it does, we should see a fairly wide variety of combinations, keeping in mind that some particular combinations may not make sense. For example, 1:N cardinality implies asymmetry and persistence generally implies arbitrary latency. Let's see what we've got (the table below is too wide for this layout, so I made it scrollable. Here's how)

Mechanism	Cardinality	Symmetric?	Potential receivers	Access controller(s)	Persistent?	Bandwidth	Message size	Latency	Anonymity
Talk to a person	1:1	Yes	Anyone	Participants	No	High	Arbitrary	Negligible	None
Group conversation	N:N	Yes	Anyone	Participants	No	High	Arbitrary	Negligible	None
Letter to person	1:1	No	Anyone	Participants	Yes	Low	Small	Days	Possible for sender
Letter to editor	1:N	No	Subscribers	Editor	Yes	Low	Small	Days	Possible for sender
Call person	1:1	Yes, except at start	Anyone	Participants	No	Medium	Arbitrary	Generally negligible, enough to be annoying in some cases	Possible for caller or receiver
Call show	1:N	No	Listeners/viewers	Producers	Typically	Medium	Generally limited	Generally negligible	Possible for caller; audience is anonymous
Conference call	N:N	Depends. In some cases only some participants can talk.	Depends. Passcode may be required	Moderator	Depends	Medium	Arbitrary	Generally negligible	Possible
SMS	1:1	No	Service subscribers	Service provider	Yes	Low	Small	Arbitrary	To the extent phone numbers can be anonymous
Email	1:1 or 1:N	No	Anyone with email (doesn't matter who provides the email service)	No one, except that various providers may try to screen out spammers	Yes	Low for typed text, can be high for large attached files	Varies, but generally at least megabytes	Arbitrary	Possible
IM	1:1	Yes, except at start	Service participants	Service provider	Optionally	Low (again excepting file attachments)	Low (again excepting file attachments)	It's "instant" messaging, right?	Possible
Chat room	N:N	Yes, except at start	Service participants	Service provider (for access to service), moderator, in some cases (for access to room)	Optionally	As with IM	As with IM	As with IM	Possible
IM Status update	1:N	No	Service participants	Service provider	May or may not be archived; persists until changed	Low	Small	Arbitrary	None
Social network message	1:1 or 1:N	No	Service participants	Participants, service provider	Yes	Low	Generally small	Arbitrary	None
Social network status update	1:N	No	Service participants	Service provider	As with IM status	Low	Small	Arbitrary	Default state for readers
Tweet	1:N	No	Anyone with a web connection or cell phone	Service provider (Twitter)	Yes	Low	Small	Arbitrary	Possible for sender, default state for readers
Blog post	1:N	No	Anyone with a web connection	Blog author	Yes	Low	Smallish	Arbitrary	Possible for author, default state for readers
Blog comment	1:N	No	Anyone with a web connection	Blog author	Yes	Low	Smallish	Arbitrary	Possible for author of comment
Forum comment	1:N	No	Anyone with a web connection	Moderator	Yes	Low	Smallish	Arbitrary	Possible for author
Virtual world/MMORPG message	1:N	No	Service participants	Service provider	No	Low	Small	Generally negligible	Players go by pseudonyms
Podcast	1:N	No	Anyone with a web connection	Creator	Yes	Medium	Largish	Arbitrary	Possible for creator, default state for audience
Webcast	1:N or N:N	Depends, as with conference call	Anyone with a web connection	Moderator	Possibly	Low	Low	Generally negligible	Possible

One could argue over particular entries, some of the terms could be better defined, and I could add other factors, for example privacy, but it's pretty clear that combinatorial explosion is exactly what's going on. There are a zillion different ways of sending messages because there are a zillion possible combinations of features one might like.

Thursday, November 18, 2010

This post typed using my fingers

I can't blame companies for doing it, but I don't have to like it: Do I really need to know that the mail you sent me was sent using your Blackberry, iPhone or Droid? Do I really need to know what tool you used to clip something?

Of course not. Looking at it a little less peevishly, I suppose I should be grateful when it doesn't matter which tool was used to create or send something, and the company behind one or another part of the process has to explicitly announce its presence. I've gritted my teeth on plenty of occasions when I wasn't told which product had produced a document -- and I already knew because I couldn't read the thing.

In fact, given the sheer number of pieces in the whole stack end-to-end, one could even argue that it's surprising how few little mini-credits one runs across. Imagine if there was a "routed by a Cisco router using Ethernet" credit on every network packet (OK, you could figure out from the MAC address whose router it was, but the MAC address has to be there in any case). Or something in every HTTP request that said what browser produced it. Wait, that's there, too, though again not so visible.

Well, I still find it all a bit annoying. So there.

Tuesday, November 16, 2010

Maybe I just don't understand this whole "privacy" thing

Today I had to get the full account number of a bank account. It was probably on some old paper statement at home, but I wasn't at home and besides, didn't I "go green with online statements" years ago? Everything's on the web these days, right?

Except when it's not. Many sites in a similar situation will provide a way of getting your full account number directly. Not this one. Most will at least provide PDFs of the statements they would have mailed, but again, no. Fine. I call them up and give them a bunch of identifiers (not quite as bad as this time, just the usual rigmarole). May I have my full account number now? Well, no, they don't give that out over the phone.

But no worries. What they can do is fax a recent statement, with account activity and all manner of other fun stuff, to any random fax number I choose. So that's all right, then.

For bonus points, they do read out a disclaimer warning that information sent to a public fax machine might be seen by anyone and everyone.

Noted.

Friday, October 29, 2010

A small point I neglected to draw out in the previous post

"Things that make the web useful" (searchability, among other things) and "things that make the web engrossing" (cross-linking, among other things) are two distinct categories, though with at least some overlap. "Things that make the web popular" is yet a third, comprising most of the other two.

This pattern can't be unique to the web.

Wednesday, October 27, 2010

Falling into the web

I don't even remember exactly how it happened, except that I happened to follow a link from Wikipedia, but I managed to end up entangled in TV Tropes. Apparently I'm not alone.

If you haven't visited it already, be warned: This is one of the more potent timesinks out there. Thinly masquerading as a collection of motifs and plot devices from television, it's really a compendium of archetypes from all sorts of fiction, clearly and wittily explicated and extensively cross-linked. The piece on William Shatner alone is worth the price of admission, as is the Evil Overlord List. Think Joseph Campbell meets Wikipedia meets Remote Control.

Somewhere around the Space Whale Aesop, the obvious occurred to me: It's the extensive cross-linking, that is, the webbiness of the site, that makes it so addictive (that, and there dependably being something worth reading at the other end of the link). A little bit later (Fridge Logic?) it occurred to me that really webby sites like TV tropes are relatively rare. Yes, most blogs include links, but mostly external links. Even Wikipedia isn't as densely linked as TV Tropes (or at least it doesn't feel like it).

In fact, I find a large part of my web experience consists either of directly visiting a favorite site, or doing a search and then following a small number of links to what I'm looking for. Most of the time, I'm using the web to find some particular piece of information, not to browse at random. Nothing wrong with browsing at random -- it's just not my main mode.

As essential as links are to the web, they may not be its most essential feature. If links went away tomorrow and individual sites were flattened into giant, unwieldy documents, it would still be possible to find useful information via your favorite search engine. If search engines went away instead, no one would be able to find much of anything. Furthermore, if search engines had never existed, sites would be much less richly linked than they are now, because authors would have been less able to find good links. Searchability supports links at least as much as the other way round.

In short, search engines may well be more important than links, except when it comes to a particular digressive mode of chasing links to see where they go [But then, I would say that after a few months at Google, wouldn't I? -- D.H. Dec 2015].

Monday, October 18, 2010

NEW TECHNOLOGY LOOKS ODDLY FAMILIAR STOP

My phone is not particularly well suited to texting, but for various reasons I've found myself doing more of it lately. Even beyond the basic problem of typing on a chiclet keyboard with fingers that did some of their first typing on an Underwood manual, there are a couple of challenges.

For one, I'm used to writing complete sentences, so I find myself compulsively and pointlessly going back and fixing spelling mistakes, checking punctuation and so forth. Mind, I don't have anything against the usual abbreviations and casual spellings. I doubt it's a sign that the language has gone to pot or that Kids These Days don't learn anything. More likely it's a sign that full and careful spelling is just not worth the effort if you can get your message across more quickly without it.

The upshot is that I text much, much more slowly than I write. I'd guess at least four times as slowly and very likely closer to eight or ten [re-reading in 2015, I note that I'm able to text much faster now, with a smartphone and keyboard app, and the character limit is much less visible. I think my texting is somewhat less terse now, but the overall point of texting technology influencing texting style still stands, I think -- D.H.]. An order of magnitude in quantity generally means a change in quality and this is no different. Working at such a slow speed, I find every word counts, as typing another is just too much bother.

Side note: Once I was at a conference where computer graphics legend Jim Blinn presented his first ray-traced picture. Ray-tracing is a technique that carefully follows rays of light through every pixel of the picture, as opposed to the classic "polygon pushing" technique, which Blinn helped pioneer and which is still in wide use today because of its speed. Polygon pushing determines which surfaces are visible and draws them (more or less directly), saving a bunch of time. Blinn claimed that one of the nice aspects of ray-tracing was that since it was so slow, around eight hours per frame in that case, as I recall, you had plenty of time to think about what was going to be in the image.

Just so, slowing down to text gives much more time to think about a short message. I'm sure the situation is different for experienced texters, but even then another factor comes into play: SMS's draconianly (and more or less artificially) short message length. If you're tweeting, it doesn't matter if you're sitting at your desk typing full steam ahead, or picking out words while squinting at a cell phone, or rattling away with thumbs of lightning. 140 bytes is 140 bytes.

Way back in the early days of electronic communication networks, people sending messages faced a similar problem. I'm not aware of any particular length restriction on telegraph messages, but for decades telegraph messages had to be transmitted, by hand, in morse code. As a result every word was expensive -- and punctuation was conveyed in words, notably STOP for a period. To cope with this, customers developed a concise "telegraphic" style in order to make every word count.

Technology doesn't just enable. It also constrains, and the effects of such constraint can be just as interesting.

Sunday, October 17, 2010

Defending your reputation (for a small fee)

Some time back, when I had a somewhat different vision of this blog, I ruminated about how one might model reputation. Whether or not the model is any good, taking some time to think about what reputation might be was a useful exercise. Re-reading the posts in that thread, one of the more useful observations was:

We try to control our reputations (at least)

through our actions

by controlling access to information about us

by influencing people's interpretation of the information (we think) we know

Of those, we have the most control over the first, though perhaps more effort is devoted to the third. The second has its own special quirk: It's possible for information to disappear from the web, if all permanent copies can be removed, but the safe assumption is that information only accumulates.

Nonetheless, there are companies in the business of helping people control access to information, and thereby their reputations. A fool's errand? Probably not. There are several services that reputation protection services can and do provide:

Monitoring what you look like on the web. If someone posts something slanderous about you, you may not find out until it's too late, unless you're constantly monitoring the web -- or have someone doing it for you.
There are various online lists and databases that you can sometimes have your personal details purged from, but who has the time?
You can't erase information from the web, particularly if it's a rumor that's already spread far and wide, but you can respond and try to counter it. In this sense, protecting a reputation is just old-fashioned PR.
If you choose to, say, put all your purchases and reading selections and reading up for your friends to peruse, you might want to use a different identity to mention that you're reading World Domination in Six Easy Evil Steps or to purchase that 1.21 gigawatt laser. But if you don't already know that, a service may not be of much help.

What's less clear to me is how much any of this is worth to private individuals. If your ex has just posted those embarrassing videos of you from the last christmas party, it's not going to help much to learn about it in a report form your reputation service. It would seem it's the PR function that's most useful in such cases, but unless you're directly in the public eye you probably don't have call for that. If you do need it, you're not going to get it online for a small monthly fee.

I'd liken it to search engine optimization. If you're doing serious business online, you definitely want it, along with real marketing expertise. If you're blogging in a web.backwater, probably not so much.

Or so I hope.

Tuesday, October 12, 2010

"The computer knows"

The other day someone asked me whether it was supposed to be cold out that week. I didn't know offhand. "That's OK," they said, "I'll check the computer. The computer knows."

It occurred to me that if someone were trying to convince a skeptical public back in the 80s that this whole "personal computer" thing was really going places, and that person were allowed just one ten-second glimpse into the faraway world of 2010 to show the audience, they would probably give their eyeteeth for that particular glimpse. Ditto for a budding AI researcher.

Except ... the viewer from thirty years ago would naturally take "the computer knows" at face value. Computers in the 21st century would be so fast and so smart that the personal computer in the kitchen could predict the weather.

Today, by contrast, we don't generally assume that computers "know" much of anything, but we do assume that they can easily direct us to someone who does, in this case the people at a weather service. Granted, said forecasters are making use of computers that, as far as computing power, could swallow an 80s-era supercomputer whole without a hiccup. Nonetheless, we don't assume that our own computers could do any such thing, or even that a supercomputer is so omnipotent as to make weather forecasters redundant.

That's the difference between having a PC and being on the web. The primary function of most computing devices -- personal computers, phones, netbooks, routers, etc. -- is communication. That's not to say that computers aren't essential in producing and cataloging data, but data is only useful if you can get to it.

Saturday, October 2, 2010

What did I mean, "web before the web"?

I badly mis-titled my previous post.

The point I was trying to make was that the ability to sit down at a computer and do many of the things we now associate with the web, and the idea that there was good money to be made in providing that ability, both predate the Web As We Know It. Fair enough, but calling that "the web before the web" is just wrong. There was very little webby about it.

What makes the web the web? The ability to link from one site to another, that is, the good old http:// link we all know and love. In the 80s you could connect to a remote site. With some applications (for example HyperCard, though it wasn't the first) you could chase links between and within documents on the same computer. UUCP and Usenet also predate the web, allowing email and news to flow between systems (including some BBSs). And, of course, the internet itself was around, so some people at least could connect to more than one system without signing off and dialing in again.

Nonetheless, the essential feature of the web, the idea that you could seamlessly follow a link in one online document to an online document hosted by a different system, had not yet arrived. Without that, no web.

Memory lane and the web before the web

Unpacking some boxes of books, I ran across The MC6809 Cookbook. The '09 was a very nicely-designed Motorola CPU with a clean and well-regarded instruction set. In the event, the Motorola family, including the 680x0 family of 16-bit processors, ended up playing Betamax, with Intel's 8080 and 80x86 family playing the role of VHS.

Actually, that's not fair to Motorola, given that the 68K architecture is still in production and use. It's not necessarily fair to Intel either, as one can certainly argue that the x86 architecture, for all its quirks, actually makes the right trade-offs. Being a software guy, I'm not going to dive much deeper than that. I'm probably already in over my head.

The book is a typical technical book of the time (1981), talking about about pinouts, voltage levels and evaluation boards along with the basics of twos-complement and the details of the instruction set. It includes a description of the language VTL (Very Tiny Language), whose runtime fits in 768 bytes -- considerably less than this post -- complete with code listings. The one for Conway's game of life "takes at least 2K of memory to operate satisfactorily," so be sure you've got that RAM upgrade installed.

Towards the beginning of the book, during the obligatory drumming-up of how great the processor is, is the boast that the '09 "was recently incorporated into what will more than likely become the small computer system of the decade ..."

Any guesses?

"... the Radio Shack TRS-80 Videotex."

No, that's not the classic (Z80-based) TRS-80 that I first learned to hack on. It's not (exactly) the TRS-80 Color Computer (the "CoCo"), though that did use the '09. It's basically a dedicated box for dialing in to servers run by news sources and such, and it basically fell quietly off the face of the earth (Videotex did well in France, but they had their own box).

So why make such a fuss -- and the major players at the time did make a fuss -- over such a thing? Well, while seeing how many Google hits I could get for TRS-80 Videotex (about 8000), I ran across this page on trs-80.org, which in turn quotes an article in TRS-80 Microcomputer News. The author of the quoted article describes the rush of using his CoCo to dial in and get late-breaking sports, news and all manner of interesting information, and even send "electronic mail" to other Compu$erve users.

I remember spending inordinate amounts of time in the early 80s on a local BBS (Hi, Keith!) chatting, emailing and playing games, nearly a decade before TimBL put up the first web server. Clearly there was something to the whole concept.

So, right idea, nearly the right time, but not quite. It's one thing to say "this whole online thing could get big," quite another to work out how it will happen, and another thing entirely to place a winning bet on a particular product. As Warren Buffett said in the 2009 Berkshire Hathaway shareholder's letter (before he said "come and shop at all our businesses"):

In the past, it required no brilliance for people to foresee the fabulous growth that awaited such industries as autos (in 1910), aircraft (in 1930) and television sets (in 1950). But the future then also included competitive dynamics that would decimate almost all of the companies entering those industries. Even the survivors tended to come away bleeding.

Friday, September 24, 2010

Blockbuster, RIP

Back during the Madness, a neighbor happened to mention a new service that would let you rent DVDs online. This was around the same time as eToys and WebVan, back when one could look at a preposterous business plan and think "Well, maybe I'm missing something." Nonetheless it seemed a bit unlikely that people would want to wait for DVDs to arrive in the mail when they could just hop over to the local Blockbuster. I didn't give the idea much of a chance.

About a decade later, Netflix is still going strong and Blockbuster has just filed for bankruptcy, sending its stock from about $0.06 to around $0.04. That's a typical "oh look, you can too fall through the floor" dot-bomb performance and, sadly, not too much of a surprise. I literally don't remember the last time I set foot in a Blockbuster or heard someone say "Oh, I'll rent it at Blockbuster". For that matter, I'm still not sure when I last bought a DVD. The only reason even to rent a DVD is that it's not available online cheaply enough. My Netflix subscription, however, is still going, albeit at the minimum rate [and it's still alive and well ... the new "smart TV" in the bedroom has Netflix built in with a button for it on the remote control, and Netflix seems to be doing pretty well following the HBO playbook in moving from supplying movies to producing original content --D.H. Dec 2015].

The winner here, of course, is online video (provided you include video on demand). The loser is physical video (tape and DVD, but with movie theaters in a separate category). Netflix would likely be in the same boat as Blockbuster had it stuck to mailing DVDs and conversely Blockbuster might have survived had Netflix not beat it to the punch online.

So there you have it: Convergence and the web winning decisively over the old bricks-and-mortar model. It really did happen. Just years later and only in an industry that's essentially been selling bits all along.

Monday, September 6, 2010

The cry of the squeamish ossifrage

I think I got rid of the old Scientific American issue years ago, but I still remember reading about the RSA public key cipher in Martin Gardiner's Mathematical Games in 1977 (August, to be precise). Thirty-three years later, RSA is still in use, providing a secure means of encrypting and signing digital data (unless someone has figured out a way to crack it and is sitting very, very tightly on the secret).

In particular, it can be used to verify that only someone in possession of a particular secret key, generally a several-hundred digit number, could have produced a particular block of bytes. If you visited a site whose URL started with "https://", for example your bank, your browser most likely used RSA in the process of satisfying itself that it really was talking to the right server.

So why is authentication such a mess? Why does resetting a password require anything from coming up with the name of a cat to providing a working email address to providing several pieces of information and then getting a phone call? Why do some sites want the three-digit code on the back of your card and some not, and how is adding three more digits that you end up handing out to all and sundry helping the situation? Why hasn't OpenID or some other knight in shining armor been able to rescue us? Why do we still use passwords for anything besides locally decrypting the key to a real authentication system? How do you even know I wrote this?

I don't really know, but if I didn't have some guesses I probably wouldn't be writing this, now would I?

First, what would a really seamless authentication system look like?

It would allow for multiple identities. Maybe I just haven't caught on to the whole every-waking-moment-of-your-life-available-online thing, but I would rather keep my work identity separate from my blogging identity separate from my personal email separate from my bank accounts. Not to mention my identity as an international man of mystery.
It would allow the same identity to work multiple places. This is not the same as giving N different sites the same username and password. Your username doesn't belong to you, whereas a real identity does. Anybody can choose your favorite username if they happen to get there first. It's also not the same as letting your browser keep track of a bunch of username-password pairs and putting a master password on all of them.
It would minimize the number of tokens needed for an identity, and each token would be there for a clear reason. If the token is a password, fine, but it should be a password, not a password and two or three "security questions."
It would use current best practices. It's risky to use anything too new when it comes to security technology, and unless you're No Such Agency or the like it's madness to try to create your own, but there are plenty of well-established road-tested security techniques available.
It should be portable, both physically (like the "pocket-thing") and across sites. Ideally, registering with a new site means registering the token(s) for the appropriate identity.
It should be as completely under the identified individual's control as possible.

What actually happens? Something along these lines, I think:

Suppose I have some sort of digital certificate that I can use to identify myself. Properly used, this could satisfy the requirements above, perhaps together with some sort of physical token, like a smartcard. Any really secure authentication system, including a smartcard, is going to have some such certificate in it somewhere.

Since it costs money to have a major certificate authority (CA) vouch for a certificate (by signing it), certificates used by individuals in practice tend to be "self-signed", or signed by members of a "web of trust" instead. That's fine for some purposes, but not for doing business with a bank. If it's not good enough for the banks, it's probably not good enough for your utility company either.

In theory, you could establish your identity with a bank and then get them to sign a certificate to that effect, which your utility company might choose to trust, but that basically puts your bank into the CA business, not one they're necessarily keen to get into. In practice, each company would rather control the process, typically asking for an account number off a paper statement to get the ball rolling. Each entity has its own customer ID system for the account number, and usernames are potluck, so you end up with (at least) one semi-identity for each company you do business with.

In the wild-and-woolly world of pure web sites, where you don't already have a customer id when you sign up, there doesn't seem to be any strong push to move beyond the usual username-password system. Everyone's used to it. Switching would mean re-doing the login screen, at the least, with new and less-familiar technology, then convincing your users to go along with it. If it ain't broke don't fix it.

Since an authentication scheme is only as strong as its reset mechanism, there are basically two schemes in wide use:

An identity is a working email address
An identity is a couple of "security questions" and answers

If I had to choose, I'd take the former, but it's not much of a choice.

Thursday, September 2, 2010

Online customer service, only without the service

I don't generally like to criticize customer service reps. It's a thankless job. However, this particular one might have been a little more careful with those boilerplate macro keys. It would be helpful, also, if SomeCompany's system would allow a password reset* given:

Account number
Username, being the service provider's home-grown email address for the customer
Customer's personal email address
Customer's full name and home address
Last four digits of customer's SSN
Customer's home phone number
IP address associated with the account (from which the system was already able to find the username)

It's sort of a division of labor anti-pattern. A human an a computer working together end up more obtuse than either alone. Offering the customer the service the customer can't log into and the chat support that didn't help is a nice parting touch.

What follows is an anonymized and lightly edited transcript of an actual customer chat sent by one of my "army of stringers, researchers, fact-checkers and miscellaneous hangers-on."

Problem: Trying to sign in; need password

Hello Customer, Thank you for contacting SomeCompany Live Chat Support. My name is Service Rep. Please give me one moment to review your information.  I'm ready to assist you today. How are you doing by the way?

Fine, thanks .

Nice to know that you are doing good.

I was trying to log in to your service 

As what I have understood, you would like to have your password for you to sign in right? 

Yes. I thought I'd already set up an account and your website found a user name from looking at my IP address, but I can't reset the password . Also, I'd rather choose my own user name rather than use the assigned one (wemadethisup@somecompany.com), if possible.

Oh, I see.   I understand that it is very important for you to know the password of your here.  I also know that you would like to have your email address personalized and change it.  There is no need to worry since as your service representative today, I want you to know I am more than willing to help you today with your issue. I can assure you that we can have a positive resolution since we will be working on this together.

Here’s what I can do, Since your password is not allowing you to log in, and since we do not store our customers’ passwords, I can give you a randomly system generated password would that be okay?

That would be fine, thanks 

Alright. Please allow me to pull up your account information so that we can resolve it in the most efficient way possible. I will be verifying security information to protect your account privacy. May I please have the account number, account holder's full name, home address, and the last 4 digits of your SSN?

<Customer gives the information>

Thank you.  May you also verify your phone number and the Email address that you are using?

<Customer gives phone number and personal email address>

Thank you.  I am referring to the SomeCompany email address that you would like to reset the password.

I don't use SomeCompany for email, so that address is only useful to me as the login ID. I'm not set up to check that account and I would rather not have to be. But I think you mean wemadethisup@somecompany.com.

Thank you. While waiting, I will share with you a feature of SomeCompany that you can truly benefit. Are you aware of the customer self-help on SomeCompany.com? SomeCompany.com has an extensive series of Frequently Asked Questions (FAQs) that cover all of our products. Customers do not have to sign in to access the FAQs. Quick steps to do it...Open a web browser window and go to http://www.somecompany.com/. On the home page, the navigation menus are on the left side of the window and click on Customers then Help and Support. 

I have already pulled up your account.

I don't think the FAQ will help. Please just reset the password. 

Okay.    Now, for the password, since we do not keep it for security reasons, I can reset it and provide you with a randomly generated one. Do not worry about changing it because you would be able to change it to your preferred password once you are able to log in. Would that be okay with you?

Yes, please.

Sure, now for me to push through the process and reset your password, may you please give me your security pin?

I don't remember setting a security PIN.

A security pin is like a password to your account. This will be sent via a postal mail to you, a few weeks after your service is started. It is a 4 digit number.   May you please try to check your postal mails?

OK. I might have the mail somewhere. I have no idea where.  [time passes] Sorry, I can't seem to find anything.

Since you have not provided the Security pin, in order to push through with this process, I would have to call you right now on your phone number to authenticate. Would that be okay with you?

Unfortunately, no. My kids are sleeping.   Perhaps I should try again during the day?

Yes, you may always contact us.   We are available 24/7, Customer.

So there's nothing else you can do?

Customer I really know how important it is to have your password. I would like to apologize however, we need to call you to authenticate so that we can reset your password.

OK. I'll try again during daytime hours.

Thank you so much for your time, Customer.

[time passes]

 Customer, here’s what we have done on this chat today, I have assisted you with your SomeCompany inquiry on resetting your password however we need to call you to authenticate.  Customer, it has been my pleasure serving you today and I truly appreciate your understanding and cooperation. Do you have other concerns for me today? I will be glad to assist you further.

No, that will be all, thank you.

We strive to exceed your expectations and hope that you will take a moment to complete the 3 question survey that will follow our interaction, your feedback will help us to continue improving how we serve you.   Do you want to use our service? Go to http://www.somecompany.com. Thank you for choosing SomeCompany as your service provider and have a great day! SomeCompany appreciates your business and values you as a customer. Our goal is to provide you with excellent service. If you need further assistance, you can chat with one of our Customer Support Specialists 24 hour a day, 7 days a week at http://www.SomeCompanySupport.com

* Actually, SomeCompany is probably right to want better authentication. It's quite possible that someone, say, found their neighbor's bill, with the account number, and leeched onto their non-secured WiFi or used other chicanery so as to connect from the right IP address and thence obtain the user name. It's conceivable that such a person also somehow happened to know the customer's personal email address and last four digits of the SSN.

Calling the phone number of record (which the customer was challenged to give and the service rep is able to verify) would raise the bar significantly. Likewise, assuming the snail mail with the PIN didn't also have the account number, the would-be thief would have had to steal two separate pieces of mail, typically delivered on different days.

The annoyance here is that the stronger authentication is strong on its own. That is, "Tell me the PIN we mailed you" is about as secure as "Tell me the PIN we mailed you and several pieces of not-too-hard-to-find information." and "So you want a password reset? Let me call you at the phone number listed on the account." is at about as secure as "Tell me several pieces of not-too-hard-to-find-information and I'll call you on the phone number listed on the account." Unfortunately, Service Reps are generally required to go through the whole account verification cha-cha-cha before doing anything meaningful.

One wonders, though, why this bundle of not-too-hard-to-find information is good enough the let the customer access the account information, but not good enough to let the customer use the service itself.

Wednesday, September 1, 2010

A belated Happy Birthday

Yikes, this is a bit casual even for the new, even-more-casual Field Notes [Heh ... I think the current record is now 27 Aug to 14 Dec 2015, which would include a Field Notes birthday -- D.H. Dec 2015]. For months I'd realized that post 500 and the third anniversary of the first Note would come close together, but I got so caught up in spinning up the new blog after post 500 that I forgot all about the date, even though I posted just one day afterwards.

In the new spirit of apathy, I won't hold forth as I did in years past, but I would at least like to note the occasion, if only a bit after the fact.

Tuesday, August 24, 2010

Ajai Chowdhry on IT in Africa

Real journalism is not an easy gig. Your job is to report news, which means things nobody knows about yet. Old news is no news, so you need to figure out what's going on as quickly as you can. Extra time spent researching or editing is time that the story's not getting published. Then you get to go on to something else you don't know about, though at least it will usually be in the same general vein.

So I feel a little bit bad as a casual blogger about picking on the CNN headline writer who characterized Indian IT entrepreneur Ajai Chowdhry's comments on African infrastructure as "Why broadband not roads will transform Africa." But only a little bit.

From what I read, Chowdhry isn't saying that broadband is more important than roads. The main assertions I get are that Africa's problems and solutions are the same as India's; India has had an advantage in being a single state instead of 53; because of their similarities and decades of close relations, doing business in Africa is not difficult for an Indian company; Africa represents a huge business opportunity; African manufacturing has a huge native market to supply; African unity will only help Africa's economy and stature in the world; and, yes, broadband and the web could and should play a major role in addressing African poverty.

Chowdhry mentions roads in one passage at the end:

But the one area where Africa can make a big difference is by not just looking at putting up roads -- it should look at putting up internet broadband-type infrastructure.

In other words, both are important, and perhaps broadband is being overlooked.

My point here is not to bash on CNN. As I said, putting this all together is harder than it looks. Rather, it's that Chowdhry's broad and well-developed view of the situation, from the standpoint of a key player in IT, provides a good perspective of how the web fits into the overall picture of economic development.

Given that Chowdhry is in the business of IT and clearly and openly hopes to gain from helping develop African IT, it's particularly notable how broad a view he presents.

Wednesday, August 18, 2010

Well now I've done it

Field Notes now has a baby sibling. Its name is Intermittent Conjecture and it has something I haven't seen in years: a post list that fits on one page. In fact, as I write this it contains only an introductory post. Considering that I skipped that formality with Field Notes, I suppose that's another first.

As I say there and said here, the plan now is to relax for a while and post whenever the mood strikes. If it's about the web, it will end up here. Otherwise it will end up there. Unless you are singularly obsessed with or repelled by talk of the web, you will probably not see a lot of difference between the two, other than the range of topics.

If there were regularly scheduled programming, this is where we would return to it.

Sunday, August 15, 2010

Wikipedia 1.0: journey vs. destination

While browsing through the Wikipedia policy pages (it was either that or just tattoo "Geek" on my forehead and be done with it) I ran across something I remembered running across a while ago, more or less shrugging at and moving on, namely an offline edition of Wikipedia. There seem to be two approaches:

The "German model": Distribute a snapshot of Wikipedia on CD. Why, I'm not sure. Perhaps to reach that select audience of people who have heard of Wikipedia but don't have an internet connection to access it*?
The "Wikipedia 1.0" model: Select the best, most polished articles and publish them, whether on paper, CD/DVD, read-only web site, or whatever.

The Wikipedia 1.0 project was proposed in 2003. At this writing, several versions have been released and 0.8 will be out Real Soon Now. That's not to say that 1.0 will be two versions from that. The beauty of the x.y version numbering scheme is that you don't have to go from 0.9 to 1.0. You can release 0.91, 0.95 ..., you can release 0.10, 0.11 ..., you can release 0.9a, 0.9b ... [But it looks like we'll go into 2016 still on version 0.8 ... my guess is that 1.0 isn't going to happen -- D.H. Dec 2015]

For my money, it's not particularly important whether 1.0 ever comes out. Plenty of good has come out of attempting the exercise at all, in particular as a spur toward improving the quality of core articles and encouraging the development of Wikipedia's quality and importance ratings. These exhibit a nice division of labor: People rate articles and computers aggregate the best-rated ones.

The main reason not to just leave it at that and integrate the ratings more directly into the UI, is that vandalism still has to be filtered by hand and, despite the lack of imagination exhibited by most vandals, always will be. But most likely even that could be handled without an explicit release mechanism, by means of "flagged revisions," which allow editors to flag particular revisions as being free of vandalism and otherwise up to snuff. Apparently the mechanism has been in place for a while but the community is still figuring out how best to use it.

What's the proverbial "simplest thing that could possibly work" here? Perhaps just allowing anyone -- or anyone with an account -- to tag a revision however they like, and allow readers to filter what revisions they see. E.g., only show me revisions that the quality rating committee has rated "good" or better and my friend Jimbo has rated "funny". The proposal for "sighted revisions" looks pretty close to this, though less flexible.

* That's a bit glib, as there are communities with access to computers but with limited or no bandwidth, but given it was the German edition at 3 Euros per CD, I doubt this was the intended audience. Nonetheless, 40,000 people opted to buy it.

Wednesday, July 28, 2010

Could I just type a date, please?

For some reason, I've been running into sites with broken date entry fields.

It doesn't seem like a hard problem. In the States, at least, the standard format for dates is "mm/dd/yyyy", so a decent date entry field would behave something like this:

If I type in, say, "7/27/2010", the date entered will be July 27, 2010
Um, that's about it. The left and right arrow keys move amongst the numbers, maybe the up and down arrows increase/decrease the field where the cursor is.

The problem is, an annoying number of sites use something that does even more for you: If you type in enough to fill in a field, it moves you to the next field -- which is actually fine -- but it doesn't remember that it did that. "Fill up the field" and "/" both move you to the next field. If you get to the end, it wraps around to the beginning.

So if I type 7/27/2010

7 goes in the month field. Good.
/ moves me to the day field. Good.
27 fills in the day field and moves me to the year field. Fair enough.
/ wraps me around to the month field. Wha?
20 fills in the month field again, except 20 isn't a valid month. Urgh.
10 fills in the day field again. Foo.
I end up with "20/10/" and no year. Bleh.

Of course, all I have to do is remember not to use / if it's the 10th or later and October or later. Right.

Or this widget could just not use / to move fields if it just moved because of a full field.

OpenID and incentives

I mentioned OpenID in the previous post, leading me to wonder "Hey, whatever happened to OpenID?"

Well, it's still there, but perhaps not in the form its creators had in mind.

In OpenID, there are three main roles

You
The site you actually want to use (the relying party)
The site that you'll log in to to convince the relying party that you're you (the provider)

I don't know about you, but my intuition was that it would be easier and more popular to be a relying party than a provider. That's more or less the situation with, say, SSL certificates. If you're a certifying authority (CA), you're vouching for each certificate that you sign and you have to pay great attention to keeping your keys safe and other such matters. If you're just using a certificate (like when you log in to your bank or some other site using https), all you have to do is decide what CAs to trust, and in practice your browser makes that decision and does all the checking behind the scenes.

In OpenID, if you're a provider all you really need to do is accept requests for your users to log in -- which you have to do anyway -- and tell whoever asked you to do that, "yep, that's them." Unless being an OpenID provider is your main gig, you really don't have to take any more care than you otherwise would. If no one trusts you, it's not the end of the world (but see below).

If you're a relying party, you have to decide whether to trust the provider. In particular, you have to trust that the provider will check identities at least as carefully as you would. If the provider is a bank (not that that seems likely) or is trying to make money solely off of providing OpenID, that's a pretty good bet. Otherwise, your milage may vary.

...

After working through that, I realized that there's a much simpler reason that -- unlike the certificate case -- parties tend to prefer providing to relying: If you rely on me, then your users will have to log in to my site, maybe see some of my advertising, be reminded that they use my service, and so on -- in order to use yours*. Sure, their account with you is still an account, but their account with me becomes the "real" account that the others are just sort of attached to. Which role would you rather play?

I like this analysis better. Providers do have an incentive to provide good service to relying parties, but unless users really care, no one has much incentive to be a relying party. Now that browsers are good at remembering passwords, having a single sign-on is less of an issue and people probably don't care so much. With a modern browser, you could give each account its own password if you like (and that's the more secure option) without having to keep all of them in your head.

* Providers can allow "checkid_immediate", where you don't have to log in to the provider, but that's not a popular option. Not only would providers likely prefer that users go through their login as often as possible, relying parties would probably prefer to know that the user actually logged in somewhere before letting them in.

Letters of introduction

As part of my never-ending quest to catch up on things I ought to know already (right up there on the list with reading all the great works that I've never gotten around to reading), I did a little brushing up on single sign-on schemes.

The basic idea of single sign-on is simple and good: I don't want to have to type a username/password all the time, and I don't want to have to keep track of which password I used for which account. The second item is particularly bad, since even though there are ways to let people have passwords without storing the passwords themselves, not everyone seems to know how to do it (the techniques are only a few decades old after all). So that means either keeping track of lots of passwords, or using a few and running the risk that one insecure site gets compromised.

(Logging in by sending a password to a server is kind of bogus anyway. Stronger schemes use passwords only to locally unlock the secrets of some sort of strong crypto which then handles the real authentication. There are also smart cards and similar which use strong crypto to generate a secret with a limited lifetime, tying a login, assuming the crypto works, to a physical object and a point in time.)

So, how does a single sign-on service work? Basically such a service depends on a way for some other system to vouch for you to the system that you're trying to log in to:

You log in once with a particular server
That server gives you (or your browser) a token (or cookie, or ticket, or whatever you like). That token effectively says "so-and-so was able to present me with such-and-such credentials -- generally a username and password but possibly something more substantial -- at such and such time"
You (or your browser) show that token to the system you're trying to log in to.
That token has enough crypto mojo in it to allow the system you're logging into to verify that it really came from some party it trusts, that it hasn't been tampered with, etc.
Assuming that the identification in the token matches a known account, you can now be logged in.

There are variations. For example, in OpenID, you give the server you're trying to log in to a URL representing your identity. That server chases the link and finds out who's vouching for you. It then checks with that party. Typically that party will ask you to log in to it and make sure that you trust the server that asked it to do that. The two servers then securely agree, effectively, that yes you did log in.

The young Charles Darwin carried with him on the voyage of the Beagle letters of introduction, as was common practice in those days. By presenting the appropriate letter, he was able to obtain lodging, safe passage and other such benefits. The letters were signed by people the benefactors knew and trusted and let them know that Darwin was also someone to be trusted. The signature a was crucial part of ensuring that the message really came from whom it said it did; people paid more attention to signatures then than we typically do now.

Is there a connection between this fading and quaintly antiquated practice and modern digital technology? Well, they don't call it a digital signature for nothing ...

Saturday, July 24, 2010

See, now that didn't take long

One other conclusion from my 500-post mullings: This site needs a re-design. I've narrowed it down to

Expert graphic design help courtesy of the Geocities-izer.

Wednesday, July 21, 2010

My, how time files

Somewhat over a year ago I posted post number 300 and said I probably wouldn't make much of a production until I hit a more significant milestone.

This is post 500 and, having mulled this whole "blog" thing over, I've come to a few small conclusions:

After 500 posts and nearly three years I'm pretty well convinced I can write a blog.
The last couple of months have seen a mad scramble at month-end to make my self-imposed and arbitrary ten-post-a-month quota
I still enjoy writing Field Notes, but there are also other things I'd like to write and my time is limited.
Somewhat to my surprise, I still enjoy reading this blog, despite feeling that I must have beaten most of the major themes to death by now.

So ... I'm not sure exactly what to do next, but one decision did jump out: Drop the ten-post-a-month thing. It's good to have a goad to encourage putting something down, but at this point in the game just producing posts doesn't seem like a very meaningful end in itself.

That doesn't necessarily mean I'll be posting less. I might, I might not. The next post might be tomorrow (not unlikely) or in a year (much less likely), but there will definitely be a next post, and one after that ... Will there be a post 1000? I have no idea.

One thing I probably will do is go back and re-read the whole blog from start back to real time and do a bit of gardening along the way. That will almost certainly produce a few "where are they now?" followups -- maybe a year or two is a long time on the web after all.

As always, I thank anyone reading this and in particular anyone following the blog regularly for your time and attention, and hope that you enjoy it at least as much as I.

[Hmm ... re-reading the blog ... now there's an idea. Five years later I still hadn't done it, but I'm doing it now, slowly, in reverse chronological order, like you see it on the web. Which is why my heart sank a little when I realized that I've only made it through 148 posts and still have 499 to go (fewer until I get to wherever I ended up when I tried to read the blog through from the other end).

As to the pace of posting, well, that did drop off a bit, didn't it? Posts from 23 Aug 2007 to 21 Jul 2010: 500, or about one every two days. Posts from 22 Jul 2010 to 14 Dec 2015: 147, or about one every 13 days. So basically once every two days vs. once every two weeks. Ah, well. I still enjoy it. I just don't do it as much.

I still have no idea whether there will be a post 1,000 --D.H Dec 2015]

The contours of Twitter

Strange Maps is a fascinating blog of, well, unusual maps, generally accompanied by interesting analysis of what they might tell us about ourselves. The example that led me to the site was a map of Twitter traffic in London.

I'm a bit surprised that more than one commenter is skeptical of the data on the grounds that financial centers like the City show less traffic than areas like Soho. Wikipedia describes Soho as "predominantly a fashionable district of upmarket restaurants and media offices" (which sounds about right) while the City I recall (from a decade or so back) rolled up the sidewalks around dusk as the white-collar crowd headed out to go homeward or pubward to relax.

Hmm ... is one more likely to tweet from the offices of some bank or trading house with the boss nearby, or while relaxing at an "upscale restaurant" afterwards -- or for that matter working at a "media office" during the day? Likewise for the case of Wall Street vs. New York's SoHo and La Defense vs. Levallois in Paris, particularly as Levallois is (Wikipedia again) "one of the most densely populated municipalities in Europe".

I'm more curious what the map would look like normalized for population, that is, tweets per person in a given area as opposed to raw tweets. Are there more tweets in central London than in the surrounding suburbs because there are more people? Also interesting would be a breakdown of both raw and normalized volume by time of day. The raw volume would at least to some degree track the flow of people in and out of the city, while the normalized volume would be affected both by that and by people's daily habits.

[Happily, Strange Maps is still in business, and still keepin' it strange --D.H. Dec 2015]

Wednesday, June 30, 2010

Maybe I should promote myself

No, I'm not talking about self-promotion. I'm thinking I'll get myself a fancy title.

For what, writing a blog? Well, according to a snarky column on title inflation in the Economist, "Southwest Airlines has a chief Twitter officer. Coca-Cola and Marriott have chief blogging officers," and "Everybody you come across seems to be a chief or president of some variety."

In such an environment it should come as no surprise that there is no shortage of online job title generators (they're slightly amusing and easy to script, a surefire formula). Picking one more or less at random, I came up with "Senior Communications Analyst", so say hello to Field Notes' newly-minted Chief Senior Communications Analyst for Blogging and New Media.

My corner office awaits.

More random graph theory on the web

Picking up an earlier thread, I went looking for papers on non-uniform random graphs, that is to say, connected networks where some members have many more connections than others, but that's about all we know. I turned up an interesting one by Fan Chung and Linyuan Lu describing the characteristics of graphs more like ones you'll find in the context of the web.

In particular, social networks and others commonly encountered appear to follow a power law, meaning that, for some particular number b, there are roughly b times as many people with n connections as there are with n+1. In social networks, b tends to be between 2 and 3, so for example there might be around 1000 members with just one connection, 500 with two, 250 with three, and so on to only a handful having eight or more.

Chung and Lu show that under such conditions, there will very likely be a core of closely connected people, that two people chosen at random will very likely be close to each other, but there will also be some significantly less-connected members. Connecting two people outside the core will generally take considerably more steps than connecting most pairs of people.

So: six degrees of separation (or whatever it really is) in most cases, but considerably more in a few cases.

The vague terms like "very likely," "some" and "significantly" have precise mathematical definitions. The details are in the paper if you're interested.

On a recent plane flight, I looked at the route map to see if the airline's network behaved like a social networking graph. Airline networks are specifically designed to connect destination cities well but as cheaply as possible, keeping in mind that cost depends on a number of complex factors. In particular, you want to minimize the number of stops needed to get from point A to point B.

Did the airline's network follow a power law? It did, but only up to a point. Most cities had only a single connection. Fewer had two and about the same proportion fewer had three. And then there were the two hubs, each with dozens of connections. There was nothing in between. You could get from almost anywhere in the network to almost anywhere else by flying to a hub and then on to the final destination — not a big surprise.

The interesting thing is that airline networks specifically don't behave like social networks, precisely because social networks tend to leave some members out in the cold and in air travel, that just won't do. It may be worthwhile in many cases to emulate some naturally-arising structure, but it's not always the best choice. Sometimes you actually have to plan.

Wiki without the pedia

While tagging my previous post, I noticed that I had tags for both "Wikipedia" and "wiki". There are four articles (now five, of course) tagged "wiki," three of which are more or less to do with Wikipedia. The other is from the Baker's Dozen series, speculating about what role the wiki approach may play in the next generation of search engines.

What really stands out to me about wikis is that there's Wikipedia and then there's everything else.

Everybody's heard of Wikipedia by now and quite a few people have tried their hand at editing it. As a result, there is a well-known tool for editing Wikipedia (Mediawiki) along with a well-established culture and etiquette. There is also enough of a critical mass that, for the most part, articles tend to improve over time.

And then there's everything else. Don't get me wrong. There are some good wikis out there. But there are also an awful lot of half-baked ones. These tend to crop up when a small software shop or similar organization decides that it needs a wiki to, say, document its software architecture and development process. Well, why not? Wikipedia is pretty successful, and software shops are always looking for lightweight, dare I say "agile" ways of tracking what's going on.

In practice, there are several pitfalls:

Wikipedia has a lot of eyes. According to Wikipedia, Wal-Mart has about 2 million employees, while Wikipedia has close to 13 million registered users. Granted, Wikipedia claims only about 90,000 "active contributors", but that's still about the same headcount as Microsoft. Chances are, your company isn't that big*
It used to be every computer science undergrad wanted to invent and implement a programming language. Somewhere around the turn of the century that ambition seems to have shifted to writing a wiki engine (which typically has at least a toy programming language in it somewhere). So many to choose from and, even though approximately one of the choices has a huge userbase and all that goes with it, the odds are that whoever set up your wiki chose something "better" than Mediawiki.
Wikis were designed for quickly throwing together webs of loosely structured text, and not for any of several other things they sometimes get used for. A wiki page generally doesn't know what role it has in a bigger picture. A wiki is not a bug tracker. It is not a release planning system. It doesn't know that feature X was promised to FooCorp for release 2.1 whose schedule has just slipped. No one told it any of that. Ah, but that's where the toy programming language comes in ...
Many shops are content to limit wikis to the smaller role of gathering together bits of wisdom that people tend to email each other as the occasion demands. "Why did you design it this way?" "Well ..." The problem is that this conversation tends to happen when, for any of myriad reasons, the design wasn't documented close to the code, so someone is now asking the author. Ideally, the original designer goes and documents the code and replies with a link to the new doc. Alternatively, if the conversation is taking place on an archived list, the answer will be in the archives for future generations. In either case, it's not clear that updating a wiki and replying with a link to that would be an improvement.
Wikis need gardening to combat various forms of rot. Typically there's even less time for this, particularly in a small shop, than there is for updating the wiki in the first place.

Wiki writing is not magically easier than any other kind of writing. Maintaining a wiki takes time and dedication. Wikipedia has a lot of dedicated contributors, including many who specialize in gardening and other less glamorous jobs. If your organization is not specifically in the business of producing wiki pages, chances are the wiki will reflect that.

* On the other hand, chances are you wiki is not going to be as big as Wikipedia. Nonetheless, (I claim) there are economies of scale that happen when the user base gets larger. In a large community people can specialize, for example in maintenance tasks.

[Wikipedia continues to dominate the world of Wiki, even neglecting its sister projects. The one notable exception I can think of is TV Tropes. I doubt it has anywhere near the readership of Wikipedia, but it's still the rare example of a publicly-edited non-Wikipedia wiki with a significant readership -- D.H. Dec 2015]

Wikipedia moved my food dish (slightly)

Wikipedia has recently undergone a facelift. Just as a casual user I've noticed approximately two things:

The buttons and stuff are shinier.
The search field is now up top instead of over to the side.

I was somewhat annoyed by that second item for a bit, but I'm already used to it now, and I can see the UX value in putting such a vital, high-volume element in a more prominent place.

What else did they do? The new features link mentions a couple of new editing widgets, which I may explore next time I edit a page, a new version of the logo (part of the general new shininess) and, "improved search suggestions". They've also made it clearer whether you're reading or editing a page, but I've never had a lot of trouble with that distinction.

Of these, the improved search suggestions are the real winner. Search suggestions rock, and I'd say that even if I didn't work for Google.

The internet ate my brain. I think.

The Economist has a quick review of Nicholas Carr's The Shallows: What the internet is doing to our brains. The gist is that the constant context-switching involved in web surfing is "already damaging the long-term memory consolidation that is the basis for true intelligence."

That "already" -- the reviewer's term and not necessarily Carr's -- is a telling bit of boiler plate, adding a bit of urgency in suggesting that this is the beginning of a long-term trend that will surely rot our brains completely before we know it.

Knee-jerk skepticism:

Just how much do we know actually about "the long-term memory consolidation that is the basis for true intelligence", or "true intelligence" for that matter?
Suppose we can show that, when web surfing, our brains behave in some sort of inattentive, scattered mode. Does that mean that we've lost the ability to think in any other mode, or just that that's how we think when we're surfing?
If the web is rotting our brains by changing our patterns of thought, is there a corresponding change in, say, the rate of technical innovation (by some reasonably objective measure)?
More subjectively, the has there been a change in the culture? My understanding is that contemporary culture is vapid, cheap and degraded and that things were much better in our parents' day. If so, that represents exactly zero change from fifty, or a hundred, or a thousand years ago.
Returning to that "already" above, assuming that there is some sort of measurable effect, is it the beginning of a trend, the end of some sort of adjustment period, a temporary blip or what?

Of course, this is just an off-the-cuff reaction to someone's review of a book that I've not read a word of -- which possibly serves to support Carr's original point.

Many years ago I was sitting in the living room of a house I lived in when a roommate from Europe wandered in and, seeing I was flipping through channels, asked if anything was on. This was back when European TV typically had way fewer channels than a US cable setup. Without thinking, I started going through the channels in order, at a steady beat of one every couple of seconds, narrating as I went: "That's baseball ... that's just bad videos and commercials ... I've seen that episode already ... that guy's just obnoxious ... that's just Gilligan's Island" The roommate's jaw steadily dropped. "How can you know all that just from a second or two?"

A fair question, but the sad truth is that once you've been through a selection of several dozen channels a few times too many, it becomes all too familiar. Sometimes just the channel number is enough, sometimes it's easy to recognize a face or a setting.

To me, the disquieting bit was not that my brain could pick up cues from previous experience that quickly. That particular circuitry has obvious survival value and has doubtless been in our wiring in some form or another for a good long time. The disquieting bit was that I had the information in my head to retrieve in the first place. I'd obviously spent enough time planted in front of the TV to recognize Bob Denver on sight. With all due respect to the late Mr. Denver, that's not necessarily a happy realization.

Was TV changing the way I processed information, or had my TV watching skewed the information I had on hand to process? Maybe both?

And thus has a quick throwaway post morphed into a not-quite-so-quick rumination on the nature of memory. Am I supposed to have the attention span to still be writing and revising this, or was I supposed to have quit after the first 140 characters?

Friday, June 25, 2010

Webbys: who are these people?

Looking at the 2010 list of Webby award winners, I see three basic categories:

Old-media names that I've heard of (The New Yorker, The Economist, Roger Ebert, Amy Poehler, Zach Galifianakis, HBO, Sesame Street ...)
New-media names I've never heard of (wonga.com, Record Tripping, Love Letters to the Future, BITTER LAWYER, Mubi, Nawls ...) [and none of these come to mind re-reading in 2015]
New-media names I've heard of but don't really use (Twitter, Metacritic, Pandora at the moment)
Vint Cerf (file under "demigods")

So what does this mean? Probably some combination of

Old media are alive and well on the web
One purpose of awards like the Webbys is to bring worthy unknowns to the world's attention
I'm a stick-in-the-mud who doesn't even tweet (so why am I writing about the web, again?)

Football on the web

(and by "football" I mean FIFA, not NFL)

Other sources drive plenty of web traffic, but if you want lots of people on a site all at once, a sporting event is the way to go. The latest case in point, of course, is the World Cup, which, according to ESPN, was producing over 12 million hits per minute, or about half again the traffic for the 2008 US presidential election (the previous record holder).

Likewise, twitter has been seeing upwards of 3000 tweets per second, comparable to the Lakers-Celtics NBA final. Normal traffic is more like 700 tweets per second.

Lest talk of presidential elections and NBA finals give too much of a US-centric impression, ESPN cites a measurement of "total mentions in social media" for the month leading up to the Cup. Far and away the top entry, ahead of hosts South Africa and well ahead of the US, are England.

But then, England have a fair bit to talk about.

Wednesday, June 23, 2010

Vint Cerf's webby

A while ago I extolled TCP as a "gold standard" and now, it appears, it's finally getting the credit it deserves. Even Tim Berners-Lee has gotten into the act, making many of the same points, albeit with a bit more polish, in introducing internet pioneer, Webby recipient and Google fellow Vint Cerf. Cerf's response, a Webby-standard five-word acceptance speech, would be a cliche coming from most of us. Considering the source, it's startlingly audacious: "You ain't seen nothing yet."

So, whatever could make an engineering masterpiece like TCP and decades of other career highlights look like "nothing", you can be assured Cerf is hard at work on it.

Thursday, June 10, 2010

Putting the "world-wide" in "world-wide web"

Here's a lovely piece of concept art: In 2006, video blogger Ze Frank challenged his viewers to construct an "earth sandwich". How do you make an earth sandwich? Just put two slices of bread at exactly opposite points on the earth (bow bow bow).

Within about a month, two contestants had placed half-baguettes in New Zealand and the Spanish countryside, accomplishing the feat. The two had (I assume) met online through Ze Frank's show, had doubtless exchanged email and, of course, produced their own online videos of the whole adventure.

In other words, it could only happen on the web, right? Well ... yes and no.

Certainly it could only have happened the way it happened on the web. That's almost tautological. But is there any part of the concept that couldn't have been done without the web? Certainly people in New Zealand and Spain have been able to communicate and share a common idea for much longer than the web has been around. Neither is the earth sandwich the first piece of concept art on a global scale. David Barr's Four Corners project, completed between 1976 and 1985 without benefit of the web or GPS, comes to mind.

On the other hand, I expect the web makes it much, much more likely that such things will happen, by providing a cheap and easy way to broadcast an idea to a global audience. It also provides a cheaper and faster way for participants to communicate with each other by providing both the time-shifting of mail (I don't have to read while you're writing) and the speed of the telephone (we don't have to wait for a message to be physically transported around the world). Time-shifting is particularly useful when the participants are twelve time zones apart.

My inner engineer questions the accuracy of both the earth sandwich and four corners projects, since the earth isn't perfectly round. It's definitely a problem for the four corners. Whether it's a problem for an earth sandwich would depend on the fine points of the GPS coordinate system, though at least the largest source of non-roundness, the equatorial bulge, shouldn't be a problem for points the same distance from the equator. On the earth sandwich page, Doc Searles doesn't even pretend to accuracy -- Cambridge is opposite the Indian Ocean, nowhere near Singapore.

Wednesday, June 9, 2010

i.e.

Consider two prefixes: i and e, both in lowercase, e. e. cummings-style. Once they were emblematic of all things new and shiny and dot-com-y. Where are they now?

e- still has its webby connotations, quite possibly because e-mail is still prevalent. We still have eBay, eHarmony, esurance, Epinions, eFileCabinet and others, though perhaps not as many as one might expect.

i-, on the other hand, was blatantly hijacked by Apple. It used to mean "internet-" or something, but through some masterstroke of Steve Jobs's patented legerdemain, it now means "cool, shiny and Apple-y". In fact, according to Wikipedia, the name "iPod" was already trademarked, for internet kiosks, when freelance copywriter Vinnie Chieco decided the prototype reminded him of 2001, A Space Odyssey, particularly the phrase "Open the pod bay door, Hal!" and proposed the name. How the initial i got attached is not clear, at least not to me.

While Jobs didn't come up with the name himself, he must have made the final call on going with it. The sleight-of-hand was being able to market something with no direct internet connectivity with such a name (the much webbier iTunes didn't come along for another couple of years).

Two other affixes from the era still seem to have life in them. The notion of calling the customized view of FooCorp "myFooCorp" lives on here and there, not to mention mySpace.

And, of course, .com has more or less become punctuation.

Finally, there's camelCase. When I was starting out, there were still widely-used programming languages with ridiculously short limits on names. Classic FORTRAN was limited to eight characters and BASIC dialects varied but could be even worse. Single-case, conventionally ALL CAPS, was still prevalent as well.

[You got around these restrictions by dropping any letter you could -- "parameters" became PARMS, "first name index" might be FSTNMIDX. Well-organized FORTRAN code typically built variable names up from abbreviated parts and had block comments in key places explaining what all the abbreviations meant.

Early versions of FORTRAN also had the convention that the first letter of the name indicated whether a variable was integer or floating point, so you'd get names like IRANK, since plain RANK would be floating point. While that led to a lot of names starting with I, I doubt that's where the dot-com-era i- prefix comes from. --D.H. October 2015]

Two popular languages were less restrictive: C and Pascal. C coding style called for all-lowercase names except for constants, with underscores serving as spaces: my_variable_name. Pascal, on the other hand, didn't allow underscores in names (or maybe they were just considered uncool?). Instead, Pascal code used capitals to break up long names: MyVariableName.

I really don't know how mixed case came to be the dominant style, but it has. I still remember a TA (who would later spend some years working for Apple) complaining that my C-style names_with_underscores hurt his eyes and why didn't I do things TheRightWay. Fast forward a few years and if you want to look web.hip you have to go camelCase. Spaces are so old economy.

The astute reader may notice the subtle distinction between camelCase (starting with lowercase) and PascalCase (starting with uppercase). Both are used in actual code. For example, Java conventions call for names of classes to start with a capital and most other names to start with lowercase. I suspect that dot-commers chose lowercase (for the most part) because it just looked less conventional.

Whatever the reasons, it seems to have caught on, more so, in fact, than any of the particular prefixes.

How much dot-com-y goodness will fit in one name? What's the equivalent of a tall double half-caf soy vanilla latte? My guess is it would be somewhere around "myENet.com", but I may have missed a step.

[A quick search reveals that "tall double half-caf soy vanilla latte" is small beans. The real bidding starts at "Venti, sugar-free, non-fat, vanilla soy, double shot, decaffinated, no foam, extra hot, Peppermint White Chocolate Peppermint Mocha with light whip, upside-down, 1 pump of peppermint, 1 and 3/8 pumps vanilla,180 degrees, heavy whip-cream, 3 ice cubes, 1/4 teaspoon Nutmeg sprinkled on top, with green sprinkles, lightly cinnamon dusted on, stirred, with no lid, double cupped, and a straw"]