Field notes on the Web: 2011

Saturday, December 31, 2011

Rumours and tweets of rumours

Someone at the Guardian (aided by academics at several universities) put in a bunch of overtime analyzing the Twitter traffic from last summer's riots in England. In all, they traced seven rumors, five that turned out to be false, one that turned out to be true, and one they classify as "unsubstantiated." They then put together a nice interactive graphic of the results, including a graph of the volume of traffic over time and a sort of cloud diagram color coded to show support for, opposition to, questioning of and commentary on the rumor in question, with size indicating the "influence" of the tweet, based on number of followers the originator of the tweet had.

The results are fascinating. You should probably have a look at them yourself (here's the link again) before going on.

There is a fairly widespread notion that the web corrects itself. People may put up misinformation, whether deliberately or in good faith, but eventually the real story will come out and supplant it. The lead-in to the Guardian interactive graphic says so in as many words: "... Twitter is adept at correcting misinformation ..."

I don't see a lot of support for this in the data presented.

In the self-correcting model, you would expect to see an initial wave of green for a false rumor, coming with the original misinformation, steadily replaced by red, with possibly some yellow (questioning) and gray (commentary) in between. Following is what actually happened for the five rumors determined to be definitely false.

Rioters attack London Zoo and release animals: Initially, green traffic grows. After a while, red traffic comes in denying the rumor. Hours later, there is influential red traffic, but the green traffic is still about as influential. Traffic then dwindles, with the last bits being green, still supporting the rumor hours after it has been disputed.
Rioters cook their own food in McDonalds: This one was picked up early by the website of the Daily Mail, which stated that there had been reports of this happening. In any case, the green traffic surges moderately twice, before peaking at high volume several hours later. There is no red traffic to speak of.
London Eye set on fire: This one actually does follow the predicted pattern. The initial green is quickly joined by yellow and red. The proportion of red steadily grows, and as traffic dies down it is almost entirely red.
Rioters attack a children's hospital in Birmingham: In this case one source of denials was someone actually working at the hospital. Again, a strong surge of green is gradually taken over by red, but not completely. As traffic dies down, the rumor is still being circulated as true. Late in the game, it resurges again, though again there is a countersurge of denial.
Army deployed in Bank [I believe this refers to the area in London near the Bank of England and the Bank tube station]: Traffic starts out yellow, as a question over a photo (which was actually a photo of tanks in Egypt). Red traffic begins to grow, but so does green, and yellow continues to dominate. Eventually everything dies down. The last bits of traffic are yellow

In summary: One of the five cases follows the "good information drives out bad" model. One other more or less follows it. Two are an inconclusive mix of support and denial. One consists almost entirely of support for a false rumor.

This was in one of the world's most connected cities, with widespread access to the internet, cell phones, land lines, television, newspapers, live webcams and whatever else. Only in the case where the rumor was trivial to refute (for example via this webcam) did Twitter appear to self-correct.

One would be hard-pressed, I think, to distinguish between the actual true rumor (Miss Selfridge set on fire -- that's the name of a store, not a person) and the false rumor about McDonalds based solely on the volume and influence of tweets confirming and denying. Likewise, the unsubstantiated rumor (Police 'beat 16-year-old girl') follows its own pattern, mostly surges of green, but interspersed with yellow.

This may seem like a lot of argumentation just to say "Take your tweets with a grain of salt", but pretty much everything tastes better with data.

Monday, November 28, 2011

Voices from the dashboard

All my life I've taken road trips, partly by natural inclination, partly by necessity. It's a largely timeless experience. Sure, the roads have improved (see the Grapevine Grade section of this page for a good example), the speed limits are higher, cars are faster and safer and there's not a lot of "local flavor" in most stopping points unless you actively seek it out, but for the most part road trips have been road trips since well before Kerouac.

One thing that has changed is the soundtrack, and not just because tastes in music have changed. When I was a kid, any audio not provided by the car and its occupants came from the radio, and if you were on a long haul, it was the AM radio. Keeping FM tuned in was and remains too much of a hassle. An AM station, especially one of the "clear channel" stations (not to be confused with the media conglomerate) licensed to broadcast at high power, could be good for hours -- enough for a whole sports fixture, several runs through the news or all the whacked-out talk radio conspiracy theories you could eat.

The key feature here, particularly on a solo trip through, say, the desert southwest US, was the lack of choice. You'd be doing well to have your pick of baseball, UFO speculation and the company of your own thoughts, and a hundred miles or so out of Albuquerque on a dark night with the game a blowout the UFO speculation starts sounding interesting and plausible.

By the time I was doing my own solo long hauls, cassette tape was an option, but a library of a few dozen albums can be limiting after a while -- and suppose you want to know what's going on in the world, or just let someone else handle the programming for a while? The in-dash CD (briefly supplemented by a multi-disc changer in the trunk) increased one's options, but the same basic constraints applied. Only with the advent of satellite radio was there little reason to tune in to local stations at all.

And now there's the web. As long as you've got a smartphone, bars, a bit of cable and an aux input, you can listen to pretty much anything. Stream your favorite home station. Stream your favorite internet station. Play your podcasts. Dial up Pandora. AM won't be completely disappearing anytime soon -- technologies written off as obsolete seldom do -- but the proportion of people who know or care must be steadily dwindling. Likewise I'd rather not try to predict whether or when web audio will supplant satellite radio, but if I had to place long-term bets, I'd bet on the web.

It's hard to argue that having a huge palette of choices isn't progress of some sort, but there's something to be said for being drawn out of one's comfort zone because there's only one game in town.

Wednesday, November 23, 2011

In which a theorist discovers something unsettling, exhilarating or both

There seems to be a natural human compulsion to keep checking the soup to see if it's boiling, to check the weather, to check the latest sports scores and stock prices, to check for messages, and so on and so forth. One of the less savory properties of the web is that it provides the means to indulge this compulsion to the nth degree.

I personally try to steer clear of this, which is the main reason I'm not on Facebook or Twitter (and not particularly active on Google+), but I'm certainly not immune. Are there any comments on Field Notes? Has anyone read the latest brilliant post (there are at least three ways to check, each giving its own opinion)? Anything new on the few sites I do follow?

Since I'm not on Facebook, I don't play Facebook games, but evidently a lot of people do. Zynga's Farmville, for example, has over 80 million subscribers, still a small minority of the gazillion on Facebook, but a big number in most normal contexts. This has irked traditional computer game creators, sucked up untold hours of human life, and intrigued computer gaming analyst/critic Ian Bogost.

Bogost noted that games like Farmville involve relatively little actual gameplay. Rather, it's the social aspect that seems to dominate. This is nothing new in gaming, but again the natural "I need to check what's going on" factor of the web in general and Facebook in particular acts to intensify this. Bogost coined the term "Cow Clicker" to describe games like Farmville where the action seems to consist mainly in, for example, clicking on depictions of animals when various timers run out.

Unable to leave it at that, Bogost took the next logical step and created a Facebook game called Cow Clicker designed to distill the social gaming experience to its purest elements. It goes like this:

You have a picture of a cow on your page.
You click on it.
It does nearly nothing -- I think maybe it moos or otherwise makes a sound?
You can't click again for six hours.

Yep. That's my story and I'm sticking to it.

If you don't want to wait six hours, you could spend "mooney" -- Cow Clicker's own virtual currency -- to get the right to click sooner. You could earn mooney by clicking on your cow, by having your friends click on feed stories about you clicking on your cow, or by paying a small amount of actual money.

People played this. Not 80 million, but somewhere around 50,000, not too bad for a joke of a game with no marketing behind it.

Clearly the actual cow clicking is a MacGuffin. No one cares much about it. What people care about is whether their friends are also playing and clicking on their feed stories, thereby generating not just more mooney, but, crucially, another thing to check in on.

Bogost had mixed feelings about this. Among other things, he found himself, despite his intentions, checking in on whether people were playing the game and what they wanted from it.

Naturally, people wanted upgrades. They wanted their choice in cows. Cowthulhu was a popular request. Eventually Bogost put up an "app store" with a selection of cows, and (I gather) added another feature or two. If you were really hardcore, you could pay $100 (or the equivalent in mooney from whatever source) for Bling Cow. Why on earth would anyone do this? Well, your friends would all know that you had splashed out for the Bling, and wouldn't they be envious? Again, people actually did this.

Eventually, Bogost was unable to shake the feeling he'd created a monster, and so he brought about the Cowpocalypse. At a preset time -- which players would hasten by actually playing the game but could defer by, yep, paying mooney -- the cattle would all be "raptured", leaving only the empty spaces on which they had once stood. And so the Cowpocalypse eventually came to pass.

At this point, it may not come as a shock that people kept playing. To recap: people were now paying (small amounts of) money for the privilege of clicking on an empty space and letting their friends know about it.

You couldn't ask for a better illustration that when economists talk about "rational consumers", they only mean people that behave as though there's some sort of "utility function", be it ever so screwy, that they're bent on maximizing. "Rational" in the usual sense has got nothing to do with it.

If people were actually rational in the usual sense, Cow Clicker would never have happened, but of course they aren't. We are, at a very basic level, social animals. We want to know what other people are doing. What in particular they're actually doing is often much less important to us than whom they're doing it with and the fact that we know this. If the entirety of Facebook were pushing a button from time to time saying "I'm here", selecting people to notify of that and having the system tell people you're notifying know whom else you're notifying, it would not be outlandish to think people would still use it.

The cynic would say that that really is the essence of Facebook and "social networking" in general, but I wouldn't go quite that far. I said above that what people are doing is often much less important than knowing it and knowing who knows, but that doesn't mean it's always more important. Content can matter -- of course -- but it's worth noting that it doesn't always.

Monday, November 7, 2011

Yay! Yet another way to spam!

While buying something online today, I was presented with a popup asking me if I wanted to chat live with a representative about what looked like a loyalty program. I went ahead and clicked, even though my spidey-sense told me not to.

PhineasTaylor is typing ...

Hello there! Thank you for taking a moment to chat with me about the wonderful opportunity of joining buyeverythingthroughus.com. With buyeverythingthroughus.com, etc., etc.

OK, a little boilerplate to get things going. Hang on though, there's more

PhineasTaylor is typing ...

Buyeverythingthroughus.com will improve your life in every possible way. It will make you rich and famous. It will cure dandruff and halitosis. Children will love you. Adults will want to be you. Your friends will adore you. Your enemies will envy you and then slink away in shame and fear, etc., etc.

Right ... anything else?

PhineasTaylor is typing ...

This P.T. person sure types a lot.

Buyeverythingthroughus.com will cure hunger. It will bring about world peace and universal prosperity. Yankees and Red Sox fans will embrace each other with love in their eyes [well, maybe it didn't go quite that far].

Since this is ostensibly a person typing, it's coming across slowly enough that there's plenty of time to go googling and find out that buyeverythingthroughus.com is about what you'd expect it is.

Knowing all that, what do you say to this exciting opportunity?

I said "No, thank you" and dismissed the chat window. I couldn't help wondering, though, whether whoever coded this up had the chutzpah to submit a paper on an exciting new "intelligent agent".

Banking on web security

People do care about web security. There are highly competent full-time professionals in the field. There are conferences on the subject on a regular basis. You'll see them in the press -- Experts Meet to Fix Security on the Web.

And yet, in large part because the problems to be solved are hard and involve significant non-techical factors, there is no shortage of things that could stand to be fixed.

Authentication is a mess. For the most part, we have passwords and security questions. I've griped about this before, multiple times, and I'm sure I'll gripe about it again.
Identity is a mess. Everyone has scads and scads of identities -- logins here, there and everywhere. They can easily get confused ("That wasn't me, that was some other David Hull!"). There's no good way to say two random identities are or aren't the same. I've griped and speculated about this before, too, and I expect I'll have more to say on that, too.
Anonymity is problematic. Everything you do on the web leaves traces, but unless you're paying extremely close attention you generally don't know exactly what kind, or whether they can be tied to your identity (whatever that is).
Network infrastructure is scary. Https with certificates is widely deployed, and most people probably at least know that some sites are "secured" and some aren't, but many fewer understand (or should need to understand) details like signatures, secure hashes and certificate authorities, or what can fail and what's less likely to. Did I mention DNS?
PCs are scary. Viruses, rootkits, system crashes ... some platforms are better designed than others, but nothing's perfect.
The cloud has its own problems. Who owns what you put there? Who's liable if data is lost or compromised? Who can see what? Who can see who sees what?
Spam is a perennial problem, not helped by any of the above.

I could go on, but if it's so bad -- and it is -- how does it work at all? People continue to be able to use credit cards both online and in person, people continue to email and text each other all sorts of sensitive information, people continue to turn to the web for all sorts of vital information. Clearly Bad Things can happen to a person on the web, but just as clearly it's not bad enough often enough to put people off the web entirely. Far from it.

My guess is that banks have a lot to do with it, at least in the US. In particular

Banks handle liability. If someone steals your credit or debit card, whether physically or online, you can tell your bank and generally they will make sure you don't have to pay for things you didn't buy. That's oversimplified, and there are certainly cases where that simple process has turned into a nightmare, but it's still a vital part of getting people to do business confidently online.
Bank cards provide a de facto stable identity. If you're buying something from my web site, I do care who you are (well, I would, and stores in general do seem to care what their customers are up to), but I certainly also care that your payment is going to go through. To some extent I'm talking to you, but I'm also talking to your bank account.

On the first point, you're not responsible for keeping your bank accounts absolutely safe. You're responsible for taking reasonable precautions, so that if someone does get hold of your account number and misuses it, they're clearly at fault (the usual "I'm not a lawyer" disclaimer applies here). Putting the rest of the burden on the banks and legal system is a large part of what keeps the wheels turning.

On the second point, if I shop at store A and store B, it's important that my bank knows that those purchases both come out of my account, and I know that I'm the same person in both cases (at least on a good day). It's less important that store A and store B know I'm the same person. There may even be cases where I'd rather they didn't know.

In short, security and identity matter when money is at stake, in which case your accounts serve as your identity and you have legal protections that predate the web.

Security and identity also matter where reputation is at stake, that is in the social realm, be it email, social networks, Twitter or whatever. The landscape is different there, but it's worth noting that most accounts and identities, including your bank accounts, don't play into that much. If someone compromises my account at widgetco.com, they might be able to have a truckload of widgets sent to my address at my expense, but they won't be able to say embarrassing things about me on this blog. Likewise if they compromise my bank account, though that would of course be bad for other reasons.

If you buy that, then you should make sure to use strong unique passwords and unique security questions for your bank accounts, your email accounts and your major social accounts, and use better security than that when it's available. How much to worry about other accounts depends on how closely they're tied to the accounts that matter. For example, if your city's online parking ticket paying site doesn't remember credit card numbers or your nefarious history of overparking, you probably don't care as much about security there.

Friday, October 14, 2011

Dennis Ritchie, 1941 - 2011

I have no intention of turning this blog into an obituaries column, and no desire to see "celebrity deaths come in threes" spill over into the tech world, but having noted the passing of Steve Jobs I feel obliged to note the passing of Dennis Ritchie as well.

You may or may not have heard of him before. It took the major news outlets a while to pick up the story, and even then it wasn't front page. For hours the main public source was colleague Rob Pike's Google+ page. That's not too surprising. CEO of major corporations and eminent computer scientist are two completely different gigs. Nonetheless, Ritchie had as profound an effect on the Web As We Know It as anyone else, even though his groundbreaking work predates the web by a good measure.

It's fair to say that the web as we know it would not exist if not for Unix. The first web server ran on NeXTSTEP, which traces its roots to Unix [and, in fact, NeXT was run by the late Steve Jobs -- tech is a small world at times -- D.H. Nov 2018]. A huge number of present-day web servers, large and small, run on Linux/GNU which, even though the Linux kernel was developed from scratch and GNU stands for "GNU's Not Unix", provide an environment that's firmly in the Unix lineage. The HTTP protocol the web runs on has its roots in the older internet protocols and belongs to a school of development in which Unix played a major role.

Ritchie was one of the original developers of Unix.

The Unix operating system, the Linux kernel, many of the GNU tools and countless other useful things (and at least one lame hack) are written in the C language, which is also one of the bases for C++, C#, Objective C and Java, among others. All in all, C and its descendants account for a large chunk of the software that makes the web run, and for years, before the ANSI C standard, the de facto standard for the language was a book universally called "K&R" after its authors, Brian Kernighan and Dennis Ritchie. That flavor of the language is still called "K&R C".

Ritchie continued to do significant work throughout his life and won various high honors, including the Association for Computing Machinery's top honor, the Turing award, and the US National Medal of Technology. He was head of the Lucent Technologies System Software Research Department when he retired in 2007. He may not have been a cultural icon, but in the world of software geekery he cast a long shadow.

RIP

Thursday, October 6, 2011

So ... what version are we on?

Trying to do a bit of tidying up, I tagged a previously-untagged recent post "Web 2.0". I did this because the post was a followup to an older post that was specifically about Web 2.0, but it felt funny. Web 2.0 is starting to sound like "Information Superhighway" and "Cyberspace". A quick check of the Google search timeline for the term suggests that usage peaked around 2007 and has been declining steadily since. Always on the cutting edge, Field Notes uses the tag most heavily in 2008.

Google's timeline isn't foolproof. Anything given a date before the late 90s is probably an article that mentioned the date (and Web 2.0) and gave no stronger indication of when the page is from. On the other hand, the more recent portion is probably more representative, since there's more metadata around these days. Also, the numbers are larger, which is often good for washing out errors.

But anyway, are we still in Web 2.0? Are we up to 3.0? Does it really matter (spoiler: probably not)?

I've argued before that while Web 1.0 was a game-changing event, Web 2.0 is more a collection of incremental improvements. Enough incremental improvements can produce significant changes as well, but not in such a way as you can draw a clear bright line between "then" and "now". The Linux kernel famously spent about 15 years on version 2.x, only just recently moving up to 3.0, and Linus says very clearly that 3.0 essentially just another release with a shiny new number. From a technical standpoint I'd say we've been on Web 2.x for a while and will continue to be for a while, unless we decide to start calling it 3.x instead.

Because, of course, "Web 2.0" is not a technical term. Never mind who uses it to what ends in what context. The ".0" gives the game away to begin with. A real version 2.0, if it ever exists, is very soon supplanted by 2.0.1, or 2.1, or 2.0b or whatever as the inevitable patches get pushed out, which is why I was careful to say "2.x" above. "2.0" as popularly used doesn't designate a particular version. It's supposed to indicate a dramatic change from crufty old 1.0 (or 1.x if you prefer). In the real world of incremental changes, that trope will only get you so far.

Hmm ... in real life versioning usually goes more like

0.1, 0.2 ... 0.13 ... 0.42 ... 0.613 as we sneak in "just one more" minor tweak before officially turning the thing loose
1.0 First official release. Everyone collapses in a heap. The bug reports start coming in
1.1 Yeah, that oughta fix it.
1.1.1, 1.1.2 ... 1.1.73 ... the third number emphasizing these are just "small patches" to our mostly-perfect product -- bug fixes, cosmetic changes, behind-the-scenes total rewrites, major new features important customers were demanding, that sort of thing.
2.0.1 OK, now we've got some snazzy new stuff. Anything coming up for a while is just going to be a "minor update". Everyone collapses in a heap. Bug reports keep coming in.
2.0.2, 2.0.3 ... yeah, we've seen this movie before
5.0, because our latest version is so much better than anything you've ever seen, including our own previous versions (Actually, version 3.x ended in tears, 5.x is largely a rewrite by a different team and no one knows what happened to 4.x -- maybe that's why one of the co-founders was sleeping under their desk and living on pizza for a couple of months?).
5.0.1, 5.0.2 ... you know the drill
Artichoke. Yep. Artichoke. Version numbers are so two-thousand-and-late [already well out of date when I wrote that ... how meta ... -- D.H Dec 2018]. We're going with vegetables now. Already having long meetings on whether it's Brussels Sprout or Broccoli next.
Artichoke 1.1, Artichoke 1.2 ...

Wednesday, October 5, 2011

Steve Jobs, 1955-2011

Well, we all knew it was coming, but you could still feel the earth shift. None of us in the tech business has remained untouched by Jobs' work, and by extension, Jobs himself. There was never, nor will there ever be, anyone quite like him.

RIP

Crowdsourcing the sky

Astronomy has been likened to watching a baseball game through a soda straw. For example, the Hubble Deep Field, assembled from 342 images taken over the course of ten days, covers about 1/500,000th of the sky, or about the size of a tennis ball seen a hundred yards away. It's quite possible to survey large portions of the sky, but there are trade-offs involved since you can only collect so much light so fast. To cover a large area and still pick up faint objects, you need some combination of a big telescope and a lot of time. The bigger the telescope (technically, there's more to it than sheer size) the faster you can cover a given area down to a given magnitude (how astronomers measure faintness).

The Large Synoptic Survey Telescope (LSST) is designed to cover the entire sky visible from its location every three days, using a 3.2 gigapixel camera and three very large mirrors. In doing this, it will produce stupefying amounts of data -- somewhere around 100 petabytes, or 100,000 terabytes, over the course of its survey. So imagine 100,000 terabyte disk drives, or over 2 million two-sided Blu-ray disks. Mind, the thing hasn't been built yet, but two of its three mirrors have been cast, which is a reasonable indication people are serious. Even if it's never finished, there are other sky surveys in progress, for example the Palomar Transient Factory.

Got a snazzy 100 gigabit ethernet connection? Great! You can transfer the whole dataset in a season -- start at the spring equinox and you'll be done by the summer solstice. The rest of us would have to wait a little longer. My not-particularly-impressive "broadband" connection gets more like 10 megabits, order-of-magnitude, so that'd be more like 2500 years, assuming I don't upgrade in the meantime and leaving aside the small question of where I'd put it all.

Nonetheless, the LSST's mammoth dataset is well within reach of crowdsourcing, even as we know it today:

Galaxy Zoo claims that 250,000 people have participated in the project. Many of them are deadbeats like me who haven't logged in for ages, but suppose there are even 10,000 active participants.
The LSST is intended to produce its data over ten years, for an average of around 2-3Gbps. Still fairly mind-bending -- about a thousand channels worth of HD video, but ...
Divide that by our hypothetical 10,000 crowdsourcers and you get 200-300Kbps, not too much at all these days. Each crowdsourcer could download a 3GB chunk of data in under an hour in the middle of the night or spread it out through the day without noticeably hurting performance.
Assuming you kept all the data, you'd need a new terabyte disk every few months, so that's not prohibitive either.
The hard part is probably uploading a steady stream of 2-3Gbps (bittorrent wouldn't help here, since each recipient gets a unique chunk of data). As far as I can tell the bandwidth is there, but at that volume I'm guessing the cost would be significant.
In reality, there would probably be various reasons not to ship out all the raw data in real time, but instead send a selection or a condensed version.

Bottom line, it's at least technically possible with today's technology, to say nothing of that available when the LSST actually goes online, to distribute all the raw data to a waiting crowd of amateur astronomers.

Wikipedia references a 2007 press release saying Google has signed up to help. As usual I don't know anything beyond that, but it does seem like a googley thing to do.

Monday, September 26, 2011

Real science, hot off the web

A while ago I commented on an Economist article claiming that Web 2.0 tools "were beginning to change the shape of the scientific debate." My contention was that the web wasn't so much changing the debate as changing the means of publication. In particular, there had always been a trade-off between speed of publication and thoroughness of review, and the web was becoming a publishing mechanism of choice on the lightly-reviewed end of that continuum.

More recently, looking for something I no longer recall, I ran across Cornell's arxiv.org (I assume the x is meant to represent a Greek χ), a repository for "Open access to 703,281 e-prints in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance and Statistics." The number 703,281 was current when I scraped it. It's probably higher by now.

That's a lot of articles, but does anyone use it for anything important? Well, one recent entry is Superluminal neutrinos in long baseline experiments and SN1987a (Cacciapaglia, Deandrea, Panizzi et. al., yes, those neutrinos). Indeed, there seems to be a lot of activity in the experimental high-energy physics section overall, which makes sense. It's useful to have experimental results available quickly, bearing in mind that there can be quite a bit of calibration, number-crunching and checking before an experimental paper is ready for public consumption (months, in the case of the neutrino paper).

Submissions to arxiv.org "must conform to Cornell University academic standards". It's not immediately clear to me what process is in place to ensure this, but from a little browsing it's clear that these are serious academic papers. It also seems reasonable to assume that most of the papers have not been through the full process of peer review required for publishing in a major journal. Indeed, the published version is almost certainly not going to appear on such a site, if only for reasons of copyright.

It seems like a good niche to fill. If you have a significant result that you're comfortable sharing with the world and staking your reputation on, there should be some way to make it immediately available, with the implied tradeoff between speed and thoroughly careful vetting. Publishing under the aegis of a major university gives everyone some assurance that you're at least doing real research. I notice from random sampling that the reference sections generally don't cite arxiv.org, giving some indication of the preliminary nature of the publications.

With that in mind, arxiv.org looks like a great resource not only for working academics but for the curious layperson as well.

Wednesday, September 14, 2011

Urbandictionary vindaloo

Sometimes it's good to remember that www stands for World-Wide Web. One fun example is Samosapedia, a burgeoning collection of slang from India and thereabouts, built on the same basic user contribution and rating scheme as its older cousin Urbandictionary, but twelve time zones away and, from my brief survey, without so much outright gamy material (somewhat off-color material, on the other hand ...).

Slang is an interesting window into a culture or cultures. The slang here is every bit as lively as anyone else's, and thanks to the web we can all get a glimpse. A few more-or-less random examples

Cup ice cream -- literally, what you'd expect, but the local flavor is in the details
Item number -- a bit of Bollywood
Do the needful -- um ... just how would you say that in, say, American English?

My favorite, on a par with the mighty might could, is cannot able to (listed under cannot able to be and, commentary notwithstanding, no more "warped" than many an "acceptable" construct).

Forbes on passwords

Wandering through the web (but not necessarily figuring it out as I went along) I ran across a slide show in Forbes on the subject of passwords, with what seems to me mostly reasonable advice. Some highlights, mostly common-sense stuff that bears repeating:

Change important passwords frequently and don't reuse them
Use different passwords for different purposes.

Important passwords (e.g., for bank accounts) should be unique.
Less important accounts can share passwords, but be aware that if one account is compromised you should consider all of them compromised.

Don't choose a password that's ever appeared elsewhere. This rules out memorable phrases like "We the people of the United States of America".
Passwords should contain nothing personally associated with you (basically a version of the previous item).
Password managers may be useful. The advantage is you can use random gibberish and the manager will remember it for you. The disadvantage is that if the master password is ever cracked, you're completely hosed.
Use HTTPS when logging in. HTTPS encrypts all connections and uses digital certificates to ensure that you're really talking to whom you think you are (just exactly how secure this system is is a whole other can of worms, but for now let's assume it's basically OK). You can tell if you are because web sites with it start with "https://" instead of "http://" and browsers now indicate whether you have a secure connection
Don't type your password into anyone else's machine.
Assume that a public WiFi access point is just that, public (the actual slide says to avoid it entirely). If you're not using an encrypted connection of some sort (HTTPS, SSH, a VPN or such) assume that anyone can see your network traffic, including passwords you type when you log in. Also assume that any random person can see anything that's publicly shared on your computer (another fine can-o-worms).
Don't depend on passwords generated by web sites or random software. Even if everything's on the up-and-up, it's very easy to get password generation wrong, typically by using a weak random number generator (see this post for more on generating passwords).
Archive your important passwords in case of catastrophe, for example by writing them down on a piece of paper and storing it in a safe deposit box that can be opened in an emergency.
In general, if you're going to record a password somewhere, do it on a physical medium separate from your computer (see disadvantage of password managers, above).

There are also a few items that don't seem actively harmful, but probably don't help greatly either

When replacing letters with numbers and such, use non-obvious numbers, e.g., r7place instead of r3place. This will add a few bits of entropy, which is good, but not really good enough on its own. If your base word is in a dictionary of 500,000 words and you replace up to three characters with one of 15 replacements, you have about 30 bits of entropy, which is not that much.
Add a number to the end of sentence-based passwords "for extra uniqueness". Adding a number adds about three bits of entropy. Meh.
Scramble a password when writing it down. This will make it harder, but not impossible, for someone who finds your written password to figure out the actual password, but it will also make it harder for you to come up with the actual password at two in the morning when you discover you don't quite remember how you scrambled it and the Very Important Site locks out accounts with more than three login failures. Of course, you could write down how you scrambled it ...
Deliberately misspelling words can make passwords more secure. Yes, but not very much more secure.
Use a sentence with lots of words, and include punctuation. In theory this can work, but in practice people come up with much-less-than-random-words, particularly if the sentence actually makes sense. Also, surprisingly many systems get indigestion if you try to use a long password.

Tuesday, August 30, 2011

Considerate software

I first heard the motto "Considerate software remembers" a job or two ago from interaction designer Carl Seglem, who credited it to Alan Cooper of About Face fame. The phrase has stuck in my head ever since, so the other day I went searching for it and found this extract on codinghorror.com.

There's a lot to like about the very idea of considerate software. If I'm using a piece of software, I want it to do something for me. I'm going to be devoting a great deal of attention to it, asking it to do this or that and expecting responses to those requests. Ideally, someone or something I'm working with that closely will treat me considerately, just as I should make every effort to treat a person I'm working with considerately.

More subtly, the metaphor of considerate software cuts the designers and implementors of the software completely out of the picture. This is surely deliberate and completely appropriate. Once software is deployed, the designers and implementors are out of the picture. I can't come and ask them how to deal with some puzzling or frustrating bit of behavior (and lucky for them, sometimes). As far as I'm concerned it's the software that's being helpful or annoying.

There are clearly limits on how considerate software could possibly be. If I decide to type in a long treatise on considerate software into the "shipping address" field of some form, I wouldn't expect the app to respond "Why yes, that's very interesting. I personally find Cooper's work exemplary. Shall we continue this conversation over coffee?" However, it doesn't seem too much to expect a politely phrased, helpful response pointing out that "I first heard the motto ..." followed by several paragraphs does not look like a valid street address.

I don't need to go into detail here about how far short much software falls in this regard. I'm sure you've got your own examples. Neither do I want to go into how and why software comes to be inconsiderate, though that's an interesting topic in itself. Instead, I'd like to go into what qualities make software considerate or inconsiderate.

The list I referred to above hits a lot of interesting points, but it feels more like a list of this and that than a thorough taxonomy. In particular, the headings, while snappy, don't always seem to match up well with what they head.

Some of the points fall under "Considerate software remembers":

"Considerate software takes an interest" is really just saying it shouldn't ask for the same information over and over. That is, it should remember what you've already told it.
"Considerate software is perceptive" says that software should remember what we do. It also says that it should adapt its behavior based on what it knows. More on that shortly.
"Considerate software takes responsibility." says that software should remember where it is and be able to restore its state as closely as possible to where it had been before something derailed it.

Other points assert that software should know the kinds of things that we know and it can reasonably be expected to know:

"Considerate software uses common sense." Common sense is not some magical filter that separates sensible behavior from senseless. It's largely a body of knowledge, whether learned or instinctive. To keep from, say, sending a check for $0, it needs to know that checks should only be sent for positive amounts.
"Considerate software anticipates needs." To anticipate needs, a piece of software needs to know what those needs are.
"Considerate software knows when to bend the rules." Is saying that it should know how (and when) to do more than just the narrow definition of its task.
"Considerate software is forthcoming." says primarily that software should actually tell us useful information that it knows, but to do that it may need to know information outside a narrow view of what it should be doing.

A third set has more to do with knowing when and when not to offer information

"Considerate software keeps you informed/is forthcoming." Not only should it know useful things we didn't specifically ask it to know, it should let us know that and modify its behavior accordingly. But ...
"Considerate software doesn't burden you with its personal problems/is self-confident/doesn't ask a lot of questions." It should limit itself to interactions useful to us, present information in ways that are easy for us to absorb and ask for information in ways that are easy for us to present.

A couple seem more about letting us exercise our judgment instead of trying to exercise it for us

"Considerate software is deferential." Software should not prohibit things that might be useful. Instead it should make sure we know the consequences of a choice and then let us make it. It occurs to me that the "undo" feature is particularly helpful here.
"Considerate software is conscientious." The principle here seems to be that software should know that some things are dangerous and not simply assume that we mean to do them.

Taking a stab at boiling this all down:

Considerate software knows as much as reasonably possible about its domain.
Considerate software remembers what's happened, what we've told it and what it's told us.
Considerate software modifies its behavior where appropriate based on the above.
Considerate software gives us ways to access to what it knows (including the state of the world as it used to be).
Considerate software actively tells us important things we might not already know.
Considerate software communicates efficiently -- taking into account how human minds work.

These principles seem fairly universal, but it's worth noting that one of the first extensions to the original web protocols, and one that enabled major improvements in the experience of using the web, was the cookie -- a way of letting a web site remember things that have happened before and, ideally, act accordingly.

Saturday, August 27, 2011

Building a better password

[I've updated this post slightly to reflect the back-of-the-envelope calculation in this post suggesting that 100 bits of entropy is probably more reasonable than my original statement that 48 bits was "not bad". Under the assumptions in that post, a 48-bit password would take on the order of microseconds to crack --D.H. Feb 2020]

I've recently complained about the irritating nature of the password strength checkers that have been popping up everywhere, so I feel obliged at least to try to analyze the problem and offer solutions. This is leaving aside the question of whether password authentication is a useful approach at all.

Fundamentally the real measure of password strength is how many passwords you'd expect to have to guess in order to get the right one. A more formal version of this is the notion of bits of entropy. If you had a list of all possible passwords in your scheme, I could identify any particular one so long as I could get answers to a series of yes/no questions, for example: "Is it in the first half of the list or the last?", "Is it in the first half of that half or the last?" and so forth. The number of such questions I need is the number of bits of entropy. Twenty questions means twenty bits, etc..

If I know that your password is either "0" or "1", you have exactly one bit of entropy. If I know it's an uppercase letter, lowercase letter, digit, "$" or "%", there are 64 possibilities, so you have 6 bits of entropy. If I know it's two such characters, you have 12 bits, and if it's seventeen such characters you have 102 bits, which is not too bad. Someone trying to guess your password would have to guess about two thousand billion billion billion passwords, on average, before stumbling on yours. That may seem like a lot, but keep in mind that the current network of Bitcoin miners can try on the order of a hundred thousand billion billion hashes -- roughly the same problem as guessing a password -- every second.

[Don't assume that guessing a password requires typing it in to the same text box you have to use. If someone steals the right data from your service provider, they can throw as much computing power as they've got at guessing the passwords. Quite possibly they'll be happy enough just to try a few thousand weak passwords for each account, since that will crack depressingly many, but attacks like running through the OED with simple substitutions of letters for numbers are absolutely feasible as well, even on fairly ordinary hardware.]

This is assuming that you picked eight characters at random. If I knew instead that your password was either "F1%ldN0t3$" or "sasssafras" (maybe I'd watched you read your password off a piece of paper with only those two words on it but couldn't quite see which you were typing), then you have only a single bit of entropy, even though both passwords are not just eight but ten characters long and one has plenty of non-letters.

More realistically, if I knew you'd picked an uncommon English word and maybe changed some of the letters to numbers, you'd have somewhere around two dozen bits of entropy. That's not nothing, but keeping in mind that each added bit doubles the number of passwords a cracker has to try, it's nearly a billion billion billion times weaker than the 102-bit scheme above.

The fundamental flaw of password strength checkers is that they can only look at the password you gave them. They have no idea what other possible passwords you might have chosen. The assumption is that if you're forced to jump through enough hoops you'll be forced to expand your parameters, but in fact it's possible to generate passwords in a secure manner using only letters, and or to generate them insecurely in a way that will still satisfy any strength checker out there. Which is why I half-grimace, half-laugh when I see the "password strength indicator" jump from "poor" to "great" as soon as I type a number.

Now, it's perfectly possible to generate completely random 17-character passwords. The problem is that something like "qcrQf1x2" or "u%js%hPQ" is a pain to try to memorize, so most people will fall back to picking a "hard" word and maybe altering it a bit. However, as xkcd points out, it's possible to do a lot better by using random short words.

For example, here's a kind of clunky way of producing a random, memorable password:

BIG HONKING DISCLAIMER: This is just for demo purposes. The second site I mention uses http, not https, so in theory anyone could be looking in on your session. Even with https, the sites might be logging all your traffic and recording the results you come up with. I personally seriously doubt they would, and it's hard to imagine they would be able to connect the dots and figure out what you were using the generated password for, but if you really want to be on solid ground, get the source, look it over, run it locally and use something like /dev/urandom or D&D dice to generate the random input (23d20 will give you close to 100 bits ... not that I would have any idea at all what "23d20" means). There are also smartphone apps that do more or less the same thing, I believe.

[I last checked that the recipe below worked on 28 Feb 2020]

With that out of the way:

Go to this site and copy the random string you see there (e.g., 60990FFC250C). If for some reason you don't like what you see, just reload.
Go to this site.
Type some short number and a space into the Challenge box and paste the random string from the first step in after it (e.g. 123 60990FFC250C)
Type anything at all into the Secret box (e.g., "secret"). This doesn't have to be hard to guess. The real entropy is coming from the random string (alternatively, put any number you like, a space, and anything else into the "challenge" box and paste the random bytes into the "secret" box).
Press the Compute with SHA-1 button. Again, the cryptographic details of how strong SHA-1 is don't matter here. You're just converting a random number to short words. A simple table lookup would do just as well.

In the Response box you will see six short words followed by some hexadecimal gibberish (in this case, WOVE COOT SLEW WIT SIGH I (FE2D 5F7B 22CD BC39)). Each of those words represents just over 10 bits of entropy. We'll need ten words in all, so repeat the procedure but this time just take the first four words (I got FIRE CUFF GALA MINK from A4B455FEFFE7BFAD).

You can play around with this formula to get words that are easier to memorize, or type, or are just more to your liking. If you reorder your words or try typing in several different things instead of 123 or secret and then picking what you like, you're decreasing your entropy, depending on what criteria you're using to filter out passphrases you don't like and whether your attacker knows what kinds of phrases you like. If you just try a few different secrets until you see something that seems memorable, that should be fine. If you do something like sort the words (and your attacker knows only to try sorted lists of words), you've lost almost 22 bits of entropy, which wouldn't be good.

Once you've selected your words add a random punctuation character, number, capital letters or whatever makes your site's password strength checker happy. Voila! Your password is now Wovecootslewwitsighifirecuffgalamink5? or whatever. This isn't great to have to type, but it's pretty secure as passwords go, and probably better than trying to remember something like C;cTbfThoO4ePFTt or 67EE386A205C4563DB8908A6C4.

If your site's password checker imposes an 8-character limit (and, incredibly enough, some do), cry.

Oh right ... I write a blog, don't I?

A couple of housekeeping items, before I attempt to get back to real blogging:

No, I haven't fallen off the face of the Earth, been trapped under a large object or wandered off to Nepal to contemplate the mysteries of the universe. Just busy, and decided to devote what little blogging bandwidth I've had lately to contemplating the nature of awareness on the other blog. Hmm ... maybe Nepal wasn't so far off.
A couple of logins ago, AdSense advised me that I appeared to have a "popular blog" and I should consider advertising on it. I'm always glad to know that people are reading Field Notes, but I suspect that AdSense and I have somewhat different notions of "popular". As much as I would like to bump my employer's revenue stream up by another 0.0000000000000001% or so, I have no plans to do that at the moment or any time soon. I'm not against running ads per se, but I don't see the point of cluttering up the layout for what I doubt would be any significant gain. If you ever do start seeing ads here, it will be because there has been a dramatic surge in demand for occasionally-posted web.musings, in which case why not?
Prompted by a couple of recent comments, including a couple of completely appropriate ones, I've settled on a definition of spam comments: If it's completely independent of the post it's supposedly commenting on, it's spam and will be summarily removed. Mentioning your favorite business as part of a thoughtful response to a post on customer service is just fine. Mentioning your website, commercial or otherwise, with nothing more than a generic "Hey, great blog!" comment is spam.
Mind, I reserve the right to delete any comment for any reason or no reason (hey, it's my blog). But as a practical matter I'd only expect to do so in cases of spam or incivility, should it occur. As part of recusing myself from matters Google (and yet still trying to write about the web), I would also remove any speculation about what Google might be up to, be it public information or not, accurate or otherwise. I don't expect that to be a problem, but thought I'd mention it.

And ... we're back!

Friday, July 29, 2011

Worst ... user experience ... ever (how to turn on your wireless radio)

[Doing a little more searching after I wrote this, I eventually learned that it's not just a few oddball brands that have this problem, and that some models really do have physical switches. If you're banging your head trying to get your wireless turned on, this list might help. It told me where to look again for the small, black-on-black slider switch on the laptop I was dealing with.

As always, I make no warranty that this will help you. I particularly don't vouch for the spammers halfway down offering to crack passwords for you. --DH]

OK, this isn't really much to do with the web, except that you can't really talk about the web unless you can actually connect to it, and it's really just a bunch of griping, but ...

Who in the Windows world decided it was a good idea to make laptops keep their wireless radios turned off until you find the right magical incantation to turn them on? Did Steve Jobs sneak into Steve Ballmer's house at night and put an iPod loaded with subliminal messages under his pillow? "Turn the wireless radios off ... trust me ... people will love it!" Did someone decide that having wireless connectivity was too simple and useful? No? What, then?

I'm trying to imagine a portable device in this day and age that you don't want to be able to connect to the the nearest hotspot. Smartphones do it. Tablets do it. Netbooks do it. Even set-top TV boxes and video games do it. One of the first things you do with most new gadgets is locate the nearest hotspot, connect up to it and say "Ah ... that's better." At least if you're me, anyway.

Is this supposed to save the battery? I can see that, but why have a separate control? There's already a "disable" option for the wireless if you want to go offline (or wired, or into "airplane mode"). That should turn the radio off, no? Conversely, if I enable the wireless I want the radio on. Duh. Do I really have to spell that out? Evidently.

OK, fine. You need to turn the radio on before I can use the wireless. How do you do it?

Typically you futz around the network area of the control panel until you stumble on a help message that says to flip a switch on the front or side of the laptop. I have never seen such a switch. Why would there be such a switch? How many other such switches are there on a modern laptop? Typically, there's a power button and ... um ... yeah, that's about it. [As mentioned up top, I have now seen such a switch. I am no more impressed than before.]

I've seen other attempts at handy buttons for some novel function, but always in the keyboard area, and never for very many product cycles. A switch is another moving part and an added design and manufacturing expense in a cutthroat business. It only makes sense if it's for something that people really want to be able to do in one quick step. Who, exactly, is asking for the ability to instantly make their mobile, web-enabled computer nearly useless?

So there's no physical switch readily apparent. That leaves the software equivalent. The previous time I had to jump through this hoop I was able to find some forum somewhere that said what to run to do the trick. This time -- as you an probably guess -- not so much.

Oh, there's a function key that will pop up a grayed-out-looking but otherwise pretty little box with an icon denoting the wireless radio, x-ed out with a nice red x (Dedicated function key? Who are all these people asking for a shortcut to do something I've never, ever wanted to do nor known anyone who admitted to wanting to do?). Clicking on the box does nothing. Pressing the function key again in hopes that it's a toggle that I somehow just turned off does nothing.

There's in icon in the tray at the bottom, bearing a similar x-ed out icon, that you can right-click on. It will tell you that you need to turn your radio on.

There's a setup application supplied by the hardware manufacturer (this is one area where closed architectures like the Mac win). It offers to set up the wireless hardware for you.

But first you have to turn the radio on. Of course.

Search the forums. Someone suggests uninstalling the drivers and rebooting. Well naturally. If I want to turn on the lights in my house the first thing I do is uninstall the wiring (never mind rebooting). Try that. Nope. Flip a couple of checkboxes buried deep in the bowels of the "Device manager" menus. Nope, sorry. Maybe the drivers -- that the manufacturer shipped with -- are out of date? You could try updating them.

If you had an internet connection.

Maybe drop-kicking the thing off the roof of a tall building will do it? Seems worth a try ...

Wednesday, July 27, 2011

Shave and a haircut: two bitcoins

Someone the other day was mentioning Bitcoin, which calls itself the first decentralized digital currency. Regular readers of this blog, a select group to be sure, will probably not be surprised that this sent my not-so-disruptive-technology sensors into high gear. So what's a decentralized digital currency?

Virtual worlds often have virtual currencies, which citizens can earn by doing various things in the virtual world and which they can exchange within the world. In at least some cases these virtual currencies have leaked into the real world, or been tied to real money to begin with, not always with happy results. One can view Bitcoin as abstracting that process and removing it from the confines of a closed, proprietary virtual world.

Bitcoin uses a modest ensemble of established crypto techniques to create a public audit trail certifying that a particular person has generated a Bitcoin, or that one person has exchanged some possibly fractional amount of Bitcoin with another (and by "person" I really mean "whatever has control of a given private key"). Generating bitcoins and certifying transactions requires a non-trivial amount of computation, much as generating money in a virtual world requires a non-trivial amount of whatever one does to earn money in that world.

There are various safeguards to ensure that each unit of Bitcoin has exactly one owner and everyone has a consistent view of who owns what. That view can change over time. In other words, Bitcoin meets some basic requirements for a currency: It is transferable, limited in supply and difficult to duplicate or forge. So far, so good.

It occurs to me that there is actually already a very widely-used decentralized digital currency, namely money.

While it is still possible to exchange cash for goods and services, an awful lot of commerce gets done without it. Instead, various banks and other entities simply increment and decrement balances in various accounts. If I pay you, my balance goes down, yours goes up and one way or another our banks and various intermediaries get to take a cut. This is certainly digital, and it's certainly currency. It's also decentralized, in that there are many banks, particularly once we move into the international arena, and not even the various central banks have complete control of what happens.

However, it's not as radically decentralized as Bitcoin aims to be. Bitcoin aims to take out all intermediaries. If I pay you in Bitcoin, everyone in the system will be informed, reasonably soon, that I now own that much less Bitcoin and you own that much more. All participants are an essentially equal footing. There are no banks, clearinghouses or other such entities at all.

More precisely, everyone learns that whoever controls my private key has that much less and whoever controls yours has that much more. Whether anyone knows who controls what keys is a separate matter. Bitcoin uses pseudonymity -- known names tied to possibly unknown entities -- to recapture some of the anonymity of cash transactions.

The Bitcoin documentation is very careful to make the classic economical distinction between value in use and value in exchange. The computational work done in producing Bitcoin and validating transactions is not inherently useful. It basically consists of guessing numbers until one the right one comes up (technically, one that contains a given bit string and hashes to a particular value). The value, if any, comes of people being willing to use Bitcoins in exchange, that is, as currency. This is no different from printed pieces of paper or numbers in databases or, for that matter, materials like gold whose prices -- that is, their exchange rate with paper currencies -- are largely decoupled from their practical uses.

So this looks well thought through and doesn't seem wildly implausible. Why was my spidey-sense tingling?

In trying to make sense of this I went back and reviewed the concept of currency. Except there doesn't seem to be a nice, crisp, near-universally accepted concept of what makes currency work. Scarcity is required, in the sense that the supply of currency must be bounded, albeit typically large. Gold and other precious metals are hard to produce. Coins are limited by fiat -- the king's mint will only put his face on so many coins, and woe betide the counterfeiter -- making it less important what the coin is made of. Notes carry this one step further. Clearly it doesn't matter much how much the paper and ink is worth, only that it's difficult to duplicate the note itself.

Numbers in databases are completely abstract, and they seem to work fine. So why not Bitcoin?

At the end of the day, currency has to be exchangeable for something useful, for example, food. This can only happen if the person accepting currency in exchange can be confident that they in turn will be able to exchange it for something useful to them. Bitcoin works hard to ensure that it will behave essentially like physical cash and carefully-regulated changes in bank balances, but that still doesn't make it a currency.

And that's the crux of it. Will people trust that Bitcoin will remain exchangeable? What is the mechanism for maintaining confidence? Typically, this confidence is based on confidence in a government, but other systems work as well. Failed states may continue to circulate currency well after the government has collapsed. Some countries are perfectly happy to use another country's currency. Local communities have been known to create their own currencies which rely on the communal bond among members. All of these and more can work, so why not Bitcoin?

Well, maybe it can.

The best measure I can think of for the viability of a new currency is how it converts to and from existing ones, and there are Bitcoin currency exchanges which do just that. From what I can tell, the jury is still out, if only because Bitcoin hasn't been around that long yet. Bitcoin is currently trading around $14, but it's been as high as twice that in the past couple of months and much, much lower not long before that [and on 28 November 2011, around $2.75, less than 10% of the all-time high ... given that the earth shook slightly when the Swiss Franc dropped from around $1.27 to around $1.16 and that Sterling's fall from 2.80DM to around 2.55 helped bring down a government, this sort of volatility does not look good ... my source for the price, mtgox.com, is now offering options and margin trading on the bitcoin, just in case anyone wants an even bigger adrenaline rush -- D.H.]. On the one hand, a non-zero value is encouraging, but on the other, that sort of volatility doesn't inspire confidence.

Personally, I don't see much reason to use Bitcoin in any significant way. Money has worked fine so far, and if the US dollar should collapse, I'm not exactly convinced that Bitcoin would become a safe haven.

Tuesday, July 12, 2011

Messin' with the buttons again

It took a couple of tries, because the template I use is a fairly old one, but I've made a couple of tweaks to the buttons at the bottom of posts

The old hand-crafted Digg widget is gone. So far as I'm aware, no one has ever Dugg this blog.
The old email button is gone
In its place is an all-singing all-dancing set of share buttons, comprising (as I write this)

email
share to Blogger
share to Twitter
share to Facebook
share to Google Buzz
+1 -- a quick way to say "I like this", should you ever be so inclined

and of course, if you just want to read the post and be done with it, you still can do that, too.

Wednesday, July 6, 2011

Wikipedia tics

I'll say it again: Wikipedia is great. I use it all the time. It does its job astoundingly well, particularly given that when it was first getting started any sensible person could have told you it couldn't possibly work. Anyone can edit it? Anyone can write anything about anything? And people are going to depend on it for information on a daily basis? Riiiight.

But it does work, thanks to countless hours of effort from dedicated Wikipedians hammering out workable policies, nurturing the culture behind those polices and putting those policies into practice by editing a stupefying number of articles. It is this endless stream of repairs and improvements that keeps Wikipedia from devolving into chaos. It's a wonderful thing, but wonderful is not the same as absolutely perfect (for starters, one is achievable and the other isn't). Anyone who's read Wikipedia more than casually will inevitably have a few pet peeves. Here are some of mine (and yes, I do try to fix them when I come across them, time permitting):

Link drift: Article A includes a link to article B. Article B gets merged into article C and the link is changed to point to article C -- not the section, but the whole article.
More link drift: Article A includes a link to article B. Someone creates an article on a different meaning of B. The article for B becomes a disambiguation page, and the article on A continues to point to it.
Digression: Article A has some connection to topic B, which people Need to Know More About. Instead of just providing a short summary and linking to the article on B, an enthusiastic editor gives the complete story of B, in nearly but not exactly the same form as in the original article (or, the digressive section moves to its own article, but the section later regrows).
I'm really into this: An article is stuffed with unsourced Things You Didn't Know about the topic, often to the point of downright creepiness.
Some say ... yes, but some other people say ... yes, but ... : People feel strongly about topic A. Generations of editors qualify each other's statements until the article reads like a pingpong match. Usually an effort is made to collect the clashing statements into one section, but that doesn't always keep them from escaping into the article at large.
Actually, everybody gets this wrong: An editor makes a great point of declaring some piece of common knowledge incorrect without bothering to check if this is really the case.
This is a very important distinction: Instead of saying something on the order of "not to be confused with [link]" or such, an editor feels that it's worth including a sentence or two on either side of some valid but not earthshaking distinction emphasizing how crucial it is (see previous item if the distinction in question is invalid)
Take it to the discussion page, please: A discussion that ought to be lightly summarized is hashed out in excruciating detail before our eyes.
Oh look, I can write a textbook/conference paper, too!: Editors seem to make a special effort to pepper their writing with the mannerisms of their professors or other authorities. Math articles seem particularly prone to this ("clearly ... it turns out that ...").
My home town/band is the awesomest: Material on a place or group reads like your cousin showing you around on a visit. I actually don't mind this, so long as it's not too overboard, even though it generally runs somewhat afoul of Wikipedia's notability policy, because how else does one find out about the Anytown Moose-waxing festival or the real meaning of "incandescent oak" in that one song (don't go searching for those -- I made them up).
This article reads like it was written by dozens of different people over the course of several years: Well, yeah. The real magic of Wikipedia is that relatively few articles read like that, particularly if they really have had a chance for dozens of different people to work on them over the course of several years.
[One other tic occurred to me not long after I hit "Publish": Gratuitous wikification. To "wikify", in wiki parlance, is to make an ordinary term into a link to the article for that term. It's one of the things that makes wikis wikis, but sometimes people seem to go randomly overboard, occasionally with fairly odd results.]

Wikipedia's strength is in its transparency. For the most part, you can see every draft of every article if you want to, every mistake, every correction, every paragraph in need of tightening, every statement in need of a reference, every quibble, every pointless edit war -- in short, everything that a normal publication, encyclopedic or otherwise, goes to great lengths to hide. The downside is that flaws like the ones listed above are also there for all to see.

The upside is that we get Wikipedia.

Friday, July 1, 2011

This password madness has got to stop

It's well known that people like to choose bad passwords, and for years other people have suggested rules for making passwords more secure. I'm not really sure why it should be happening now in particular, but it seems that every site that has a password must now jump on the bandwagon and have a password policy enforcer.

And of course, they're all a little different.

Fortunately there are plenty of possibilities. Here's a do-it-yourself guide in case you think your site needs one. First, pick any two of

The password must contain at least one number
The password must contain at least one lowercase letter
The password must contain at least one uppercase letter
The password must contain at least one special character

Next flip a coin to pick one of the remaining two to disallow.

Now pick a minimum length. Back in the day, when computers were much slower than they are now and it wasn't fairly easy to get a gazillion computers to cooperate (with or without the owners' consent), the recommended minimum length was eight characters. Today it should probably be more like 12 or 14. So make sure the minimum length is at least six.

Now set a maximum length of 8. Why a maximum, given that all other things equal, longer passwords are stronger, and the whole point of the exercise is to encourage strong passwords? Don't know. Probably whoever put the database together remembered the old eight-character rule and decided that should be the maximum. But 8 is a magic number for passwords and everyone else does it.

Finally, add an arbitrary hidden restriction. For example, if the password has to have a number, make sure it can't be the first character (yes, I ran into that one). If it has to be a special character, quietly disallow '$' and '!'. Something like that, just to reduce the strength and make people work a little harder.

Voila. You now have a password policy. If I did that math right, there are three dozen basic policies, times however many arbitrary rules there are, so there are easily hundreds of possibilities. Chances are fair that your poor user will never have encountered your exact policy before and never will again.

Chances are also fair that once they jump through all your hoops (bonus points if this is all happening on a smartphone or tablet), your poor user will have never come up with that particular password on the spot before. That's good, since sharing passwords can be dangerous. The only drawback is that poor user is liable to forget this ad-hoc password within five minutes of logging in.

So urge them to write it down "some place safe."

Then have them pick three or four secret questions and answers for when they have to reset the password next time they log on. But that's a different rant.

If you feel you need further security advice, you can always consult a real expert.

Friday, June 10, 2011

Now available in sleek mobile templates

Blogger offered me a new feature to turn on, and it looked cool, so I turned it on.

If you're accessing this on a mobile device, you should now see the blog in a much less cluttered format, better suited to the small screen. I haven't tried it from a mobile device yet myself, but the preview looked good. Hope you like it.

Monday, May 30, 2011

Whither Tuvalu?

The sea has been rising and will pretty clearly continue to rise. This is not cause for immediate concern to citizens of Utah or Kyrgyzstan, but it's of great concern to citizens of countries such as Maldives (highest elevation 2.3 meters) or Kiribati (a few meters). Or Tuvalu (4.5 meters).

Bear in mind that an island nation does not have to be completely inundated to become uninhabitable. As the sea rises, the water inland becomes brackish and plants stop growing. Storms become more destructive. Even normal tides can become problematic and, leaving that aside, the amount of land, say, two meters above the sea will typically be dramatically less than the amount one meter above the sea. It's a serious concern.

The Economist considers the worst case of an island nation becoming completely uninhabitable. International law is unclear on this, there not being a lot of precedent, but the article speculates that, while the residents of the nation may be displaced and the nation itself no longer meet the criteria of having a clear territory or a permanent population, yet a nation might still remain a legal entity. This matters because under this scenario the nation would still retain assets.

The most obvious asset is the territorial claim under the law of the sea (mainly territorial waters of 12 nautical miles and an economic zone of 200 nautical miles), but in the case of Tuvalu there is also the .tv domain (Maldives and Kiribati have their own domains of course, but haven't been able to exploit them economically the way Tuvalu has). I've written before about how this didn't pan out to be quite the bonanza it was originally hoped to be, but according to Wikipedia it does bring in $4 million a year, or about $400 per year per capita under a contract expiring around 2012.

It's not clear what price the domain might fetch in the next round of negotiations, and in any case it would be small compensation for losing one's homeland, but amid all the sadness it's remarkable that perhaps some day the proceeds from a piece of virtual real estate will help sustain a virtual nation.

Monday, May 23, 2011

Sometimes they do listen

My correspondent who complained of bank's website putting "Complete Transfer" on a button that did not, in fact, complete the transfer is happy to report that the button now says "Continue Transfer".

Yay!

Tuesday, May 10, 2011

Kids, don't try this at home. Really. Don't.

Whenever I grab a spare moment from not being a lawyer and not being a security expert, I try to find time to not be a research chemist. Fortunately for all of us, not only have more capable souls taken up that profession, but some of them have seen fit to blog about it.

Along with some interesting commentary on the pharma business and such, Derek Lowe's In the Pipeline includes two fine collections of hair-raising tales, under the headings of Things I Won't Work With and -- less extensively and not quite so entertainingly -- Things I'm Glad I Don't Do. Some of it's a bit technical, but Lowe does a good job explaining things in a way someone with only basic knowledge of chemistry can understand.

And who am I to complain anyway? I try to write in such a way that a non-compugeek reader can substitute "peanut butter" for terms like "sliding window protocol" and still get the gist, but I can't promise success in that regard. At the very least, the non-chemist can substitute "exploding, highly-toxic and malodorous peanut butter" for most of the chemical terms and get the general drift.

Which, one must admit, does give the chemist a bit of a leg up. I doubt I'll ever get to grace a post here with turns of phrase like

... the resulting compounds range from the merely explosive ... to the very explosive indeed
Fragrance expert Luca Turin has described isonitriles as "the Godzilla of scent", and that's accurate, if you also try to imagine Godzilla's gym socks.
... water ice (explosion, natch), chlorine ("violent explosion", so he added it more slowly the second time), red phosphorus (not good) ...
A colleague of mine made some in graduate school, and came down the hall to us looking rather pale.
It reeks to a degree that makes people suspect evil supernatural forces.
... it’ll start roaring reactions with things like bricks and asbestos tile.
It is, of course, extremely toxic, but that's the least of the problem.
Read the paper and be glad that this wasn’t your PhD project.

On the other hand, none other than Gordon Moore (of Moore's law fame) got his start in the sciences blowing things up back in the days when a child's chemistry set had Real Chemicals in it. In their own way, wild-eyed-crazy chemistry experiments are just as much a part of the web's DNA as ~~cryptographically secure pseudo-random number generators~~ peanut butter.

(When I was typing the first sentence, I missed the 'e' in "being." Blogger's spell checker flagged it. Yep.)

Thursday, May 5, 2011

Really? I never mentioned Snopes?

Well that obviously needs fixed.

On the off chance that you haven't heard of it, snopes.com, more formally the Urban Legends Reference Pages, is the first place to go whenever someone forwards you a forward of a forward of ... a forward of an email containing some compelling factoid or tale.

All things considered, the signal/noise ratio of the web is surprisingly high. Some sites, like Wikipedia, improve that ratio by (in aggregate) adding useful information. Snopes does this as well, but also helps filter out the noise. Given that it's a two-person operation (Barbara and David Mikkelson, who met during the days of alt.folklore.urban), one could make a strong case that Snopes accounts for more signal/noise improvement per person than any other site, if "signal/noise improvement per person" weren't such a geekily silly measure I'm not sure even I can use it with a straight face.

Crucially, Snopes does not set out specifically to debunk legends, though it may seem that way since only a small minority end up confirmed as true. Rather, it sets out simply to document the known facts, track down how the various legends and rumors have circulated and if possible where they may have started, calling police departments and local officials to actually ask if something happened, and generally doing the journalistic legwork that too often gets bypassed in pursuit of a good story.

The Mikkelsons manage to do all this in evenhanded good faith and with a well-pitched sense of humor. Think of it as MythBusters for the web, albeit without Jamie's epic mustache.

Postscript: It occurs to me that studying the proliferation of urban legends ought to be a potent vaccine against taking the notion of "the wisdom of crowds" too far.

Wednesday, May 4, 2011

What's the Garden City Telegram saying today?

Ran across this while browsing through the news: Newseum.org keeps images and readable PDFs of the front pages of hundreds of newspapers from dates it considers historically significant. For copyright reasons, only selected front pages are available, but when they are the selection is impressive.

Navigation leaves a bit to be desired. Links from a particular front page to next and previous from the same paper would be nice, for example. Nonetheless, it's a cool idea.

"When I use a word, it means just what I choose it to mean"

Another one in over the transom:

I have accounts at two different banks. It's usually not a problem, but sometimes I need to transfer money from bank A to bank B. I was happy to find out that bank B can do free transfers from other banks, after you've set things up. It takes three days or so to clear, but that's generally no big deal.

All you have to do, once it's set up, is go on their web site and select "Transfer from outside the bank". You select an account that you've already set up (bank A in my case). You fill in the amount. You select how you want the transfer done. Actually there's just one choice: three-day free transfer. At the bottom there's a button labeled "Complete Transfer"

So you click that button. Another page comes up confirming the details you've just put in. Great. You're done.

Not so fast.

Three days later, there's no money transferred. Contact customer service. No, we don't see any record of it yet. Please contact the bank where you initiated the transfer.

You're the bank where I initiated the transfer.

Sorry, don't see any sign of it.

Then you go back and try again. You fill in the form. You click on "Complete Transfer". You see the same page with the details you just put in ...

... and you think to scroll down to the bottom, which is cut off by your browser window. At the bottom is a button that says "Send Transfer".

Didn't I just complete the transfer? Guess not. Click "Send Transfer". The window goes gray and a spinny thing spins for a while. Then you get a different window and a confirmation number. Three days later the money is there.

Would it have killed anyone to have labelled the first button "Continue" instead of "Complete Transfer"?

Indeed.

[There's a happy ending, as announced in this post: The bank changed the text. Yay!]

Tuesday, April 26, 2011

"Thank you for your business"

The book is The Thank You Economy, by Gary Vaynerchuk. The thesis is that, thanks to social media, business is returning to its mom-and-pop roots, in that personal customer service is once again becoming important. I ran across the book listening to an interview with Vaynerchuk on NPR.

I'm of two minds about this:

Mind 1: Hmm ... it's all different now, is it? Is business, in fact, paying more attention to individual customers? Did it really stop? How would you measure this?

Anecdotal evidence: Today I took my car to the shop expecting a hefty amount of deferred maintenance because, well, it had been a while. Instead, they explained what it really needed, did that, offered to fix a couple of minor problems that had been bugging me for years, which I had them go ahead and do, and sent me on my way for a modest sum. These were the same folks who last year quickly and efficiently diagnosed and fixed a problem that the dealer I called had had no clue about, which is why I came back in the first place.

Are they on Facebook? No. Can I follow them on Twitter? No. Do they provide no-nonsense service at a reasonable price? Absolutely. Do they have all the business they can handle? Judging by the parking lot and the steady stream of customers, I'm guessing so. Are they run essentially the same way they would have been 50 years ago? Quite likely.

Mind 2: Well, I've got to be a fan of someone who titles the first chapter of his book "How Everything Has Changed, Except Human Nature", and anyone pushing for good old-fashioned customer service is OK in my book. Rather than focus on what historical trends might or might not have been, another take is that the modern web offers tools that let good businesspeople serve their customers better, even if those customers are across the country or the world. In that case, he's got a point, and probably a lot of useful experience and tips to share.

Mind, Vaynerchuk's own site makes the less modest claim that the "Thank you economy" is "the most important shift in culture businesses have seen," but then, he's got a book to sell.

Saturday, December 31, 2011

Monday, November 28, 2011

Wednesday, November 23, 2011

Monday, November 7, 2011

Friday, October 14, 2011

Thursday, October 6, 2011

Wednesday, October 5, 2011

Monday, September 26, 2011

Wednesday, September 14, 2011

Tuesday, August 30, 2011

Saturday, August 27, 2011

Friday, July 29, 2011

Wednesday, July 27, 2011

Tuesday, July 12, 2011

Wednesday, July 6, 2011

Friday, July 1, 2011

Friday, June 10, 2011

Monday, May 30, 2011

Monday, May 23, 2011

Tuesday, May 10, 2011

Thursday, May 5, 2011

Wednesday, May 4, 2011

Tuesday, April 26, 2011

About Me

People following Field Notes

FeedBurner

You may also want to visit

Blog Archive

Reader Picks

Labels

Search This Blog

Pages