Tuesday, August 30, 2011

Considerate software

I first heard the motto "Considerate software remembers" a job or two ago from interaction designer  Carl Seglem, who credited it to Alan Cooper of About Face fame.  The phrase has stuck in my head ever since, so the other day I went searching for it and found this extract on codinghorror.com.

There's a lot to like about the very idea of considerate software.  If I'm using a piece of software, I want it to do something for me.  I'm going to be devoting a great deal of attention to it, asking it to do this or that and expecting responses to those requests.  Ideally, someone or something I'm working with that closely will treat me considerately, just as I should make every effort to treat a person I'm working with considerately.

More subtly, the metaphor of considerate software cuts the designers and implementors of the software completely out of the picture.  This is surely deliberate and completely appropriate.  Once software is deployed, the designers and implementors are out of the picture.  I can't come and ask them how to deal with some puzzling or frustrating bit of behavior (and lucky for them, sometimes).  As far as I'm concerned it's the software that's being helpful or annoying.

There are clearly limits on how considerate software could possibly be.  If I decide to type in a long treatise on considerate software into the "shipping address" field of some form, I wouldn't expect the app to respond "Why yes, that's very interesting.  I personally find Cooper's work exemplary.  Shall we continue this conversation over coffee?"  However, it doesn't seem too much to expect a politely phrased, helpful response pointing out that "I first heard the motto ..." followed by several paragraphs does not look like a valid street address.

I don't need to go into detail here about how far short much software falls in this regard.  I'm sure you've got your own examples.  Neither do I want to go into how and why software comes to be inconsiderate, though that's an interesting topic in itself.  Instead, I'd like to go into what qualities make software considerate or inconsiderate.

The list I referred to above hits a lot of interesting points, but it feels more like a list of this and that than a thorough taxonomy.  In particular, the headings, while snappy, don't always seem to match up well with what they head.

Some of the points fall under "Considerate software remembers":
  • "Considerate software takes an interest" is really just saying it shouldn't ask for the same information over and over.  That is, it should remember what you've already told it.
  • "Considerate software is perceptive" says that software should remember what we do.  It also says that it should adapt its behavior based on what it knows.  More on that shortly.
  • "Considerate software takes responsibility." says that software should remember where it is and be able to restore its state as closely as possible to where it had been before something derailed it.
Other points assert that software should know the kinds of things that we know and it can reasonably be expected to know:
  • "Considerate software uses common sense."  Common sense is not some magical filter that separates sensible behavior from senseless.  It's largely a body of knowledge, whether learned or instinctive.  To keep from, say, sending a check for $0, it needs to know that checks should only be sent for positive amounts.
  • "Considerate software anticipates needs."  To anticipate needs, a piece of software needs to know what those needs are.
  • "Considerate software knows when to bend the rules." Is saying that it should know how (and when) to do more than just the narrow definition of its task.
  • "Considerate software is forthcoming." says primarily that software should actually tell us useful information that it knows, but to do that it may need to know information outside a narrow view of what it should be doing.
A third set has more to do with knowing when and when not to offer information
  • "Considerate software keeps you informed/is forthcoming." Not only should it know useful things we didn't specifically ask it to know, it should let us know that and modify its behavior accordingly.  But ...
  • "Considerate software doesn't burden you with its personal problems/is self-confident/doesn't ask a lot of questions." It should limit itself to interactions useful to us, present information in ways that are easy for us to absorb and ask for information in ways that are easy for us to present.
A couple seem more about letting us exercise our judgment instead of trying to exercise it for us
  • "Considerate software is deferential."  Software should not prohibit things that might be useful.  Instead it should make sure we know the consequences of a choice and then let us make it.  It occurs to me that the "undo" feature is particularly helpful here.
  • "Considerate software is conscientious." The principle here seems to be that software should know that some things are dangerous and not simply assume that we mean to do them.
Taking a stab at boiling this all down:
  • Considerate software knows as much as reasonably possible about its domain.
  • Considerate software remembers what's happened, what we've told it and what it's told us.
  • Considerate software modifies its behavior where appropriate based on the above.
  • Considerate software gives us ways to access to what it knows (including the state of the world as it used to be).
  • Considerate software actively tells us important things we might not already know.
  • Considerate software communicates efficiently -- taking into account how human minds work.
These principles seem fairly universal, but it's worth noting that one of the first extensions to the original web protocols, and one that enabled major improvements in the experience of using the web, was the cookie -- a way of letting a web site remember things that have happened before and, ideally, act accordingly.

Saturday, August 27, 2011

Building a better password

[I've updated this post slightly to reflect the back-of-the-envelope calculation in this post suggesting that 100 bits of entropy is probably more reasonable than my original statement that 48 bits was "not bad".  Under the assumptions in that post, a 48-bit password would take on the order of microseconds to crack --D.H. Feb 2020]

I've recently complained about the irritating nature of the password strength checkers that have been popping up everywhere, so I feel obliged at least to try to analyze the problem and offer solutions.  This is leaving aside the question of whether password authentication is a useful approach at all.

Fundamentally the real measure of password strength is how many passwords you'd expect to have to guess in order to get the right one.  A more formal version of this is the notion of bits of entropy.  If you had a list of all possible passwords in your scheme, I could identify any particular one so long as I could get answers to a series of yes/no questions, for example:  "Is it in the first half of the list or the last?",   "Is it in the first half of that half or the last?" and so forth.  The number of such questions I need is the number of bits of entropy.  Twenty questions means twenty bits, etc..

If I know that your password is either "0" or "1", you have exactly one bit of entropy.  If I know it's an uppercase letter, lowercase letter, digit, "$" or "%", there are 64 possibilities, so you have 6 bits of entropy.  If I know it's two such characters, you have 12 bits, and if it's seventeen such characters you have 102 bits, which is not too bad.  Someone trying to guess your password would have to guess about two thousand billion billion billion passwords, on average, before stumbling on yours.  That may seem like a lot, but keep in mind that the current network of Bitcoin miners can try on the order of a hundred thousand billion billion hashes -- roughly the same problem as guessing a password -- every second.

[Don't assume that guessing a password requires typing it in to the same text box you have to use.  If someone steals the right data from your service provider, they can throw as much computing power as they've got at guessing the passwords.  Quite possibly they'll be happy enough just to try a few thousand weak passwords for each account, since that will crack depressingly many, but attacks like running through the OED with simple substitutions of letters for numbers are absolutely feasible as well, even on fairly ordinary hardware.]

This is assuming that you picked eight characters at random.  If I knew instead that your password was either "F1%ldN0t3$" or "sasssafras" (maybe I'd watched you read your password off a piece of paper with only those two words on it but couldn't quite see which you were typing), then you have only a single bit of entropy, even though both passwords are not just eight but ten characters long and one has plenty of non-letters.

More realistically, if I knew you'd picked an uncommon English word and maybe changed some of the letters to numbers, you'd have somewhere around two dozen bits of entropy.  That's not nothing, but keeping in mind that each added bit doubles the number of passwords a cracker has to try, it's nearly a billion billion billion times weaker than the 102-bit scheme above.

The fundamental flaw of password strength checkers is that they can only look at the password you gave them.  They have no idea what other possible passwords you might have chosen.  The assumption is that if you're forced to jump through enough hoops you'll be forced to expand your parameters, but in fact it's possible to generate passwords in a secure manner using only letters, and or to generate them insecurely in a way that will still satisfy any strength checker out there.  Which is why I half-grimace, half-laugh when I see the "password strength indicator" jump from "poor" to "great" as soon as I type a number.

Now, it's perfectly possible to generate completely random 17-character passwords.  The problem is that something like "qcrQf1x2" or "u%js%hPQ" is a pain to try to memorize, so most people will fall back to picking a "hard" word and maybe altering it a bit.  However, as xkcd points out, it's possible to do a lot better by using random short words.

For example, here's a kind of clunky way of producing a random, memorable password:

BIG HONKING DISCLAIMER: This is just for demo purposes.  The second site I mention uses http, not https, so in theory anyone could be looking in on your session.  Even with https, the sites might be logging all your traffic and recording the results you come up with.  I personally seriously doubt they would, and it's hard to imagine they would be able to connect the dots and figure out what you were using the generated password for, but if you really want to be on solid ground, get the source, look it over, run it locally and use something like /dev/urandom or D&D dice to generate the random input (23d20 will give you close to 100 bits ... not that I would have any idea at all what "23d20" means).  There are also smartphone apps that do more or less the same thing, I believe.

[I last checked that the recipe below worked on 28 Feb 2020]

With that out of the way:
  • Go to this site and copy the random string you see there (e.g., 60990FFC250C).  If for some reason you don't like what you see, just reload.
  • Go to this site.
  • Type some short number and a space into the Challenge box and paste the random string from the first step in after it (e.g. 123 60990FFC250C)
  • Type anything at all into the Secret box (e.g., "secret").  This doesn't have to be hard to guess.  The real entropy is coming from the random string (alternatively, put any number you like, a space, and anything else into the "challenge" box and paste the random bytes into the "secret" box).
  • Press the Compute with SHA-1 button.  Again, the cryptographic details of how strong SHA-1 is don't matter here.  You're just converting a random number to short words.  A simple table lookup would do just as well.
In the Response box you will see six short words followed by some hexadecimal gibberish (in this case, WOVE COOT SLEW WIT SIGH I (FE2D 5F7B 22CD BC39)).  Each of those words represents just over 10 bits of entropy.  We'll need ten words in all, so repeat the procedure but this time just take the first four words (I got FIRE CUFF GALA MINK from A4B455FEFFE7BFAD).

You can play around with this formula to get words that are easier to memorize, or type, or are just more to your liking.  If you reorder your words or try typing in several different things instead of 123 or secret and then picking what you like, you're decreasing your entropy, depending on what criteria you're using to filter out passphrases you don't like and whether your attacker knows what kinds of phrases you like.  If you just try a few different secrets until you see something that seems memorable, that should be fine.  If you do something like sort the words (and your attacker knows only to try sorted lists of words), you've lost almost 22 bits of entropy, which wouldn't be good.

Once you've selected your words add a random punctuation character, number, capital letters or whatever makes your site's password strength checker happy.  Voila!  Your password is now Wovecootslewwitsighifirecuffgalamink5? or whatever.  This isn't great to have to type, but it's pretty secure as passwords go, and probably better than trying to remember something like C;cTbfThoO4ePFTt or 67EE386A205C4563DB8908A6C4.

If your site's password checker imposes an 8-character limit (and, incredibly enough, some do), cry.

Oh right ... I write a blog, don't I?

A couple of housekeeping items, before I attempt to get back to real blogging:
  • No, I haven't fallen off the face of the Earth, been trapped under a large object or wandered off to Nepal to contemplate the mysteries of the universe.  Just busy, and decided to devote what little blogging bandwidth I've had lately to contemplating the nature of awareness on the other blog.  Hmm ... maybe Nepal wasn't so far off.
  • A couple of logins ago, AdSense advised me that I appeared to have a "popular blog" and I should consider advertising on it.  I'm always glad to know that people are reading Field Notes, but I suspect that AdSense and I have somewhat different notions of "popular".  As much as I would like to bump my employer's revenue stream up by another 0.0000000000000001% or so, I have no plans to do that at the moment or any time soon.  I'm not against running ads per se, but I don't see the point of cluttering up the layout for what I doubt would be any significant gain.  If you ever do start seeing ads here, it will be because there has been a dramatic surge in demand for occasionally-posted web.musings, in which case why not?
  • Prompted by a couple of recent comments, including a couple of completely appropriate ones,  I've settled on a definition of spam comments:  If it's completely independent of the post it's supposedly commenting on, it's spam and will be summarily removed. Mentioning your favorite business as part of a thoughtful response to a post on customer service is just fine.  Mentioning your website, commercial or otherwise, with nothing more than a generic "Hey, great blog!" comment is spam.
  • Mind, I reserve the right to delete any comment for any reason or no reason (hey, it's my blog).  But as a practical matter I'd only expect to do so in cases of spam or incivility, should it occur.  As part of recusing myself from matters Google (and yet still trying to write about the web), I would also remove any speculation about what Google might be up to, be it public information or not, accurate or otherwise.  I don't expect that to be a problem, but thought I'd mention it.
And ... we're back!