Thursday, January 12, 2017

Identity redux

Today I spent an embarrassing amount of time trying to figure out why I couldn't use SSH with my new GitHub account, before figuring out that I needed to log in as and not  Evidently I'm not the first person to stub a toe on this, but it got me thinking about one of the earliest topics on this blog: identity.

A natural way to think about identities and logging in is that your username is your identity and there are various ways of authenticating that identity, for example
  • a password
  • a password and a second factor such as a magic number sent via SMS or generated by a smartcard
  • a public/private key pair
  • (in some SSH contexts) hostname or IP address and public/private key pair
Others are possible, of course.  What the GitHub experience made clear to me is that the "username" part is secondary, at least as far as SSH is concerned.  The important part of authenticating SSH is the key.

As far as I can tell, GitHub is taking the public key offered during the SSH handshake and looking it up to get the account, and thus the account name.  That's probably also why when you try to upload a key you've already uploaded (e.g., to check that you haven't taken complete leave of your senses while trying to figure out why you can't log in), the error message is "already in use".  It doesn't say by whom, even when it's you.  The rule is one account per key (but potentially multiple keys per account).

This suggests a different approach to identity.  As far as the web is concerned a key, and in general an authentication method, is an identity.  This is more or less the case with Bitcoin wallets, and to some extent for PGP and other email privacy schemes, but even then for the most part we talk about using keys to establish an identity.

Let's run through some data modeling to see how this all fits:
  • People, identities and resources to be accessed (such as accounts) are three different things.
  • A person can have multiple identities
  • Multiple people can use the same identity, though that's often not a good thing
  • A resource can be accessed by multiple identities
  • In general, though not in the case of GitHub accounts, an identity can access multiple resources
There are two reasons I find this key-is-identity model attractive.  One is that your web server doesn't see you, it sees the credentials you present.  It really only sees the key, or at least only ought to look at it when verifying identity. Yes, it may also know things like which IP address someone is connecting from, but even though that information can sometimes be a useful hint that something's not right, it's ephemeral, not part of the identity. 

The other, maybe just the first with a different emphasis, is that it loosens the connection between resources and people.  It might be nice to think that Gavin Belson logged in to your server with username and the proper password, but it's better to think that someone logged in with those credentials.  You know that logged in.  You don't know that that was Gavin Belson (I'm looking at you, Gilfoyle).  The identity that matters here is, not Gavin Belson.

Except that is associated with a password, which can change without changing what we mean by the username (or, one would hope, it's associated with a password and a second factor, such as a phone or smartcard).  Are we really going to say that if Gavin changes his password, we're dealing with a different identity?

Let's try "yes".  The whole point of changing your password is that anyone who knew your old password won't know your new one.  We presume that, at least at first, only Gavin knows the new password.  From the point of view of the system Gavin is logging in to, (, old password) is indeed a different identity from (, new password) because there are potentially different sets of people who could be each.

What if Gavin uses his phone as a second factor?  There are a number of ways to do that, so suppose that the server sends him a text with a magic number when he tries to log in and expects to get that number back as part of the login process.  That provides a reasonable assurance that whoever's logging in has both Gavin's password and his phone (assuming the text isn't intercepted).  If Gavin does have his phone, it also informs him that someone, hopefully it's actually him, is trying to log in.

Suppose Gavin switches phone numbers but keeps his password the same.  Should we consider that a new identity?  I think the same logic still applies.  If Gavin's password has already been compromised and he changes his phone number, then someone might manage to grab the old phone number, and so forth.  In any case, the set of people with access to the old phone number is potentially different from the set of people with access to the new one, so different identity.

If you're carefully tracking who did what to a resource, you need to track the authentication.  A different means of identification, even for the same user name, means potentially different people.

One logical conclusion of this is that username is not identity.  So what is it?

It's a name.  Seems plausible, at least.

Names are yet another concept, distinct from person, identity or resource (personae are yet again distinct, but this is getting complicated enough as it is).  For example, sure looks like an identity, but when you send email, you're really just sending it to an address which is connected to an inbox (which is a resource).

There may be more than one address connected to a given inbox.  Likewise, the name I use to log in to access that inbox may or may not be an email address (for example, my ISP provides me with an email address I never use, but if I want to see mail for that address I log in with a username that's different from the email address).  Likewise for however you logged in to send me mail.  If we're using secure mail then, regardless of everything just mentioned, you'll encrypt the mail to your recipient's public key and sign it using your private keys.  The keys are the real identity, because we trust them.

I'm comfortable with this, too.  I've come to think that one of the most fertile sources of bugs is confusing names with identities (see this post for some a bit more on names).  Names are convenient, but ideally they're only used to look up what you're really interested in.  I personally prefer systems in which renaming things is cheap, if only because I generally come to hate the names I come up with to start with, no matter how much careful thought I gave them.

The way you generally do this is to assign a unique id -- typically a hundred bits or so of random gibberish -- to each resource that can be named, and then maintain a map from name to id.  When you access a resource, say an account, by name, you look up the name in that map, stash the id somewhere and use the id to access the object.  If the nameid map changes, you still have the id and you can still find the same object.  The system can maintain as many names for a given object as it sees fit, but each name corresponds to only one object (at least at any given time).

Summing up
  • Web servers don't see people directly.  They see the credentials that people (and other servers, for that matter) present.
  • The credentials are distinct from who (or what) uses them
  • The names we use to refer to resources are distinct from the credentials that can be used to access them.
I think public key systems line up well with the key-is-identity model because the public key is the single identifying item.  In a password scheme, whether you consider (username, password) or just username to be the identity, you are giving two pieces of information, both of which are durable, but which must generally be kept separate because one of them is meant to be secret.  The password isn't truly private.  The host you're logging into has to know it [technically, it only has to store a secure hash of the password together with a bit of randomly-generated "salt", but from a security point of view that's only a bit better since it still has to see the password during the login in order to do the hash and comparison, and password files can be stolen and attacked brute-force offline --D.H. Feb 2017].

In a public key scheme, there is still a public part and a private part, but you present only the public part.  The private key remains truly private.  It's generally stored encrypted, guarded by a passphrase that you only use locally.  If you change the passphrase, no one else needs to know.  Authenticating means exchanging ephemeral information, much of it randomly generated, that will never be used again.  All of this makes it much easier to keep the private key secure for long periods of time, so the public key can serve as a durable identity.  Since it's the only durable thing that other parties see, it's sufficient to serve as an identity by itself.

There's a long way to go yet, but it seems likely that the world will gradually shift to key-as-identity, or something at least as strong.