Sunday, August 27, 2017

Give us your money or we'll pirate your shows a few days early

Recently HBO suffered a data breach by parties who then tried to extort money using the threat of publicly releasing, among other things, Game of Thrones episodes and internal emails.  HBO, despite having initially offered a much lower sum than demanded, reportedly in a bid to buy time, ultimately did not pay the extortionists.  The Grauniad* has what looks like a pretty good summary on this one.

One thing that jumps out of this is that HBO was not particularly concerned with having episodes of its flagship show leak early.  There are probably several reasons for this.  HBO subscribers aren't paying per view, but monthly for the service as a whole.  If someone were able to repeatedly steal HBO productions and escape prosecution, that would likely be a problem.  Leaks of a few select episodes probably not.

Even if you can somehow make repeatedly stealing HBO content work, you're basically competing with HBO and the cable channels at distributing HBO content.  That's not a game I'd personally want to get into. Your milage may vary, but bear in mind that people are already pirating HBO content after it airs.  It's not clear how taking the extra risk to steal from HBO directly is providing that much of a competitive advantage.

More broadly, this all pushes back against the idea that, to make money selling content in a world where content can easily be copied, you need to provide something "live", like breaking news, live sporting or musical events, interactive games and such.

Of course, people can and do make money this way, but clearly that's not the only way.  HBO and many other content providers have done well with more traditional productions.  People seem happy to pay a modest monthly fee in order to see comedy, drama, documentaries and whatever other genres.

In principle there's a free-rider problem here in that people can get the same content, albeit generally illegally, without paying.  In practice, the problem appears tolerable.  HBO's refusal to pay a ransom to prevent GOT episodes from leaking underscores this.  People are apparently content to pay for the brand rather than the ability to access any particular bits at any particular time.

*I tend to use the Private Eye names for the major British newspapers, particularly the Grauniad and Torygraph, because, well, sorta funny, but also fairly apt.  The Telegraph is well known for its Tory leanings and the Guardian, however well it's built its brand as an international news outlet, is still prone to the sort of typo that led to the nickname in the first place.  But on the other hand, if your instructions as editor are to "carry on as heretofore," I suppose that has to include the tyops.  Sorry, typos.

Friday, August 25, 2017

On to the next milestone

It looks like I ended up adding a few more posts to the original five (four real posts plus the birthday post).  Counting this one, that'll make ten in all (but only eight real posts).

That seems like enough for now.  I'll probably come back later and edit for typos and stylistic blunders, and maybe add some missing links, but I make no promise as to what will appear here for the next while.  As usual, I might post again tomorrow, or not for months.  I probably will post again at some point, but if not, the 600+ existing posts aren't going anywhere.

It does seem like someone (or someone's web crawler, at least) has been reading, and that's cool.  If you've read and enjoyed, so much the better.


Bitcoin: Yeah, dunno about Venezuela either

Not long after posting the last post on Bitcoin I saw some headlines about Venezuelans using bitcoin instead of the Bolivar fuerte (fuerte meaning "strong") since the Bolivar itself is currently in hyperinflation.  This makes some sense, in that if the inflation rate for a currency is around 700-800%, Bitcoin's fluctuations against the dollar seem like less of a problem.  Bitcoin wallets and exchanges also provide a way to store value independent of the nation's banking system.

On the other hand, Bitcoin is not the only solution to this problem.  Besides buying and selling a reserve currency on the black market, people have historically come up with all kinds of solutions to currency shocks, including IOUs, home-grown alternative currencies, commodities and good old-fashioned barter.

I'm not saying any of these is a good solution.  In a situation such as this one there may not be any good solutions.  The point is that Bitcoin is not the only game in town.

There are also practical issues.  If the problem with reserve currencies is that trading in them is illegal, then the only legal way to buy Bitcoin is with Bolivares at the official rate, which exposes you to the same hyperinflation you're trying to get away from.  If you're willing to trade on the black market, it's not clear why you need Bitcoin.  And for that matter, the Venezuelan government can always make trading Bitcoin itself illegal.

If, somehow, Venezuela switched entirely to Bitcoin, that would currently mean around 800 billion dollars worth of Bolivares chasing around 70 billion dollars worth of Bitcoin, but that seems like a big if.  For that kind of money, one could build one's own cryptocurrency.

But I'm not an economist.  All of the above seems plausible to me, but I've been wrong before.  So, once again ... ¯\_(ツ)_/¯.

(And once again, I don't have any position in Bitcoin one way or the other)

I'm on a party line

There have been headlines lately about new FCC regulations allowing internet service providers to sell information about what sites you visit.  From the summary I read in The Verge, which looks well put together and overall plausible, the situation is a bit more complicated than that, but certainly ISPs have access to quite a bit of information about what sites a particular IP address under their management connects to, and they have to have access to that information in order to provide good service.

I'm not going to offer an opinion here on whether this is good, bad, indifferent or some combination.  Instead I wanted to take a look at privacy in general.

If you live in a house with separate rooms with doors that close and may even lock, it's easy to think of having a room of one's own as the natural state of things, but that's not universally the case.  There are plenty of examples of people sharing space, whether in a one-room house or a portable structure such as a tent, yurt or tipi.  Or think of an un-air-conditioned apartment block in summer.  If everyone's window is open onto the same courtyard, privacy is going to be a bit limited.  Enhanced privacy isn't the most obvious benefit of air conditioning, but it would certainly appear to be one.

Even if doors and windows can close, living in a small community, particularly one that has to be fairly self-sufficient, means getting to know more than one might care to about one's neighbors, and having them know details about one's own life.  Arguably this is actually the normal state of things.  Urbanization is a fairly recent phenomenon in human history.

Again, not saying any of this is good or bad, just that privacy is not necessarily something that we once had, but lost once technology came along.

For that matter, and back at the title, in the earlier days of telephony, many customers had a party line arrangement, meaning that a number of households shared the same physical phone line.  This meant that if someone else was making a call and you picked up your phone, you would hear them talking, at which point you might hang up and try again later, or perhaps ask them if they would be done soon ... or just listen in for a while.

Even placing a call meant, at least in some cases, calling an operator and telling them whom you wanted to call, so they could patch the call through -- literally using a patch cord.  That process was eventually automated, but the phone company still needed to keep records, at least of long-distance calls, in order to bill for them.  Those records could be subpoenaed in the course of criminal investigations and in any case were available to at least some company employees.

People seemed largely OK with all this, perhaps because the convenience of the telephone outweighed the lack of privacy, perhaps because people figured out ways of minimizing the intrusion (some interesting game theory/economics there), and probably for other reasons.

We're also social animals.  To some extent we want to share things about ourselves and have others share with us.  It's not clear to me whether social media have amplified this kind of behavior so much as reflected it.

What seems different about modern technological privacy is that the people with access to one's private information are strangers with their own incentives and plans.  In a small, tight-knit community information flows both ways.  "Everybody knows everything about everybody."  With a 20th-century phone company or a 21st-century ISP this isn't the case, and generally the entity in question is in business to make money.

One can argue that such businesses have a strong incentive to respect their customer's privacy on the grounds that failing to respect it would be bad for business, but that doesn't always seem particularly comforting.  On the other hand, the basic issues are clearly older than the internet, so at least we've had some time to work them out.  I could have added 19th-century telegraph companies or maybe even 18th-century messenger services to the paragraph above.

I think the problem decreases as you go back in time, since communicating via commercial services run by strangers becomes less pervasive, but the telephone was a pretty integral part of 20th-century life, particularly in the second half.  It's not clear to me how much more integral the net is.  I'm sure it is to some extent, but not how much.

I honestly don't know what to conclude from all that, but I did at least want to offer the perspective that, as in other cases, the internet doesn't necessarily change everything.  Some things, almost certainly, but the real fun lies in figuring out exactly what.

Thursday, August 24, 2017

Unplugging ... or not

Years ago a friend told me of a mutual friend who had taken a hiking trip out in the mountains somewhere.  "Yeah, they decided to take a cell phone," my friend said.

My immediate reaction was "What's the point? I thought the whole point of going out in the boonies was to get away from phones and such."  My friend explained that the phone was for emergencies, and 911 did work where they were (there was apparently a tower nearby).  I don't think they actually ended up using the cell phone.

As I write this I'm up in the mountains, though still more or less in civilization (different mountains, as it happens).  There is intermittent cell reception ... and wifi throughout the place.  The wifi is also a bit spotty, but not because of reception.  I have a nice clear connection to the wifi, but so does everyone else, and there are a lot of technophiles around.  Nonetheless it seems to be enough to get messages through to the outside world.  And to blog.

There are still significant parts of the world, even the more or less industrialized world, without internet or cell access, but it's shrinking.  Cell phone carriers would prefer to concentrate their resources where people are (I've heard that "we don't cover the cows" or something like that has been a motto, but Google doesn't seem to back me up on that).  This means that most people will be near coverage, but there's also a knock-on effect.

People get used to coverage, so they really notice when it's not there.  If your fun adventure in the backcountry is marred by not being able to call home in the evening, you may well report "poor coverage by my carrier" to your friends.  No one wants to be that carrier, so there's an incentive to build out coverage even in less profitable areas, an incentive that wasn't there in the early days.  There are still plenty of places where you wouldn't reasonably expect to see coverage, but I wouldn't be surprised if this effect has brought coverage to places that wouldn't have it based on a purely local economic analysis.

Having a mobile phone has long since gone from something that can be handy to something that has influenced our habits thoroughly enough to change our expectations.  For many of us, unplugging by traveling out of reach of the web is no longer an easy option.  If you visit your relatives' cabin at the lake, you probably still have bars.  That mountain retreat has connectivity because enough customers wanted it.  The only real way to unplug is to ... well ... not use the web for a while.  Which, come to think of it, shouldn't really have to be a special occasion.

While writing this, I looked up cell coverage in Alaska.  It looks like, not surprisingly, most of the physical area of the state is uncovered, but almost all of the population is.  It would be interesting to know more about the swaths in the interior that are covered.  I'd guess transportation is involved, just as interstates in the lower 48 tend to have towers at fairly regular intervals, even in unpopulated areas.

Wednesday, August 23, 2017

Bitcoin: Yeah, I dunno

I've been pretty skeptical about Bitcoin in the past, particularly about Bitcoin as a currency.  My thoughts on the currency part haven't changed meaningfully, but my hunch on Bitcoin as a speculative vehicle -- that it was in a bubble that ought to burst any time now -- is, well,
  • kinda confirmed by the way the price has been acting.  It seems to be doing what it was doing around 2013-2014, but with much bigger numbers
  • kinda disconfirmed by the way it's not bursting, and didn't completely burst last time
But then, such is the way of bubbles.  You can know for sure you were in a bubble, but not so much that you are in one.  So ... ¯\_(ツ)_/¯

I should point out that I have no money in Bitcoin and no speculative position one way or another.  Just watchin' the show.

Tags and finding things

Putting together these recent posts, and posts on the other web, I notice I'm much more casual about tagging.  I can't bring myself to stop altogether.  A post without tags seems somehow incomplete.  But every time I add a tag I find myself asking "Why am I doing this?"

For years and years it's been possible to add "" to a search and find whatever you want on this blog (or likewise any other), whether I've tagged it or not.  The difference, if any, is more a matter of curation.

Donald Knuth, in putting together The Art of Computer Programming, made a great effort to put together a complete index, partly out of frustration with the textbooks he'd had to read as an undergrad.  To him, this wasn't just a matter of searching for all occurrences of a given term (which was possible since the text of TOACP was in digital form), or dumping out a concordance of terms by page.  Context mattered.  The index entry for C. A. R. Hoare might include pages mentioning quicksort, even if Hoare's name doesn't appear on those pages, for example.

I think tags on a blog fill a similar purpose.  If you click on the link for a tag, you'd expect to see posts on that particular topic, regardless of the exact words.  The link for annoyances on this blog includes several annoying things, whether or not I happened to include the word annoy or its forms in the posts.  Machines are getting better at this sort of inference, but they're not great yet.

I think that's a good theory, anyway, and I think human curation is still useful.  On the other hand, I don't really have time to post on this blog, much less read through it and fix up tags.  I've done some re-reading, but I've only really been through a couple hundred posts, and then only fixing typos and adding the occasional note or update.  So what you get here is hit or miss.  Not so much a careful taxonomy as a record of whatever I happened to be thinking at the time.

If I had time, I would probably trim the set of tags down significantly, particularly getting rid of tags that are completely redundant with search results, and probably consolidating a few similar tags down to one canonical choice.  But not today, and not any time soon.  If the tags as they stand make for more interesting browsing, great.

(By the way, I'm not particularly proud that annoyances is currently the most populated tag on this blog)

Now can we stop the password madness?

I've ranted about this plenty of times, and now it seems like the world has come around to my point of view.

Um yeah, right.

I think pretty much anyone who's had to deal with restrictions like "This password must be eight characters long, contain at least one number, one uppercase letter, one lowercase letter, one special character and the characters Pa$$w0rd in order" has recoiled in disgust.  So maybe it wasn't my vast influence.

In any case, headlines are now circulating that the person who promulgated those rules (one Bill Burr of NIST) has said "Sorry, it was all a horrible mistake."  So the person responsible has fessed up and the annoying rules should be history in, oh, let's say ten or twenty years.

As usual, I think the real story is a bit more nuanced, as they say, but it looks like the Naked Security blog at Sophos has already done a better piece on it than I will.  Basically, the advice in the original guidelines in 2003 wasn't bad at the time and it's not Bill Burr's fault that people cargo-culted it into the annoying mess we see today.

Now if we can just get rid of "security questions" ...

The Great American Eclipse (and a bit about the web).

Most of this post is probably better suited to the other blog, but Field Notes has been a bit quiet for the past, um, years, so why not?

You may be aware that there was recently a total eclipse of the sun in the United states.   If you've never seen a total eclipse, I highly recommend it.   If you happen to be near the path of totality when one occurs in your area, don't be lulled into thinking that seeing a 95% eclipse or whatever is 95% as good as seeing a total eclipse.  The difference is, literally, night and day.  During totality, the sun is not so much covered as replaced by a black hole surrounded by the corona, about as bright as the full moon.  And then, before you know it, there's an impossibly bright spot at the very edge and you have to look away.  Seconds later the light is already a thousand times brighter and it feels like day again -- a weirdly dim, clearly-lit day, to be sure, but definitely day.  Miss totality and that last part, the strange light, is about all you get.  So if you get the chance ...

There are two main concerns in getting a good view of an eclipse: traffic and weather.  Some people are able to bypass traffic by booking a train, or flying into the zone, or even being in the air during totality, but most of us will end up taking to the road.  If enough people decide to do this, things can get hairy.  A rule of thumb in traffic engineering is that one lane of highway can handle about 2000 cars an hour.  If your main route into the zone is a four-lane highway, that is, two lanes each way, that's about 4000 cars an hour.  If it's an hour before totality and you have 5000 cars between you and the zone, your chances are not looking good.

It would be nice to have some idea of what to expect, something like "If you live here, you should try to go here (assuming good weather).  Leave at this time and expect the trip to take this long."  But the problem is, nobody really had a good idea how many people would be trying to go where, when.  I'm not a total umbraphile, but I'm sure I paid a lot more attention to this eclipse than most of the population.  My personal attitude of "Yeah, gotta try to see that" was probably not typical.  Typical attitudes were probably more like  "Sounds kinda cool ... but I've got work on Monday.  Maybe I should take a look on my lunch break."  So not everyone is going to hop on the freeway or book a hotel months in advance.

Furthermore, a certain number of people who were thinking about it will hear reports of possible gridlock and think the better of it, or try to find an alternate route, or whatever.  In this internet-connected age people will be telling each other where they are and how conditions are, and watching real-time traffic, or at least trying to.

This sort of uncertainty makes projections a bit difficult, and there's not really any relevant historical data.  The last time an eclipse went all the way across the continental US, in 1918, there was no interstate system, much less an internet (though telegraphs were very much in use).  Even in 1979, the last time a total eclipse was visible in the contiguous US, the picture was still considerably different from today, if only because there were only 70% as many people in the country.

None of this stopped people from trying to project.  One such effort calculated the "drivesheds", analogous to watersheds, to show which locations on the centerline of the eclipse were closest, by road, to the highest number of people.  The top three were Santee, SC, where I-95 meets the centerline,  Idaho Falls, ID, where I-15 meets the centerline, and Sabetha, KS, where US 75 meets the centerline.

The first sounds pretty likely.  I-95 gets a lot of traffic to begin with, and it runs from the Canadian border in Maine through Boston, New York, Philadelphia and Washington,  DC on its way to Jacksonville and Miami.  The second seems plausible.  It's the closest point for Phoenix, Salt Lake City and, perhaps surprisingly, San Diego and from the looks of it most of LA.

The third ... well, it seems to assume that anyone in Texas wanting to see the eclipse is going to head up I-35 toward the Kansas City metro, a chunk of which is directly in the path, then veer off onto two-lane local highways to get to the (technically) closest point on the centerline.

I don't think traffic works that way.

In the event, traffic was bad in a few places in the days leading up to the eclipse, but not too bad.  It looks like people trickled in from here and there over a period of days, and relatively few people headed into the zone of totality on the day, or even at all.  There are 25 million people in the driveshed for Salem, Oregon.  Officials in Oregon planned on one million.  It's not clear that there were even that many.

Coming home, on the other hand, was a different matter.  There was no particular schedule for getting to the zone of totality, but the show ended at a very precise time, and suddenly there was no reason to stick around.  Hours-long delays were common across the country.  This seems obvious in hindsight, but in the run-up to the event the main concern was "will I be able to make it in time?" Most people were probably not really concerned about getting home at any particular time.

As far as I can tell, Sabetha, Kansas saw the same thing in miniature [Toward the bottom of the page it mentions that Sabetha's Sixth Street Park had a crowd of ... about 200 people --D.H.].

So much for traffic.  The eternal concern for eclipse chasers everywhere, whether on the I-95 corridor or Mongolia, is the weather.  Personally, I've seen two total eclipses (this one and the Great European Eclipse of 1999), and in both cases sheer dumb luck brought a break in the clouds in time to see the corona and the diamond ring.

I followed weather forecasts closely in the days leading up to the eclipse.  Unlike 18 years ago, there were a number easily available online.  Weather forecasting has advanced significantly.  Live radar is available on any number of sites, as are recent satellite images.  If there's a hole in the cloud cover, it doesn't seem like it should be too hard to find.

If you've got an internet connection.

I gave up trying to get online and our party ended up just picking a spot.  My guess is that if we'd had to make an emergency call it would have had priority and it would have been connected.  Data, not so much.  It's almost as though there were more people in town than usual and they were all trying to get on the web at the same time.

We thought about trying to move toward what looked like a clearer spot, but in the time it took to try to figure it out the clouds shifted and a beautiful blue gap opened up with about 30 minutes to go.  Then closed again as the encroaching moon shut the light down by bit.  Then the sky abruptly darkened more.  The eclipsed sun was up there somewhere.  The clouds above were dark, but the horizon was dusk (or dawn, if you prefer) in all directions.  It looked like some of the low clouds were moving away from where the sun must have been, but it was hard to tell.

And then the clouds shifted and there was the silvery ring of the corona a minute or more into totality.  A little washed out, like the full moon through high clouds, but there.  Then, after what seemed like no time at all, an incredibly bright pinpoint of light widening into a slim maybe-crescent.  Time to look away.

A mile down the road, people saw nothing but clouds.  On the way back, through what had been overcast and rain, the sun was out.  Would live radar really have helped?  Who knows.

The two eclipses I've seen happened to be part of the same Saros series, a set of eclipses at 18-year intervals with nearly identical geometry.  The latitude was similar.  The duration was similar.  As it happens, the weather was similar.  In theory, technology would have played a much larger role in this one than the last one, but in practice, not really.

Lists and limitations

There are several things that Wikipedia does that you wouldn't necessarily guess just from a description like "online, world-editable encyclopedia."  One of my favorites is that tends to accumulate lists.  All kinds of lists.  List of bridges.  List of Eastenders characters.  List of enzymes.  List of Russian poets.  List of screw drives.  List of numbers (always room for one more).  List of presidents of Brazil.  List of fictional primates.  List of fires.  And, of course, List of lists of lists.

Honestly, part of the attraction is the sheer fun of seeing just what sorts of things people have seen, but there's a more serious point, too.   Our brains are subject to all sorts of biases that lead us to remember things selectively.  We tend remember things by their impact.  We remember more recent things better than less recent things (though we also tend to remember the beginning of a list of things better than the middle).  We remember things that we encounter more than similar things we don't.  We tend to remember things that have stressful emotional associations, and so forth.  In fact, there's a great list of these biases on Wikipedia.

One antidote to our natural biases is to lay out all the information in one place, for example, in a list.

To take a random example, what's the largest city in the US, meaning the one with the most area?  I've spent some time in LA, and I'd think that it, or one of its suburbs, must be pretty large.  Let's check the List of US Cities by Area.  And the winner is ... Sitka, Alaska, at 7434 sq km (2870 sq mi), population 8881.  Next on the list are Juneau, Wrangell and Anchorage, all in Alaska.

None of these is what we'd think of a a "big city", however large the city limits might be.  In fact, all of these are consolidated city/counties, and it's not surprising that counties in Alaska would be on the large side.  Likewise for Anaconda and Butte, both in Montana, a little farther down the list.  The first physically large city with a large population is number five: Jacksonville, Florida, area 1935 sq km (747 sq mi), population 821,784 (though Anchorage is over 200 thousand).  The first with a population over a million is Houston, Texas, area 1553 sq km (600 sq mi), population 2,099,451.

The next large cities with over a million are Phoenix and ... oh, there's LA at number 12.  Interestingly, New York, New York, which I tend to think of as the prototypical "lots of people packed into not much space", is number 24 on the list, between Kansas City, Missouri (Kansas City, Kansas is considerably smaller in both area and population), and Augusta, Georgia.

If all this matches up with your preconceptions, congratulations, your preconceptions are better than mine.

Or suppose there's been a major flood in your area (and, of course, I hope there hasn't).  Seems like it's happening more and more around the world?  Is it? Well, one way to find out would be to look at the List of Floods.  And, indeed, it looks like there have been many more in this century than before.  In fact, in the 1990s, the list shifts from decade-by-decade to year by year.

But that's not really right.  We've just gotten better at reporting them, and more recently-reported events might be easier to link to on Wikipedia.  So there are limits.  In this case, there's a clear sampling bias (I wouldn't quite call it recency bias, since that has to do with an individual's memory.

Maybe I'll just go back to browsing the List of images on the cover of Sgt. Pepper's Lonely Hearts Club Band.  Interesting bunch, that lot.

Happy 10th, Field Notes

On August 23, 2007 I published the very first post on this blog.  As I've said before, the original aim was to join the blogging community and, frankly, improve my job prospects.  I would be hooked into the network of tech bloggers, doors would open and life would be good.

As it turned out, this didn't happen, but doors ended up opening, life is good and I'm grateful.

The blog, for its part, has evolved on its own.  My original aim was to be fairly technical, but that soon fell by the wayside.  Not that I never get technical, just that I'm not particularly writing for fellow geeks.  Rather, I'm writing for that hypothetical "intelligent layperson", someone who's not deeply versed in the field but knows a thing or two and is interested to find out a bit more.  If that's you, and you've been able to find out a bit more, I'm glad to have helped.

I used to have a self-imposed quota of ten posts a month.  That was good in that it prompted me to post, but eventually it felt like I was just doing it to do it.  This being the tenth anniversary, and with the quota in mind, it would make sense to put together a sudden flurry of ten posts to mark the occasion.  So I've put together five, counting this one.


Thursday, January 12, 2017

Identity redux

Today I spent an embarrassing amount of time trying to figure out why I couldn't use SSH with my new GitHub account, before figuring out that I needed to log in as and not  Evidently I'm not the first person to stub a toe on this, but it got me thinking about one of the earliest topics on this blog: identity.

A natural way to think about identities and logging in is that your username is your identity and there are various ways of authenticating that identity, for example
  • a password
  • a password and a second factor such as a magic number sent via SMS or generated by a smartcard
  • a public/private key pair
  • (in some SSH contexts) hostname or IP address and public/private key pair
Others are possible, of course.  What the GitHub experience made clear to me is that the "username" part is secondary, at least as far as SSH is concerned.  The important part of authenticating SSH is the key.

As far as I can tell, GitHub is taking the public key offered during the SSH handshake and looking it up to get the account, and thus the account name.  That's probably also why when you try to upload a key you've already uploaded (e.g., to check that you haven't taken complete leave of your senses while trying to figure out why you can't log in), the error message is "already in use".  It doesn't say by whom, even when it's you.  The rule is one account per key (but potentially multiple keys per account).

This suggests a different approach to identity.  As far as the web is concerned a key, and in general an authentication method, is an identity.  This is more or less the case with Bitcoin wallets, and to some extent for PGP and other email privacy schemes, but even then for the most part we talk about using keys to establish an identity.

Let's run through some data modeling to see how this all fits:
  • People, identities and resources to be accessed (such as accounts) are three different things.
  • A person can have multiple identities
  • Multiple people can use the same identity, though that's often not a good thing
  • A resource can be accessed by multiple identities
  • In general, though not in the case of GitHub accounts, an identity can access multiple resources
There are two reasons I find this key-is-identity model attractive.  One is that your web server doesn't see you, it sees the credentials you present.  It really only sees the key, or at least only ought to look at it when verifying identity. Yes, it may also know things like which IP address someone is connecting from, but even though that information can sometimes be a useful hint that something's not right, it's ephemeral, not part of the identity. 

The other, maybe just the first with a different emphasis, is that it loosens the connection between resources and people.  It might be nice to think that Gavin Belson logged in to your server with username and the proper password, but it's better to think that someone logged in with those credentials.  You know that logged in.  You don't know that that was Gavin Belson (I'm looking at you, Gilfoyle).  The identity that matters here is, not Gavin Belson.

Except that is associated with a password, which can change without changing what we mean by the username (or, one would hope, it's associated with a password and a second factor, such as a phone or smartcard).  Are we really going to say that if Gavin changes his password, we're dealing with a different identity?

Let's try "yes".  The whole point of changing your password is that anyone who knew your old password won't know your new one.  We presume that, at least at first, only Gavin knows the new password.  From the point of view of the system Gavin is logging in to, (, old password) is indeed a different identity from (, new password) because there are potentially different sets of people who could be each.

What if Gavin uses his phone as a second factor?  There are a number of ways to do that, so suppose that the server sends him a text with a magic number when he tries to log in and expects to get that number back as part of the login process.  That provides a reasonable assurance that whoever's logging in has both Gavin's password and his phone (assuming the text isn't intercepted).  If Gavin does have his phone, it also informs him that someone, hopefully it's actually him, is trying to log in.

Suppose Gavin switches phone numbers but keeps his password the same.  Should we consider that a new identity?  I think the same logic still applies.  If Gavin's password has already been compromised and he changes his phone number, then someone might manage to grab the old phone number, and so forth.  In any case, the set of people with access to the old phone number is potentially different from the set of people with access to the new one, so different identity.

If you're carefully tracking who did what to a resource, you need to track the authentication.  A different means of identification, even for the same user name, means potentially different people.

One logical conclusion of this is that username is not identity.  So what is it?

It's a name.  Seems plausible, at least.

Names are yet another concept, distinct from person, identity or resource (personae are yet again distinct, but this is getting complicated enough as it is).  For example, sure looks like an identity, but when you send email, you're really just sending it to an address which is connected to an inbox (which is a resource).

There may be more than one address connected to a given inbox.  Likewise, the name I use to log in to access that inbox may or may not be an email address (for example, my ISP provides me with an email address I never use, but if I want to see mail for that address I log in with a username that's different from the email address).  Likewise for however you logged in to send me mail.  If we're using secure mail then, regardless of everything just mentioned, you'll encrypt the mail to your recipient's public key and sign it using your private keys.  The keys are the real identity, because we trust them.

I'm comfortable with this, too.  I've come to think that one of the most fertile sources of bugs is confusing names with identities (see this post for some a bit more on names).  Names are convenient, but ideally they're only used to look up what you're really interested in.  I personally prefer systems in which renaming things is cheap, if only because I generally come to hate the names I come up with to start with, no matter how much careful thought I gave them.

The way you generally do this is to assign a unique id -- typically a hundred bits or so of random gibberish -- to each resource that can be named, and then maintain a map from name to id.  When you access a resource, say an account, by name, you look up the name in that map, stash the id somewhere and use the id to access the object.  If the nameid map changes, you still have the id and you can still find the same object.  The system can maintain as many names for a given object as it sees fit, but each name corresponds to only one object (at least at any given time).

Summing up
  • Web servers don't see people directly.  They see the credentials that people (and other servers, for that matter) present.
  • The credentials are distinct from who (or what) uses them
  • The names we use to refer to resources are distinct from the credentials that can be used to access them.
I think public key systems line up well with the key-is-identity model because the public key is the single identifying item.  In a password scheme, whether you consider (username, password) or just username to be the identity, you are giving two pieces of information, both of which are durable, but which must generally be kept separate because one of them is meant to be secret.  The password isn't truly private.  The host you're logging into has to know it [technically, it only has to store a secure hash of the password together with a bit of randomly-generated "salt", but from a security point of view that's only a bit better since it still has to see the password during the login in order to do the hash and comparison, and password files can be stolen and attacked brute-force offline --D.H. Feb 2017].

In a public key scheme, there is still a public part and a private part, but you present only the public part.  The private key remains truly private.  It's generally stored encrypted, guarded by a passphrase that you only use locally.  If you change the passphrase, no one else needs to know.  Authenticating means exchanging ephemeral information, much of it randomly generated, that will never be used again.  All of this makes it much easier to keep the private key secure for long periods of time, so the public key can serve as a durable identity.  Since it's the only durable thing that other parties see, it's sufficient to serve as an identity by itself.

There's a long way to go yet, but it seems likely that the world will gradually shift to key-as-identity, or something at least as strong.