Thursday, December 13, 2018

Common passwords are bad ... by definition

It's that time of the year again, time for the annual lists of worst passwords.  Top of at least one list: 123456, followed by password.  It just goes to show how people never change.  Silly people!

Except ...

A good password has a very high chance of being unique, because a good password is selected randomly from a very large space of possible passwords.  If you pick your password at random from a trillion possibilities*, then the odds that a particular person who did the same also picked your password are one in a trillion, the odds that one of a million other such people picked your password are about one in a million, as are the odds that any particular two people picked the same password.  If a million people used the same scheme as you did, there's a good chance that some pair of them accidentally share a password, but almost certainly almost all of those passwords are unique.

If you count up the most popular passwords in this idealized scenario of everyone picking a random password out of a trillion possibilities, you'll get a fairly tedious list:
  • 1: some string of random gibberish, shared by two people
  • 2 - 999,999: Other strings of random gibberish, 999,998 in all
Now suppose that seven people didn't get the memo.  Four of them choose 123456 and three of them choose password.  The list now looks like
  • 1: 123456,  shared by four people
  • 2: password,  shared by three people
  • 3: some string of random gibberish, shared by two people
  • 4-999,994:  Other strings of random gibberish, 999,991 in all
Those seven people are pretty likely to have their passwords hacked, but overall password hygiene is still quite good -- 99.9993% of people picked a good password.  It's certainly better than if 499,999 people picked 123456 and 499,998 picked password, two happened to pick the same strong password and the other person picked a different strong password, even though the resulting rankings are the same as above.

Likewise, if you see a list of 20 worst passwords taken from 5 million leaked passwords, that could mean anything from a few hundred people having picked bad passwords to everyone having done so.  It would be more interesting to report how many people picked popular passwords as opposed to unique ones, but that doesn't seem to make its way into the "wow, everyone's still picking bad passwords" stories.

From what I was able to dig up, that portion is probably around 10%.  Not great, but not horrible, and probably less than it was ten years ago.  But as long as some people are picking bad passwords, the lists will stay around and the headlines will be the same, regardless of whether most people are doing a better job.

(I would have provided a link for that 10%, but the site I found it on had a bunch of broken links and didn't seem to have a nice tabular summary of bad passwords vs other passwords from year to year, so I didn't bother)

*A password space of a trillion possibilities is actually pretty small.  Cracking passwords is roughly the same problem as the hash-based proof-of-work that cyrptocurrencies use.  Bitcoin is currently doing around 100 million trillion hashes per second, or a trillion trillion hashes every two or three hours.  The Bitcoin network isn't trying to break your password, but it'll do for estimating purposes.  If you have around 100 bits of entropy, for example if you choose a random sequence of fifteen capital and lowercase letters, digits and 30 special characters, it would take a password-cracking network comparable to the Bitcoin network around 400 years to guess your password.  That's probably good enough.  By that time, password cracking will probably have advanced far beyond where we are and, who knows, maybe we'll have stopped using passwords by then.

Saturday, December 8, 2018

Software cities

In the previous post I stumbled on the idea that software projects are like cities.  The more I thought about it, I said, the more I liked the idea.  Now that I've had some more time to think about it, I like the idea even more, so I'd like to try to draw the analogy out a little bit further, ideally not past the breaking point.

What first drew me to the concept was realizing that software projects, like cities, are neither completely planned nor completely unplanned.  Leaving aside the question of what level of planning is best -- which surely varies -- neither of the extremes is likely to actually happen in real life.

If you try to plan every last detail, inevitably you run across something you didn't anticipate and you'll have to adjust.  Maybe it turns out that the place you wanted to put the city park is prone to flooding, or maybe you discover that the new release of some platform your depending doesn't actually support what you thought it did, or at least not as well as you need it to.

Even if you could plan out every last detail of a city, once people start living in it, they're going to make changes and deviate from your assumptions.  No one actually uses that beautiful new footbridge, or if they do, they cut across a field to get to it and create a "social trail" thereby bypassing the carefully designed walkways.  People start using an obscure feature of one of the protocols to support a use case the designers never thought of.  Cities develop and evolve over time, with or without oversight, and in software there's always a version 2.0 ... and 2.1, and 2.2, and 2.2b (see this post for the whole story).

On the other hand, even if you try to avoid planning and let everything "just grow", planning happens anyway.  If nothing else, we codify patterns that seem to work -- even if they arose organically with no explicit planning -- as customs and traditions.

In a distant time in the Valley, I used to hear the phrase "paving the cow paths" quite a bit.  It puzzled me at first.  Why pave a perfectly good cow path?  Cattle are probably going to have a better time on dirt, and that pavement probably isn't going to hold up too well if you're marching cattle on it all the time ...  Eventually I came to understand that it wasn't about the cows.  It was about taking something that people had been doing already and upgrading the infrastructure for it.  Plenty of modern-day highways (or at least significant sections of them) started out as smaller roads which in turn used to be dirt roads for animals, foot traffic and various animal-drawn vehicles.

Upgrading a road is a conscious act requiring coordination across communities all along the roadway.  Once it's done, it has a significant impact on communities on the road, which expect to benefit from increased trade and decreased effort of travel, but also communities off the road, which may lose out, or may alter their habits now that the best way to get to some important place is by way of the main road and not the old route.  This sort of thing happens both inside and outside cities, but for the sake of the analogy think of ordinary streets turning into arterials or bypasses and ring roads diverting traffic around areas people used to have to cross through.

One analogue of this is in software is standards.  Successful standards tend to arise when people get together to codify existing practice, with the aim of improving support for things people were doing before the standard, just in a variety of similar but still needlessly different ways.  Basically pick a route and make it as smooth and accessible as possible.  This is a conscious act requiring coordination across communities, and once it's done it has a significant impact on the communities involved, and on communities not directly involved.

This kind of thing isn't always easy.  A business district thrives and grows, and more and more people want to get to it.  Traffic becomes intolerable and the city decides to develop a new thoroughfare to carry traffic more efficiently (thereby, if it all works, accelerating growth in the business district and increasing traffic congestion ...).  Unfortunately, there's no clear space for building this new thoroughfare.  An ugly political fight ensues over whose houses should get condemned to make way and eventually the new road is built, cutting through existing communities and forever changing the lives of those nearby.

One analog of this in software is the rewrite.  A rewrite almost never supports exactly the same features as the system being rewritten.  The reasons for this are probably material for a separate post,  but the upshot is that some people's favorite features are probably going to break with the rewrite, and/or be replaced by something different that the developers believe will solve the same problem in a way compatible with the new system.  Even if the developers are right about this, which they often are, there's still going to be significant disruption (albeit nowhere near the magnitude of having one's house condemned).


Behind all this, and tying the two worlds of city development and software develop together, is culture.  Cities have culture, and so do major software projects.  Each has its own unique culture, but, whether because the same challenges recur over and over again, leading to similar solutions, or because some people are drawn to large communities while others prefer smaller, the cultures of different cities tend to have a fair bit in common, perhaps more in common with each other than with life outside them.  Likewise with major software projects.

Cities require a certain level of infrastructure -- power plants, coordinated traffic lights, parking garages, public transport, etc. -- that smaller communities can mostly do without.  Likewise, a major software project requires some sort of code repository with version control, some form of code review to control what gets into that repository, a bug tracking system and so forth.  This infrastructure comes at a price, but also with significant benefits.  In a large project as in a large city, you don't have to do everything yourself, and at a certain point you can't do everything yourself.  That means people can specialize, and to some extent have to specialize.  This both requires a certain kind of culture and tends to foster that same sort of culture.


It's worth noting that even large software projects are pretty small by the standards of actual cities.  Somewhere around 15,000 people have contributed to the git repository for the Linux kernel.  There appear to be a comparable (but probably smaller) number of Apache committers.  As with anything else, some of these are more active in the community than others.  On the corporate side, large software companies have tens of thousands of engineers, all sharing more or less the same culture.

Nonetheless, major software projects somehow seem to have more of the character of large cities than one might think based on population.  I'm not sure why that might be, or even if it's really true once you start to look more closely, but it's interesting that the question makes sense at all.