Field notes on the Web: security

Showing posts with label security. Show all posts

Thursday, December 13, 2018

Common passwords are bad ... by definition

It's that time of the year again, time for the annual lists of worst passwords. Top of at least one list: 123456, followed by password. It just goes to show how people never change. Silly people!

Except ...

A good password has a very high chance of being unique, because a good password is selected randomly from a very large space of possible passwords. If you pick your password at random from a trillion possibilities*, then the odds that a particular person who did the same also picked your password are one in a trillion, the odds that one of a million other such people picked your password are about one in a million, as are the odds that any particular two people picked the same password. If a million people used the same scheme as you did, there's a good chance that some pair of them accidentally share a password, but almost certainly almost all of those passwords are unique.

If you count up the most popular passwords in this idealized scenario of everyone picking a random password out of a trillion possibilities, you'll get a fairly tedious list:

1: some string of random gibberish, shared by two people
2 - 999,999: Other strings of random gibberish, 999,998 in all

Now suppose that seven people didn't get the memo. Four of them choose 123456 and three of them choose password. The list now looks like

1: 123456, shared by four people
2: password, shared by three people
3: some string of random gibberish, shared by two people
4-999,994: Other strings of random gibberish, 999,991 in all

Those seven people are pretty likely to have their passwords hacked, but overall password hygiene is still quite good -- 99.9993% of people picked a good password. It's certainly better than if 499,999 people picked 123456 and 499,998 picked password, two happened to pick the same strong password and the other person picked a different strong password, even though the resulting rankings are the same as above.

Likewise, if you see a list of 20 worst passwords taken from 5 million leaked passwords, that could mean anything from a few hundred people having picked bad passwords to everyone having done so. It would be more interesting to report how many people picked popular passwords as opposed to unique ones, but that doesn't seem to make its way into the "wow, everyone's still picking bad passwords" stories.

From what I was able to dig up, that portion is probably around 10%. Not great, but not horrible, and probably less than it was ten years ago. But as long as some people are picking bad passwords, the lists will stay around and the headlines will be the same, regardless of whether most people are doing a better job.

(I would have provided a link for that 10%, but the site I found it on had a bunch of broken links and didn't seem to have a nice tabular summary of bad passwords vs other passwords from year to year, so I didn't bother)

*A password space of a trillion possibilities is actually pretty small. Cracking passwords is roughly the same problem as the hash-based proof-of-work that cyrptocurrencies use. Bitcoin is currently doing around 100 million trillion hashes per second, or a trillion trillion hashes every two or three hours. The Bitcoin network isn't trying to break your password, but it'll do for estimating purposes. If you have around 100 bits of entropy, for example if you choose a random sequence of fifteen capital and lowercase letters, digits and 30 special characters, it would take a password-cracking network comparable to the Bitcoin network around 400 years to guess your password. That's probably good enough. By that time, password cracking will probably have advanced far beyond where we are and, who knows, maybe we'll have stopped using passwords by then.

Tuesday, December 30, 2014

That CAPTCHA moved!

While recovering a password for a site -- that is, my real password was whatever information the recovery page needed -- I noticed a new wrinkle on CAPTCHA: Moving CAPTCHA. Instead of the usual smeared-out or obscured letters, three plainly readable letters, somewhat tilted, on a clearly contrasting background, but wiggling slightly back and forth.

Seems like an interesting step in the whole OCR arms race, except ...

The problem for an attacker to solve here isn't recognizing a moving character, which might or might not be harder than recognizing a still one. It's grabbing a frame of the animation to examine. If you can do that at all, then recognizing one particular arrangement of the letters is no harder than recognizing any other CAPTCHA. Easier, in fact, since you have nice, legible letters, and you can re-run the OCR on each frame and go with the consensus.

Again, I haven't looked at this in detail, but there would seem to be two main ways of putting the moving image up in the first place: A .gif or other animated image format, which is no problem to decode into its images, or some sort of JavaScript animation. That might be harder to grab, but not because of the animation. You can just as well use JavaScript to put up a still image, and in either case the answer is to render the JavaScript and then grab the pixels.

In other words, it seems unlikely that the moving image adds any real difficulty for an attacker. It does look harder, intuitively, to the human eye, but the attacker isn't using a human eye -- that's the whole point of the exercise to begin with.

Tuesday, September 30, 2014

Heartbleed, Shellshock and Raymond's Linus's Law

You have probably heard by now that bash, one of the basic tools in the Linux/GNU toolkit, has had a glaring vulnerability for the last, oh, twenty-plus years, now deemed Shellshock. You've probably also heard of the Heartbleed vulnerability in OpenSSL. Apart from making international press and raising serious questions about computer security, these two bugs have a number of features in common:

They're implementation bugs. Bash, as defined in its documentation, does not allow the sort of behavior that Shellshock allows, and likewise for SSL (the protocol) and OpenSSL (an implementation of SSL). In both cases, the implementations were doing things they shouldn't have.
They're basic implementation bugs. In Shellshock, text which should be ignored or discarded is instead interpreted as a command. In Heartbleed, a reply message which is supposed to have a given length instead has another.
No one noticed them for a long time. In the case of shellshock, a very long time. Or at least, no one seems to have visibly exploited them.

It's that last item I want to focus on. In his famous essay The Cathedral and the Bazaar, extolling the virtues of open source development, Eric Raymond claimed that "given enough eyeballs, all bugs are shallow," or in other words, if you had enough people looking at the source code to a system, any serious issues would be flushed out and fixed quickly. He called this principle Linus's Law, in honor of Linux creator Linus Torvalds (Linus didn't come up with it. Linus did put forth his own Linus's Law, but it doesn't seem to have garnered much attention).

In any case, despite bash and OpenSSL being two of the most widely used tools in the software world, these basic and serious bugs don't seem to have been flushed out quickly at all. Now, it is possible that multiple people noticed the problems, shrugged and went on with their lives, or that some entity or another discovered the bugs and exploited them very quietly, but that's not how Raymond's Linus's law is supposed to work.

I think there are two reasons for this.

First, as many have pointed out, there's no convincing evidence that more eyeballs really do mean more bugs found. Rather, it seems that you quickly hit diminishing returns. Four people may or may not find about twice as many bugs as two people, but forty people probably won't find twice as many bugs as twenty. Forty people may not even find twice as many bugs as two.

Exactly why this might be is a good research topic, but I'd guess that a lot of it is because some bugs are easy to find, some aren't, and once you've found the easy bugs throwing more eyeballs at the problem (now there's an image) won't necessarily help find the hard bugs.

One of the sobering implications of Shellshock and Heartbleed is that even simple bugs can be hard to find, but that's not news to anyone who's done much coding.

I think there's a second reason, though, more subtle than the first but worth noting: There probably aren't really that many eyeballs on the source code to begin with.

In theory, millions of people could have found either of these two bugs. If you've installed Linux, you have the bash and OpenSSL source code, or if you didn't copy it, you can easily get at it. Odds are you didn't, though, unless you were actively developing one of those packages. Why would you? I use Linux systems all the time. I don't want to study the source code. I just want it to work. I have looked at various parts of the Linux/GNU source, but generally just to see how it worked, not with a particular eye toward finding bugs. Maybe that makes me a bad net.citizen, but if so, I'm pretty sure I'm in good company.

OK, but there have still been hundreds of contributors to each of those projects. Surely one of them would have seen the problem code and fixed it? Not necessarily. A tool like bash consists of a large number of modules (more or less), and the whole point of breaking things down into modules is that you can work on one without caring (much) about (many of) the others. Someone who worked on job control in bash would not necessarily have even looked at the environment variable parsing, which is where the problem actually was.

In other words, there might only have been a handful of people who even had the opportunity to find Shellshock or Heartbleed in the source code, and they didn't happen to spot the problems, probably because they were trying to get something else done at the time.

There's another kind of eyeball, though: testers. Even if only a few people were looking closely at the source, lots of people actually use bash, OpenSSL and other open-source tools.

Fair enough, but again, their attention is not necessarily focused where the bugs are. Most people logging into a Linux box and using bash are not going to be defining functions in environment variables. Most script writers aren't either (though git, headed by Linus himself, seems to like to). It's a moderately tricky thing to do. Likewise, almost no one using OpenSSL is even going to be in a position to look at heartbeat packets. Most of us don't even know if we're using OpenSSL or not, though if you've visited an https:// URL, you probably have.

In short, Raymond's implicit assumption that bug-finding is a matter of many independent trials, in the statistical sense, evenly distributed over the space of all possible bugs, looks to be wrong on both counts: "many" and "independent".

[The current Wikipedia article on Linus's law cites Robert Glass's Facts and Fallacies about Software Engineering, which made similar observations in 2003, over a decade before this was posted. It also no longer seems to mention any version of Linus's law due to Linus himself. That was removed in this edit --D.H. Oct 2018]

Saturday, June 21, 2014

Return of the cow clicker

I've previously written about Cow Clicker, a Facebook game in which players would click on an image of a cow, and later just the space where a cow had once been, thereby sending a message to all their friends that they had done so. While not a runaway hit, Cow Clicker did manage to attract some 50,000 users, some portion of whom paid real money for the privilege of clicking more often, or on a fancier cow (Bling Cow could be yours for only $100).

The idea behind Cow Clicker was to reduce social gaming to its barest elements, partly as parody and partly as a study of social gaming behavior. Fast forward a few years, and someone has done the same thing for mobile phone apps. The Yo app will send a message to any of your contacts saying, simply "yo". Unlike Cow Clicker, Yo has attracted hundreds of thousands of users so far, who have already sent millions of yos.

This popularity has had two not-too-shocking consequences. On the one hand, it has attracted $1 million in funding. On the other hand, it has been hacked.

Actually, the hack doesn't seem so much a hack as a matter of the app leaking confidential information and someone noticing it. Three college students using the app were able to get the personal phone number of the founder, text him and get a call back.

What does it all mean? Anyone who thinks it means the end of civilization as we know it is forgetting that civilization as we know it produced the tulip mania, phone booth packing, pet rocks and any number of other major and minor follies. Nor can it possibly be surprising that an app, however trivial, that could gather hundreds of thousands of users in short order might attract investment money. Whether or not you believe that the attention economy is anything new or different, getting people's attention is potentially worth money ... "This Yo brought to you by Spümcø".

Somewhat more concerning, though still not surprising, is that even a simple app like Yo would leak confidential information. Security in applications of all kinds is still something you have to build in, or at least you can't assume that your app is secure just because you haven't done anything to make it insecure. To some extent this is a hard problem. Any useful app will involve some form of communication, and any communication exposes information, even if it's only who's communicating with whom (which can reveal much more than you might think).

It's been a couple of years since Cow Clicker's heyday. Most likely the ruckus about Yo will die down and in another few years another minimal app will take its place. Sic transit gloria mundi.

Tuesday, March 19, 2013

Password reductio ad absurdum

I was just now logging into a site I hadn't logged into in several months, one for which I wanted to be sure I had a unique password. Naturally, I'd forgotten the password. So I clicked on the forgotten password link and chose the email option. There was also a security questions option. I should remember to make up some random lies for that, since I'm not going to use it and would prefer no one else did either.

Before too long, an email arrived with a clearly randomly-generated sequence of twelve upper- and lowercase letters. That's about 68 bits of entropy. If you could guess a trillion passwords a second [which, scarily enough, is not at all far-fetched], it would still take about 12 years to guess all the possibilities. I'm not a great fan of passwords in general, except when used locally to unlock something that's actually secure, but that's a pretty reasonable password generation scheme.

So I log in with my new password. Before I actually get in to the site, I'm told I need to change my password.

Because it's too weak.

Because it doesn't have a letter and a number.

But I'm free to make up any seven-character or longer sequence that does contain a number and a letter, which does at least filter out all but two of SplashData's top 25 list of weak passwords (all but trustno1 and password1). Let's just say it's 92% effective at improving password security and leave it at that.

Wednesday, November 14, 2012

Getting Smart with email

It appears that two participants in a prominent scandal -- if you're reading this now, you know which one, and if you're reading this later, it won't really matter -- tried to cover their email tracks by not actually sending email at all. Instead, they shared an email account and would write messages but save them to the Drafts folder for the other to read.

I'm a bit unclear on how this helps significantly, particularly since it doesn't seem to have worked all that well in this case.

The act of sending email itself is reasonably secure. If you and your recipient are both using one of the major providers (the same provider, that is), then sending email just means copying some bits, if that. Nothing need go out over the public internet. Likewise, reading that email just means logging in and viewing it. You are using HTTPS, aren't you? Probably, even if you don't know it, but it's worth checking your email settings just in case.

If you're up to no good and storing email on an unencrypted local drive, you deserve to lose.

So it really comes down to how many passwords you would have to crack to get at the messages. Consider two scenarios:

Alice and Bob have separate accounts with MyEmail.com, which supports two-factor authentication. That means that it's not enough just to know the password. When you log in, you give not just the password, but a magic number from a text message sent to your phone, or from some other kind of device that produces single-use magic numbers.
Alice and Bob share a TOP SEEKRIT "drop box" account with just a password.

In scenario 2, if I can crack that one password, I can see the whole correspondence, so long as I think to check the Drafts folder. Alice and Bob basically have a password plus a bit of security through obscurity, otherwise known as "no additional security".

In scenario 1, I have two passwords to try to guess, which means two chances at success instead of one. So far, so good. I crack one of the passwords and log in. The login screen then says "enter the magic number we just sent to your phone". Oops. Not only do I not have the magic number to log in with, Alice (or Bob, as the case may be) now knows that someone is trying to log in.

I suppose it would be possible for Alice could set her phone to forward magic number messages to Bob (or vice versa, but not both!) and use two-factor authentication that way, assuming no one will ask why Bob is getting strange texts with random numbers in them for no apparent reason. I'd then have to crack the shared password and steal a phone, more or less what I'd have to do in scenario 1, except instead of having a choice of passwords to crack, I have a choice of phones to steal.

Note that some two-factor authentication schemes use a cryptocard or something similar as a second factor. That would make sharing the account physically impossible, unless Alice and Bob are in the same room, in which case the Cone of Silence is probably the better option.

All bets are off if The Man is able to force MyEmail.com to give up access to the account, but that applies equally well in either scenario.

Thursday, June 28, 2012

Yet another wacky security scheme

Passwords are easy to get wrong. Trying to make people come up with "stronger" passwords just makes it worse. Security questions just provide another avenue of attack, probably an easier one. So, ladies and gentlemen, may I introduce to you: The security word.

"What is it?", you may later regret asking.

You give the site a "security word". Later, they will ask you not for the word, but a few randomly selected letters, for example the second, fifth and eighth, and next time it might be the first, fifth and sixth (note to self -- lopadotemachoselachogaleokranioleipsanodrimhypotrimmatosilphioparaomelitokatakechymenokichlepikossyphophattoperisteralektryonoptekephalliokigklopeleiolagoiosiraiobaphetraganopterygon may not be the best choice for this exercise).

If you picked, say, security, and the system asks for the second fifth and eighth letters, you would give 'e', 'r' and 'y'. If someone's looking over your shoulder, how much information do they have? Let's fire up the old UNIX shell

$ grep '^.e..r..y.*' /usr/share/dict/words | wc -l
   84

What this means is that there are 84 words in the dictionary on my system that have 'e', 'r' and 'y' in those positions, or about six bits of entropy. Most of them are words like ventrohysteropexy and dextrogyratory that people are unlikely to pick. The person who helped me set up the account in question recommended something "easy to remember". Odds are it's "security".

If not, all an attacker has to do is guess the letters that the site asks for next time. There's a good chance that at least one will be one the attacker has already seen. There won't be a lot of choices for the unknown letters. Without looking at the list, I'd bet that 'q' isn't on it and 'e', 't' and a few others cover most of the possibilities. Even without having looked over your shoulder, an attacker would know just from the security word being English that certain letters are better to try in certain positions.

So basically you have another hoop to jump through that adds minimal actual security, but tries to create the illusion of strong security, while really just making the system harder to use. Huzzah.

Friday, February 24, 2012

Is it OK to tweet "fire" in a crowded theater?

Evidently not.

Or at least, it's not a good idea to tweet in jest that you'll blow an airport sky-high if it remains closed for snow, so preventing you from visiting your girlfriend. Paul Chambers of Doncaster, England found this out the hard way, paying a fine of £1000, gaining a criminal record and losing his job in the bargain. His appeal will be heard before the high court of the UK and his defense has had at least one high-profile fundraiser, but it's all a bit sobering, to say the least.

This lack of humo(u)r on the part of airport security is not new, by the way, nor limited to the UK. I remember as a kid -- so, ahem, well before 9/11 -- noticing a sign at the airport we were flying out of saying it was a federal crime even to joke about hijacking, bombs and such, and promptly blanching and making a mental note not to make any smart comments to the nice folks by the metal detector.

With that in mind, the remarkable aspect of the case isn't so much that it involves Twitter, though it is one of the first such cases, but that the authorities chose to prosecute for this particular remark at all. I don't know how often such cases are prosecuted, but I'd guess it's not too often. They certainly don't seem to make the press much. I doubt the story would have been less remarkable had Mr. Chambers been brought in for making the same remark in person at the ticket counter.

In any case, caveat tweetor.

[Paul Chambers' conviction was eventually quashed, two and a half years later on the third appeal, the case having attracted considerable attention and celebrity involvement. It's not clear if his job was reinstated, but according to Wikipedia he and his girlfriend did eventually marry.]

Monday, November 7, 2011

Banking on web security

People do care about web security. There are highly competent full-time professionals in the field. There are conferences on the subject on a regular basis. You'll see them in the press -- Experts Meet to Fix Security on the Web.

And yet, in large part because the problems to be solved are hard and involve significant non-techical factors, there is no shortage of things that could stand to be fixed.

Authentication is a mess. For the most part, we have passwords and security questions. I've griped about this before, multiple times, and I'm sure I'll gripe about it again.
Identity is a mess. Everyone has scads and scads of identities -- logins here, there and everywhere. They can easily get confused ("That wasn't me, that was some other David Hull!"). There's no good way to say two random identities are or aren't the same. I've griped and speculated about this before, too, and I expect I'll have more to say on that, too.
Anonymity is problematic. Everything you do on the web leaves traces, but unless you're paying extremely close attention you generally don't know exactly what kind, or whether they can be tied to your identity (whatever that is).
Network infrastructure is scary. Https with certificates is widely deployed, and most people probably at least know that some sites are "secured" and some aren't, but many fewer understand (or should need to understand) details like signatures, secure hashes and certificate authorities, or what can fail and what's less likely to. Did I mention DNS?
PCs are scary. Viruses, rootkits, system crashes ... some platforms are better designed than others, but nothing's perfect.
The cloud has its own problems. Who owns what you put there? Who's liable if data is lost or compromised? Who can see what? Who can see who sees what?
Spam is a perennial problem, not helped by any of the above.

I could go on, but if it's so bad -- and it is -- how does it work at all? People continue to be able to use credit cards both online and in person, people continue to email and text each other all sorts of sensitive information, people continue to turn to the web for all sorts of vital information. Clearly Bad Things can happen to a person on the web, but just as clearly it's not bad enough often enough to put people off the web entirely. Far from it.

My guess is that banks have a lot to do with it, at least in the US. In particular

Banks handle liability. If someone steals your credit or debit card, whether physically or online, you can tell your bank and generally they will make sure you don't have to pay for things you didn't buy. That's oversimplified, and there are certainly cases where that simple process has turned into a nightmare, but it's still a vital part of getting people to do business confidently online.
Bank cards provide a de facto stable identity. If you're buying something from my web site, I do care who you are (well, I would, and stores in general do seem to care what their customers are up to), but I certainly also care that your payment is going to go through. To some extent I'm talking to you, but I'm also talking to your bank account.

On the first point, you're not responsible for keeping your bank accounts absolutely safe. You're responsible for taking reasonable precautions, so that if someone does get hold of your account number and misuses it, they're clearly at fault (the usual "I'm not a lawyer" disclaimer applies here). Putting the rest of the burden on the banks and legal system is a large part of what keeps the wheels turning.

On the second point, if I shop at store A and store B, it's important that my bank knows that those purchases both come out of my account, and I know that I'm the same person in both cases (at least on a good day). It's less important that store A and store B know I'm the same person. There may even be cases where I'd rather they didn't know.

In short, security and identity matter when money is at stake, in which case your accounts serve as your identity and you have legal protections that predate the web.

Security and identity also matter where reputation is at stake, that is in the social realm, be it email, social networks, Twitter or whatever. The landscape is different there, but it's worth noting that most accounts and identities, including your bank accounts, don't play into that much. If someone compromises my account at widgetco.com, they might be able to have a truckload of widgets sent to my address at my expense, but they won't be able to say embarrassing things about me on this blog. Likewise if they compromise my bank account, though that would of course be bad for other reasons.

If you buy that, then you should make sure to use strong unique passwords and unique security questions for your bank accounts, your email accounts and your major social accounts, and use better security than that when it's available. How much to worry about other accounts depends on how closely they're tied to the accounts that matter. For example, if your city's online parking ticket paying site doesn't remember credit card numbers or your nefarious history of overparking, you probably don't care as much about security there.

Wednesday, September 14, 2011

Forbes on passwords

Wandering through the web (but not necessarily figuring it out as I went along) I ran across a slide show in Forbes on the subject of passwords, with what seems to me mostly reasonable advice. Some highlights, mostly common-sense stuff that bears repeating:

Change important passwords frequently and don't reuse them
Use different passwords for different purposes.

Important passwords (e.g., for bank accounts) should be unique.
Less important accounts can share passwords, but be aware that if one account is compromised you should consider all of them compromised.

Don't choose a password that's ever appeared elsewhere. This rules out memorable phrases like "We the people of the United States of America".
Passwords should contain nothing personally associated with you (basically a version of the previous item).
Password managers may be useful. The advantage is you can use random gibberish and the manager will remember it for you. The disadvantage is that if the master password is ever cracked, you're completely hosed.
Use HTTPS when logging in. HTTPS encrypts all connections and uses digital certificates to ensure that you're really talking to whom you think you are (just exactly how secure this system is is a whole other can of worms, but for now let's assume it's basically OK). You can tell if you are because web sites with it start with "https://" instead of "http://" and browsers now indicate whether you have a secure connection
Don't type your password into anyone else's machine.
Assume that a public WiFi access point is just that, public (the actual slide says to avoid it entirely). If you're not using an encrypted connection of some sort (HTTPS, SSH, a VPN or such) assume that anyone can see your network traffic, including passwords you type when you log in. Also assume that any random person can see anything that's publicly shared on your computer (another fine can-o-worms).
Don't depend on passwords generated by web sites or random software. Even if everything's on the up-and-up, it's very easy to get password generation wrong, typically by using a weak random number generator (see this post for more on generating passwords).
Archive your important passwords in case of catastrophe, for example by writing them down on a piece of paper and storing it in a safe deposit box that can be opened in an emergency.
In general, if you're going to record a password somewhere, do it on a physical medium separate from your computer (see disadvantage of password managers, above).

There are also a few items that don't seem actively harmful, but probably don't help greatly either

When replacing letters with numbers and such, use non-obvious numbers, e.g., r7place instead of r3place. This will add a few bits of entropy, which is good, but not really good enough on its own. If your base word is in a dictionary of 500,000 words and you replace up to three characters with one of 15 replacements, you have about 30 bits of entropy, which is not that much.
Add a number to the end of sentence-based passwords "for extra uniqueness". Adding a number adds about three bits of entropy. Meh.
Scramble a password when writing it down. This will make it harder, but not impossible, for someone who finds your written password to figure out the actual password, but it will also make it harder for you to come up with the actual password at two in the morning when you discover you don't quite remember how you scrambled it and the Very Important Site locks out accounts with more than three login failures. Of course, you could write down how you scrambled it ...
Deliberately misspelling words can make passwords more secure. Yes, but not very much more secure.
Use a sentence with lots of words, and include punctuation. In theory this can work, but in practice people come up with much-less-than-random-words, particularly if the sentence actually makes sense. Also, surprisingly many systems get indigestion if you try to use a long password.

Saturday, August 27, 2011

Building a better password

[I've updated this post slightly to reflect the back-of-the-envelope calculation in this post suggesting that 100 bits of entropy is probably more reasonable than my original statement that 48 bits was "not bad". Under the assumptions in that post, a 48-bit password would take on the order of microseconds to crack --D.H. Feb 2020]

I've recently complained about the irritating nature of the password strength checkers that have been popping up everywhere, so I feel obliged at least to try to analyze the problem and offer solutions. This is leaving aside the question of whether password authentication is a useful approach at all.

Fundamentally the real measure of password strength is how many passwords you'd expect to have to guess in order to get the right one. A more formal version of this is the notion of bits of entropy. If you had a list of all possible passwords in your scheme, I could identify any particular one so long as I could get answers to a series of yes/no questions, for example: "Is it in the first half of the list or the last?", "Is it in the first half of that half or the last?" and so forth. The number of such questions I need is the number of bits of entropy. Twenty questions means twenty bits, etc..

If I know that your password is either "0" or "1", you have exactly one bit of entropy. If I know it's an uppercase letter, lowercase letter, digit, "$" or "%", there are 64 possibilities, so you have 6 bits of entropy. If I know it's two such characters, you have 12 bits, and if it's seventeen such characters you have 102 bits, which is not too bad. Someone trying to guess your password would have to guess about two thousand billion billion billion passwords, on average, before stumbling on yours. That may seem like a lot, but keep in mind that the current network of Bitcoin miners can try on the order of a hundred thousand billion billion hashes -- roughly the same problem as guessing a password -- every second.

[Don't assume that guessing a password requires typing it in to the same text box you have to use. If someone steals the right data from your service provider, they can throw as much computing power as they've got at guessing the passwords. Quite possibly they'll be happy enough just to try a few thousand weak passwords for each account, since that will crack depressingly many, but attacks like running through the OED with simple substitutions of letters for numbers are absolutely feasible as well, even on fairly ordinary hardware.]

This is assuming that you picked eight characters at random. If I knew instead that your password was either "F1%ldN0t3$" or "sasssafras" (maybe I'd watched you read your password off a piece of paper with only those two words on it but couldn't quite see which you were typing), then you have only a single bit of entropy, even though both passwords are not just eight but ten characters long and one has plenty of non-letters.

More realistically, if I knew you'd picked an uncommon English word and maybe changed some of the letters to numbers, you'd have somewhere around two dozen bits of entropy. That's not nothing, but keeping in mind that each added bit doubles the number of passwords a cracker has to try, it's nearly a billion billion billion times weaker than the 102-bit scheme above.

The fundamental flaw of password strength checkers is that they can only look at the password you gave them. They have no idea what other possible passwords you might have chosen. The assumption is that if you're forced to jump through enough hoops you'll be forced to expand your parameters, but in fact it's possible to generate passwords in a secure manner using only letters, and or to generate them insecurely in a way that will still satisfy any strength checker out there. Which is why I half-grimace, half-laugh when I see the "password strength indicator" jump from "poor" to "great" as soon as I type a number.

Now, it's perfectly possible to generate completely random 17-character passwords. The problem is that something like "qcrQf1x2" or "u%js%hPQ" is a pain to try to memorize, so most people will fall back to picking a "hard" word and maybe altering it a bit. However, as xkcd points out, it's possible to do a lot better by using random short words.

For example, here's a kind of clunky way of producing a random, memorable password:

BIG HONKING DISCLAIMER: This is just for demo purposes. The second site I mention uses http, not https, so in theory anyone could be looking in on your session. Even with https, the sites might be logging all your traffic and recording the results you come up with. I personally seriously doubt they would, and it's hard to imagine they would be able to connect the dots and figure out what you were using the generated password for, but if you really want to be on solid ground, get the source, look it over, run it locally and use something like /dev/urandom or D&D dice to generate the random input (23d20 will give you close to 100 bits ... not that I would have any idea at all what "23d20" means). There are also smartphone apps that do more or less the same thing, I believe.

[I last checked that the recipe below worked on 28 Feb 2020]

With that out of the way:

Go to this site and copy the random string you see there (e.g., 60990FFC250C). If for some reason you don't like what you see, just reload.
Go to this site.
Type some short number and a space into the Challenge box and paste the random string from the first step in after it (e.g. 123 60990FFC250C)
Type anything at all into the Secret box (e.g., "secret"). This doesn't have to be hard to guess. The real entropy is coming from the random string (alternatively, put any number you like, a space, and anything else into the "challenge" box and paste the random bytes into the "secret" box).
Press the Compute with SHA-1 button. Again, the cryptographic details of how strong SHA-1 is don't matter here. You're just converting a random number to short words. A simple table lookup would do just as well.

In the Response box you will see six short words followed by some hexadecimal gibberish (in this case, WOVE COOT SLEW WIT SIGH I (FE2D 5F7B 22CD BC39)). Each of those words represents just over 10 bits of entropy. We'll need ten words in all, so repeat the procedure but this time just take the first four words (I got FIRE CUFF GALA MINK from A4B455FEFFE7BFAD).

You can play around with this formula to get words that are easier to memorize, or type, or are just more to your liking. If you reorder your words or try typing in several different things instead of 123 or secret and then picking what you like, you're decreasing your entropy, depending on what criteria you're using to filter out passphrases you don't like and whether your attacker knows what kinds of phrases you like. If you just try a few different secrets until you see something that seems memorable, that should be fine. If you do something like sort the words (and your attacker knows only to try sorted lists of words), you've lost almost 22 bits of entropy, which wouldn't be good.

Once you've selected your words add a random punctuation character, number, capital letters or whatever makes your site's password strength checker happy. Voila! Your password is now Wovecootslewwitsighifirecuffgalamink5? or whatever. This isn't great to have to type, but it's pretty secure as passwords go, and probably better than trying to remember something like C;cTbfThoO4ePFTt or 67EE386A205C4563DB8908A6C4.

If your site's password checker imposes an 8-character limit (and, incredibly enough, some do), cry.

Friday, July 1, 2011

This password madness has got to stop

It's well known that people like to choose bad passwords, and for years other people have suggested rules for making passwords more secure. I'm not really sure why it should be happening now in particular, but it seems that every site that has a password must now jump on the bandwagon and have a password policy enforcer.

And of course, they're all a little different.

Fortunately there are plenty of possibilities. Here's a do-it-yourself guide in case you think your site needs one. First, pick any two of

The password must contain at least one number
The password must contain at least one lowercase letter
The password must contain at least one uppercase letter
The password must contain at least one special character

Next flip a coin to pick one of the remaining two to disallow.

Now pick a minimum length. Back in the day, when computers were much slower than they are now and it wasn't fairly easy to get a gazillion computers to cooperate (with or without the owners' consent), the recommended minimum length was eight characters. Today it should probably be more like 12 or 14. So make sure the minimum length is at least six.

Now set a maximum length of 8. Why a maximum, given that all other things equal, longer passwords are stronger, and the whole point of the exercise is to encourage strong passwords? Don't know. Probably whoever put the database together remembered the old eight-character rule and decided that should be the maximum. But 8 is a magic number for passwords and everyone else does it.

Finally, add an arbitrary hidden restriction. For example, if the password has to have a number, make sure it can't be the first character (yes, I ran into that one). If it has to be a special character, quietly disallow '$' and '!'. Something like that, just to reduce the strength and make people work a little harder.

Voila. You now have a password policy. If I did that math right, there are three dozen basic policies, times however many arbitrary rules there are, so there are easily hundreds of possibilities. Chances are fair that your poor user will never have encountered your exact policy before and never will again.

Chances are also fair that once they jump through all your hoops (bonus points if this is all happening on a smartphone or tablet), your poor user will have never come up with that particular password on the spot before. That's good, since sharing passwords can be dangerous. The only drawback is that poor user is liable to forget this ad-hoc password within five minutes of logging in.

So urge them to write it down "some place safe."

Then have them pick three or four secret questions and answers for when they have to reset the password next time they log on. But that's a different rant.

If you feel you need further security advice, you can always consult a real expert.

Thursday, January 27, 2011

OK, this is a bit unsettling ...

File under unintended consequences. It all makes sense, and yet, it doesn't seem quite right.

Mike Cardwell blogs:

When you visit my website, I can automatically and silently determine if you're logged into Facebook, Twitter, GMail and Digg.

and sure enough, the page will say "Yes, you are logged in" or "No, you are not logged in" at the appropriate places. Eerie. What's going on here?

As Cardwell explains, whenever you send an HTTP request to a server, you get back a response code. That response code might say things like "Your request was OK, here's the data you asked for," or "Sorry, I don't have what you're looking for," or "Goodness, I seem to be having some sort of problem here." or any of a number of other things. So far, so good.

Modern browsers can keep track of whether you're logged in to particular sites, so you don't have to keep logging in. Fair enough. If you're logged in and you ask for something on a site, you'll get it (assuming you have the proper permissions, etc.). If not, you'll typically get an error.

HTML allows you to reference other web sites within your document -- that's pretty much what makes the web webby -- and modern browsers allow you to behave one way or another depending on what happens when you try to fetch something (it doesn't even have to be based on a status code -- pretty much any reliably observable difference in behavior will do).

Put it all together, and

any web site
can use a reference to another site
to tell if you're logged in to that site

In Chrome, at least, if you open an incognito window to visit Cardwell's site, it can no longer tell whether you're logged in, because incognito windows don't share any state with other browser windows. But that's kind of throwing out the baby with the bathwater. You can also turn off JavaScript support (or only selectively turn it on), but that has its own problems.

To really solve the problem you have to be able to control what state is shared between, for example, different tabs or windows. Doing that simply and non-intrusively is easier said than done.

On the other hand, as a couple of commenters point out, such tricks have been around for a while. Whether anyone's exploiting them in a significant way is another matter. Before a site can find out if you're logged in, it has to get you to visit it, not that there aren't plenty of sneaky ways to do that, and then it just knows whether you're logged in or not to sites it knows how to check for (each site requires its own custom-tailored check). And then, if all you log into is, say, GMail and Twitter, then all your adversary can find out -- from this particular particular, at least -- is that (yawn) you use GMail and Twitter.

Worth losing sleep over? Probably not. Worth keeping in mind? Definitely.

Cardwell's site looks to have a lot of other fun and useful information on it as well ... and if you stop by for a visit, your browser will most likely tell his server I sent you.

Monday, September 6, 2010

The cry of the squeamish ossifrage

I think I got rid of the old Scientific American issue years ago, but I still remember reading about the RSA public key cipher in Martin Gardiner's Mathematical Games in 1977 (August, to be precise). Thirty-three years later, RSA is still in use, providing a secure means of encrypting and signing digital data (unless someone has figured out a way to crack it and is sitting very, very tightly on the secret).

In particular, it can be used to verify that only someone in possession of a particular secret key, generally a several-hundred digit number, could have produced a particular block of bytes. If you visited a site whose URL started with "https://", for example your bank, your browser most likely used RSA in the process of satisfying itself that it really was talking to the right server.

So why is authentication such a mess? Why does resetting a password require anything from coming up with the name of a cat to providing a working email address to providing several pieces of information and then getting a phone call? Why do some sites want the three-digit code on the back of your card and some not, and how is adding three more digits that you end up handing out to all and sundry helping the situation? Why hasn't OpenID or some other knight in shining armor been able to rescue us? Why do we still use passwords for anything besides locally decrypting the key to a real authentication system? How do you even know I wrote this?

I don't really know, but if I didn't have some guesses I probably wouldn't be writing this, now would I?

First, what would a really seamless authentication system look like?

It would allow for multiple identities. Maybe I just haven't caught on to the whole every-waking-moment-of-your-life-available-online thing, but I would rather keep my work identity separate from my blogging identity separate from my personal email separate from my bank accounts. Not to mention my identity as an international man of mystery.
It would allow the same identity to work multiple places. This is not the same as giving N different sites the same username and password. Your username doesn't belong to you, whereas a real identity does. Anybody can choose your favorite username if they happen to get there first. It's also not the same as letting your browser keep track of a bunch of username-password pairs and putting a master password on all of them.
It would minimize the number of tokens needed for an identity, and each token would be there for a clear reason. If the token is a password, fine, but it should be a password, not a password and two or three "security questions."
It would use current best practices. It's risky to use anything too new when it comes to security technology, and unless you're No Such Agency or the like it's madness to try to create your own, but there are plenty of well-established road-tested security techniques available.
It should be portable, both physically (like the "pocket-thing") and across sites. Ideally, registering with a new site means registering the token(s) for the appropriate identity.
It should be as completely under the identified individual's control as possible.

What actually happens? Something along these lines, I think:

Suppose I have some sort of digital certificate that I can use to identify myself. Properly used, this could satisfy the requirements above, perhaps together with some sort of physical token, like a smartcard. Any really secure authentication system, including a smartcard, is going to have some such certificate in it somewhere.

Since it costs money to have a major certificate authority (CA) vouch for a certificate (by signing it), certificates used by individuals in practice tend to be "self-signed", or signed by members of a "web of trust" instead. That's fine for some purposes, but not for doing business with a bank. If it's not good enough for the banks, it's probably not good enough for your utility company either.

In theory, you could establish your identity with a bank and then get them to sign a certificate to that effect, which your utility company might choose to trust, but that basically puts your bank into the CA business, not one they're necessarily keen to get into. In practice, each company would rather control the process, typically asking for an account number off a paper statement to get the ball rolling. Each entity has its own customer ID system for the account number, and usernames are potluck, so you end up with (at least) one semi-identity for each company you do business with.

In the wild-and-woolly world of pure web sites, where you don't already have a customer id when you sign up, there doesn't seem to be any strong push to move beyond the usual username-password system. Everyone's used to it. Switching would mean re-doing the login screen, at the least, with new and less-familiar technology, then convincing your users to go along with it. If it ain't broke don't fix it.

Since an authentication scheme is only as strong as its reset mechanism, there are basically two schemes in wide use:

An identity is a working email address
An identity is a couple of "security questions" and answers

If I had to choose, I'd take the former, but it's not much of a choice.

Wednesday, July 28, 2010

OpenID and incentives

I mentioned OpenID in the previous post, leading me to wonder "Hey, whatever happened to OpenID?"

Well, it's still there, but perhaps not in the form its creators had in mind.

In OpenID, there are three main roles

You
The site you actually want to use (the relying party)
The site that you'll log in to to convince the relying party that you're you (the provider)

I don't know about you, but my intuition was that it would be easier and more popular to be a relying party than a provider. That's more or less the situation with, say, SSL certificates. If you're a certifying authority (CA), you're vouching for each certificate that you sign and you have to pay great attention to keeping your keys safe and other such matters. If you're just using a certificate (like when you log in to your bank or some other site using https), all you have to do is decide what CAs to trust, and in practice your browser makes that decision and does all the checking behind the scenes.

In OpenID, if you're a provider all you really need to do is accept requests for your users to log in -- which you have to do anyway -- and tell whoever asked you to do that, "yep, that's them." Unless being an OpenID provider is your main gig, you really don't have to take any more care than you otherwise would. If no one trusts you, it's not the end of the world (but see below).

If you're a relying party, you have to decide whether to trust the provider. In particular, you have to trust that the provider will check identities at least as carefully as you would. If the provider is a bank (not that that seems likely) or is trying to make money solely off of providing OpenID, that's a pretty good bet. Otherwise, your milage may vary.

...

After working through that, I realized that there's a much simpler reason that -- unlike the certificate case -- parties tend to prefer providing to relying: If you rely on me, then your users will have to log in to my site, maybe see some of my advertising, be reminded that they use my service, and so on -- in order to use yours*. Sure, their account with you is still an account, but their account with me becomes the "real" account that the others are just sort of attached to. Which role would you rather play?

I like this analysis better. Providers do have an incentive to provide good service to relying parties, but unless users really care, no one has much incentive to be a relying party. Now that browsers are good at remembering passwords, having a single sign-on is less of an issue and people probably don't care so much. With a modern browser, you could give each account its own password if you like (and that's the more secure option) without having to keep all of them in your head.

* Providers can allow "checkid_immediate", where you don't have to log in to the provider, but that's not a popular option. Not only would providers likely prefer that users go through their login as often as possible, relying parties would probably prefer to know that the user actually logged in somewhere before letting them in.

Friday, April 23, 2010

Security and the appearance thereof

I'm traveling at the moment, which means I'm currently blogging at you courtesy of a hotel WiFi system. The WiFi is the usual hotel setup: you connect to an unsecured network and the system then intercepts your first web page request and replaces it with a login screen. At first blush, this may give the impression that the system is secure, except that the username/password are put together from the name of the hotel, its street number and the word "internet". They are the same for all guests.

An insecure network is not necessarily a security problem. Rather than expect the network itself, wired or wireless, to be secure, it's better to use some sort of end-to-end scheme which will be essentially equally secure whether or not the network is. The problem comes when a component that you think is secure actually isn't.

Is that the case with a hotel WiFi? It depends. If you take the login page as an indication of security, you've got a security problem waiting to happen. If you take the totally insecure user name and password as an indication of insecurity, then no problem. Unfortunately, it's perfectly reasonable for the non-technical user -- that is to say, almost any user -- to associate passwords with security. That's what they're supposed to be there for, after all.

Thursday, March 4, 2010

What's my mother's first dog's maiden name?

I can remember my login, but I can't remember my password. "No problem," says the site, "Just tell me that secret you told me when you set up the account. What was your first dog's name?" So I type in ... oh wait a minute, I can't tell you that ... and the site sets me up with a new password. Pretty slick, yes?

Well, not quite. Since lots and lots of sites are doing this, I've got two main choices:

Use the same small set of questions and answers everywhere.
Use different questions and/or answers.

Using the same everywhere means not having to remember as much. If I make up a bunch of answers and/or use different ones everywhere, then I have to remember what I made up. Basically, I'm up against the exact same problem as with passwords themselves, except now there are two weak spots for attackers to exploit: My actual password, and the questions guarding my password.

Of the two the password is probably a bit more secure, assuming I haven't used one of the 500 worst passwords of all time (unfortunately, some people seem to confuse scatology with security). City of birth? There's a good chance it's one of the top 100. Mother's maiden name? Not exactly classified information. First dog's name? There are a lot of Maxes and Buddies out there.

Now I'd say most sites don't let you reset a password directly. Typically they'll email some gibberish to the address you registered with and you use that to log in and reset the password for real. But in that case, why bother with the rigmarole? Whatever real security there is comes from putting email in the loop.

All in all, it's a classic example of a more complex system looking more secure than a simple one, but actually being less secure.

Wednesday, February 3, 2010

Chrome's security model

Over the past few months I've been migrating away from Firefox and toward Chrome because I've grown bored of trying to figure out which tab is eating my CPU. I frequently keep a dozen or two tabs open because why not? It's not like a multi-gigahertz CPU and a dedicated graphics chip should have any trouble keeping a dozen or even a hundred web pages up to date, especially if I'm only looking at one of them.

Bill Gates or someone once said that if cars had progressed like computers they would run near light speed and get a zillion miles per gallon. An interesting statement coming from someone on the software side; to factor in software and complete the analogy you'd have the supercar dragging an asteroid behind it and its drive wheels wrapped in several alternating layers of duct tape and gauze.

But I digress.

I mean, I'm all for writing to a nice abstract garbage-collected virtual machine in a type-safe more-or-less high-level language with lots of support for encapsulation and other OO goodness, and I accept that in the real world that means accepting a performance hit. But does making programmability available to the web.world at large really have to mean an all-too-typical script can suck the rest of the world into its vortex?

Sorry, digressing again.

Of course, in a couple of years the hardware will be faster, leaving the world temporarily in search of a way to squander the newly-minted extra cycles. But only temporarily ...

OK, OK, what was I going to say about Chrome and security?

Chrome, like other browsers, will remember passwords for you, a very handy feature. Unlike other browsers, it does not support a "master password" that you would have to type in before using or viewing these saved passwords. Google is quite adamant on this point. Has been for years.

Google's position is that they do encrypt the passwords as they're saved on disk. If you're using Chrome and someone steals your laptop, they're not going to be able to view your passwords unless they can log in as you. If you use your screen lock feature, that means any time you step away from your computer, your password file is protected just like everything else on your account.

Their further assertion is that adding a master password feature to the browser would only provide the appearance of further security. The saved passwords on disk are no more or less protected than before. Conversely, if you give your browser the master password and don't lock your screen, someone could then grab your laptop and log into any account of yours they liked.

On the other side, pretty much anyone who switches over to Chrome will notice that not only is there no master password, but the saved passwords panel in the options actually makes it easier to view saved passwords. This certainly looks like a gaping security hole at first blush. In particular, there's no indication that any encryption is going on, anywhere. Purely as a point of user interaction, having to type a password gives the impression, correct or not, that something secure is happening behind the scenes.

After digging through all this, a couple of finer points came out:

On Windows, Chrome uses Windows' built-in encryption which is based on the currently logged-in user's credentials. Why reinvent the wheel? This is the security technology you're already trusting.
On Linux, and as far as I can tell on Mac OS as well, the encryption is stubbed out. There really isn't any encryption going on at all.

So, don't trust Chrome to keep passwords safe on Linux or Mac OS unless you're encrypting your disks wholesale. If not, anyone who steals your laptop can just mount the disk and read through ~/.config/google-chrome/Default/Web Data.

On Windows, your Chrome passwords are as safe as your account. If you don't have a password on your Windows account, you effectively don't have encrypted passwords. If your company knows the password for your account, they also know any passwords Chrome has saved. If you exit Chrome and hand your laptop over to your roommate's friend from out of town, you've handed them your saved passwords as well (they just have to restart Chrome).

From a strictly technical, by-the-book security standpoint, Google is right. But I'm still with the hordes of other users on this one. If you put locks on your house doors, you might still want to have a locked drawer on your desk, or a safe embedded in the concrete floor of the garage. Passwords to bank accounts and such are sensitive enough that it makes sense to raise the bar for them, if only slightly.

Yes, someone could still install a keylogger and yes, exiting Chrome or otherwise making it forget the master password is not much different from locking the screen and yes, the plaintext passwords will find themselves in RAM for at least small windows of time and yes, you probably should have a separate guest account for out-of-town friends of roommates. Be that as it may, Google can try to educate the world in the finer points of security models and attack surfaces, or it can give people what they want and pick up more market share from Firefox.

Frankly, I'm surprised they've held out this long.

Wednesday, April 1, 2009

Not much more about Conficker.c

Some of this I already suspected and some it I'd have learned sooner if I'd been paying closer attention:

All Hell has not broken loose. This is not particularly surprising. All Hell has a history of not breaking loose on cue. One likely reason is that the people behind this appear to be in it to make money, and the successful parasite does not kill its host. There's even a plausible guess as to the business model: charge to rent out the infected machines as a distributed password-cracking compute server, sort of like SETI@home but up to no good and under remote control.

For example, if you know the last four digits of someone's Social Security number, there are no more than 100,000 possibilities for the other digits. If you have 100,000 computers at your beck and call, it will take very little of any particular computer's time to try all of the combinations. Of course, there are problems with the approach, particularly if trying a number involves contacting, say, some bank's server, which might find it suspicious that the customer has forgotten her SSN and has resorted to trying all possible combinations in quick succession. But you get the idea.

What about the notion that if your computer is infected, thieves will be able to track your every keystroke and steal your secrets? Well, one can't rule anything out, but that kind of behavior doesn't fit well with the "distributed password cracking" scenario. If I'm leeching off your PC's processing power, the last thing I want to do is draw attention to myself.

I previously said there were "many, many" Conficker infections. What's "many"? The actual figure is thought to be in the millions or low tens of millions, which is large enough, but consider that there are somewhere in the high hundreds of millions of computers in use.

Monday, March 16, 2009

Curses, foiled again!

Seen on an automated ticket site, right next to the captchas:

You do not have permission to access this website if you are using an automated program.

Oh no! Now I'll have to modify my automated ticket-poaching bot to scrape the page I'm hitting and look for text that says I can't use the site. And it gets worse. It's not safe to just look for those particular words. They might decide to change the message to something really scary, like "TOP SEEKRIT NO BOTS ALLOWED WE MEAN IT!" No, I'll need to put together something that can parse English text and figure out whether or not I'm permitted to poach tickets there. That's even harder than cracking captchas.

Damn you, evil ticket guardians!