Tuesday, December 30, 2014

That CAPTCHA moved!

While recovering a password for a site -- that is, my real password was whatever information the recovery page needed -- I noticed a new wrinkle on CAPTCHA: Moving CAPTCHA.  Instead of the usual smeared-out or obscured letters, three plainly readable letters, somewhat tilted, on a clearly contrasting background, but wiggling slightly back and forth.

Seems like an interesting step in the whole OCR arms race, except ...

The problem for an attacker to solve here isn't recognizing a moving character, which might or might not be harder than recognizing a still one.  It's grabbing a frame of the animation to examine.  If you can do that at all, then recognizing one particular arrangement of the letters is no harder than recognizing any other CAPTCHA.  Easier, in fact, since you have nice, legible letters, and you can re-run the OCR on each frame and go with the consensus.

Again, I haven't looked at this in detail, but there would seem to be two main ways of putting the moving image up in the first place: A .gif or other animated image format, which is no problem to decode into its images, or some sort of JavaScript animation.  That might be harder to grab, but not because of the animation.  You can just as well use JavaScript to put up a still image, and in either case the answer is to render the JavaScript and then grab the pixels.

In other words, it seems unlikely that the moving image adds any real difficulty for an attacker.  It does look harder, intuitively, to the human eye, but the attacker isn't using a human eye -- that's the whole point of the exercise to begin with.

Tuesday, September 30, 2014

Heartbleed, Shellshock and Raymond's Linus's Law

You have probably heard by now that bash, one of the basic tools in the Linux/GNU toolkit, has had a glaring vulnerability for the last, oh, twenty-plus years, now deemed Shellshock.  You've probably also heard of the Heartbleed vulnerability in OpenSSL.  Apart from making international press and raising serious questions about computer security, these two bugs have a number of features in common:
  • They're implementation bugs.  Bash, as defined in its documentation, does not allow the sort of behavior that Shellshock allows, and likewise for SSL (the protocol) and OpenSSL (an implementation of SSL).  In both cases, the implementations were doing things they shouldn't have.
  • They're basic implementation bugs.  In Shellshock, text which should be ignored or discarded is instead interpreted as a command.  In Heartbleed, a reply message which is supposed to have a given length instead has another.
  • No one noticed them for a long time.  In the case of shellshock, a very long time.  Or at least, no one seems to have visibly exploited them.
It's that last item I want to focus on.  In his famous essay The Cathedral and the Bazaar, extolling the virtues of open source development, Eric Raymond claimed that "given enough eyeballs, all bugs are shallow," or in other words, if you had enough people looking at the source code to a system, any serious issues would be flushed out and fixed quickly.  He called this principle Linus's Law, in honor of Linux creator Linus Torvalds (Linus didn't come up with it.  Linus did put forth his own Linus's Law, but it doesn't seem to have garnered much attention).

In any case, despite bash and OpenSSL being two of the most widely used tools in the software world, these basic and serious bugs don't seem to have been flushed out quickly at all.  Now, it is possible that multiple people noticed the problems, shrugged and went on with their lives, or that some entity or another discovered the bugs and exploited them very quietly, but that's not how Raymond's Linus's law is supposed to work.

I think there are two reasons for this.

First, as many have pointed out, there's no convincing evidence that more eyeballs really do mean more bugs found.  Rather, it seems that you quickly hit diminishing returns.  Four people may or may not find about twice as many bugs as two people, but forty people probably won't find twice as many bugs as twenty.  Forty people may not even find twice as many bugs as two.

Exactly why this might be is a good research topic, but I'd guess that a lot of it is because some bugs are easy to find, some aren't, and once you've found the easy bugs throwing more eyeballs at the problem (now there's an image) won't necessarily help find the hard bugs.

One of the sobering implications of Shellshock and Heartbleed is that even simple bugs can be hard to find, but that's not news to anyone who's done much coding.

I think there's a second reason, though, more subtle than the first but worth noting:  There probably aren't really that many eyeballs on the source code to begin with.

In theory, millions of people could have found either of these two bugs.  If you've installed Linux, you have the bash and OpenSSL source code, or if you didn't copy it, you can easily get at it.  Odds are you didn't, though, unless you were actively developing one of those packages.  Why would you?  I use Linux systems all the time.  I don't want to study the source code.  I just want it to work.  I have looked at various parts of the Linux/GNU source, but generally just to see how it worked, not with a particular eye toward finding bugs.  Maybe that makes me a bad net.citizen, but if so, I'm pretty sure I'm in good company.

OK, but there have still been hundreds of contributors to each of those projects.  Surely one of them would have seen the problem code and fixed it?  Not necessarily.  A tool like bash consists of a large number of modules (more or less), and the whole point of breaking things down into modules is that you can work on one without caring (much) about (many of) the others.  Someone who worked on job control in bash would not necessarily have even looked at the environment variable parsing, which is where the problem actually was.

In other words, there might only have been a handful of people who even had the opportunity to find Shellshock or Heartbleed in the source code, and they didn't happen to spot the problems, probably because they were trying to get something else done at the time.


There's another kind of eyeball, though: testers.  Even if only a few people were looking closely at the source, lots of people actually use bash, OpenSSL and other open-source tools.

Fair enough, but again, their attention is not necessarily focused where the bugs are.  Most people logging into a Linux box and using bash are not going to be defining functions in environment variables.  Most script writers aren't either (though git, headed by Linus himself, seems to like to).  It's a moderately tricky thing to do.  Likewise, almost no one using OpenSSL is even going to be in a position to look at heartbeat packets.  Most of us don't even know if we're using OpenSSL or not, though if you've visited an https:// URL, you probably have.

In short, Raymond's implicit assumption that bug-finding is a matter of many independent trials, in the statistical sense, evenly distributed over the space of all possible bugs, looks to be wrong on both counts: "many" and "independent".

[The current Wikipedia article on Linus's law cites Robert Glass's Facts and Fallacies about Software Engineering, which made similar observations in 2003, over a decade before this was posted.  It also no longer seems to mention any version of Linus's law due to Linus himself.  That was removed in this edit  --D.H. Oct 2018]

Wednesday, July 16, 2014

Protocol basics -- heartbeats, pings and acks

For no particular reason, I thought I'd start an occasional series on the basics of computer protocols such as those, like TCP and HTTP, that the web is built on.  Also for no particular reason, the basic principle that came to mind first is the idea of heartbeats.

But first, what's a protocol?

The word itself derives from Greek protos (first) and kolla (glue), so that ought to be clear enough.

No?

The trail is something like: prōtokollon really refers to the first draft of an official agreement (the first one glued into a binding), and thence more generally to an official set of rules and procedures, and thence finally to the computing meaning: A set of rules for exchanging messages between computers (often called hosts).

One of the most basic problems in computer protocols is determining whether the other party is there or not.  How hard can that be, right?

Unlike the physical world, you can't just look.  All you have is some means of sending messages, typically a relay of several steps mixing wired and wireless transmission, high-volume and low-volume connections, and so forth.  I'll go into deeper detail in some later post, but the point is that all you can do is send a message, and any particular message might or might not arrive at its destination in any particular amount of time.

One simple way to tell if the other party is there is just to ask.  Send a message saying "If you get this, please send it back to me."  You send that message, the other host sends back a reply and voila, you know they're there.

This is a perfectly good approach.  The first message is generally called a ping, probably taken from SONAR terminology, and the reply packet is generally called an ack (or ACK), short for "acknowledgement".  (There's also such a thing as a nack ( or NAK), short for "negative acknowledgement", which means "yes, I got that, but I couldn't understand it," or "yes, I got that, but you're sending me messages too fast, so please stop for a bit".  I'll admit to occasionally having said "NAK" in response to an explanation that went over my head.)

But what if you don't get your ack?  Is your connection bad?  Has the other host crashed?  Did it receive your ping but fail to reply?  Did it reply, but the return connection was bad?  How long should you wait before you decide that the ack isn't coming?

To help get around problems like this, you can send a series of pings and listen for a series of acks.  To help tell what's going on, you can number them so you can match the acks to the pings.  If the connection is flaky, you might miss an ack from time to time, but overall if the other host is there and you have at least some sort of connection, you'll get at least some acks back.

You might even have the other host tell you how many pings it's heard.  That will give you some idea of whether any problems are on the outbound connection, inbound connection, or both.  For example, if the return connection is bad but the outbound connection is fine, you'll hear something like "Ack for ping 1, I've heard 1 ping", "Ack for ping 3, I've heard 3 pings" ...  If you hear "Ack for ping 3, I've heard 2 pings", you know that it missed ping 2.  Most bad connections will affect both directions, but that doesn't have to always be the case -- the other host's network layer is part of the incoming connection, and it's possible that it's able to send messages but sometimes has trouble hearing them.

If the other host crashes and restarts, you might hear something like "Ack for ping 1, I've heard 1 ping", "Ack for ping 2, I've heard two pings", and then eventually, once the other host is up again, "Ack for ping 50, I've heard 1 ping".  This may or may not be useful information.  It's a basic principle of networking that during that eerie silence, there's no way to know whether the other host is crashing and restarting, the network is down, the other host is running slowly, there's a bug in whatever's handling the pings, the network is up but messages are being delayed, or whatever.

By the point you hear back that the rebooted host has only heard one ping, you may not greatly care.  You can't begin to figure out what's going on until you get a message from the other host, and even then what you can deduce depends on the exact messages, that is, on the protocol.  On the other hand, you can decide that if you haven't heard replies for N pings in a row, something is wrong.  That's often a good bet, but you have to be prepared for the possibility that things are just slow and the other host was there all along.

In some kinds of network, messages are always sent to everyone who could be listening.  In most such cases, the networking layer will filter out messages that aren't addressed to a particular system, but it's also possible to mark them "broadcast", meaning that everyone should listen.  In such setups, a broadcast ping is a good way to find out who's on the network.  This process is called discovery, and since not all networks have broadcasting built in, there are discovery protocols for networks that don't.

If you're having an actual conversation with another host, say, sending requests and getting replies, you're automatically pinging and acking.  However, you may reach a point where you don't have anything to say at the moment, but you want the other host to know you're still there.  In that case, you could send a ping, either as a do-nothing request or as a special kind of message.  It doesn't much matter which, so long as you and the other host agree on the protocol.  Such a message is generally called a keep-alive, since it's meant to keep the hosts from killing the connection (which basically means forgetting about it) on the assumption the other has gone away.

In some cases, only one host cares if the other is there.  For example, imagine a weather station where the main host is listening for data coming from a bunch of sensors -- thermometer, anemometer, hygrometer, manometer, and so forth.  It's fine for the sensors to blindly send out their information no matter what, but the main host would like to be able to report if a sensor is faulty.  Or in an even simpler example, you just want to know if another host is there at all, without needing it to send you any particular information.

In such cases, you shouldn't have to ping (and you might not even be able to, for example if the sensors have transmitters but no receivers), but you want the things you're monitoring to send acks regularly as though you had.  You can then decide that if you miss N messages, you'll report a problem.  Since they're not actually acknowledging anything, such a message is generally called a heartbeat rather than an ack.

In fact, any series of regular messages meant to determine if a host is present or not can be called a heartbeat.  The heartbeats in the famous heartbleed bug, for example, were a series of pings and acks.  The bug was that a badly constructed ping would cause the ack to contain information that shouldn't have been there.


This post has turned out longer than I expected.  I had expected to write a couple of paragraphs about heartbeats, but to get there I ended up delving a bit deeper.  As is often the case, there's more to even the simple pieces than might meet the eye.  I would like to make one last point, though.  Heartbeats, pings, acks and indeed most of the basics of computer protocols, have been around much longer than computers.  It would be interesting to hunt down early examples, but one that springs to mind is a team on an isolated, dangerous mission agreeing to send out regular radio messages.  If some number don't arrive, send in the rescue squad (or just assume the mission has, sadly, failed).

The basic idea of "make a noise if you're still here" is, of course, considerably older than radio.


Saturday, June 21, 2014

The disappearing (and reappearing) profile photo

Recently I noticed that my profile photo was broken (I've since fixed it).  "That's odd", I thought, "I uploaded it quite a while ago.  Maybe there's some glitch in Blogger's servers."  I kept checking, figuring it should come back before too long, but it didn't.  So I went to my Blogger profile to see what had happened to the image, and discovered that the URL I had given was broken.

I don't remember why I'd given a URL instead of just uploading an image.  Maybe I didn't have a copy of the image handy.  Maybe I just thought it was "webbier" to give a URL, but never mind.  Easily fixed.  I hunted up another copy of the image and uploaded it ... and we're back!

What's interesting, though, was that the URL pointed at Technorati, whose probably-no-more-tangled-than-usual history I've touched on before.  So I checked.  Technorati is still a thing, albeit clearly not one I personally pay much attention to.  Evidently they've redone their infrastructure a bit, or perhaps just cleaned out inactive accounts, causing the link to finally rot after however many years it's been since I first put it on my profile.

Links rot.  That's just part of the web.  In fact, it's a key architectural decision behind the web (as opposed to, say, Xanadu).  It would be interesting, though, to study which links rot, and when, and why.

In the case of my profile photo, a link to an obscure corner of Technorati, linked to a completely inactive account associated with a little-read blog, remained stable for years until, one day, it disappeared.  This is probably not too uncommon, but nonetheless I'd expect link rot to become less common over time.

In the old days, people would put up web sites on their personal computers, or on the workstation in their lab, and so forth.  They would get tired of the hassle of hosting the site, or graduate, or whatever, and the site would go away.  That's largely been replaced by web hosting services, but even then sites go away all the time as people get tired of paying for them and maintaining them.

However, a larger and larger portion of content is now being hosted by companies like Facebook, YouTube, Google, Twitter and so forth, or by major media outlets, which at least implicitly promise to maintain the content in perpetuity.  "Perpetuity" is rather better defined in theory than in practice, but I have a high degree of confidence that links to this blog will still work ten years from now, probably twenty and quite possibly fifty.

Will someone living a thousand years from now be able to read Field Notes?  I have no idea.  The odds of Google (or any of the other present-day giants) still being around in a thousand years are fairly small, but the likelihood of it costing peanuts to maintain everything that's ever been published on Blogger are pretty good, so who knows?

What does seem likely is that the bulk of "user-created content" will remain accessible as long as there is a web more or less like the present one for it to be part of.  If that's right, then the main sources of link rot will probably be companies folding and taking their sites down, or content owners deciding to take older content down or hide it behind paywalls or similar actions.  In other words, links are probably less likely to rot due to inattention or Life Happening to the particular person who created them in the first place, and more likely to happen due to explicit decisions by corporate entities.

Return of the cow clicker

I've previously written about Cow Clicker, a Facebook game in which players would click on an image of a cow, and later just the space where a cow had once been, thereby sending a message to all their friends that they had done so.  While not a runaway hit, Cow Clicker did manage to attract some 50,000 users, some portion of whom paid real money for the privilege of clicking more often, or on a fancier cow (Bling Cow could be yours for only $100).

The idea behind Cow Clicker was to reduce social gaming to its barest elements, partly as parody and partly as a study of social gaming behavior.  Fast forward a few years, and someone has done the same thing for mobile phone apps.  The Yo app will send a message to any of your contacts saying, simply "yo".  Unlike Cow Clicker, Yo has attracted hundreds of thousands of users so far, who have already sent millions of yos.

This popularity has had two not-too-shocking consequences.  On the one hand, it has attracted $1 million in funding.  On the other hand, it has been hacked.

Actually, the hack doesn't seem so much a hack as a matter of the app leaking confidential information and someone noticing it.  Three college students using the app were able to get the personal phone number of the founder, text him and get a call back.

What does it all mean?  Anyone who thinks it means the end of civilization as we know it is forgetting that civilization as we know it produced the tulip mania, phone booth packing, pet rocks and any number of other major and minor follies.  Nor can it possibly be surprising that an app, however trivial, that could gather hundreds of thousands of users in short order might attract investment money.  Whether or not you believe that the attention economy is anything new or different, getting people's attention is potentially worth money ... "This Yo brought to you by Spümcø".

Somewhat more concerning, though still not surprising, is that even a simple app like Yo would leak confidential information.  Security in applications of all kinds is still something you have to build in, or at least you can't assume that your app is secure just because you haven't done anything to make it insecure.  To some extent this is a hard problem.  Any useful app will involve some form of communication, and any communication exposes information, even if it's only who's communicating with whom (which can reveal much more than you might think).

It's been a couple of years since Cow Clicker's heyday.  Most likely the ruckus about Yo will die down and in another few years another minimal app will take its place.  Sic transit gloria mundi.

Thursday, June 19, 2014

The internet of loos

Auntie Beeb reports that the loos at Heathrow Terminal 2 are being fitted with sensors to detect how many people are using particular toilets, and when.

Feel free to snicker or chortle right about now.

OK, so what does this mean?  The overly harsh take would be "Yeah, that's about all this whole 'internet of things' things is going to amount to."  A more optimistic take would be "Heathrow is one of the world's busiest airports.  If they see a benefit to this, there must be something to it."  While I've seen any number of "there must be something to it" endorsements fail to pan out -- too much of this is a good sign of an impending bubble -- I tend to lean toward the second opinion.

Yes, I'm not thrilled with the term "Internet of Things", but I think that this is more because what we're seeing is a gradual trend of (some) ordinary things being put on the internet, and not a brand new phase or some sort of new internet.  Lots of things have been on the internet, some for longer than others.  Weather sensors.  Webcams.  Taxi cabs.  Temperature and voltage sensors for computers in datacenters.  As time goes on, the portion of internet data generated via human intervention will probably decrease, and the amount generated by various ... things ... will probably increase.

This isn't the hardcore IoT vision, though.  All the examples I gave are things that naturally actively generate data.  Even Taxi Cabs have always needed to communicate their location and status.  Fitting them with GPS and putting them on the net just makes that process more accurate and efficient.

The full IoT vision involves tagging everything with some sort of net-friendly identifying device, say an RFID, which can then be scanned.  If every book on your bookcase, every fork in your silverware drawer, every pair of pants in your closet and so on is tagged, then you just need to wave a scanner around in order to upload an exact inventory.

Perhaps more realistically, if newly manufactured objects carry RFIDs -- and some do -- then gradually people will come to have more and more net-visible things around them.  What we choose to do with that data is another matter, as are a number of privacy concerns (what's to keep someone from walking by your house with a scanner and seeing what's in it?).

In that sense, the Heathrow loos are more like weather sensors and taxi cabs and less in line with the "tag ALL the things" concept.  Interesting though they may be, they don't say much one way or the other about how the larger IoT vision will play out.

Sunday, June 1, 2014

And the winner is ... text. Huh.

There are a gazillion ways we can send messages to each other these days: email, chat, your favorite social medium, send a postcard, make a phone call, walk over and say hi, etc., etc..  Some of these were the stuff of science fiction when I was a kid.  In particular, I think it's finally time to say that videophones are commonplace.  Most smartphones can handle it, and the bandwidth is there in many places, though certainly not everywhere.  Even so, millions of people have the ability to make a video call should they so choose.  Probably more like hundreds of millions.  And many do.

And yet ... if you have a video-capable smartphone and you want to send someone a quick message, or you're a celebrity and you want to let your fans know when your next appearance is, or you're a bank and you want to send your client a security code for logging in, or you're a wireless carrier and you want to send your customer a balance update, or even in some cases a spammer who wants to tell someone they may already have won a fabulous prize, or for any other number of reasons, what medium do you choose?  You send a text message.

This is really not all that new an idea.  In the 1800s, for example, people would send telegrams and cables, or -- in densely populated areas, at least -- dash off short notes for messengers to carry.  The diction is even strikingly similar to the modern equivalent, and it's even more striking given that there is massively more bandwidth available these days.  Clearly the problem is not that you have to crowd everything into a 160-character SMS message.  There are any number of ways around that.  Nor are you paying by the word, as in the old days.   With all the ways that one could send a message, right up to a high-bandwidth video connection, people are choosing to text.

What parameters might determine this?

Text has about the lowest bandwidth of anything that's in regular use for communication.  If you've ever heard anyone try ... to ... repeat ... what ... they ... were ... texting ... as ... they ... typed ... it ... in, you were probably gritting your teeth.  Even if you can text as fast as you can talk, with liberal use of abbreviations like OMG and U, it's still much more mental effort than just, y'know, talking.

As a side-effect of the low bandwidth, text is notoriously bad for conveying inflection and other nuances.  Emoticons only help so much.  Was that smiley sarcastic?  Is that frowny because of what they're telling me, or because they're telling it to me?  I texted them five minutes ago and they haven't replied.  Are they busy or do they hate me?  And so forth.

Text is so-so for latency and reliability.  Messages get dropped form time to time, or hung up in the ether for minutes or hours with no indication of whether they've been delivered or not.  Even under ideal conditions, you have to wait for the other party to type in their entire message before you get to see any of it.

Where text wins, I think, is setup time, which is as minimal as can be.



There are two main types of protocol: Packet-switched and circuit switched.  In a packet-switched protocol, the sender constructs self-contained packets and sends them to the receiver.  Since each packet is self-contained, individual packets may get lost or misdirected, and there is no guarantee that just because one arrived, any other will as well.  The prototypical packet-switched system is the mail, and to this day internet protocol documents speak of "envelopes" and "addresses".

In a circuit-switched protocol, the two parties first establish a connection (as we tend to call it these days), and then communicate over it.  Once the connection is established, messages flow over it in either direction (though in some cases they must take turns), until the connection is closed, either deliberately by the participants or by some sort of external disruption.  In general, you have some indication that this has happened, and if you do have a connection established, it's quick and easy to say "did you hear that?" or whatever if there's any doubt.

The prototypical circuit-switched protocol is the telephone.  When you place a call, you are establishing a connection.  Originally, the operator would use a patch board to set up an actual electrical circuit.  Thence the name.

Connections take a while to set up.  When you call someone, you put in their number, their phone rings, they stop what they're doing and answer it, and generally say "hello" or something to make sure you know the connection is established.  And then you talk.  A video call works much the same way, and for the same reason.  It's establishing a connection.

There are currently two widely-established packet-switched media: email and text.  I say "media" here because I'm talking about how things look to the people using them, as opposed to network protocols like TCP, UDP, ICMP and so forth, and I'm leaving aside services like Snapchat, which go beyond text, because it's early days yet.

Of email and text, text is much lighter weight.  Email more or less requires a subject line, and if a simple email evolves into a conversation, each piece of the conversation general contains everything previous. It's possible to have a rapid-fire email conversation, but it's a bit awkward. It's also considerably more likely that the recipient of your email isn't going to look at it for an indefinite amount of time.  For better or worse, if you're carrying your phone, you're likely to know immediately if someone has texted you.

Put all that together, and text wins, easily, on setup time.  If you already have a window open for your recipient (a sort of mini-connection, but without the overhead of setting up both ends), you just type.  And that's it.  Even if you don't, it's generally easy to pick a recipient from your contacts.  And then you just type.  And that's it.

Because the setup is so easy, a text can easily turn into a conversation.  If the conversation gets involved, you can always text "call me" or whatever and get the benefits of a real, higher-bandwidth connection but, crucially, this is opt-in.  You only pay that price if it turns out to be worth it.


It's now been almost twenty years since Kurt Dahl predicted that in the year 2020 -- then still comfortably far in the future -- there would be no need for kids to learn to read (See the Field Notes take on it here).   Instead, "text" became a verb, one used most by the very kids who would have seemed not to need it.  As always, it's easy, and pointless, to criticize in hindsight, though it might have been a clue that the prediction itself was conveyed via text.  Certainly there are many reasons why text should still be around, and texting is probably not a particularly big one.  Nonetheless, it's interesting that a medium that would seem to have so little going for it would win out, and that this could be due not so much to the virtues of text itself, as to the economics of communication protocols.

Friday, May 16, 2014

Is the Internet of Things still a thing?

Traveling in the Valley, I drove past a billboard from a company boasting of its role in helping build the Internet of Things.  That made me pause for a second.  I hadn't really heard the term in a while, and this isn't one of those cases where the Valley is ahead of the tech trend.  Not so long ago, I seem to recall, the Internet of Things was getting quite a bit of hype in the world at large.

What is this IoT, by the way?  It's the idea that all the things in your life, or at least way more of them than now, are connected to the Net and in some cases happily talking to each other.  So, say, when your toaster pops up a slice of toast you can get a text on your phone that you can't read because you're driving to work and forgot you'd even put anything in the toaster to begin with.  Or if all your clothes have RFIDs sewn in, you can easily track what's in your closet and what's in the wash (or what you're wearing, but you may already know that).

OK, that's a bit glib.  There are some interesting applications.  I'm pretty sure.

There are a couple of kinds of hype terms, I think.  Some are just pure hype.  You'll hear them for a while, then it will turn out that there wasn't any there there, and they quietly go away.  There was a lot of this flying around in the dot com days, of course.

Some hype terms, however, have an actual useful idea behind them.  The internet and the Web, for example.  That doesn't necessarily mean that the particular hype term will survive -- remember the Information Superhighway?  We call it the internet now, but the concept behind it hasn't gone away and will continue to develop.

Some of these kinds of terms will fade in and out as the underlying concept goes through cycles of hype, backlash, rehabilitation and possibly hype again.  AI is one.  E-commerce would be another.

I suspect that the IoT is one of these.  We can expect surges in hype, followed by periods of "meh", and maybe a name change or two, but over time more and more things with computing power or computer friendly id tags in them will get connected to the world at large -- thermostats, TVs, cars, security systems, stoplights, dishwashers, consumer goods ... maybe even toasters.  Possibly things that don't have significant computing power will get enough to get on the net, too.  Maybe roads and bridges get large numbers of sensors that can communicate conditions back to some control center.

So, even if the billboard is a bit jarring to someone not immersed in the Valley's particular media bath, the company behind it is probably engaged in something significant, and maybe even useful.

Web sites ... Y U NO UNDERSTAND DATA?

I'm a web site.  I would like your phone number.  Please type it in.  No, not your real phone number.  Let's say it's +1 800 555 1212, or as we used to stay in the States, 1-800-555-1212, or even (800) 555-1212.  But not KLondike5-1212.  No one says that anymore.

Right.  Let's go ahead:

+

Whoa, wait a minute.  What's this 'plus sign' thing?  I am a simple American web site.  Your "country codes" frighten and confuse me.  Just give me the area code, prefix and number.  Go ahead, please.

(

What?  A parenthesis?  Dude.  It's a phone number.  What's with all the special characters?  Try again.

800

Cool. Now we're getting somewhere.

5

Stop right there!  Anyone can see there are spaces between the parts of a phone number.  Why didn't you type a space?

[space]555[space]1212

See?  That wasn't so hard, was it?  Notice how I split the number up into little boxes, and jumped from box to box when you hit space?  Wasn't that just slick?  This is what you can do with modern technology.

Phone number widgets, I think, are the web site UX equivalent of silly password "strengthening" rules.  No two are alike, and almost all of them get in the way for no good reason.  Social Security number widgets are pretty dicey, too, but you don't run into those so much (even in the States).

Credit card numbers, on the other hand, those are generally pretty easy to put in.  I can't imagine why.

Thursday, April 24, 2014

Print ... still not dead

Way back in the late sixties and early seventies, a bunch of people in Northern California put out The Whole Earth Catalog.  Several of them, actually.  There were one or two lying around the house when I was growing up, and I would often browse through them.  I can't say I remember any particular content, but I do remember the vibrantly busy layout and, of course, the iconic photos of the Earth on the cover, including William Anders' famous shot of the Earth from the surface of the moon on this edition.

Print catalogs have a long and influential history.  The Sears Catalog, for example, had a huge influence on the rural United States in the early 20th century, offering as it did everything from pins and nails to tools to toys and games, clothing, fishing and hunting equipment, bicycles, automobiles and even a house to put it all in.  As I understand it, the arrival of the latest Sears Catalog in the mail was a noteworthy event in many communities.

In these days of the web, of course, there's little need for a mail order catalog.  A good commercial web site is more up to date, a good deal easier to search and not so bad to idly browse.  Some will not only show you detailed pictures of the goods, but let you customize and see the results.  Why kill trees to send something static that will be obsolete by the time it arrives?

And yet ...

Kevin Kelly, one of the original editors of The Whole Earth Catalog, has been running or co-editing the site Cool Tools since its origins as a mailing list in 2000.  It's now settled into a blogish form, but last year Kelly decided to collect the best bits from the site and publish them, as a book.  In print.  In 472 pages of print, to be exact.

There's at least one webby twist, though:  Each item has a QR code which you can scan with your smartphone to get a link to the seller's site.  That makes perfect sense, really.  While the contents of the sites may change, the sites themselves will be much more stable (particularly if the book does a good job of driving business to them).

It's an interesting hybrid.  A physical book that you can leaf through will provide a nice overview -- nicer than scrolling through screen after screen, unless your screen is pretty big -- and you still have the links.  Granted, the links are a bit more cumbersome to chase, but if you're mostly browsing and only occasionally visiting the linked sites, that's probably not too bad.

Even if it just ends up being an interesting conversation piece, Cool Tools is only the latest in a line of blogs and other web sites spinning off books.  Randall Munroe of xkcd fame, for example, is publishing his What If series in book form.  Just to emphasize how not-real-time an enterprise book publishing is, even with today's technology, the book won't actually be available until September.

It's one thing if publishers are still putting out genre fiction paperbacks or coffee table photo books.  The paperback as a tradition will probably be around for a while yet, and you don't have to buy a fancy reader to enjoy it.  The coffee table book is the canonical example of something that print can still deliver better.

But a catalog and a web comic would seem to be two of the least print-friendly formats that could feasibly be printed.  And yet they are.  I have no idea why this should be, but I don't mind.

Saturday, January 4, 2014

All your IRQ are belong to us

I did some of my first real professional programming on an early IBM PC running MS-DOS.  Back then "DOS" stood for "Disk Operating System", as opposed to "Denial Of Service" and in the literal sense of "something that will operate a disk drive", that was accurate.  In other respects that "Operating System" implied, even at the time -- things like multitasking, so more than one program could run at once, or memory protection, so that a running program couldn't read or (worse) scribble on memory that didn't belong to it -- well, "Denial Of Service" might have been just as good a description.

Under MS-DOS's BIOS (Basic Input-Output System), applications talked to the system, and the hardware talked to the system, through "Interrupt Requests" or IRQs.  These were basically entries in a table of the form "When this happens, run the code at this address".  The entries in the table were called "vectors", and any particular IRQ had the address of a particular chunk of code, called an interrupt handler or interrupt routine.  For example, the IRQ for key clicks would be vectored to the code for dealing with a key click event.

Dealing with a key click event is not quite as simple as it sounds.  You had to do several things:
  • "Debounce" the key click -- I forget whether the PC did this in hardware or software, but a when a human presses a key on a keyboard, the corresponding circuit doesn't just close, at least not on those early keyboards.  It would go through a period of milliseconds in which the circuit would bounce back and forth between open and closed.  Even to an early PC a millisecond is a fair bit of time, and you wouldn't want to interpret that bouncing as someone typing really fast.
  • Keep track of which shift keys were pressed at the time.  You would do this by keeping a few bits around like "left shift key is up/down", "control key is up/down", etc.  The caps lock key acts differently from the other keys, of course.  Miss an event and you could get CAPITAL LETTERS when you wanted lowercase, or worse, control characters which could cause all kinds of fun.
  • Buffer up key presses in a block of memory so that if the user typed several keys while the main program was thinking, they would still be there to read when it got done thinking.  Actual applications would read characters from a buffer, as 'H', 'e', 'l', 'l', 'o', rather than catching a series of events like left-shift-key-down, h-key-down, h-key-up, shift-key-up, l-key-down ... directly from the keyboard IRQ.
  • Check for magic key sequences like "print screen" or the famous "ctrl-alt-del"
and this is not to mention things like actually displaying the typed character somewhere, or changing the state of a document being edited.  That was all done by the application code.

Keep in mind that the keyboard IRQ was just one IRQ.  There were IRQs for the system's internal timer, for communicating with the disk drive and the modem and printer ports, for applications to talk to the BIOS, and so forth, so imagine the discussion here multiplied by a dozen or so important IRQs.

I mentioned that most applications would be fine with just reading characters from the system's buffer, but some, for example many games, really were interested in the raw events.   There were also utilities you could buy that would allow you to do things like scroll back to text that had scrolled off the screen, or display a clock or check the spelling of what you'd just typed, if you hit a magic sequence of keys.  Because the DOS code sitting on top of the BIOS didn't directly support such things, such programs would "hook" the BIOS's IRQs by changing the IRQ to vector to their code.  Since DOS didn't do memory protection, anyone could Just Do That, and many did.

There are a couple of hazards to this approach.  For one, you didn't necessarily want to completely take over handling of the keyboard.  Many utilities just wanted to hook one magic key sequence to trigger what they did and pass the rest through untouched.  The usual approach to this was to "chain" -- the last thing that a newly-installed interrupt handler would do was to call the handler that had been there before it.  That means you don't care what happens down the line and you don't have to try to replicate what everything else was doing, but it leads to the second hazard.

Suppose I've written a nifty utility that pops up a calendar whenever the user presses ctrl-alt-C, and you've written a nifty utility that pops up a calculator whenever the user presses ctrl-alt-C.  Several things can happen if both of our utilities are installed:
  • Maybe mine was installed last, so that the IRQ is vectored to my handler.  You'll see a calendar when you hit ctrl-alt-C, and you may or may not see anything else
    • Most likely my handler will "eat" the magic keypress by only popping up the calendar,
    • but it might choose to go ahead and chain to whatever handler was there before.  In that case, your handler could also get called, depending on whether any other handlers were installed between ours, and what they do.
  • And likewise, of course, the other way around.
In other words, we have what is technically called "a mess" (or several other things you might imagine).  If your handler is installed last, it owns the world -- or at least the IRQ it handles.  If not, well, all kinds of things could happen, but a likely one is customers calling up saying "I installed your lousy utility and it doesn't work!"

The inevitable consequence: Every utility you bought would implore you to please, pretty please make sure that it's the last one in the AUTOEXEC.BAT script called at startup.  Or, more conveniently but also worse if you're trying to rein in this chaos, its handy installation script would edit AUTOEXEC.BAT to make sure it was the last one to run -- until the next utility with such an install script came along, or until you hand-edited AUTOEXEC.BAT to try to fix some conflict by moving some other utility to the bottom.

Ah, those wacky, wild and carefree days of the PC revolution.  Good thing this sort of thing doesn't happen any more in our modern, wonderful web.world.


Now where was I before this trip down memory lane?  Ah, right.  Cleaning up someone's system after  a couple of shiny-looking downloaded "utilities" reset the default browser, hijacked the search bar to point at a different search engine and left droppings in the startup folder offering to re-install something almost but not quite deleted.  Oh ... and fixing a driver that wasn't up to date.

Ah, progress.