Monday, March 29, 2010

Needles, haystacks and French cooking

If I had to describe this blog in one word, it would be unfilmable, so I'm not surprised Field Notes wasn't the first blog to be made into a movie. Nor am I particularly surprised that Julie Powell's The Julie/Julia Project would be the lucky one, not because I knew and loved it before it was adapted -- like much of the world, I'd never heard of it -- but because why not? It's engagingly written and it has a good hook: Julie Powell sets out to cook all 536 recipes in Mastering the art of French Cooking in 365 days (one might call that a gimmick, but one doesn't stick with a mere gimmick for a year solid, come rain or shine). Interleave the story of Powell writing the blog and getting published with Child's story of getting her own book published, throw in a more than capable ensemble cast, and you've got yourself a nice little movie.

Still, why that particular blog? Loads of other blogs have made it to print and plenty depict more dramatic events and colorful characters. Some, if memory serves, were even created with the express purpose of producing a screenplay. One explanation leaps out: A good dose of sheer dumb luck.

That's not a criticism. It's just the lay of the land. There are a huge number of blogs out there. There are a lot of good blogs out there, many, many more than could be put into print by a major publisher, much less filmed. Connecting any particular one up with a major studio is asking to find a particular needle in a very large haystack full of other needles. Just go with me on the image, OK? Getting some blog filmed is easier: It's asking to pick any needle out of a haystack full of ... well something like that.



Yes, I did watch (a good portion of) Julie and Julia. What can I say, given my previous selections? I was told there were car chases involved.

Thursday, March 25, 2010

Anonymity at the source

One of the longer and more intricate threads on this blog has been anonymity. One of my early aha! moments in tracing that thread was realizing that being anonymous comes of having other people who could be you. Perhaps more precisely, saying that an anonymous person did something means saying that there are a number of people who could have done it. The more the better. Later, I learned that this concept has been formalized under the name of anonymity set.

I also learned that, since anonymity can have value, there are economic considerations involved.  If someone is going to willingly belong to an anonymity set which could be associated with some nefarious deed, they'll want to be compensated for that risk. Following this through, it would appear to be fairly difficult to get a robust and secure anonymous network set up.

These two insights gave me some confidence in understanding the broad outlines of anonymity in general, and some assurance that anonymity on the web has much in common with anonymity in the world at large. However, there's at least one case in the real world that doesn't seem to carry over well to the web: radio.

A radio receiver is passive. Absent fairly high-powered eavesdropping equipment, no one knows what station I'm listening to at any given time. Web browsing is active. If I'm listening to web radio, the server knows exactly what content it's sending where. If I want to listen anonymously, I have to associate myself with a set of other people who also might be listening to my particular station and then obscure my connection with the server so that any of us might be the one accessing that particular content.

With radio, even if I am the only person in the world listening to radio, no one need know I'm listening and I could still be listening to anything. The only way to know that people aren't listening to some particular content is to crack down on possession of receivers and to go after the broadcasters, who may be out of jurisdiction. Both of these are done, but not entirely successfully.

The only analog I can think of on the net would be to stream everything to everyone and leave it to the clients to filter out exactly what was of interest. This pushes anonymity back to the source. Instead of having a number of people who could be listening to a particular channel, the necessary confusion comes from not knowing which channel a particular listener might be listening to. As with radio, even if there were only one client in the world, there would be no way to tell just what that client was accessing. Keeping the source of the content anonymous is still an exercise in the familiar sort anonymity, but that much is the same in the real world.

Since this comes at the price of bandwidth, it will generally not be an attractive option. This is one place where the distinction between true broadcast (radio) and point-to-point (the net) really matters. The closest real-world example of radio-like anonymity I can think of -- and I may well be missing something -- would be Usenet news [If you're reading this now, there's a good chance you've never used or heard of Usenet.  Technically, Usenet still exists as of this writing.  Nonetheless, I've changed the verbs in the next paragraph to past tense -- D.H. Sep 2018].

Depending on the settings, the news server at your particular site might well have been grabbing the entire hierarchy, from sci.math to alt.you.really.dont.want.to.know. Anyone who trusted (or controlled) the news server could then read whatever they wanted. If your news server was owned by your academic department, you were probably on solid ground, but if you were using a web interface to access some public server, you were in the same spot as with anything else on the web.

I should say here that, as always, technical considerations are not the only ones that matter. In practice, the web seems to be a fairly open place with reasonable assurances of privacy. Likewise, in practice there is never a foolproof guarantee of anonymity, on the web or off.

Wednesday, March 24, 2010

New tag: Division of labor

I wouldn't normally devote a post to announcing a new tag; they come up all the time in the natural course of things. However, I've recently found another common thread running through several posts: The division of labor between humans and computers.

One of the lessons of early AI work was that there's not a lot of overlap between what humans naturally do well and what computers naturally do well. I say "naturally" because much of the work in the ensuing decades has been in enabling machines to do things that don't map naturally to their capabilities.

For example, it's not hard to program a computer to calculate a million decimal digits of pi. It takes some cleverness to produce, say, a billion digits reasonably quickly, but the basic problem is not that hard. On the other hand, it's quite hard to get a machine to recognize faces or walk without tripping over, things which are easy for us.

One crucial aspect of engineering is making the best use of the resources you have. If your resource is a computer, try to put it on problems that involve crunching large amounts of data, not, say, perception, judgment, natural language processing or recognizing objects in the natural world. Machines can deal with those problems, too, to various degrees, but not nearly as easily as we can.

I've called this theme dumb is smarter. Division of labor is complementary. It has to do with putting humans in the loop so that the machines only have to do what they're good at.

Another division of labor

Knowledge Generation Bureau (yep, KGB) has been around for a while now, but their latest TV campaign has been visible enough that even I caught it. When you've got the Baldwin brothers working for you, all things are possible.

There are a few key differences in between KGB and the various web-based search engines:
  1. KGB is phone-friendly. You text them a question -- any question. They send an answer in concise English, not a list of links.
  2. KGB is only semi-automated. Typically, there will be an actual person involved at some point in producing your answer.
  3. KGB charges $0.99 per question.
Item 1 clearly distinguishes KGB from search engines, regardless of how the answers are found. A list of links just isn't a great fit for a phone, even a recent-vintage smart phone. The promise of answering any question also separates them from a text-based interface to a search engine. Item 3 separates KGB from the various ad-supported answers pages (and from vanilla search engines). Given that, and the longevity of all three flavors so far, I doubt that any is going to supplant the others. Sometimes a quick list of links is the right thing -- particularly if you don't know exactly what you're looking for -- sometimes text Q&A is good enough to pay for, and sometimes you're just curious and might rather wait for a horde of strangers to ponder your question.

What originally prompted me to post, however, was item 2. Clearly, KGB must have a well-developed database to help answer routine questions quickly [or ... perhaps they use one of the major search engines? -- D.H. Dec 2015], and I'd assume their business model depends heavily on being able to answer most questions with little or no human involvement. But to be able to deliver coherent answers to arbitrary plain English questions quickly and accurately, there has to be a human in the loop at some point. The trick is to tilt the balance as heavily as possible toward the machine. Another effort along those lines is Pandora, which relies on humans for perception and (some) judgment, but leaves the heavy lifting to the machines.

I'm not really inclined to shell out 13 bucks to try out the baker's dozen on KGB, but the results might be interesting.

[As of May 2015, KGB seems to still be going strong.  Their home page seems to give answers for free, but with ads, but their phone service still charges --D.H.]

Tuesday, March 23, 2010

What's TimBL up to these days?

Sir Timothy John Berners-Lee, pretty much universally known as TimBL, is one of the few people who can actually lay claim to having invented a major new technology. By defining HTTP, URLs and HTML, and by implementing the first web server and browser and the first web page, he created the foundations of The Web As We Know It. Since then he has spent a great deal of time directing the W3C, pondering the semantic web and web architecture in general, and otherwise moving the web.ball forward.

That was about the extent of my knowledge until I ran across a press release stating that British PM Gordon Brown had appointed him to lead the new Institute of Web Science. From the press release it looks like this will be at least to some extent an extension of the semantic web work, but will also investigate "other emerging web and internet technologies", provide technical guidance to HM government and work on opening government data (MP expenses, anyone?) to public use.

So, good stuff, we'll see what comes of it.

Brown also announced that all Britons would have "high-speed broadband" by 2020. Not just broadband, but high-speed. Whatever that means.

[That was a couple of governments ago, and I see no mention of any Institute of Web Science on TimBL's personal page.  Broadband in the UK seems to be faring rather better, but I'm not really that familiar with the topic. -- D.H., May 2015]

Personal note: I had the pleasure of (sort of) meeting Sir Tim a few years back when I was sitting on standards committees -- a saga in itself. I was very new to the game and had volunteered to scribe (committee-speak for "take notes"). The W3C Technical Architecture Group had stopped by to look in on what our group was doing, so whenever a member of the TAG whose name I couldn't remember made a comment, I notated it as having been said by "TAG". After a while someone gently pointed out that TAG's name for much of what I'd been scribing was Tim Berners-Lee. Fortunately I was too busy typing to sink into the floor, and I have since been assured that TimBL himself was unlikely to have been offended. At least I hope not.


Monday, March 22, 2010

Counterfeit negotiations

You may not have heard of ACTA. I don't believe I had until I heard a radio piece on it a few days ago. Certainly it hasn't been on my radar screen and I get the distinct impression that as far as the parties involved are concerned, the fewer radar screens it's on, the better.

ACTA stands for Anti-Counterfeiting Trade Agreement. While "counterfeiting" might suggest coins and bills, ACTA appears to be aimed at counterfeiting of goods (for example, generic drugs) and of content such as music and movies. I say "appears" because since its inception in 2008, ACTA has been negotiated in secret, albeit with occasional substantive leaks.

While I can see the common thread here, it seems a bit of a stretch to treat piracy, where no one is really pretending that an unauthorized copy is anything other than that, with counterfeiting, which tries to pass off something illegitimate as legitimate. Add to that the overall lack of transparency as to what's being negotiated or even exactly who is doing it on whose behalf, and it's very easy to see why the EFF and others have been strongly opposed to the whole business from the outset.

Whatever one's opinions on intellectual property and the role of the internet in distributing it, and no matter whether ACTA is in fact an attempt to make major changes to IP policy or just an alignment of practices among the entities involved, I can see no good reason to hold such a negotiation in secret. At the very least, doing so gives the net.world one more reason to believe the RIAA and company are either acting in bad faith or stunningly clueless. That helps no one.

[ACTA was signed by several countries in 2011, but only ratified by Japan, meaning that no one else is actually legally committed to following it, and leaving it a bit moot whether Japan is following it in its relations with itself. -- D.H. May 2015]

Wednesday, March 17, 2010

Common knowledge in social networks

Something not particularly profound, but still somehow interesting (to me, at least): I'm not a member of FacePage or MyBook or whatever, but an important part of linking with someone is sharing with them whom you're linked with besides them. From that point on, they also know whom you become linked with -- or unlinked from -- and when. Depending on the situation, that sort of information might be very interesting in and of itself.

Nothing wrong with that, given that it's a two-way street and everyone understands and accepts the rules from the outset, but it might conceivably influence one's actions. You know my connections and the changes to them, but I know yours, and I know you know mine, and you know I know yours, and so forth. What I'm fumbling at here is that this looks like a case of common knowledge, knowledge which is significant not only for its own sake, but because everyone knows everyone knows it.

Saturday, March 13, 2010

More cheesy movie goodness

Since I've admitted to watching watching Ghostbusters II recently, I suppose it will do no harm to admit to having watched Hackers as well. Hey, I missed it the first time around. Skipped it, actually, on the grounds that I would have a hard time accepting its depiction of computers and networks or the inevitable Markov chain of random technical terms.

I was right about the cargo cult computing but, being perhaps older and wiser, much more able to bear it. It's a fun movie to look at, with plenty of whizzy graphics. Come to find out much of it was done with motion-controlled models, as the CGI of the time would have looked too artificial. In movie logic, dancing equations and morphing false-colored talking virus heads on 90s-era hardware are the most natural thing in the world, because they provide atmosphere.

The movie does a surprisingly good job of capturing the gray hat hacker ethos, however much it fails to convey the paint-drying dullness of most forms of hacking to (non-hackish) spectators. I've always found it unfortunate that "hacker" has come generally to mean not hacker programmer but black hat hacker or even script kiddie (fortunately, we have "geek" these days covering roughly the same territory that "hacker" used to). On that front Hackers may have done more good than harm. They also manage to mention the Dragon Book, so props for that.

I didn't pay much attention to the plot because, well, it didn't seem like that kind of movie, but I did react to two of the film's more famous howlers. One was the RISC/CISC confusion in the Pentium scene (if you want to know more detail than that, you should probably just watch the film). The other was when the main characters gush over a 28.8bps modem.

Ah yes, this is a blog about the web, isn't it? I'm getting there ...

As IMDB duly points out, they meant 28.8kbps, not 28.8bps. But that's just a typo and I heard it as 28.8kbps at the time. What caught me was that, more than the cassette tapes, minifloppy disks, haircuts and rollerblades, it's the bandwidth that sends the whole thing horribly off the rails. Never mind the rest of the implausibilities. They're supposed to be doing all this over 14.4 or slower? Oy. Guessing passwords, sure, if they're weak and easily guessed, but if they ever have to download more than a few hundred K, they're doomed.

It's the bandwidth that makes the modern web what it is. Thanks to YouTube and company, the video of the villain's face on the hero's laptop screen seems perfectly plausible until you remember what size pipe it's all supposed to be going through. In fact, much of the cyberstuff, except for the phreaking (the technology has moved on) and the social engineering (ageless and evergreen), is probably more plausible now than it was then, thanks to the web.

Monday, March 8, 2010

Clip 10-cent coupons with this simple $300 device

I saw a piece on the local news recently about smart phone apps that will help you with your shopping. Use your phone's camera to scan a bar code and it will tell you if it knows of a nearby store with a better deal (its database of stores is limited). It will also sent you an image of a coupon, if there is one available, to scan at the cash register.

Ten years from now it will all sound laughably primitive (what, you had to have your phone show the scanner a bar code image, all to transmit a dozen or so bytes?). Right now it's pretty slick, but when I heard about it, some dim light went on in the back of my head: It's not the image on the screen, or the printed image on a paper coupon that matters. It's the magic number in the bar code ... oh, right ... the very first post to this blog was about just that notion.

It's probably also worth noting that the groundwork for this was laid quite a while ago, before the internet and before music or movies became digital, namely the introduction of bar codes in supermarkets. A pioneering piece of modern digitization, though of course digital communication itself is far, far older.

Thursday, March 4, 2010

What's my mother's first dog's maiden name?

I can remember my login, but I can't remember my password. "No problem," says the site, "Just tell me that secret you told me when you set up the account. What was your first dog's name?" So I type in ... oh wait a minute, I can't tell you that ... and the site sets me up with a new password. Pretty slick, yes?

Well, not quite. Since lots and lots of sites are doing this, I've got two main choices:
  • Use the same small set of questions and answers everywhere.
  • Use different questions and/or answers.
Using the same everywhere means not having to remember as much. If I make up a bunch of answers and/or use different ones everywhere, then I have to remember what I made up. Basically, I'm up against the exact same problem as with passwords themselves, except now there are two weak spots for attackers to exploit: My actual password, and the questions guarding my password.

Of the two the password is probably a bit more secure, assuming I haven't used one of the 500 worst passwords of all time (unfortunately, some people seem to confuse scatology with security). City of birth? There's a good chance it's one of the top 100. Mother's maiden name? Not exactly classified information. First dog's name? There are a lot of Maxes and Buddies out there.

Now I'd say most sites don't let you reset a password directly. Typically they'll email some gibberish to the address you registered with and you use that to log in and reset the password for real. But in that case, why bother with the rigmarole? Whatever real security there is comes from putting email in the loop.

All in all, it's a classic example of a more complex system looking more secure than a simple one, but actually being less secure.