Tuesday, September 29, 2009

AI and Cloud Computing -- do they complete each other?

The problem with useful things in software, and perhaps useful things in general, is that they tend to become magnets for anything and everything remotely related. And then some. The old saw about AI is that it's everything we don't know how to do yet with computers. Larry Ellison's now-famous (instantly famous, really, rank having its privileges) rant fills in the rest of the picture:
The interesting thing about cloud computing is that we've redefined cloud computing to include everything that we already do. I can't think of anything that isn't cloud computing with all of these announcements.
In short: If it can't be done, it's AI. If it can, it's cloud computing.

RockMelt deja vu

Writing for The New York Times, Miguel Helft leads a fairly skeptical article on Marc Andreessen venture RockMelt with "It has been 15 years since Marc Andreessen developed the Netscape Internet browser that introduced millions of people to the Internet." (for a more nuanced picture, more or less consonant with that shorthand, see the Wikipedia article on Mosaic). Helft goes on to opine that "Mr. Andreessen appears to want a rematch" in the browser wars.

Given the current glut of browsers and that Google itself has made only a small dent in the browser market with Chrome [a rather larger dent now --D.H. May 2015], which is shipping code and not half bad by the way, it's only natural to wonder what Andreessen and company expect to accomplish. I could be wrong, as I certainly have been before, but I would expect to see either

Déjà vu I: The RockMelt team sets out to Do Browsers Right This Time. Browsers have become de facto operating systems, complete with the ability of one rogue script to grind the whole thing to a halt, so it's plausible that a redesign from a clean sheet could do better. Every time I've seen this trick tried, little things like release schedules and compatibility with the messy outside world intervene. This is a particular stumbling block for companies in the placeholder home page stage where the world is still young, clean and pretty.

Not that, say, Opera or Chrome or <your favorite browser that I'm forgetting> haven't had some measure of success, just that it's not so clear what our new protaganists are going to come up with that the dozens before them have missed.

Déjà vu II: From what I can glean from the article, RockMelt is not trying to be a general-purpose browser. Andreessen is also on the board of FaceBook and RockMelt is explicitly aimed at supporting social networking. This has a number of advantages, particularly the relative lack of competition and the chance to build on a successful existing brand.

But do I really want to use a different browser for socializing than for checking the weather? I'm probably not the right person to ask, since my social networking and my web use hardly intersect, but my personal answer would be "no". My guess is that people will either shrug and continue to use their existing browser for everything, or the new browser will offer more and more plug-ins and apps so it can act just like a regular browser. Which brings us back to item I.

Either way, I can't shake the feeling I've seen this movie before. Didn't AOL used to have its own browser or such?

[In the end, they were bought by Yahoo! in 2013, evidently not such a bad outcome for them --D.H. May 2015]

Saturday, September 26, 2009

Wikipedia, voices and objectivity

In some sort of ideal world, we get our information purely from objective sources, apply cool judgment and act accordingly. In this world the ideal news article or reference text doesn't appear to have been written by anyone. It merely transmits facts, and only facts, to the reader directly and transparently.

This is a caricature, of course, but it's fairly close to what my high school journalism teacher taught, and it's woven deeply into Wikipedia's fabric under the label of Neutral Point of View (NPOV). On the other hand, Wikipedia is almost by definition a work in progress, constantly updated by a near-anarchy of mostly psudonymous if not anonymous editors. No one can stop you from saying that hard-boiled eggs must only be cracked on the big end, and no one can stop me from correcting your heinous misconception. I mean, from expressing my personal opinion on the matter.

But it all works remarkably well, for several reasons:
  • Wikipedia is inclusive by nature. An encyclopedia aims to be all-inclusive to begin with. An online encyclopedia, without the limitations of physical ink and paper, doesn't have to worry about running out of space. More important, though, is the huge number of contributors. All the paper in the world is useless without someone to write on it. And revise. And re-revise. And so on. This is not to say that Wikipedia includes everything willy-nilly. There are definite policies for what can and cannot be included, but they're aimed towards notability and not someone's idea of correctness.
  • The guidelines like NPOV really do matter because they're supported by a strong culture. The community has long since reached a critical mass of active members that take Wikipedia policy seriously and act to reinforce it and to repair breaches, even if that means tediously reverting an endless stream of "MY MATH TEECHUR SUX DOOD" and worse vandalism.
  • It's generally easy to tell when someone is injecting opinion. It's even easier to tell when two (or more) people are trying to inject conflicting opinions. The occasional of jumble of "Some authorities [who?] insist that ... however so-and-so[17] has stated that ... " doesn't necessarily make for smooth or pleasant reading, but it does tend to make clear who's grinding which ax.
  • Similarly, it's easy to spot a backwater article that hasn't seen a lot of editing. This is not necessarily a bad thing. Obscure math articles, for example, tend to read like someone's first draft of a textbook, full of "Let x ..." and "it then clearly follows that ..." The prose may be a bit chewy, but whoever wrote it almost certainly cared enough to get the details right. Articles on obscure bands generally read like liner notes and tend to slightly hype that band's achievements and their home-town music scene. That's fine. Take it with a grain of salt and enjoy the tidbits you wouldn't have heard otherwise.
  • Likewise, it's easy to tell when an article has had a good going-over. Articles on "controversial" topics may or may not have had their "on the other hand ... on the other other hand ..." back-and-forth smoothed out, but they do tend to accumulate copious footnotes. Just as one could argue that forums exist to generate FAQ lists, one could argue that such articles exist to gather references to primary sources.
Whenever I find myself too far out on my "web changes nothing" limb, it helps to consider Wikipedia and realize that there's really nothing quite like it. But it's also important, I think, to realize that Wikipedia works so well not because it works perfectly -- it clearly doesn't -- but because it's robust in the face of its imperfections. This is a property of good distributed systems in general, the distributed system in this case comprising not just the author/editors, but the reader taking Wikipedia's nature into account.


P.S.: While fetching up the link for NPOV above, I first tried "npov", figuring it would redirect to the right place, WP:NPOV, since I can never remember the right prefix for the special pages. Oddly enough, if you don't capitalize it the right way, npov redirects to Journalism. Not sure I buy that, but it's an interesting angle.

Thursday, September 24, 2009

Lost in a web of stars

The day job is less busy now. In the inevitable letdown period, my attention has wandered skyward, to the Astronomy Picture of the Day. Along with the Galaxy Zoo and other random sites, the APOD largely satisfies my desire to learn a little astronomy without, um, actually going out and looking at the sky. Besides, I learn more this way, or at least I learn things that just looking up at the sky gives little hint of. That's why they have all those telescopes and high-tech instruments, after all.

For example, while you'll often see pretty posters of the Orion nebulae or the Trifid nebula, it's another thing entirely to see them in context and realize that, were our eyes sufficiently sensitive (and our surroundings sufficiently dark) even a clear, dark sky would be cloudy. And of course, Van Gogh's Sterrennacht springs to mind.

In the night sky most places we see very little beyond local stars. In major cities it can be hard to see even that much. This limited view reveals very little about the universe at large. Recent theories hold that the universe was born out of some sort of quantum foam and still reflects that structure. What are they talking about? If you zoom out far enough, it starts to make sense.

How can astronomers develop theories of how stars and galaxies form when the timescales involved are much, much longer than anyone's lifetime? They look at lots and lots and lots of stars and galaxies. On a clear, dark night the unaided eye can pick out a few thousand stars. Galaxies have stars by the billion, and there are plenty of galaxies. The Hubble Deep Field, for example, covers about two millionths of the night sky and comprises about 3,000 galaxies. Even the Galaxy Zoo's original million are only a small sample of what's out there.

From common experience, stars (except our sun, of course) are little pinpricks of light. Science tells us that's because they're mind-bogglingly far away. Only objects in the solar system are close enough to appear as anything more than points [well ... you have the Sun, the Moon, the Andromeda galaxy, the Magellanic clouds and the occasional comet ... but let's just agree that you'll see a lot more with a telescope, especially a big one or one in space --D.H. Dec 2015]. But with a good telescope, you can not only tell stars from points of light, you can not only see that stars are round, you can see one that isn't and pick out individual stars in a galaxy far, far away (well, actually a pretty close one by galactic standards).

With special equipment astronomers can see colors the eye can't, as in this lovely image of the Andromeda galaxy in ultraviolet (make sure your cursor isn't over the picture), or pick out otherwise hidden features and reveal the complexity of the processes at work in a nebula, or even show us what's right in front of our faces.

This is a really small sample of the APOD archive. Wander through it yourself and you'll find all kinds of wonders and not a few oddities. But beyond the pretty pictures, the real value lies in the descriptions, written by professional astronomers. It's one thing to read in a science article about this or that theory or process, quite another to see a principle illustrated by a real live picture from a real live observatory accompanied by a clear, concise paragraph rich in links to further pictures and other resources.

This is the kind of thing the web was made for. Certainly it's long been possible to subscribe to an astronomy magazine or go to the local library and get information of a similar quality, but the web enhances the experience considerably.

Friday, September 18, 2009

The Google Books settlement

After three years of negotiations and legal wrangling Google, the US Author's Guild and the Association of American Publishers have reached a settlement regarding the rights to the millions of books Google has been digitizing. Google's announcement of the same is downright giddy in tone. They're "delighted". Sergey is calling it a "win-win".

Part of this bubbliness is just Google. It's always taken a fairly upbeat view of the world and seen itself as a not just a company, but a Force For Good -- or at least Not A Force For Evil. However, any time there's a long-fought-over settlement with lawyers involved, one would expect all parties to be slightly disgruntled. That this isn't the case sets my spidey sense tingling.

Clearly Google came out of this well. The only other plausible explanation is that it came out badly and is scrambling to save face, but that doesn't wash with anything else I've heard. So I have to give a nod to the EFF and others who have expressed misgivings and hope the nice folks there use their awesome power for good.

If nothing else, this will be a great chance to get some data on the famous "long tail".

Sunday, September 13, 2009

Global sneakernet? Not likely.

Previously on Field Notes: A pigeon is found to outperform a broadband network. This turns out not to be an isolated or implausible incident. Further, it looks like Moore's law is on the side of the pigeon; the amount of information a pigeon could reasonably carry is increasing much faster than broadband speeds.

How far can this go? Will the skies soon be clogged with data-laden birds (and the streets below with their by-products)? Will startups soon be offering a higher-tech equivalent, perhaps autonomous mini-drones carrying terabytes or more at a time?

I'm guessing not, and for one reason that should be pretty clear: latency. Deutch's fallacies strike again. The physically-moving-medium approach has always had its niche, namely when volume is more important than latency. This is one reason the mails still run -- another being that sometimes you just need to send a physical object. Most of the time, though, latency wins. If you're sending the message, "attack at dawn", it hardly matters whether you could get the entire contents of your video library halfway around the world by noon the next day.

Just where the cutover occurs depends on a variety of factors. For an interesting case, see the earlier note on kinescopes at the bottom of this post.

Thursday, September 10, 2009

Classic software engineering and the mighty pigeon

A diligent member of my army of stringers, researchers, fact-checkers and miscellaneous hangers-on forwarded me a BBC article on a contest between South Africa's broadband carrier and a carrier pigeon. The pigeon carried a 4GB memory stick 60 miles in 2 hours. In the same amount of time, the broadband connection had transmitted 2% of the data. Pigeon 1, broadband nil.

Is this really news? First, how fast was the broadband connection? 2% of 4GB is 80MB. 80MB/7200s = 11KB/s. OK, that's pretty slow. For comparison, I just ran a speed test against a server about 60 miles away. The download speed was about 14Mb/s, or about 2MB/s. That would bring me my 4GB in about 40 minutes. Pigeon 1, broadband 1.

But wait. I can only download as fast as the other end can upload. If the other end has my broadband connection, it can upload at about 360Kb/s or about 45KB/s. You read that right. My upload speed would appear to be about a fortieth of my download speed. That's about four times the speed of the South African connection, meaning I couldn't even get 10% of the data transmitted by wire before the pigeon reached its destination. Pigeon 2, broadband 1.

Hmm ... my ability to send large quantities of data -- movies, for example -- to the world at large is severely limited, but my ability to access said data from whoever can make it available quickly isn't. And my internet connection is provided by a Cable TV/old-school media company ... but I digress.

When I got the link about the South African pigeon, I immediately thought of Jon Bentley's classic Programming Pearls and More Programming Pearls. If you're interested in software engineering you could do worse than to stop reading this right now, go hunt down copies of these books and inhale them. The code samples are variously given in C, C++ and a procedural pseudocode reminiscent of Old High Algol, but just as Chaucer is worth the trouble of reading in the original, so too Bentley.

If you don't want to hunt down paper copies, check out the web site [The web site doesn't carry the entire book, but it's still well worth visiting. The book itself has been extensively updated in the recent second edition. There's even a little Java here and there, but only a little. The Labs is still the Labs, after all].

I assume you're back now and along the way noticed problem 11 in column 1 of Pearls:
11. In the early 1980's Lockheed engineers transmitted daily a dozen drawings from a Computer Aided Design (CAD) system in their Sunnyvale, California, plant to a test station in Santa Cruz. Although the facilities were just 25 miles apart, an automobile courier service took over an hour (due to traffic jams and mountain roads) [Ah, Highway 17. Good times, good times.] and cost a hundred dollars per day. Propose alternative data transmission schemes and estimate their cost.
The solution Lockheed came up with?
The computers at the two facilities were linked by microwave, but printing the drawings at the test base would have required a printer that was very expensive at the time. The team therefore drew the pictures at the main plant, photographed them, and sent 35mm film to the test station by carrier pigeon, where it was enlarged and printed photographically. The pigeon's 45-minute flight took half the time of the car, and cost only a few dollars per day. During the 16 months of the project the pigeons transmitted several hundred rolls of film, and only two were lost (hawks inhabit the area; no classified data was carried). Because of the low price of modern printers, a current solution to the problem would probably use the microwave link.
Pigeon 3, broadband 1.

For several obvious reasons I doubt that pigeons are going to be the optimum solution for most high-volume data transmission problems, but it certainly gives one pause to note that a generation after the Lockheed story, in the face of Moore's law and all that, pigeon power is still a plausible solution. At least compared to what passes for broadband here in the States.

Why should this be? Moore's law cuts both ways. In fact, it currently favors the pigeon. The South African test was done with a 4GB memory stick. Sticks of 16GB are now available. Leaving aside the question of whether a pigeon (or two) could carry more than one stick, even a single pigeon with a single 16GB stick could beat my download speed over a 60 mile course.

Memory sticks are getting bigger much faster than broadband is getting faster, if only because switching to a larger stick is much, much easier than switching broadband technologies.

This is all reminding me of my early posts on Jim Gray.

Tools of choice

In real life I'm a software developer. That doesn't figure in much here, probably because as far as the web is concerned I'm an ordinary user, not a developer. However, one place I use the web is at work. No, not to browse fascinating articles from the blogosphere, unless the article happens to answer a particular vexing question I'm dealing with. My web use at work basically boils down to
  • gmail
  • a web-based bug tracking system
  • searches now and then for answers to vexing questions
  • researching and downloading open source software
This last bullet item has significantly changed the way software developers work, at least in the Java corner of the world where I dwell. Your mileage may vary, but for a large and still-growing set of typical problems, downloading a package and using it is likely to be a better option than rolling your own. You can't beat the price, and the time between "that looks interesting" and actually using the package is generally measured in minutes. There's no obligation and if the package is not quite right, the source code is right there.

Some examples from the toolkit I use at work
  • Java itself and its libraries are now essentially open source.
  • The Eclipse IDE. Now, I realize that IDE wars are to our time what editor wars were to the previous generation (um, that would be my generation, I guess), but Eclipse is the one I happen to use for a variety of reasons. One caveat: Eclipse is not just an IDE. It's really a whole platform. It slices. It dices. It has distros like Linux has distros. If you're not careful you can end up with a bloated mess. If you pick and choose, though, you can end up with a very nice, usable, though still memory-hungry tool.
  • Subversion for version control. Again, other worthy choices are available.
  • JUnit. The value here is not so much the code as the mere fact of putting something out there as a framework for writing unit tests. That said, I've had no complaints about the code.
  • Apache Ant for builds. I actually don't use Ant directly these days, but I rely on it behind the scenes. Having seen one too many Makefiles that ate Chicago, I have no plans to go back to make.
  • Apache in general for a variety of useful libraries, including networking (Mina) and general utilities (Commons)
  • A new favorite for taming Swing: MiG Layout. If you've ever considered fleeing to a tall mountain in Nepal rather than hassle another mysterious problem with GridBagLayout and its little friends, check MiG out. Your life will become better.
Naturally this is just a particular, idiosyncratic view of what's out there. If you run Linux (as I do at home), there's the whole GNU/Linux/git/gcc/autoconf/gmake/gcc/... toolchain. If you like Perl, or Ruby, or Python, each is its own little universe. Any way you slice it, the amount of stuff out there is impressive.

Back at the blog, is this a real live example of disruptive technology? If so, what is the disruptor? Is it the concept of open source? Is the enabling technology the internet, the web, or some combination of both? How much does it matter that much of the internet and web as we know it rests on open source/free software? Why am I carefully saying "open source" here and not "free"? How many threes are there in a dozen?

All interesting questions except perhaps the last, but not ones I'm going to tackle just now.

[I still use Java, Eclipse and JUnit.  I'd now recommend git over Subversion for version control.  For various reasons, I don't have much occasion to use the rest of the list these days. --D.H. Dec 2015]

Tuesday, September 8, 2009

What changed?

Overheard at a dinner party:

Guest 1: My browser has this feature that remembers what sites I've been visiting, then shows me which ones have changed since I last visited them.

Guest 2: Sounds great.

Guest 1: Yeah, except that most of the time all it means is there are new ads.

Guest 2: Oh.

Guest 1: At least it works with your site, since you don't carry any ads.

(No, the site wasn't Field Notes)

Two similar questions with significantly different answers:
  • Have the bytes representing this page changed?
  • Has this page changed in any meaningful way?
Computers have a pretty good handle on the first but not the second. The bytes can change without the page really changing, not just because of ads but because of trivial reformatting, or an upgrade in some unseen component, or for any of a number of other reasons. Now that everything's all AJAX-y and a page may really just be a script for fetching and displaying the real content, the page can change without the bytes changing.

The questions above are central to web caching, a whole big ball of wax which (to mix metaphors) I'd rather not wade into at the moment. Fortunately, if your job is just to make sure bytes get distributed in an efficient manner you can forget about the second question and concentrate on the first. And a good thing, that, otherwise there would be no Web As We Know It.

If the second question really is important to you, I have one word, son: metadata.

Monday, September 7, 2009

Angling for a spot on the exchange

In the old days, when people traded stocks and commodities face to face and recorded the resulting transactions on parchment with a quill pen, it mattered very much whether you were on the exchange floor or just working through some intermediary. If you were on the floor, you could get your order placed faster. A faster order meant a better price and up-to-date information about market movements. As they say in the biz, old news is no news.

Open outcry exchanges are on the way out now, replaced by shiny new technology. Anyone anywhere can get in the game, set up a trading algorithm, put it on a server and sit back as the cash flows in.

So long, of course, as your server is in the same building as the exchange's servers. Otherwise, your order is going to get there (milliseconds) too late and you'll be edged out by someone with lower latency, who will then be glad to turn around and sell you what you wanted at a slim but persistent profit.

Score one more for Peter Deutsch.