Thursday, April 30, 2009

Postgraduate physics and the web

At the end of my previous post I implied that while computers and the web can be useful in learning basic physics, they're not essential. After all, the great physicists of history did very well with out them. This is all true as far as it goes, but it leaves out an important part of the picture. While classical physics is largely accessible to any careful observer with a good grounding in math, modern physics is pretty much impossible without heavy machinery.

You may or may not need billions of dollars worth of equipment to produce the results you're after, but the odds are very good you'll need a computer to sort them out. And since your collaborators might well be at other facilities far away, you'll probably want a good email connection. In fact, these days, there's all kinds of information out there, scattered all over the place. Maybe one of the major facilities should come up with a way of pulling it all together and making it accessible to the world.

Oh wait. They did.

Undergraduate physics on the web

Last month, give or take, I ran across a page of links to applets designed to demonstrate various principles of physics. The demo has long (probably always) been a staple of physics classes. I vividly remember a couple. In one, interfering laser beams were progressively filtered down to the point the detector could register individual photons landing left or right.

But the one I really remember involved the lecturer, an iron ball and a chain. The ball hung from the chain, which was anchored far up at the ceiling, centered left-to-right. At rest, the ball hung at maybe waist height. The lecturer carefully brought the ball toward the left, until it was at head height, and placed it directly against his nose. He then let go, noting that it was very important to hold completely still and not to give the ball an extra push. The ball swung slowly to the right, then back to the left. Conservation of energy being what it is, and some small amount having been dissipated by friction, the ball returned just short of its original position, to the relief of all concerned.

I doubt there's anything quite so memorable on the page I mentioned, though to be fair I've only chased a small portion of the links. As is typical with such collections, some of the links are broken, some of the apps don't work and many are a bit on the clunky side. Nonetheless, it's really great to be able to google around a bit and see a virtual demonstration with moving parts that illustrates a given principle much more effectively than mere text. This is the kind of thing that simply wasn't available to the general public a generation ago, or available at all two generations ago, and so one of the many small ways in which the web makes life just that bit better.

On the other hand, the actual physical world has always been available to all, and careful observation has always been repaid in understanding. It worked for Newton and Gallileo, so there must be something to it ...

Tuesday, April 28, 2009

So ... what does it mean to "own" "content"?

Coming back to a simple question I tried to address a while back, only to find I had only the foggiest idea what it even meant, I think I've now got a reasonable theory: It's a mess out there, folks.

The open source and free software communities have arrived at the rough consensus that trying to "own" information is counterproductive. There are some doctrinal differences. Open source leans more toward utility, meaning that in many cases it's more useful to let everyone have unlimited access. Free software holds that there is something ineffably un-ownable about information and that trying to act otherwise invariably sets one against the grain of the universe. Ironically enough, both camps take great care to retain some control, in order to make sure that no one ends up asserting ownership of something realeased as free.

In the commercial world things are less clear-cut. Leaving aside any upcoming revisions in its "site governance" FaceBook says it owns anything you put up on it, even if you take it down. FaceBook users are less comfortable with that than is FaceBook. Record labels, movie studios and other entertainment providers try their best to control how much and when you access their products. Newspapers would just like someone to buy ads. Online, print, they don't much care at the moment.

Artists, for their part, are trying all kinds of experiments in the area of how to get paid. And good for them. Experimentation is part of the job, after all. Some argue for strict copyright enforcement, some argue for widespread free distribution. Many would be happy just to have the problem of people trying to get at what they produce.

So like I said, it's a mess. But what caught me slightly off guard, despite my occasional "not-so-disruptive technology" screeds, was realizing that it's always been a mess. Who owns your likeness? Or rather, who controls who can see what pictures of you when? Well, it depends. If you write a good old-fashioned letter to your print newspaper (if you still have one), do you own it? The paper generally claims you don't.

Who decides which songs a particular advertiser can use in a particular ad? The short answer: lawyers. How many Tops does it take to call a band the Four Tops? Ask the judge. Who decides how much your local venue pays to BMI and ASCAP? It's somewhat complex. Just how much James Brown can you sample for your old school rap? Most of it, apparently, but don't even try it with Frank Zappa.

In short, the question is an ill-defined and contentious one without clear bright lines, one which is answered piecemeal one test case at a time. And this is not news.

Wednesday, April 22, 2009

"Buying" internet movies

Maybe I'm missing something but ...

For a price near what you'd pay for a DVD, Amazon will give you the right to watch a given movie over the internet, subject to some fine print. Or you could just get the DVD for a similar price.

If I have a DVD, I can watch it pretty much anywhere -- on any TV with a DVD player attached, with a portable player, in a car with a player, on a laptop, at a hotel ... wherever. If I have the internet equivalent, I'm restricted to internet enabled devices, and within that, to whatever devices Amazon chooses to allow.

Frankly, I wouldn't know what devices those would be without looking. I know it'll work with my Roku box, and I've never really had cause to try anywhere else. I know Netflix will let me watch their movies on a laptop (at least if it's using IE/Windows), but I haven't been traveling much lately, so that's not really on my radar either.

The point being that, if I'm just watching a particular movie through a subscription service, or renting it short-term to watch it once, I don't care that watching the movie is tied to the box I ordered it on. I'm going to sit down, order the movie, pop some popcorn and watch it. On the other hand, if I'm "buying" it, that is, buying the right to watch it whenever I want, I'd also like the right to watch it wherever I want.

Internet video delivery is not there yet. I'm sure it will get there, and I'm sure it's good enough for some people right now. It's not there for me yet, late adopter that I am.

[What's a "DVD"?  Recently a friend wanted to watch a short clip on a DVD, and had a choice of ... a set-top box in the living room that happened to have a DVD attached.  When your main devices are a phone and a tablet, and there's an app on those for the major providers, it's not so hard to watch internet video everywhere, and a pain to watch a DVD anywhere.  Pretty much the opposite of what I describe above, only six or seven years ago.

I personally prefer not to buy internet movies.  The price tends to be three or four times the rental price, and I generally don't watch movies three or four times.  Maybe the movie won't be available three years from now when I feel like re-watching an old favorite, but maybe it'll be available cheaper on the original site or elsewhere.  I find this surprising, honestly, since I often tend toward a "get ALL the bits and keep them" mentality, but evidently not in this case.  --D.H. Jan 2016]

Sunday, April 19, 2009

Terms of wiki art

"Link rot" is the tendency for URLs to become invalid as the sites they point to go dead or move elsewhere (and any forwarding left behind goes dead). It's an annoying but necessary consequence of a very basic principle of the web: links don't have to point at anything, even though they generally should*. It's probably less of a problem than it used to be as more material comes to live on sites hosted by large, durable entities. Blogger.com, now a Google property, for example. As the man said, cool URIs don't change.

Wikipedia and similar wikis add a particular twist: Links within the wiki generally don't go dead; they go weird. Some ways this can happen:
  • The original link points to an article on, say, crickets. Per usual custom, the actual link reads [[Cricket|crickets]]. That is, it appears as "crickets" but actually points to the article entitled Cricket. This is originally about the insect, but soon someone adds an article on the game. The link now points at either the disambiguation page for the various possibilities of Cricket or at the article for the game, depending on how the process proceeds.
  • The original link points to a specialized article, say on cricket songs. This is later deemed not to be worth its own article and gets folded into Cricket (insect). Helpful bots redirect the link in the article, but the link is now considerably less useful, particularly if it was originally something like [[Cricket song|song]] and later edits rearrange the sentence the link appears in. You start with something like "The sound of the instrument has been compared to the [[Cricket song|song]] of crickets." and end up with something like "The sound of the instrument has been compared to insect [[Cricket (insect)|song]]," with the actual material on cricket song somewhere on the page.
  • In the previous case, the section on cricket song may later be removed, possibly completely or possibly to, say, a general page on insect sounds. The [[Cricket (insect)|song]] link now points to an article on the cricket, with at best a link in the general direction of the original material on its song, said link being in some random spot on what is now a very thorough and complete article on the cricket, its diet and habits, its appearance, its significance in human culture, etc. etc.
  • Or ... the first two cases can combine to leave a link that appears as "song", points to Cricket and lands you — huh?? — at an article on an inscrutable pastime of the Commonwealth.
I'm 90% sure the Wikipedia community has a term of art for this, but the obvious choices of "wikirot" and "wiki rot" don't seem to turn up anything. "Wiki gardening" is the practice of tending a wiki in order to counter such rot and generally improve the organization of the wiki.

While I'm at it, is there a term for the practice "wikifying" (making links for) marginally relevant terms while leaving really relevant ones "unwikified"?

* For a little more on dangling links as a principle of web architecture, see this post and this one. Appropriately enough, the relevant snippets are buried in the middle of them.

Paying for movies

Continuing the theme of "What does it mean to own content?" ... let's compare some ways of legally paying for a movie, short of actually going to a theater and, um, watching a movie.

If you have an ordinary TV setup you can pay for a movie by:
  • Watching it for "free" over the air or on basic cable/satellite service, but dealing with commercial interruptions
  • Watching it on a "premium" movie channel and paying for it and whatever else you watch on that channel through a subscription fee
  • Watching it on pay per view which, despite the name, generally means paying for the right to view it during a 24-hour period (at least on cable)
  • Buying a DVD and watching it as much as you want
With a DVR and the movie channels' on-demand services, time is not really a factor, except in that different channels offer different movies at different times and pay-per-view is generally more up to date.

If you have an internet connection and suitable equipment, you can pay for a movie by
  • Watching it for "free" via a service like Hulu, but dealing with commercial interruptions
  • Watching it through a service like Netflix and paying for it and whatever else you watch that way through a subscription fee
  • "renting" it from Amazon (or anyone else who does this) for a 24-hour period
  • "buying" it from Amazon (or anyone else who does this) and watching it as much as you want (subject to a little fine print)
Suspiciously similar, no?

Saturday, April 18, 2009

What is "content"?

Lots to write about, but not a lot of time to write about it.

A few weeks back, I started a post entitled "What does it mean to own content?" but as so often happens, it veered off in a different direction. As you might guess by the size of the "Intellectual Property" tag, I see questions of content and ownership to be central to understanding the Web As We Know It. That is, if I thought I understood such questions, I'd feel a lot more like I understood the WAWKI.

So I'm going to try again, but this time, for everyone's sake, I'm going to try to take smaller bites. Thus the question in the title.

First, I'm not crazy about "content" as a label for all the various things we put up on the web or otherwise try to get to an audience, but for the moment it'll have to do. It's not the worst choice. If "content-free" is a harsh assessment, then content must be worth something. It fits more or less with the metaphor that the web is a web of conduits, that sites have things in them (or on them, at least) and so forth.

But what is it? I'm tempted to drop back to Potter Stewart's "I know it when I see it," but that's hardly an answer, or at least, not being a Supreme Court justice I don't personally feel I can get away with it. Another approach would be to borrow Russell Ackoff's data/information/knowledge/understanding/wisdom hierarchy and say that content is any of the five -- or more likely, anything north of "data".

Since we're dealing at least partly in intellectual property, what does it look like from that angle? Intellectual property (another term I claim I'm not crazy about) is "property that results from original creative thought." Content might or might not be someone's property, so take that out and you're left with content as what "results from original creative thought." Or broadening just a bit, something that someone created.

This is probably close to what I'm feeling for. It emphasizes that someone found reason to make the "content" available and, come to think of it, it's pretty much a restatement of what I said above: "all the various things we put up on the web or otherwise try to get to an audience." So at least I seem to agree with myself. It also jibes reasonably well with the classic notion of the web as a collection of resources.

To be clear, I would count something computer-generated -- say a compilation of statistics on something else -- as content. The computer generated the exact data, but someone decided to make that data available.

So far, so good, but under that definition is anything not content? This comes back, I think, to Ackoff's distinction between data and information. To take an example, it's possible to measure how many people from, say, Idaho, have accessed this site last week. That's data, but (to my knowledge) no one has actually pulled it together and presented it to the world as information. Now, if I tell you that, according to the statistics available to me, no one from Idaho has accessed this site directly this week, then it becomes information and, by my attempted definition, content.

(Except, where would I get that information? Google can supply it to me, but only if I ask. Is it "content" before then? Does it become "content" when I ask for it? Or only when I present it to the world? Well, I don't know what content is, but I know it when I see it ...)

Wednesday, April 8, 2009

Actually ... a bit more on Omegle

First, in the process of writing the previous post, I stumbled across a basic point about anonymity and handles: You only need handles for long-running dialogs. If you just want to access sites anonymously, a persistent handle would be an active hindrance. If you just want to contact someone once, but don't need or want them to get back to you, a persistent handle is of no help. Only if you want to contact someone and receive replies repeatedly do you need a handle.

Second, while Omegle itself seems Mostly Harmless, it's possible that it's used as a source of cover traffic for a real, behind-the-scenes anonymizer. The more people you might possibly be, the more anonymous you are. Ideally, you could blend with a large number of faces in a crowd. One problem in anonymizers, however, is that this works both ways.

If you might be mistaken for me, I might be mistaken for you. If you're up to no good and I'm just being anonymous for whatever unknowable reason, then you stand to gain from the confusion and I stand to lose. In that case, why should I use an anonymizer? But if everyone thinks like I do, there's no one for you to blend in with. See here and here and the "anonymity" tag in general for more detail.

On the other hand, if you have a ready supply of random people happily chatting with each other anonymously, you (probably) also have a ready supply of unknowing parties to offload risk onto. Just to be clear, I have no reason at all to believe Omegle is doing this. It's not even clear that Omegle would provide better cover traffic than other, sneakier schemes. It's just an interesting point for pure speculation.

Finally, when I mentioned the memory of a goldfish, I had a MythBusters result in mind.

Go ahead and talk to strangers

Here's one of those interesting viral-y things you find on the web that's probably not the Next Big Thing, but still intriguing: Someone or something called Omegle. Appropriately enough, they're so new even Wikipedia doesn't seem to have heard of them. They have an omega in their logo, so it's probably either "oh MAY gul" or "oh MEE gul", depending on the side of the pond [It's oh-meg-ul, according to the web site].

[Amazing what you can come up with via the most rudimentary checking: According to the Omegle blog, Omegle is a Python application written by an 18-year-old guy from Brattleboro, Vermont named Leif K-Brooks. It went live on March 25th 2009 and has been growing steadily. Of course, that could just be a cover story (see next post), but I doubt it.]

What have they done? They've launched a site that's brilliant in its simplicity and doesn't seem to have been done in quite this form before: it lets you talk to strangers.

All you do is start a chat. The site picks a random other person starting a chat and pairs you up. You show up in the transcript as "You". The other person shows up as "Stranger". Disconnect and try again, and once again "You" are talking to "Stranger". Most likely it's a different Stranger, but who knows? Presumably it's the same "You."

For that matter, if you did end up talking to the same person twice, how would you know? You could give out some identifying information, or sneakily glean some from the other party, but that would sort of defeat the purpose, wouldn't it?

From a brief trial, it would appear that the level of conversation varies from the occasional spambot flogging a web site, to juvenile antics, to the kind of civil conversation one might strike up while waiting in line for something.

It's about as anonymous as you can get [but see note below]. In some anonymous setups, a particular party goes by a consistent handle. You may not know who they are, but you might know that last week they said such-and-such and that so-and-so says they're into modern jazz. Omegle, by contrast, has the memory of a goldfish (well, actually, less memory than a goldfish). Every conversation uses the same two handles, which is to say there are no handles at all.

It's not clear how strong the actual anonymity guarantee is. The easy way to do this, just accept direct connections and hook them up, is not secure. If you say "the stolen diamonds are hidden under the bridge on Fourth Street," Omegle might well know which IP sent the message. On the other hand it might be using some stronger anonymizer, in which case it wouldn't know. Who knows?

What is pretty clear is that Omegle is not a practical system for most kinds of cloak-and-dagger mischief. If agent 86 wants to contact agent 99, Omegle won't help. If agent 86 wants to tell some random person where the stolen diamonds are, that's a different story -- but it's hard to see how it's a useful one.

Omegle won't connect you to a particular web site, or email address, or chat handle, even if it's random and anonymous. It will just put you in a virtual room with a stranger. Some stranger. Any stranger. As such, it seems like mostly harmless fun, if you like that sort of thing.

[I wasn't able to confirm this on Wikipedia or Omegle's own site, but my understanding is that Omegle sets up a point-to-point connection between the two participants.  This allows each participant to see the other's IP address using standard networking tools, making it not very anonymous at all --D.H. Jan 2016]

Friday, April 3, 2009

Galaxy Zoo 2

In celebration of the 100 Hours of Astronomy event, Galaxy Zoo aims to get one million galaxies classified before the hundred hours are up. At this writing, it looks like they'll easily meet that goal (well done, everyone!), but you can still try your hand at it any time you want. Who knows? You might even find your very own "voorwerp".

The new classification is more fun than the old, which was not bad to begin with. There are more questions, the interface is nicer, you can go back and review the galaxies you've classified and mark images as favorites. You can invert the image you're looking at, which both looks cool and can help you see features you might otherwise miss. If the galaxy you're looking at has a bar or spiral arms, you get to play "Galaxy Wars" (kinda cheesy name, but what the hey) to gauge whether the features are more or less prominent compared to some other galaxy.

If you think of a sequence from indistinct blob to disk with a blob in the middle to full-featured galaxy, you can imagine the galaxies actually form over stupefyingly long periods of time -- which is more or less the point of the whole exercise. [Actually, no. It turns out that Hubble was quite clear that his now-famous sequence of galaxy forms was meant as a taxonomic device and makes no claim about the forms through which galaxies evolve. Current theory holds that elliptical galaxies form via mergers of spiral galaxies.].

Oddly fascinating fun. Recommended.

Wednesday, April 1, 2009

Not much more about Conficker.c

Some of this I already suspected and some it I'd have learned sooner if I'd been paying closer attention:

All Hell has not broken loose. This is not particularly surprising. All Hell has a history of not breaking loose on cue. One likely reason is that the people behind this appear to be in it to make money, and the successful parasite does not kill its host. There's even a plausible guess as to the business model: charge to rent out the infected machines as a distributed password-cracking compute server, sort of like SETI@home but up to no good and under remote control.

For example, if you know the last four digits of someone's Social Security number, there are no more than 100,000 possibilities for the other digits. If you have 100,000 computers at your beck and call, it will take very little of any particular computer's time to try all of the combinations. Of course, there are problems with the approach, particularly if trying a number involves contacting, say, some bank's server, which might find it suspicious that the customer has forgotten her SSN and has resorted to trying all possible combinations in quick succession. But you get the idea.

What about the notion that if your computer is infected, thieves will be able to track your every keystroke and steal your secrets? Well, one can't rule anything out, but that kind of behavior doesn't fit well with the "distributed password cracking" scenario. If I'm leeching off your PC's processing power, the last thing I want to do is draw attention to myself.

I previously said there were "many, many" Conficker infections. What's "many"? The actual figure is thought to be in the millions or low tens of millions, which is large enough, but consider that there are somewhere in the high hundreds of millions of computers in use.