Oh. My. Goodness.
It seems the outgoing government of Italy has seen fit to put up, with no prior warning, a web site with the tax details of every Italian taxpayer. The information includes, at least, income and tax paid.
Can they do that? Looks like they just did.
Human nature being what it is, the site was soon overwhelmed with traffic and, according to the BBC, has been or is to be taken down. It's not completely clear to me whether the "privacy watchdogs" mentioned have the authority to get it taken down, or are just demanding it be taken down, but the statement that the site was up for 24 hours suggests that someone took it down. In which case the interesting question becomes who was able to scrape and store how much data about whom during that window ... [My understanding from later articles is that the watchdogs in question were governmental agencies with the authority to have the site taken down, and that it was taken down ... but not before a lot of people had had their fun]
I'd be curious to know how all this squares with Article 8 of Chapter II of the EU Charter of Fundamental Human Rights. My guess would be that it doesn't. But then, my understanding is that the CFHR doesn't actually have any legally binding status.
Wednesday, April 30, 2008
Engineering, in general
Today I put up a tent. This wasn't some high-tech mountain climbing special, but an ordinary one from one of the major retail chains, the same brand as the ones we used when I was growing up. It was reasonably light and compact for what it was, certainly lighter and more compact than the comparable model a generation ago.
It wasn't too hard to put together, even a year after last having done the exercise and without looking at the instructions. There were two colors of poles, black and gray. The black poles slipped through black openings and attached to the tent with black clips. The gray poles slipped through gray openings and attached with gray clips.
At any point, there were only a few sensible things to do. Most of them worked, and if you got something wrong -- tried to seat a pole at the wrong spot, for example -- it would soon be clear that it wasn't going to work. If you got the three main poles bent into place, you had something resembling a tent in shape. Everything after that made it a better tent -- roomier, more stable, easier to get into and out of, shaded, what-have-you.
If you neglected to do something small -- clip a particular clip or fasten a particular little strap -- the result was only slightly less good than if you hadn't. If you left out something major -- left the fly off or decided not to stake it down -- there would be a more noticeable effect. The result might be sub-optimal, but you could tell something was missing and choose to fix it if you wanted.
This particular tent was old enough (and cheap enough) that the elastic cord holding two of the poles together had broken. That made it harder to keep the sections of the poles together at first, but once the poles were under load there was no functional difference. The design still worked even with a less-than-perfect implementation.
In other words, it was a very nice piece of engineering. You could tell that the company had been making tents for quite a long time. The lessons to be drawn, say regarding user interface and design for robustness, are so obvious I won't bother to point them out.
Or as I used to say, if the average software shop ran as well as the average sandwich shop, the world would be a better place.
It wasn't too hard to put together, even a year after last having done the exercise and without looking at the instructions. There were two colors of poles, black and gray. The black poles slipped through black openings and attached to the tent with black clips. The gray poles slipped through gray openings and attached with gray clips.
At any point, there were only a few sensible things to do. Most of them worked, and if you got something wrong -- tried to seat a pole at the wrong spot, for example -- it would soon be clear that it wasn't going to work. If you got the three main poles bent into place, you had something resembling a tent in shape. Everything after that made it a better tent -- roomier, more stable, easier to get into and out of, shaded, what-have-you.
If you neglected to do something small -- clip a particular clip or fasten a particular little strap -- the result was only slightly less good than if you hadn't. If you left out something major -- left the fly off or decided not to stake it down -- there would be a more noticeable effect. The result might be sub-optimal, but you could tell something was missing and choose to fix it if you wanted.
This particular tent was old enough (and cheap enough) that the elastic cord holding two of the poles together had broken. That made it harder to keep the sections of the poles together at first, but once the poles were under load there was no functional difference. The design still worked even with a less-than-perfect implementation.
In other words, it was a very nice piece of engineering. You could tell that the company had been making tents for quite a long time. The lessons to be drawn, say regarding user interface and design for robustness, are so obvious I won't bother to point them out.
Or as I used to say, if the average software shop ran as well as the average sandwich shop, the world would be a better place.
The video flood
Shockingly, I'm not the only person pondering the impact of video on the internet. Under the provocative headline Does online video threaten the net?, Auntie Beeb, in a somewhat indecisive mood, concludes yes, possibly, well, no probably not.
One interesting point, lurking but not prominent in the piece, is that the projected buildout to video-friendly bandwidth will most likely benefit the larger players and shake out smaller ones who are too busy undercutting each other to accumulate the kind of capital needed to make major upgrades to the backbone and other infrastructure.
All of this assumes that the net will be the way to go for video. It may well be. Cisco is confident that the bandwidth will be there, and Cisco knows a bit about the topic.
On the other hand, I'm not quite ready to say it's a foregone conclusion. Quite possible, sure. Likely, even. But given the inherent efficiency of broadcasting broadcast video, maybe not inevitable. Back on the one hand, though, if internet video works and it doesn't cost too much, why not?
One thing that jumped out at me was the projection that by 2011, 30% of bandwidth would be video and 43% would be peer-to-peer file sharing of video. I'd be interested to know the assumptions behind those figures. To what extent do people use peer-to-peer because the only alternative is to shell out for a DVD or hope the movie you like is available on demand on cable?
I was also intrigued by the BBC's iPlayer service, but I wasn't able to try it since it only works if you can convince their server you're in Britain. An international version is promised. That's got to be about £icensing.
One interesting point, lurking but not prominent in the piece, is that the projected buildout to video-friendly bandwidth will most likely benefit the larger players and shake out smaller ones who are too busy undercutting each other to accumulate the kind of capital needed to make major upgrades to the backbone and other infrastructure.
All of this assumes that the net will be the way to go for video. It may well be. Cisco is confident that the bandwidth will be there, and Cisco knows a bit about the topic.
On the other hand, I'm not quite ready to say it's a foregone conclusion. Quite possible, sure. Likely, even. But given the inherent efficiency of broadcasting broadcast video, maybe not inevitable. Back on the one hand, though, if internet video works and it doesn't cost too much, why not?
One thing that jumped out at me was the projection that by 2011, 30% of bandwidth would be video and 43% would be peer-to-peer file sharing of video. I'd be interested to know the assumptions behind those figures. To what extent do people use peer-to-peer because the only alternative is to shell out for a DVD or hope the movie you like is available on demand on cable?
I was also intrigued by the BBC's iPlayer service, but I wasn't able to try it since it only works if you can convince their server you're in Britain. An international version is promised. That's got to be about £icensing.
Two ways to digitize books
I recently said that I didn't expect everything in print to be available online digitally any time soon. One reason is that the fundamental question of who gets paid when and how for copyrighted material is far from settled. Another is the sheer volume of books out there.
All I know is what I read in the paper, um, I mean the online version of the paper, but as I understand it there are two competing approaches to this at the moment. Google and Microsoft will come in and digitize your books for you. All they ask is that they retain certain rights to the digital version, like the exclusive right to index it online.
The Open Content Alliance, on the other hand, will digitize the book and make it available to all. But it will cost you, along with the alliance, and its benefactors, a portion of the $30 or so it costs to digitize a book.
Despite the cost, many research libraries are finding it more in keeping with their mission to make the digital content available without restriction. This will be an interesting test of the "information wants to be free" theory.
All I know is what I read in the paper, um, I mean the online version of the paper, but as I understand it there are two competing approaches to this at the moment. Google and Microsoft will come in and digitize your books for you. All they ask is that they retain certain rights to the digital version, like the exclusive right to index it online.
The Open Content Alliance, on the other hand, will digitize the book and make it available to all. But it will cost you, along with the alliance, and its benefactors, a portion of the $30 or so it costs to digitize a book.
Despite the cost, many research libraries are finding it more in keeping with their mission to make the digital content available without restriction. This will be an interesting test of the "information wants to be free" theory.
Sunday, April 27, 2008
Non-intrusion and the web
In responding to a comment on a previous post, I mentioned that there are ways to refer precisely to a piece of something on the web, even if that something isn't specifically set up for it. It's easier and more robust if the document you're linking to has fine-grained markup, but it's not impossible to say "so many characters into document X" or "right after the words 'Chapter II'" or something similar. At worst, with a little scripting (I wave my hands here) you could pull up the right passage more or less automagically on a browser and no one need know the difference.
There is a general principle at work here. Changes to any given system may be intrusive or non-intrusive. An intrusive change requires changes to what's already there. A non-intrusive one doesn't.
If I wanted to compile an annotated list of terms people use for colors, I could use an approach like the one above and get a list that might get out of sync if someone updates a document I'm pointing at, but which should work well enough in practice. If I tire of having to keep my hand-crafted links in sync, I could try to get some particualr tag inserted at any mention of a color. That would make my job a lot easier, and the people maintaining documents in my list could edit to their hearts' content so long as the tags got maintained properly.
This is the basic tradeoff. An intrusive change requires cooperation from all involved, but can result in a more cohesive system. A non-intrusive change expects less, but possibly at the cost of cohesion. The architectural sweet spot is a resource that can provide convenient handles for anything useful it provides, without knowing or caring who's using it or what it's being used for. This is often easier said than done.
The web is a big place. The chances of convincing everyone else to change for the sake of your change diminish sharply the more people you have to convince. If it's more than just a few close friends, you generally need a really good reason. "You want me to re-do my page so you can track color terms I use in it? Um, no."
That's not to say you can't offer a good reason for making a change, while still offering something of lesser value in case the change doesn't happen. People do that all the time, and it often works. There's a big difference between offering and demanding.
It's not surprising that non-intrusion is a fundamental part of what makes the web work, and what makes the web the web. Off the top of my head:
Lexicographical note: Backward-compatible is more specific than non-intrusive, but otherwise very closely related. A backward-compatible change is generally within a given system: The new one will do everything the old one will and more. Non-intrusion is more about being able to add an entirely new piece without anyone else even having to know about it. I'll generally know that I've moved up from version 6.1 to version 6.2, backward compatibly, but I may not know that, say, my web page is being mashed up with someone else's in ways I never anticipated, non-intrusively.
There is a general principle at work here. Changes to any given system may be intrusive or non-intrusive. An intrusive change requires changes to what's already there. A non-intrusive one doesn't.
If I wanted to compile an annotated list of terms people use for colors, I could use an approach like the one above and get a list that might get out of sync if someone updates a document I'm pointing at, but which should work well enough in practice. If I tire of having to keep my hand-crafted links in sync, I could try to get some particualr tag
This is the basic tradeoff. An intrusive change requires cooperation from all involved, but can result in a more cohesive system. A non-intrusive change expects less, but possibly at the cost of cohesion. The architectural sweet spot is a resource that can provide convenient handles for anything useful it provides, without knowing or caring who's using it or what it's being used for. This is often easier said than done.
The web is a big place. The chances of convincing everyone else to change for the sake of your change diminish sharply the more people you have to convince. If it's more than just a few close friends, you generally need a really good reason. "You want me to re-do my page so you can track color terms I use in it? Um, no."
That's not to say you can't offer a good reason for making a change, while still offering something of lesser value in case the change doesn't happen. People do that all the time, and it often works. There's a big difference between offering and demanding.
It's not surprising that non-intrusion is a fundamental part of what makes the web work, and what makes the web the web. Off the top of my head:
- Once you've given something a URL, anyone can reference it in any way they want. You don't have to give out a new URL, or do anything at all, whenever someone wants to reference your resource.
- Search engines index huge masses of documents by content without those documents having to do anything special at all (if you want to, you can add "meta" tags to help this along, but you don't have to).
- More fundamentally, there's no requirement that a resource even exist for a given URL. Obviously a page with lots of broken links for no reason is not a very useful page, but you don't have to have to finish everything before publishing anything. And a good thing, else nothing would ever get published. Wikis are a famous example of this principle in action.
- Tagging services like del.icio.us attach metadata to pages without changing the pages themselves.
- Mashups are all about re-using resources in new and unanticipated ways.
Lexicographical note: Backward-compatible is more specific than non-intrusive, but otherwise very closely related. A backward-compatible change is generally within a given system: The new one will do everything the old one will and more. Non-intrusion is more about being able to add an entirely new piece without anyone else even having to know about it. I'll generally know that I've moved up from version 6.1 to version 6.2, backward compatibly, but I may not know that, say, my web page is being mashed up with someone else's in ways I never anticipated, non-intrusively.
Labels:
del.icio.us,
imperfection,
metadata,
tagging,
Web 2.0
Friday, April 25, 2008
Edward R. Murrow on computers and communication
One question I've been exploring in these posts is to what extent the features and problems of the web are technical and to what extent they're human. My bias tends to be that the human element is more pervasive than one might think. As usual, though, some else has beaten me to the punch:
The newest computer can merely compound, at speed, the oldest problem in the relations between human beings, and in the end the communicator will be confronted with the old problem, of what to say and how to say it.Murrow had quite a few interesting things to say about the media, and about life in general. I'll leave you with another choice bit from the same speech:— Edward R. Murrow (1964)
The speed of communications is wondrous to behold. It is also true that speed can multiply the distribution of information that we know to be untrue.
Tuesday, April 22, 2008
Harry Potter and the compendious lexicon
In broad strokes, since the full story is all over the news: Ridiculously famous author J. K. Rowling (along with her publishers) is suing dedicated fan Steven Vander Ark over the pending publication in print of The Harry Potter Lexicon, a listing of terms, people, places, spells and suchlike from Rowling's Harry Potter series so all-inclusive that Rowling herself admits to having used the online version for reference on occasion.
This is not the first time that print publication of an online reference work has led to legal wrangling. Eric Weisstein and Stephen Wolfram's disputes with CRC come to mind. Legal action is by no means inevitable in such cases -- I can spot at least one book on my shelf that was spun off of a web site without apparent trouble -- but when it does happen there is a sizable risk of stirring up a real hornet's nest.
In the Potter case, Rowling is not arguing with Vander Ark's right to maintain his lexicon. The basis of the legal suit is that the Lexicon uses too much copyrighted material from the books to fall under "fair use".
What's troubling here is that there didn't appear to be any issue while the lexicon was online, but now that Vander Ark is seeking to publish his version in print, and in competition with Rowling's own upcoming lexicon, comes the lawsuit.
Should the print version be treated differently from the online version? If so, why? The thrust of Rowling's legal argument appears to be that the print version is a commercial work. Whether the copying work is commercial is one of the four basic questions of fair use. That's all fine, except that the online lexicon carries Google ads, and fairly prominently I might add. So again, what's the difference?
There's an interesting technical issue in the background here, distinct from the well-known economic differences between web publication and print publication. Vander Ark's lexicon contains passages from the books because the books themselves are not directly available online.
If everything is on the web, then it's easy not to actually copy copyrighted material -- just link to it. Granted, it can be difficult to link to, say, a particular paragraph in a work, but it's not technically impossible. In practice, inclusion by reference (i.e., linking) works fairly well. Linking from a web page to a printed book, not so much. Even with the full seven volumes handy, and with links realized as "book B, page P, line L" or similar, thumbing through to an exact page and scanning for the right sentence would be a pain to say the least.
Likewise, once a web site goes to print, its links go dead. You do see URLs in footnotes and references fairly routinely, but not woven into the printed text. Putting down the book, pulling up a browser and typing in a link is no better than thumbing through a book.
This lack of interconnectivity makes the print and online versions potentially fundamentally different beasts, regardless of the economics of printing. In the case at hand, it encourages copying that would not take place if everything were online. Thinking it over, I'm not sure this really bears directly on copyright cases. Both versions of the Lexicon copy text because linking to print is impractical. However, it might help explain how we got here.
Perhaps this will all go away when everything is online, indexed and searchable, but I don't expect that to happen for quite some time.
This is not the first time that print publication of an online reference work has led to legal wrangling. Eric Weisstein and Stephen Wolfram's disputes with CRC come to mind. Legal action is by no means inevitable in such cases -- I can spot at least one book on my shelf that was spun off of a web site without apparent trouble -- but when it does happen there is a sizable risk of stirring up a real hornet's nest.
In the Potter case, Rowling is not arguing with Vander Ark's right to maintain his lexicon. The basis of the legal suit is that the Lexicon uses too much copyrighted material from the books to fall under "fair use".
What's troubling here is that there didn't appear to be any issue while the lexicon was online, but now that Vander Ark is seeking to publish his version in print, and in competition with Rowling's own upcoming lexicon, comes the lawsuit.
Should the print version be treated differently from the online version? If so, why? The thrust of Rowling's legal argument appears to be that the print version is a commercial work. Whether the copying work is commercial is one of the four basic questions of fair use. That's all fine, except that the online lexicon carries Google ads, and fairly prominently I might add. So again, what's the difference?
There's an interesting technical issue in the background here, distinct from the well-known economic differences between web publication and print publication. Vander Ark's lexicon contains passages from the books because the books themselves are not directly available online.
If everything is on the web, then it's easy not to actually copy copyrighted material -- just link to it. Granted, it can be difficult to link to, say, a particular paragraph in a work, but it's not technically impossible. In practice, inclusion by reference (i.e., linking) works fairly well. Linking from a web page to a printed book, not so much. Even with the full seven volumes handy, and with links realized as "book B, page P, line L" or similar, thumbing through to an exact page and scanning for the right sentence would be a pain to say the least.
Likewise, once a web site goes to print, its links go dead. You do see URLs in footnotes and references fairly routinely, but not woven into the printed text. Putting down the book, pulling up a browser and typing in a link is no better than thumbing through a book.
This lack of interconnectivity makes the print and online versions potentially fundamentally different beasts, regardless of the economics of printing. In the case at hand, it encourages copying that would not take place if everything were online. Thinking it over, I'm not sure this really bears directly on copyright cases. Both versions of the Lexicon copy text because linking to print is impractical. However, it might help explain how we got here.
Perhaps this will all go away when everything is online, indexed and searchable, but I don't expect that to happen for quite some time.
Thursday, April 17, 2008
Now that I've tagged it, how do I find it?
I try to assign a few relevant tags to every post, as is common practice. Blogger's tagging facility (along with lots of others, I expect) will suggest tags you've previously used. For example, if I type "bl" into the space, up comes a list containing blacksmithing, BLOBs, blogger, blogs and Eubie Blake. Hmm ... I'd forgotten I'd used "blogger" as a tag before. Curiously, J and only J, pulls up only personal names. I've been known on occasion to run through the entire alphabet to make sure I haven't missed anything.
Conversely, if nothing pops up for a tag, you know you haven't used it. That happened for DMCA on the previous post. That's funny. I'm sure I've mentioned the DMCA before. Aha, there are a couple. All I have to do is search. A couple of quick edits and they're tagged now, too.
Now, if anyone could find those by searching, why add the tag? Indeed, I almost didn't add the tag, because those posts don't happen to be about DMCA. They mostly just mention it in passing. I opted to tag those because the passing mention was apropos of what DMCA is. On the other hand, this post contains the letters "DMCA" but has nothing to do with it.
Likewise, there are a couple of posts, like this one or maybe even this one, that don't mention DMCA specifically, but might be relevant. On review, I'm not tagging those DMCA either. They're both tagged "copyrights" now.
While reviewing that I ran across another post that should have been tagged "copyrights" but wasn't. That's a recurring problem with tagging. Which tags should I use here? Which tags did I put that article under? Did I think to put it under "copyrights"?
All of this leads me to a few general points on tagging:
Conversely, if nothing pops up for a tag, you know you haven't used it. That happened for DMCA on the previous post. That's funny. I'm sure I've mentioned the DMCA before. Aha, there are a couple. All I have to do is search. A couple of quick edits and they're tagged now, too.
Now, if anyone could find those by searching, why add the tag? Indeed, I almost didn't add the tag, because those posts don't happen to be about DMCA. They mostly just mention it in passing. I opted to tag those because the passing mention was apropos of what DMCA is. On the other hand, this post contains the letters "DMCA" but has nothing to do with it.
Likewise, there are a couple of posts, like this one or maybe even this one, that don't mention DMCA specifically, but might be relevant. On review, I'm not tagging those DMCA either. They're both tagged "copyrights" now.
While reviewing that I ran across another post that should have been tagged "copyrights" but wasn't. That's a recurring problem with tagging. Which tags should I use here? Which tags did I put that article under? Did I think to put it under "copyrights"?
All of this leads me to a few general points on tagging:
- It's not particularly new. It's much the same as traditional indexing. In web form, it goes back at least to categories on wikis.
- It's time consuming and subjective. Care and feeding of categories is an important part of wiki gardening, for example.
- It's most useful exactly where more automated tools don't work. If you want to find posts here that mention DMCA, just search. "DMCA" is probably not a particularly useful tag. "Annoyances" and "Neat hacks" make better use of the tool.
- Likewise, tools like sub-categories or other schemes for tags to aggregate other tags, though useful, aren't foolproof.
- On the other hand, tags are still nice for browsing, particularly the more abstract ones.
A little more on copyrights
First, a big honking obvious disclaimer: I AM NOT A LAYWER. But you knew that. It doesn't say "lawyer" on my profile, it says software geek. Even if it didn't say anything at all, it still wouldn't say "lawyer". Unless someone can show you that they are licensed to practice law in your jurisdiction and you've engaged them to represent you, anything they say about the law is just an interesting bit of information.
Why beat that to death so obnoxiously? Well, first I don't want anyone running off and making an important legal decision on the basis of something I said here. Not likely, I understand, and not my fault if it does happen, but at least I feel better about it now.
While I'm at it, I should also disclaim that I'm going by U.S. copyright law, though international treaties ensure a certain degree of harmony among the various national laws. For whatever it's worth I have, at various times, included "Alle rechten voorbehouden" and "Tutti i diritti sono riservati" in source code.
More relevantly, though, it comes back to another point that Linus made in the discussion I mentioned. You can put anything you want in a copyright notice, but that doesn't make it legally binding. I could say "you may only run this application while you are thinking pure thoughts," but so what? Copyright law has nothing to say about such matters.
If you choose to write your own copyright notice or license agreement, understand that some portion of it is likely to be either redundant -- because certain rights and restrictions apply even if there is no notice at all -- or invalid -- because it asserts something contrary to, or simply not covered by, the relevant law. The GPL is now in its third revision and has been subject to all kinds of legal scrutiny. There's a reason so many people just use it as-is and leave the legal questions to the lawyers.
It's also worth noting that the heart of the GPL has to do with distribution, not use: If you distribute this, you must make the source code available and you must include this notice so that anyone you distribute it to knows what's up.
The other point I wanted to make is that a legal kludge like (in the case at hand) including a poem in a source code comment is likely to be as legally fragile as a comparable kludge in the code itself would be technically fragile. In this case, what are you protecting? If it's only the poem, then I can just take that part out. If it's the source as a whole, why do you need the poem? Again, I'm not a lawyer, but it sure doesn't seem like much to stand on.
Why beat that to death so obnoxiously? Well, first I don't want anyone running off and making an important legal decision on the basis of something I said here. Not likely, I understand, and not my fault if it does happen, but at least I feel better about it now.
While I'm at it, I should also disclaim that I'm going by U.S. copyright law, though international treaties ensure a certain degree of harmony among the various national laws. For whatever it's worth I have, at various times, included "Alle rechten voorbehouden" and "Tutti i diritti sono riservati" in source code.
More relevantly, though, it comes back to another point that Linus made in the discussion I mentioned. You can put anything you want in a copyright notice, but that doesn't make it legally binding. I could say "you may only run this application while you are thinking pure thoughts," but so what? Copyright law has nothing to say about such matters.
If you choose to write your own copyright notice or license agreement, understand that some portion of it is likely to be either redundant -- because certain rights and restrictions apply even if there is no notice at all -- or invalid -- because it asserts something contrary to, or simply not covered by, the relevant law. The GPL is now in its third revision and has been subject to all kinds of legal scrutiny. There's a reason so many people just use it as-is and leave the legal questions to the lawyers.
It's also worth noting that the heart of the GPL has to do with distribution, not use: If you distribute this, you must make the source code available and you must include this notice so that anyone you distribute it to knows what's up.
The other point I wanted to make is that a legal kludge like (in the case at hand) including a poem in a source code comment is likely to be as legally fragile as a comparable kludge in the code itself would be technically fragile. In this case, what are you protecting? If it's only the poem, then I can just take that part out. If it's the source as a whole, why do you need the poem? Again, I'm not a lawyer, but it sure doesn't seem like much to stand on.
Labels:
copyrights,
GPL,
Intellectual Property,
law,
Linus Torvalds
Linus on the gray area in copyright law
I stumbled across an interesting thread while looking for something else (isn't that what you're supposed to do on the web?). Evidently someone suggested having the Linux kernel refuse to load any module (kernel plugins) that wasn't under the GPL, and even went so far as to compose an original poem in order to ensure copyrightability of the kernel (wouldn't any of several existing colorful comments do just as well?).
Linus objects, fairly forcefully, arguing that Linux trying to use technical means to control what can and can't be hooked in is essentially the same as, and no better than, the recording industry trying to use technical means to control who plays what when and where. Both interfere with fair use. You can bring it in, he says, but not in my tree [that is, I won't let you bring in this thing that won't let the kernel bring in other things, but you can make your own kernel with your thing-that-won't-bring-in-certain-other-things ...]. As far as I can tell, the effort failed. Certainly ndiswrapper, which would raise the same sort of issue if removed, is still around.
He then goes on to argue that the concept of a "derived work", which the proponents of the proposal invoked to argue that hooking a module into the kernel brings the module under the same copyright restrictions as the kernel, 1) doesn't work that way and 2) is a gray area with no bright line around it and 3) is deliberately a gray area and moreover it's a good thing it is.
He also makes the very cogent point that copyright is about distribution, not use.
Arching over all this, I think, is a view of how technology relates to the law.
Technology is all about well-defined limits. Code doesn't necessarily do quite what we say it does, but whatever it does do, it does unequivocally. Law is about human nature and behavior. It's built on slipperier notions like precedent and intent. Law is open to interpretation. This is a defining feature. There is no meaningful law without judges.
What I get from Linus here, and maybe this is just because I believe it myself, is that trying to use technology to put well-defined limits on the law is generally counterproductive.
A successful law can't be severely at odds with technical reality. It would be pointless, say, to make it illegal to transmit packets with an even number of bytes. Occasionally someone suggests something similarly asinine, but very few such proposals make it out of committee. I'm going to take the opportunity here to blatantly dodge the issue of whether the DMCA is a successful law in this sense.
However, law does best when it steers clear of the technical details. "It's illegal to sell someone else's work without their permission. I don't care how you do it." is much better law than "All computers must provide means to prevent copying of DVDs", which in turn is much better than "All computers must provide this particular means to prevent copying of DVDs."
The first version leaves room for lawyers pitch arguments to judges over what "work" and "permission" mean, but that's what lawyers and judges are supposed to do.
Likewise, code does best when it steers clear of the legal details.
Linus objects, fairly forcefully, arguing that Linux trying to use technical means to control what can and can't be hooked in is essentially the same as, and no better than, the recording industry trying to use technical means to control who plays what when and where. Both interfere with fair use. You can bring it in, he says, but not in my tree [that is, I won't let you bring in this thing that won't let the kernel bring in other things, but you can make your own kernel with your thing-that-won't-bring-in-certain-other-things ...]. As far as I can tell, the effort failed. Certainly ndiswrapper, which would raise the same sort of issue if removed, is still around.
He then goes on to argue that the concept of a "derived work", which the proponents of the proposal invoked to argue that hooking a module into the kernel brings the module under the same copyright restrictions as the kernel, 1) doesn't work that way and 2) is a gray area with no bright line around it and 3) is deliberately a gray area and moreover it's a good thing it is.
He also makes the very cogent point that copyright is about distribution, not use.
Arching over all this, I think, is a view of how technology relates to the law.
Technology is all about well-defined limits. Code doesn't necessarily do quite what we say it does, but whatever it does do, it does unequivocally. Law is about human nature and behavior. It's built on slipperier notions like precedent and intent. Law is open to interpretation. This is a defining feature. There is no meaningful law without judges.
What I get from Linus here, and maybe this is just because I believe it myself, is that trying to use technology to put well-defined limits on the law is generally counterproductive.
A successful law can't be severely at odds with technical reality. It would be pointless, say, to make it illegal to transmit packets with an even number of bytes. Occasionally someone suggests something similarly asinine, but very few such proposals make it out of committee. I'm going to take the opportunity here to blatantly dodge the issue of whether the DMCA is a successful law in this sense.
However, law does best when it steers clear of the technical details. "It's illegal to sell someone else's work without their permission. I don't care how you do it." is much better law than "All computers must provide means to prevent copying of DVDs", which in turn is much better than "All computers must provide this particular means to prevent copying of DVDs."
The first version leaves room for lawyers pitch arguments to judges over what "work" and "permission" mean, but that's what lawyers and judges are supposed to do.
Likewise, code does best when it steers clear of the legal details.
Labels:
copyrights,
DMCA,
DRM,
Intellectual Property,
law,
Linus Torvalds
Monday, April 14, 2008
Deja vu by satellite
As I wrote the post We want the world and we want it reasonably soon, it became clearer and clearer that I'd run across the scenario before. By the end, I was pretty sure where, and when I asked around, sure enough, it was Geocast (the name has since been recycled).
Ironically, I remember discussing the concept at the time with a friend who was directly involved and my not quite getting it. Technically, it may have been ahead of its time, but there was also the problem of content providers not being comfortable sending out semi-permanent copies to all and sundry. That part is arguably more of a problem with bigger cheaper storage.
On the other hand, worthy ideas have a way of coming around again ...
Ironically, I remember discussing the concept at the time with a friend who was directly involved and my not quite getting it. Technically, it may have been ahead of its time, but there was also the problem of content providers not being comfortable sending out semi-permanent copies to all and sundry. That part is arguably more of a problem with bigger cheaper storage.
On the other hand, worthy ideas have a way of coming around again ...
How many is a crowd?
According to a recent Wall Street Journal article, it's hard to tell how many centenarians there are. There are several reasons for this, but paradoxically, one reason is that there aren't that many of them. As a result, it's difficult to make a generic statement like "There are N people in this county over 100 on Social Security," because N is liable to be 1 or 2, in which case you've just told the world that those particular people are on social security. One comes to value one's privacy as one grows older.
The same situation arises in tabulating election results in sparsely-populated areas. Announcing that six people in the county voted for candidate A and two voted for candidate B is somehow different from announcing six thousand and two thousand. Granted, in a county with eight voters, lack of anonymity kind of goes with the territory.
So at what point do we start to feel that aggregate results are comfortably anonymous? There's almost certainly no single point, and I'm being deliberately vague with "comfortably anonymous", but it's an interesting question. Discussion of web communities often revolves around social networks, in which anonymity is specifically not such a concern, or "megacommunities", which are big enough that aggregates are almost certainly comfortably anonymous.
But even on the web, if you slice out a small enough niche, the question can come up. If I hear that forty anonymous members of the left-handed Dutch-speaking pigeon fancier's society are in favor of some proposition, I may not have too much trouble figuring out who that is.
The same situation arises in tabulating election results in sparsely-populated areas. Announcing that six people in the county voted for candidate A and two voted for candidate B is somehow different from announcing six thousand and two thousand. Granted, in a county with eight voters, lack of anonymity kind of goes with the territory.
So at what point do we start to feel that aggregate results are comfortably anonymous? There's almost certainly no single point, and I'm being deliberately vague with "comfortably anonymous", but it's an interesting question. Discussion of web communities often revolves around social networks, in which anonymity is specifically not such a concern, or "megacommunities", which are big enough that aggregates are almost certainly comfortably anonymous.
But even on the web, if you slice out a small enough niche, the question can come up. If I hear that forty anonymous members of the left-handed Dutch-speaking pigeon fancier's society are in favor of some proposition, I may not have too much trouble figuring out who that is.
Intelligence by default
One of the workhorses of GUI design is the radio button: a set of choices, of which you're allowed to pick exactly one. If you click on a choice that's not selected, the previously selected choice winks out and the new one lights up.
This device has been around for a long time, particularly in car radios (thence the name). The ones I remember as a kid were big honking things that made a satisfying clunk when you pushed them and mechanically wound the tuner dial to fairly near the station you wanted. Push another one and, sproing-chunk, the old one popped out and the dial moved again. Good for endless amusement.
As an aside, there doesn't seem to be much call for, say, an "exactly two" widget, though I do remember a Bruce Tognazzini article on how one might implement such a thing. "Exactly one" makes sense for a single-valued function, like the frequency of a radio tuner, and "pick all you like" makes sense in many contexts, but other restrictions are trickier. It's just as likely to be "if you pick A you can't pick B" or "pick up to N units worth".
Introducing a special widget for such cases would mean working out the expected behavior and making sure the user knows what to expect. For example, if it's "pick exactly two", which one goes away when you pick a new one? The top one? The oldest one?
Anyway, the useful thing about radio buttons in real radios is that you get to pick the stations yourself. This extends to a more general concept of "presets". Fiddle with the buttons and then store a the setup, retrievable at the push of a button. You can get back to that state whenever you want, without having to set everything up again, and you can use that state as a default baseline for further fiddling.
Presets go back at least to the development of the pipe organ and remain important in electronic music. It would be interesting to trace the history of the idea.
Working with an interface or web site that allows presets (or templates, or similar setting of defaults) is night-and-day more pleasant than working with the same interface without them. The benefit is partly from the programmability, but it doesn't matter so much whether I stored the settings or you did, so long as they're the right settings and I can get to them easily. It's not that an interface with presets seems smart, so much as one without them seeming dumb.
This device has been around for a long time, particularly in car radios (thence the name). The ones I remember as a kid were big honking things that made a satisfying clunk when you pushed them and mechanically wound the tuner dial to fairly near the station you wanted. Push another one and, sproing-chunk, the old one popped out and the dial moved again. Good for endless amusement.
As an aside, there doesn't seem to be much call for, say, an "exactly two" widget, though I do remember a Bruce Tognazzini article on how one might implement such a thing. "Exactly one" makes sense for a single-valued function, like the frequency of a radio tuner, and "pick all you like" makes sense in many contexts, but other restrictions are trickier. It's just as likely to be "if you pick A you can't pick B" or "pick up to N units worth".
Introducing a special widget for such cases would mean working out the expected behavior and making sure the user knows what to expect. For example, if it's "pick exactly two", which one goes away when you pick a new one? The top one? The oldest one?
Anyway, the useful thing about radio buttons in real radios is that you get to pick the stations yourself. This extends to a more general concept of "presets". Fiddle with the buttons and then store a the setup, retrievable at the push of a button. You can get back to that state whenever you want, without having to set everything up again, and you can use that state as a default baseline for further fiddling.
Presets go back at least to the development of the pipe organ and remain important in electronic music. It would be interesting to trace the history of the idea.
Working with an interface or web site that allows presets (or templates, or similar setting of defaults) is night-and-day more pleasant than working with the same interface without them. The benefit is partly from the programmability, but it doesn't matter so much whether I stored the settings or you did, so long as they're the right settings and I can get to them easily. It's not that an interface with presets seems smart, so much as one without them seeming dumb.
Friday, April 11, 2008
Sorry, the pizza server is down
A colleague asks "Since when do you need a network connection to make a pizza?"
I don't know the answer, but I appreciate the question. Apparently the problem wasn't that the folks at the national pizza chain in question couldn't make a pizza, it was that they couldn't sell it without their computer being able to read the barcode on the box and send the secrets therein back to the mothership, presumably so that Corporate would know how many slices of pepperoni needed to be accounted for.
So the next time someone wants to sell you an "integrated, web-enabled solution" to connect up all the outposts of your empire, you might want to ask "But what happens if the connection's down?" (and particularly, what if someone decides they want to try to take that connection down). A good vendor should have a good answer, but it's still a good question.
I don't know the answer, but I appreciate the question. Apparently the problem wasn't that the folks at the national pizza chain in question couldn't make a pizza, it was that they couldn't sell it without their computer being able to read the barcode on the box and send the secrets therein back to the mothership, presumably so that Corporate would know how many slices of pepperoni needed to be accounted for.
So the next time someone wants to sell you an "integrated, web-enabled solution" to connect up all the outposts of your empire, you might want to ask "But what happens if the connection's down?" (and particularly, what if someone decides they want to try to take that connection down). A good vendor should have a good answer, but it's still a good question.
Tuesday, April 8, 2008
Timing is something, at least
Back when nothing was connected to anything, if you needed to reset the clock on your computer -- say, to work around a "Y2K" bug, but surely never to thwart an overzealous licensing scheme -- you could just do it. At most you'd have to re-date some files when you were done. That's no longer the case.
If you're on the net, you pretty much have to have your clock set correctly. For one thing, most communication on the net is timestamped in one form or another. "Did you get my email?" "No. When did you send it?" If your clock is off by five minutes, people won't care, but five days or five months is a different matter, and servers are liable to drop suspiciously dated mail on the floor.
Timing is also important in security. If I send you my encrypted password, someone in the middle can grab a copy and try to send it later. The main way of dealing with this sort of replay attack is to require the message containing the password also to contain a nonce -- a bit of data that is different every time.
One way to do this is to send a random number when requesting a response: "Please send your message and the number ######### in encrypted form". Another way is to have the sender include its idea of the current time. If it's not reasonably close to the receiver's idea of the current time, the receiver rejects the message. This approach is particularly useful when protecting a series of messages, since it doesn't require continual requests and responses, but it will only work if the clocks are synchronized.
A variant of this is a keycard that generates a new security code every few seconds. When you log in with such a card, the server will reject any codes that are too old.
If phrases like "reasonably close" and "too old" give you the idea that time on the net is somewhat fuzzy, that's because it is. If you and I can only communicate through messages that take a small but non-zero time to reach their destinations, then there's no meaningful way to say "I did X at the same time you did Y." (Einstein had some things to say on similar topics, but let's not go there now)
How would we prove such an assertion? I could send you a message, timestamped by my clock, and you could do the same. We could also note the times at which we each received our messages. But what if, say, the relevant timestamps are identical, but my clock is really a bit fast of yours, or a bit slow? What if one message got slightly delayed by a transient network traffic jam? There's no way to know.
This can actually be a pain if, say, you are picking up file A from a remote server and creating a local file B from it. File A might change, so you want to make sure that you re-create file B whenever file A changes. A popular development tool, which shall remain nameless, assumes that file B needs to be rebuilt if file A has changed more recently than it has. Really, you want to rebuild B if A has changed since the last time you used it to build B. These are basically the same thing if everything is on the same host or if the hosts' clocks are tightly synchronized, but not if one clock is allowed to drift away from the other.
Fortunately, there are ways of ensuring that clocks in different computers are very likely to be in sync within a given tolerance (which depends on the latency of the system, and other factors). They involve measuring the transit time of messages among servers, or between a given server and "upstream" servers whose clocks we trust, as with NTP.
Time may be fuzzy on the net in one sense, but from a practical point of view it's not fuzzy at all. Without really trying that hard, I have now several accurate clocks at my disposal. The first one I got used the radio time signals broadcast from WWV in Fort Collins, Colorado. My cell phone gets periodic pings from the nearest tower, the towers being synchronized I-know-not-how. My cable box shows the current time according to the feed upstream. And every computer in the house keeps good time thanks to NTP.
I haven't checked rigorously, but none of them ever seems to be more than seconds off of the others. In theory, the radio clock and the computers should be within less than a second of each other. Under good conditions, NTP can maintain sync to within tens of milliseconds, or less time than most packets take to reach their destination over the internet (under ideal conditions, it can do better than a millisecond).
Except for the radio case, all this is by virtue of the clocks in question belonging to one or another network. Particular measurements and results on a network are fuzzy, but the aggregate can be quite robust.
If you're on the net, you pretty much have to have your clock set correctly. For one thing, most communication on the net is timestamped in one form or another. "Did you get my email?" "No. When did you send it?" If your clock is off by five minutes, people won't care, but five days or five months is a different matter, and servers are liable to drop suspiciously dated mail on the floor.
Timing is also important in security. If I send you my encrypted password, someone in the middle can grab a copy and try to send it later. The main way of dealing with this sort of replay attack is to require the message containing the password also to contain a nonce -- a bit of data that is different every time.
One way to do this is to send a random number when requesting a response: "Please send your message and the number ######### in encrypted form". Another way is to have the sender include its idea of the current time. If it's not reasonably close to the receiver's idea of the current time, the receiver rejects the message. This approach is particularly useful when protecting a series of messages, since it doesn't require continual requests and responses, but it will only work if the clocks are synchronized.
A variant of this is a keycard that generates a new security code every few seconds. When you log in with such a card, the server will reject any codes that are too old.
If phrases like "reasonably close" and "too old" give you the idea that time on the net is somewhat fuzzy, that's because it is. If you and I can only communicate through messages that take a small but non-zero time to reach their destinations, then there's no meaningful way to say "I did X at the same time you did Y." (Einstein had some things to say on similar topics, but let's not go there now)
How would we prove such an assertion? I could send you a message, timestamped by my clock, and you could do the same. We could also note the times at which we each received our messages. But what if, say, the relevant timestamps are identical, but my clock is really a bit fast of yours, or a bit slow? What if one message got slightly delayed by a transient network traffic jam? There's no way to know.
This can actually be a pain if, say, you are picking up file A from a remote server and creating a local file B from it. File A might change, so you want to make sure that you re-create file B whenever file A changes. A popular development tool, which shall remain nameless, assumes that file B needs to be rebuilt if file A has changed more recently than it has. Really, you want to rebuild B if A has changed since the last time you used it to build B. These are basically the same thing if everything is on the same host or if the hosts' clocks are tightly synchronized, but not if one clock is allowed to drift away from the other.
Fortunately, there are ways of ensuring that clocks in different computers are very likely to be in sync within a given tolerance (which depends on the latency of the system, and other factors). They involve measuring the transit time of messages among servers, or between a given server and "upstream" servers whose clocks we trust, as with NTP.
Time may be fuzzy on the net in one sense, but from a practical point of view it's not fuzzy at all. Without really trying that hard, I have now several accurate clocks at my disposal. The first one I got used the radio time signals broadcast from WWV in Fort Collins, Colorado. My cell phone gets periodic pings from the nearest tower, the towers being synchronized I-know-not-how. My cable box shows the current time according to the feed upstream. And every computer in the house keeps good time thanks to NTP.
I haven't checked rigorously, but none of them ever seems to be more than seconds off of the others. In theory, the radio clock and the computers should be within less than a second of each other. Under good conditions, NTP can maintain sync to within tens of milliseconds, or less time than most packets take to reach their destination over the internet (under ideal conditions, it can do better than a millisecond).
Except for the radio case, all this is by virtue of the clocks in question belonging to one or another network. Particular measurements and results on a network are fuzzy, but the aggregate can be quite robust.
Subscribe to:
Posts (Atom)