Monday, December 12, 2016

The topological sort

I can imagine two reactions to that title:
  • "Topological sort?"  What's that?  Which I aim to answer shortly.
  • "You really have run out of things to say about the web, haven't you?  Now you're just dragging out stuff from your CS classes."  To which I can confidently reply ... maybe.  But let's see where this goes.
Suppose I'm baking a cake.  Before I do anything, I need to buy ingredients.  Then I need to mix up the batter.  I'll also need to warm up the oven and grease the pan.  I'll probably want to make frosting.  After the oven is warm I can put the cake in and bake it, then let it cool and frost it.  Also I should clean up, so let's say that after I mix up the batter and bake the cake I do the dishes.

We can make this a little more formal.  I'll use a letter with an arrow to another letter to show that one task has to happen before the other.  So:
  • I → M (you have to have the ingredients before you mix up the batter)
  • M → B (you have to mix up the batter before you bake it, duh)
  • W → B (warm up the oven before baking)
  • G → B (grease before baking)
  • F → A (make frosting before, um, A is for applying the frosting)
  • B → C (bake before cooling because, um, that's how cooling works)
  • C → A (cool the cake before applying the frosting)
  • M → D (no point in doing the dishes until you're done mixing up the batter)
  • B → D (gotta clean up that cake pan as well)
We can put this all together into a little ASCII-art diagram (or I suppose I could draw a picture with boxes and arrows, but this is quicker):

     + -------+ -> D
I -> M -> |   |
     W -> |-> B -> C -> |-> A
          |             |
     G -> |        F -> |

This kind of diagram is generally called a dependency graph, since it shows what depends on what.  It leaves out some important things like how long it takes to do things, and for this example I've quietly required that you have to do all the dishes at once, but this ought to be enough for purposes of illustration.  Some key features of a diagram like this are:
  • The tasks are labeled, as opposed to just being featureless dots, so we can tell W from G and so forth.
  • A task might depend on more than one thing, as B depends on M, W and G.
  • A task might have more than one thing depending on it, as M has both B and D depending on it.
  • There are no cycles, where two tasks each depend on the other, whether directly or indirectly.
  • In particular this means that each line can only have an arrow on one end.  You can't have M depend on I and also have I depend on M.
A diagram like this, whether it's describing dependencies between tasks or something else, is called a directed acyclic graph or DAG.  Directed because the lines have a direction to them, and acyclic because there are no cycles, whether direct or indirect.

If you take any two tasks, they can't each depend on the other (that is, the rules say no cycles), so either
  • One depends on the other, whether directly or indirectly.  For example A depends on W, because W has to happen before B, which has to happen before C, which has to happen before A.  This can be true for more than one reason.  For example D depends on M directly, but also because D depends on B and B depends on M.
  • Neither depends on the other.  For example, you can mix up batter and grease the pan in either order.  You can start the oven warming before you buy ingredients (if you don't mind having the oven on while you're out shopping), or vice versa.
The mathematical name for this is, reasonably enough, a partial order.  Partial orders and DAGs are just two different ways of describing the same situation.

Now suppose there's only one of you in the kitchen and you can only do one task at once.1 You need to make this partial order into a total order, where you can always say which task comes next and as a consequence each task except the first has exactly one task immediately after it.  Here's how you do that:
  • Find a task that doesn't depend on any other task in the diagram
  • Write it down in the answer and remove it from the diagram
  • Do this until there are no tasks left.
In this example, you have your choice of I, W or G.  Once you remove I, M also doesn't depend on anything in the diagram so you have your choice of M, W or G.  Keep this up until there's nothing left and you might get I, M, G, W, B, D, F, C, A.  You might also get any of a number of different answers, but in every case you'll get something that you could actually do in a kitchen.  You'll never be asked to bake the cake before you have the ingredients or anything else impossible.

This is about as simple an algorithm as you can get.  The only tricky bit is keeping track of which tasks do and don't have other tasks before them, and in most cases that's not particularly hard either.  Flattening a partial order into a total order is a lot simpler than sorting a bunch of items that are totally ordered.  Books have been written about that.

OK, so what is this good for and what does it have to do with the web?  Applications for topological sorting crop up all over the place.  Planning tasks, like in the baking example.  Figuring out how to build a large application that uses libraries that depend on other libraries (web browser depends on HTTP depends on TCP depends on low-level networking plus a few hundred others).  Recalculating cells in a spreadsheet.   Finding the best route from point A to point B. Typesetting.

Typesetting?  Yep.  If you're trying to find the best places to break a paragraph into lines, which places you can break the second line depends on where you break the first line, and so forth.  That gives you a DAG of dependencies between lines and we're off to the races.  Here's a pretty good summary (if a bit technical) of several ways to solve the problem, including a topological sort.

Suppose you're trying to learn a new subject.  If I want to learn about Frobnitizian Mechanics (not actually a thing), I might need to know about Hamiltonians in physics (actually a thing).  To understand that, I need to know some multivariate calculus and Newton's laws.  Both of those depend on single-variable calculus, and so on down to basic algebra.  Where should I start?  Start with anything that doesn't require anything you don't already know.  Repeat.

You already knew that, of course, but it's nice to know that that's all you have to do.

Of course in the real world, particularly on the web, not everything breaks cleanly into a DAG.  There are plenty of cases where, say, page A points to page B points to page C points back to A and there's no obvious place to start.  How you handle a situation like this in computing depends on what exactly you're trying to do, but it can be handled.  In real life, you should probably just start reading because hey, you gotta start somewhere.

There's one other piece of information that comes out of a topological sort that can be just as important as what has to happen before what: what doesn't have to happen before what.  In the baking example, I, W and G are independent of each other and we can remove any of them from the dependency graph first.

But in many situations you don't have to remove just one.  If I had someone helping me, I could send them out for groceries while I got the pan and the oven ready.  Most likely in that case I'd still be done with that before they got back and I'd have to wait before I could go further, but I'll still get done sooner than if I had to make the grocery run myself.

There are two reasons.2 that computers can do so much more today than they could in the past.  The one everyone knows about is Moore's law.  Processors are faster, memory is bigger and cheaper and so forth.  The other is parallel processing.  If I can break a problem down into pieces that can run independently of each other, I can throw more of this faster cheaper hardware at the problem.  If the structure of the problem limits me to having one thing happening at a time, it doesn't matter how many processors I have.  A topological sort will show you when you can parallelize and when you can't.

1 Yes, in real life you could and probably should mix up batter while the oven is warming, but I'm not going to fix the diagram to reflect this.  You could do that by splitting W into W1 for "start the oven warming" and W2 for "the oven finishes warming up", talking about "events" instead of "tasks" and adding extra dependencies  W1 →  W2 and W2 → B.  Then you could have the order W1,M,W2 with only one thing happening at a time.  I don't think this really helps the discussion.

2 Well, there are more than two, really.  Every so often someone will find a significantly faster algorithm for a key problem, for example.  One could also point to improvements in software engineering.  I mean, you'd think we'd be better at this than we used to be, and I'm pretty sure we are. Nailing down exactly how that's true, with proper respect to the very good engineering that's been done in the past, is a trickier problem.  Quantifying processor speed, parallelism or algorithmic complexity is generally straightforward.

Tuesday, October 25, 2016

Names and namespaces

I'm not the only David Hull in the world.  I may not even be the only one with my exact name.  I've met a couple of other David Hulls.  There was also a prominent philosopher by the same name, not to mention musicians, athletes and others.  Mine is not a particularly common name, but, at least if we look and first and last names only, it's definitely not unique.  It would be interesting to know what portion of names in the world are unique.  There are plenty of Pat Smiths or Zhang Weis, for example but (as far as I know) only one Dweezil Zappa (born Ian Donald Calvin Euclid Zappa).

In daily life it's generally not a huge problem for two people to have the same name.  If there are two David Hulls working in the same office, one might be "Dave" and the other "David", or one might be "Dave in sales", or one might go by "Walrus" for whatever reason.  If we want to be more precise, we can always add more identifiers, such as middle name, birth date, place of birth and so forth.  It's very unlikely that (to make up an example), there was more than one Patricia Terpsichore Smith born on February 29th, 1940 in Saint Paul, Minnesota.

If you want to be really sure, you assign everyone a unique identifier, such as the Social Security number in the US (SSN for short).  In theory there should never be more than one person with a given social security number, regardless of their name, age or place of birth.  In practice, that doesn't really hold up.  There are actually millions of people with multiple SSNs and/or SSNs assigned to multiple people.  Leaving that aside, though, it's worth taking a closer look at how SSNs and identifiers like them are built.  I'll use US examples here since that's what I'm familiar with, but many other countries have similar schemes.

A Social Security number is split into three parts.  Up until 2011, these followed a specific pattern.  For example, LifeLock CEO Todd Davis's is 457-55-5462.  The 457 part is in the range 449-467, which is assigned to Texas.  The 55 means, in this case, that the card was issued in 1982.  Exactly which year the middle digits map to depends on the first three digits, presumably because not all three-digit prefixes are used every year.  The last four digits are issued in numerical order, so putting it all together, Davis would have been the 5462nd person to be issued a SSN starting with 457-55, and the card was issued in Texas in 1982.

This sort of scheme is not uncommon.  US phone numbers are built in a similar fashion.  The full history would be material for a separate post, but during the "long distance" era before the advent of cell phones, a US phone number consisted of a three-digit area code, having a middle digit of 0 or 1, followed by a three-digit exchange associated with a piece of equipment in a particular location, followed by a four-digit number.  For example, the White House phone number is (202) 456-1111.  The area code for Washington, D.C. is 202, 456 is one of the exchanges there and 1111 is easy to remember, easy to dial on old-fashioned rotary phones and hey, the White House can pick whatever number it likes.

Likewise, US Zone Improvement Plan codes (zip codes to most of us) uses the first digit to denote a particular region of the country, the second two to denote a particular area within that region (and typically a particular sorting and distribution center, and the last two a smaller area within that region.   Here's a nice illustration I mentioned in a previous post. The later ZIP+4 scheme takes that down to individual blocks, apartment buildings, large businesses and so forth.

The parts in schemes like this often nest.  Area codes comprise exchanges which comprise individual phones.  Zip code regions comprise distribution centers which comprise smaller areas.  Social security numbers are a bit different, in that the first five digits together denote a region and year associated with a block of individual numbers, but you can't really say that there are years within regions and vice versa.

People's names are also a bit borderline.  You could think of families comprising individuals, but the family name doesn't really correspond to biological families for a variety of reasons.  The approximately 22% of Koreans named Kim () are not a biological family (though there are some 384 clans with the Kim name).  It's not much more meaningful to talk of all the Hulls than it is to talk of all the Davids.

All of these schemes have special cases, for example:
  • SSNs never start with 000, while 700-728 were originally reserved for railroad employees
  • SSNs never start with 666, and this number does not even appear on the Social Security Administration's historical list.
  • Area code 800 (along with several others now) is reserved for toll-free services
  • Exchange 555 includes special numbers such as 555-1212 (directory assistance) and has blocks that are guaranteed not to be used for real phone numbers, which is why a US phone number you see in a movie almost always starts with 555.
  • Zip codes starting with 569 are reserved for the USPS Parcel Return Service.
  • In the US legal system, John Doe, Richard Roe and other names are used in various contexts for persons whose actual names are unknown or withheld.
It's easy to think of more of these identifiers made of parts, usually with special cases.  Credit card numbers.  Place names like Paris, Texas, USA as distinct from Paris, Kentucky, USA or Paris, Kiribati or Paris, France.  Three cases are of particular interest on the web:
  • domain names, which consist of parts nested from right to left (e.g.,
  • IP addresses, which (in version 4) consist of four parts (sort of) nested left to right (e.g.,
  • URLs, which consist of a protocol (e.g., http) an authority (e.g.,, a path (e.g, /blogger.g) a query (e.g., ?blogID=21299...) and a fragment (e.g., #overview/src=dashboard) again nested (basically) from left to right
Again there are special cases, such as, IP addresses starting with, e.g., 192.168. and the about: pseudo-protocol Chrome uses.

All of the naming schemes I've mentioned so far make some use of the idea of a namespace, that is, a context in which names are meant to be unique.  Within a given family, siblings typically share a family name but have distinct first names.  I say "typically" because there are plenty of exceptions in real life, ranging from ordinary blended families to George Foreman's five sons named George (who nonetheless appear to go by their own nicknames in daily life).

You could think of an area code as a sort of family name with exchanges as given names or, one level down, think of the exchange as a family composed of individual numbers.  Some of these analogies make more sense than others.  Sure, every SSN starting with 457 was assigned in Texas (if it was assigned before 2011), but there's no good way to get from the middle digits to a year without knowing the first three digits.  Real life is a bit messy.

Even so, schemes like this are a decent fit for the way we think, which should not come as a great surprise.  But this has its drawbacks.  Maybe you don't really want someone to know in what state and year you got your social security card.  Maybe you'd like to give out your phone number without giving away a reasonably good idea of where you live.

Besides the privacy implications, there are practical concerns.  In theory there are a billion possible SSN's, enough to keep up with the US population for a while yet.  In practice, not all numbers can be used.  If there are only 500 people with numbers starting with XXX-YY at the end of the year, the other 500 numbers starting with XXX-YY will go unassigned, and I'm sure there are other inefficiencies.  This is not unique to SSNs.  Any numerical scheme that allocates blocks of numbers will tend to leave some blocks unfilled.

For these and other reasons, many kinds of ID numbers are assigned in a single "flat" namespace, as SSNs are now.  One way to do this is with a serial number that's incremented with each new ID, but (again at least partly for privacy and security reasons), that's often not the case.  For example, Blogger gives this blog post and ID of 8084382145281586649.  The blog itself has an entirely different ID.  The two have nothing (obvious) to do with each other.  I certainly haven't written 8 quintillion posts for this blog, nor are there anywhere near that many posts in all of Blogger.  The previous post on this blog (from, um, just a little while ago), has ID 236347809273236220.

This way of using longish, apparently random strings of digits has a few often-useful properties:
  • Because the numbers are big enough, there is generally very little chance that the same ID number will be given out twice.  And by "very little" I mean "not liable to happen in our lifetimes" and sometimes much longer, not "eh ... this'll happen from time to time but don't sweat it".  As a rule of thumb, if there are N digits in an ID, the number of things you'd need to get a collision is an N/2 digit number.  If blog post IDs are 18 digits or so long, you'd need billions of posts before there was a significant chance of a collision, even if they're not explicitly checking whether a supposedly new ID has already been used.  Generally, "universal unique ID" (UUID) schemes use a lot more than 18 digits, making the chances of collision ridiculously small.
  • Almost all UUID schemes use some sort of secure hash.  This means that, generally speaking, changing even one bit of the input will change about half of the bits of the ID.  This and other properties make it, as far as anyone currently knows, infeasible to learn anything about the thing being assigned the ID from the thing itself.  For example, the IDs of the two posts give no clue that they identify adjacent posts in the same blog, much less what's in them.  The URLs given to the posts, in contrast, make an effort to provide at least some useful information (e.g.,  But that's fine.  As long as you have a unique ID you know exactly which item you're dealing with and you can give it any kind of friendly name you like.
You can still have namespaces of a sort with a hashing scheme.  If I form my IDs by hashing the string "fieldnotesontheweb" followed by the title of the post and you use "myawesomeblog" instead of "fieldnotesontheweb", there is pretty much no chance we'll every use the same ID, even if we happen to pick the same post title.  This gives the same kind of uniqueness as the "given names within a group of siblings" model.  You just can't tell from the IDs.

It's not uncommon for a naming scheme to evolve from a hierarchical structure, like SSNs before 2011, to a flat structure, like SSNs from 2011 onwards.  Given that, there's a good argument to be made that you should just start with a big, flat namespace and save the headache of conversion.

Friday, May 6, 2016

A couple of updates on Satoshi

In the previous post I said it seemed "really odd" that Wright hadn't publicly demonstrated he had Satoshi's private keys by signing a message1.  Wright has since said that he lacks the "courage" to do this, and at least one of the Bitcoin experts who had previously said that Wright was certainly Satoshi has since walked that back.

The simplest explanation here is that Wright doesn't have the key.

Re-reading that post, I see I commented to the effect that, while Bitcoin is often described as a means of anonymously transferring money, that wasn't necessarily so.  I said I'd get back to that, but didn't really, so here's a bit more:

Bitcoin's blockchain very publicly ties a given transaction to a timestamp and the keys of the sender and recipient (and probably other stuff I'm leaving out).  That's quite a bit of information to mine, even though the keys don't have real-world identities attached to them.  For example:

  • If you see a series of transactions with the same sender and recipient over time, you can assume they're doing business.
  • If you know that someone ordered a doomsday device online for one million dollars, and you see exactly one Bitcoin transaction for $1,000,000 in a reasonable time period around the purchase, that's a pretty good clue that that wallet is associated with the sale of a doomsday device.

The doomsday device is a contrived example, of course.  A real evil genius wouldn't be so obvious (though a henchman might ...)  The buyer and seller could, say, break the transaction down into unremarkably-sized pieces and use separate wallets for each part.  Nonetheless, the cardinal rule of anonymity on the web is that you've probably given away more than you think you have, if your adversary is really looking.  There are protocols built on top of Bitcoin to mitigate this, but Bitcoin itself makes no guarantee.

Another weak point is that at some point you need to get reserve currency in and out of a the system, unless you really, really believe in Bitcoin as a currency in its own right.  The easiest way to do this is via an exchange, which ties your transactions to a particular bank account.  It would not be wise to assume that these accounts can be kept perfectly anonymous.   Also, if there are relatively few people converting Bitcoin to and from reserve currency, the fact that everyone else has to go through them will leave its mark on the transactions in the blockchain one way or another.  Again, I'm sure there are ways to mitigate this, but they're not built into Bitcoin.

But then if you're buying and selling doomsday devices online, you really don't need my advice.

1In Bitcoin circles the preferred method would be to transfer some small amount of Bitcoin from Satoshi's wallet.  This process includes producing a digital signature using Satoshi's key.

Tuesday, May 3, 2016

On the interwebs, no one knows you're Satoshi Nakamoto

Several major news outlets have recently announced that Australian Craig Wright has come forward as the person behind the pseudonym Satoshi Nakamoto, inventor of the Bitcoin protocols.

I have no particular reason to doubt this, but more on that in a bit.  First, an update on Bitcoin.

In previous posts I've expressed skepticism that Bitcoin, considered as a currency, would come to supplant reserve currencies such as the Dollar, Pound or Yen.  I think that case is made by now.  The main points are
  • The current Bitcoin money supply is a fraction of a percent of the dollar money supply
  • The dollar value of Bitcoins actually spent in a day is an even smaller percentage of the number of dollars spent in a day
  • While several major retailers have announced that you can "pay in Bitcoin", this generally means they're partnering with someone who converts Bitcoin to and from a reserve currency.
  • The merchants themselves quote prices in a reserve currency and take payments in that currency.
  • Bitcoin's exchange rate with reserve currencies is much, much more volatile than reserve currencies' exchange rates with each other
  • As a result, there is little evidence that anyone holds on to Bitcoin except as a speculation
  • All of this has been the case for years now
That doesn't mean that the whole exercise has been a waste of time or, for that matter, that a few people haven't made large piles of (reserve currency) money from it.  Bitcoin may not be a viable currency, but it's interesting considered as a payment method.  In particular, the "blockchain" used to provide a public hard-to-forge record of transactions, has seen interest from a number of players, including existing banks.

It's worth noting that the blockchain doesn't provide the anonymity that Bitcoin is rightly or wrongly known for.  After all, by design it's a public record.  But before we get to that, let's do a quick review of public key cryptography with the caveat that, although I've studied cryptography with some care, I'm far from a cryptography expert.

Public key cryptography uses a publicly known algorithm and two keys: a public key that everyone can see and a private key that only you should see.  In the usual version, RSA, the two keys are "inverses" of each other.  If you take a message and use one key to apply the algorithm, you get gibberish.  If you apply the other key to the result, you get the original message back.  This means you can use the system in two ways:
  • Encryption:  If I run a message through the algorithm using your public key then (as far as we know), only you (or at least, only someone in possession of your private key) can turn the resulting gibberish back into the original message.
  • Signing: If you run a message through the algorithm with your private key to get gibberish, then anyone with the public key can recover the original message but (as far as we know) only you (or at least, only someone in possession of your private key) could have produced the gibberish that you sent out.
Typically these are used together, along with a cryptographic hash function that can take any message and boil it down to a largish number -- a few hundred digits -- in such a way that (as far as we know), no one could have produced a fake message that boils down to that same number.  Putting it all together, if I want to send you a message securely, and assure you that I actually wrote that message:
  • I use the hash function to boil my message down to a largish number, called the hash value, that (as far as we know) could only have been produced from my message.
  • I use my private key to turn that number into a different largish number, that (as far as we know) only I (or ... you get the picture) could have produced and (as far as we know) could only have been produced from my message.  I add this "signature" to the bottom of my message.
  • Finally, I use your public key to turn the whole thing1 into gibberish that (as far as we know) only you can decrypt, using your private key.
  • After you do this, you use my public key to turn the signature back into the original hash value, and you use the cryptographic hash function to verify that the message hashes to that same value.
What do I mean "as far as we know"?

All of these algorithms and protocols are widely published and widely studied by everyone from undergrads to renowned experts to the folks at No Such Agency and its counterparts.  This is a key tenet of modern cryptography.  Only private keys are private.  Everything else is public.  "Security through obscurity" is frowned upon.  While it is possible that someone has cracked one or more pieces of this system, it's unlikely for several reasons:
  • The same entities that would be cracking the system are using it for highly sensitive information (they might also be using some secret system for really really sensitive information, but the information known to be secured using the widely-known systems is pretty sensitive -- military communications, money transfers ...).  If you know your system can be cracked you have to assume your enemy could figure that out as well.
  • There are known ways to test whether your crypto has been hacked, mainly noting carefully who acts on what secret information (whether real information or carefully selected false information).  Military history has several famous cases of battles deliberately lost to avoid giving away that the enemy's code had been cracked.  Eventually, though, that information will be put to use.  Not long after that, the jig is up.
  • No academic researcher has published a significant crack of the current "best practice".  By contrast, several no-longer-widely-used systems have been cracked in the literature (MD5, anyone?).  Cracking a widely-used system is a pretty good way to advance on the tenure track and/or line up a lucrative job and/or develop serious street cred.
If someone has cracked the current public-key encryption infrastructure, they're using that information very carefully.

OK, how about my other disclaimer, "or someone in possession of your private key".  Key management is where we separate the pros from the amateurs -- and even the pros get it wrong way more often than they'd like.

Even if you're using some sort of uncrackable-in-principle quantum cryptography, at some point you're going to want to actually read a message encrypted to you and put your signature on something you're sending to someone else, and to do that you need some sort of key.  With current technology, typically that key is a file that's been encrypted using a passphrase.  If I want to decrypt or sign something, I put in the passphrase and the crypto software decrypts my private key to use for the real task at hand.

There are several weak points here.  An attacker could steal my key file and guess my passphrase.  An attacker could install a hacked version of the crypto software that sends the decrypted private key to the attacker, or encrypts the message with the attacker's public key along with the real recipient's.  An attacker could install a hacked version of the crypto software that uses a not-so-random number generator in places where we expect an unguessable random number to be used (see the footnote below for more on how those last two would work).

This is just off the top of my head.  Real experts spend a lot of time coming up with more sophisticated attacks and ways to prevent them.  The point is that no matter how strong the actual encryption algorithm, there needs to be a key of some sort, and there are plenty of ways to steal or tamper with the keys.

So if someone comes forward claiming ownership of a private key that was created years ago and used in a very visible way and that could be worth hundreds of millions of dollars, you may want to be a little careful before taking such a claim at face value.

Apparently, the only publicly available cryptographic evidence of the claim that Craig Wright is Satoshi Nakamoto is a digital signature that could have been copied from the publicly available blockchain.  This seems really odd.  You don't prove you have a private key by showing something already signed with that key.  You prove it by taking some new message of someone else's choosing, signing that and having the signature checked, ideally by yet a third person.  And anyone claiming to be Satoshi ought to know that.

On the other hand, reliable resources report that Wright has done essentially that using a different key that only Satoshi should have.  Additionally, Bitcoin experts who have met Wright claim he's the real deal -- he acts like Satoshi, he has the technical knowledge that Satoshi would have, he knows historical details that only Satoshi is likely to know, and so forth.

This is not to say that Craig Wright is lying or otherwise up to no good.  I have no reason at all to believe that or imply that.  My point is that that it would be a mistake to judge his or anyone's claim to be Satoshi Nakamoto based on cryptographic evidence alone.

So ... who knows?  At this point, we will probably never know for sure who Satoshi Nakamoto is or was, or even whether it was only one person.  Which seems somehow appropriate.

Postscript: The private key published by Craig Wright was used in an early block -- block 9 -- in the Bitcoin blockchain, in which a quantity of Bitcoin was transferred to the late Hal Finney.  As it happens, Hal Finney commented on a post on this blog.  The post was on anonymity, but his comment led me to his blog, where I saw a bunch of interesting stuff on digital rights management (DRM).   Naturally, I posted a reply to that, rather than his actual comment.  You can see the whole exchange here.

1In practice, you don't use the public key system directly for large messages.  Rather, you use the public key system to encrypt a randomly generated key for a faster, non public-key system that is then used to encrypt the actual message.  Besides being faster, this also allows you to send a message to multiple recipients without having to generate a separate encrypted version of the whole message for each recipient key.  Instead, you encrypt the randomly generated key for each recipient.  I trust you can see why I left all this out of the main post.

Saturday, January 16, 2016

On the responsibility of "flash mobs"

Hmm ... where did I put that big honking I AM NOT A LAWYER disclaimer?  Ah ... here it is.

OK ... where was I?

In an old post about a pillow fight that got out of hand, I speculated about the responsibilities of flash mobs.  One point that the original post only mentions in passing is that the pillow fight was only a flash mob activity by the loosest of interpretations.  It was, after all, already an annual event, pretty much the opposite of a spontaneous occurrence.

So why call it a "flash mob"?  While the event was scheduled for a definite place and time (Valentine's Day in Justin Herman Plaza) and now even has a Facebook page, the event itself is open to anyone who happens to show up.  There are no tickets and there is no official organizer or organizing body.  If you show up with a pillow on Valentine's day and start swinging, you're in.  Otherwise you're not.

Leaving aside some interesting questions of identity and language usage for the other blog, it seems that the key point here is that people sometimes gather unofficially to do things, they've been doing that forever, and, most important, being unofficial does not absolve anyone of responsibility.  If I get together with ten close friends and twenty people they invited and 35 people those people invited, it doesn't matter whether we did this over the web, or whether I know everyone there.  It matters what we do.

If we decide to go clean up a city park, good for us.  If we decide to trash the same park, we're responsible for that instead.  Which is why the original headline, "S.F. may crack down on 'flash mob' antics" misses the point.  As the article itself made clear, the city had a particular case of how to deal with a not-officially-sanctioned group of people making a mess.  Nothing particularly flash-mobby or webby about it.

Sunday, January 3, 2016

Print ... yet again still not dead

A few years ago I speculated on what it might mean that was getting a good share of its revenue from its local print edition in DC -- evidently it was the print revenue that was keeping the magazine afloat.  Politico is still around, including the print edition.  I don't know whether the print edition is still critical to the operation, but it says something that it's still around years later.

There's an even more blatant example, though, one that I don't recall noticing earlier even though it's been around since 2005: WebMD has a print edition.  I've seen it in a couple of doctor's offices in the past few months.  I'm sure I've seen it many times before and just not registered it.

It makes perfect sense, of course.  Patients waiting in doctor's offices are the classic captive audience.  Even today, when people are likely to have smartphones and/or tablets to read from -- or could just bring a book like in the olden days -- it's clearly still worth it to have a pile of paper around to browse through.  A medical magazine aimed at the general reader makes perfect sense.  If you're into sports or celebrity gossip you've probably already read the stories in those months-old magazines, but chances are you haven't browsed through WebMD, no matter how old it is.  Being in a doctor's office, you might well be in the mood to.

As with Politico, it's particularly interesting that a primarily web-based outlet -- you can't get much webbier than WebMD -- is choosing to publish a print edition, and sticking with that decision for years at a time.