Wednesday, August 23, 2017

Tags and finding things

Putting together these recent posts, and posts on the other web, I notice I'm much more casual about tagging.  I can't bring myself to stop altogether.  A post without tags seems somehow incomplete.  But every time I add a tag I find myself asking "Why am I doing this?"

For years and years it's been possible to add "site:fieldnotesontheweb.com" to a search and find whatever you want on this blog (or likewise any other), whether I've tagged it or not.  The difference, if any, is more a matter of curation.

Donald Knuth, in putting together The Art of Computer Programming, made a great effort to put together a complete index, partly out of frustration with the textbooks he'd had to read as an undergrad.  To him, this wasn't just a matter of searching for all occurrences of a given term (which was possible since the text of TOACP was in digital form), or dumping out a concordance of terms by page.  Context mattered.  The index entry for C. A. R. Hoare might include pages mentioning quicksort, even if Hoare's name doesn't appear on those pages, for example.

I think tags on a blog fill a similar purpose.  If you click on the link for a tag, you'd expect to see posts on that particular topic, regardless of the exact words.  The link for annoyances on this blog includes several annoying things, whether or not I happened to include the word annoy or its forms in the posts.  Machines are getting better at this sort of inference, but they're not great yet.

I think that's a good theory, anyway, and I think human curation is still useful.  On the other hand, I don't really have time to post on this blog, much less read through it and fix up tags.  I've done some re-reading, but I've only really been through a couple hundred posts, and then only fixing typos and adding the occasional note or update.  So what you get here is hit or miss.  Not so much a careful taxonomy as a record of whatever I happened to be thinking at the time.

If I had time, I would probably trim the set of tags down significantly, particularly getting rid of tags that are completely redundant with search results, and probably consolidating a few similar tags down to one canonical choice.  But not today, and not any time soon.  If the tags as they stand make for more interesting browsing, great.

(By the way, I'm not particularly proud that annoyances is currently the most populated tag on this blog)

1 comment:

earl said...

End of 1st line, "web" should be "blog."