Putting together these recent posts, and posts on the other web, I notice I'm much more casual about tagging. I can't bring myself to stop altogether. A post without tags seems somehow incomplete. But every time I add a tag I find myself asking "Why am I doing this?"
For years and years it's been possible to add "site:fieldnotesontheweb.com" to a search and find whatever you want on this blog (or likewise any other), whether I've tagged it or not. The difference, if any, is more a matter of curation.
Donald Knuth, in putting together The Art of Computer Programming, made a great effort to put together a complete index, partly out of frustration with the textbooks he'd had to read as an undergrad. To him, this wasn't just a matter of searching for all occurrences of a given term (which was possible since the text of TOACP was in digital form), or dumping out a concordance of terms by page. Context mattered. The index entry for C. A. R. Hoare might include pages mentioning quicksort, even if Hoare's name doesn't appear on those pages, for example.
I think tags on a blog fill a similar purpose. If you click on the link for a tag, you'd expect to see posts on that particular topic, regardless of the exact words. The link for annoyances on this blog includes several annoying things, whether or not I happened to include the word annoy or its forms in the posts. Machines are getting better at this sort of inference, but they're not great yet.
I think that's a good theory, anyway, and I think human curation is still useful. On the other hand, I don't really have time to post on this blog, much less read through it and fix up tags. I've done some re-reading, but I've only really been through a couple hundred posts, and then only fixing typos and adding the occasional note or update. So what you get here is hit or miss. Not so much a careful taxonomy as a record of whatever I happened to be thinking at the time.
If I had time, I would probably trim the set of tags down significantly, particularly getting rid of tags that are completely redundant with search results, and probably consolidating a few similar tags down to one canonical choice. But not today, and not any time soon. If the tags as they stand make for more interesting browsing, great.
(By the way, I'm not particularly proud that annoyances is currently the most populated tag on this blog)
Showing posts with label tagging. Show all posts
Showing posts with label tagging. Show all posts
Wednesday, August 23, 2017
Thursday, January 27, 2011
The no-tag tag
I recently ran across a blog with a tag I don't think I'd seen before: "No particular tag"
What's the point? Well, for one thing it gives you an easy way to bring up all the posts that don't have any other tag, and which otherwise couldn't be reached at all through the tag list.
This distinction between nothing and a label for nothing comes up again and again: The empty set vs. no set at all; a null value vs. an empty string or other collection; Odysseus getting Polyphemus to say that "Noman" was attacking him ...
It's a double-edged sword. It's certainly useful, probably even necessary, to have a something-that-stands-for-nothing, but it can also cause no end of confusion. Any number of bugs come down to losing track of the distinction between no value and an empty value.
It's a neat idea, adding a tag for no tag, but I'm not sure how much demand there is for it. If there were much, I'd expect to see more of it. But perhaps I should leave the definitive statement to the experts:
What's the point? Well, for one thing it gives you an easy way to bring up all the posts that don't have any other tag, and which otherwise couldn't be reached at all through the tag list.
This distinction between nothing and a label for nothing comes up again and again: The empty set vs. no set at all; a null value vs. an empty string or other collection; Odysseus getting Polyphemus to say that "Noman" was attacking him ...
It's a double-edged sword. It's certainly useful, probably even necessary, to have a something-that-stands-for-nothing, but it can also cause no end of confusion. Any number of bugs come down to losing track of the distinction between no value and an empty value.
It's a neat idea, adding a tag for no tag, but I'm not sure how much demand there is for it. If there were much, I'd expect to see more of it. But perhaps I should leave the definitive statement to the experts:
Everybody knows that more wars have been won with a shovel than a sword. Give a man a hole and what does he have? Nothing, but give a man a shovel and he can dig a hole to contain the nothing.
Sunday, April 27, 2008
Non-intrusion and the web
In responding to a comment on a previous post, I mentioned that there are ways to refer precisely to a piece of something on the web, even if that something isn't specifically set up for it. It's easier and more robust if the document you're linking to has fine-grained markup, but it's not impossible to say "so many characters into document X" or "right after the words 'Chapter II'" or something similar. At worst, with a little scripting (I wave my hands here) you could pull up the right passage more or less automagically on a browser and no one need know the difference.
There is a general principle at work here. Changes to any given system may be intrusive or non-intrusive. An intrusive change requires changes to what's already there. A non-intrusive one doesn't.
If I wanted to compile an annotated list of terms people use for colors, I could use an approach like the one above and get a list that might get out of sync if someone updates a document I'm pointing at, but which should work well enough in practice. If I tire of having to keep my hand-crafted links in sync, I could try to get some particualr tag inserted at any mention of a color. That would make my job a lot easier, and the people maintaining documents in my list could edit to their hearts' content so long as the tags got maintained properly.
This is the basic tradeoff. An intrusive change requires cooperation from all involved, but can result in a more cohesive system. A non-intrusive change expects less, but possibly at the cost of cohesion. The architectural sweet spot is a resource that can provide convenient handles for anything useful it provides, without knowing or caring who's using it or what it's being used for. This is often easier said than done.
The web is a big place. The chances of convincing everyone else to change for the sake of your change diminish sharply the more people you have to convince. If it's more than just a few close friends, you generally need a really good reason. "You want me to re-do my page so you can track color terms I use in it? Um, no."
That's not to say you can't offer a good reason for making a change, while still offering something of lesser value in case the change doesn't happen. People do that all the time, and it often works. There's a big difference between offering and demanding.
It's not surprising that non-intrusion is a fundamental part of what makes the web work, and what makes the web the web. Off the top of my head:
Lexicographical note: Backward-compatible is more specific than non-intrusive, but otherwise very closely related. A backward-compatible change is generally within a given system: The new one will do everything the old one will and more. Non-intrusion is more about being able to add an entirely new piece without anyone else even having to know about it. I'll generally know that I've moved up from version 6.1 to version 6.2, backward compatibly, but I may not know that, say, my web page is being mashed up with someone else's in ways I never anticipated, non-intrusively.
There is a general principle at work here. Changes to any given system may be intrusive or non-intrusive. An intrusive change requires changes to what's already there. A non-intrusive one doesn't.
If I wanted to compile an annotated list of terms people use for colors, I could use an approach like the one above and get a list that might get out of sync if someone updates a document I'm pointing at, but which should work well enough in practice. If I tire of having to keep my hand-crafted links in sync, I could try to get some particualr tag
This is the basic tradeoff. An intrusive change requires cooperation from all involved, but can result in a more cohesive system. A non-intrusive change expects less, but possibly at the cost of cohesion. The architectural sweet spot is a resource that can provide convenient handles for anything useful it provides, without knowing or caring who's using it or what it's being used for. This is often easier said than done.
The web is a big place. The chances of convincing everyone else to change for the sake of your change diminish sharply the more people you have to convince. If it's more than just a few close friends, you generally need a really good reason. "You want me to re-do my page so you can track color terms I use in it? Um, no."
That's not to say you can't offer a good reason for making a change, while still offering something of lesser value in case the change doesn't happen. People do that all the time, and it often works. There's a big difference between offering and demanding.
It's not surprising that non-intrusion is a fundamental part of what makes the web work, and what makes the web the web. Off the top of my head:
- Once you've given something a URL, anyone can reference it in any way they want. You don't have to give out a new URL, or do anything at all, whenever someone wants to reference your resource.
- Search engines index huge masses of documents by content without those documents having to do anything special at all (if you want to, you can add "meta" tags to help this along, but you don't have to).
- More fundamentally, there's no requirement that a resource even exist for a given URL. Obviously a page with lots of broken links for no reason is not a very useful page, but you don't have to have to finish everything before publishing anything. And a good thing, else nothing would ever get published. Wikis are a famous example of this principle in action.
- Tagging services like del.icio.us attach metadata to pages without changing the pages themselves.
- Mashups are all about re-using resources in new and unanticipated ways.
Lexicographical note: Backward-compatible is more specific than non-intrusive, but otherwise very closely related. A backward-compatible change is generally within a given system: The new one will do everything the old one will and more. Non-intrusion is more about being able to add an entirely new piece without anyone else even having to know about it. I'll generally know that I've moved up from version 6.1 to version 6.2, backward compatibly, but I may not know that, say, my web page is being mashed up with someone else's in ways I never anticipated, non-intrusively.
Labels:
del.icio.us,
imperfection,
metadata,
tagging,
Web 2.0
Thursday, April 17, 2008
Now that I've tagged it, how do I find it?
I try to assign a few relevant tags to every post, as is common practice. Blogger's tagging facility (along with lots of others, I expect) will suggest tags you've previously used. For example, if I type "bl" into the space, up comes a list containing blacksmithing, BLOBs, blogger, blogs and Eubie Blake. Hmm ... I'd forgotten I'd used "blogger" as a tag before. Curiously, J and only J, pulls up only personal names. I've been known on occasion to run through the entire alphabet to make sure I haven't missed anything.
Conversely, if nothing pops up for a tag, you know you haven't used it. That happened for DMCA on the previous post. That's funny. I'm sure I've mentioned the DMCA before. Aha, there are a couple. All I have to do is search. A couple of quick edits and they're tagged now, too.
Now, if anyone could find those by searching, why add the tag? Indeed, I almost didn't add the tag, because those posts don't happen to be about DMCA. They mostly just mention it in passing. I opted to tag those because the passing mention was apropos of what DMCA is. On the other hand, this post contains the letters "DMCA" but has nothing to do with it.
Likewise, there are a couple of posts, like this one or maybe even this one, that don't mention DMCA specifically, but might be relevant. On review, I'm not tagging those DMCA either. They're both tagged "copyrights" now.
While reviewing that I ran across another post that should have been tagged "copyrights" but wasn't. That's a recurring problem with tagging. Which tags should I use here? Which tags did I put that article under? Did I think to put it under "copyrights"?
All of this leads me to a few general points on tagging:
Conversely, if nothing pops up for a tag, you know you haven't used it. That happened for DMCA on the previous post. That's funny. I'm sure I've mentioned the DMCA before. Aha, there are a couple. All I have to do is search. A couple of quick edits and they're tagged now, too.
Now, if anyone could find those by searching, why add the tag? Indeed, I almost didn't add the tag, because those posts don't happen to be about DMCA. They mostly just mention it in passing. I opted to tag those because the passing mention was apropos of what DMCA is. On the other hand, this post contains the letters "DMCA" but has nothing to do with it.
Likewise, there are a couple of posts, like this one or maybe even this one, that don't mention DMCA specifically, but might be relevant. On review, I'm not tagging those DMCA either. They're both tagged "copyrights" now.
While reviewing that I ran across another post that should have been tagged "copyrights" but wasn't. That's a recurring problem with tagging. Which tags should I use here? Which tags did I put that article under? Did I think to put it under "copyrights"?
All of this leads me to a few general points on tagging:
- It's not particularly new. It's much the same as traditional indexing. In web form, it goes back at least to categories on wikis.
- It's time consuming and subjective. Care and feeding of categories is an important part of wiki gardening, for example.
- It's most useful exactly where more automated tools don't work. If you want to find posts here that mention DMCA, just search. "DMCA" is probably not a particularly useful tag. "Annoyances" and "Neat hacks" make better use of the tool.
- Likewise, tools like sub-categories or other schemes for tags to aggregate other tags, though useful, aren't foolproof.
- On the other hand, tags are still nice for browsing, particularly the more abstract ones.
Wednesday, August 29, 2007
Tag everything!
How many things can I tag?
- I can tag my email
- I can tag pictures on the web with flickr (and others)
- I can tag pictures on my local disk (and on the web) with Picasa
- I can tag web sites on deli.cio.us (and others)
- I can tag and rate songs with my favorite media player
- I can rate movies and books through Amazon, Netflix, Blockbuster etc.
- I can tag files on disk, to varying extents, depending on my operating system
- I can tag entries in my calendar, depending on my calendar app
Subscribe to:
Posts (Atom)
