Monday, February 5, 2024

Do what I say, not what I mean

While watching a miniseries on ancient history, I got to wondering how quickly people could move around in those days.  The scriptwriters mostly glossed over this, except when it was important to the overall picture, which seems fine, but it still seemed odd to see someone back in their capital city discussing a battle they'd taken part in a thousand kilometers away as though it had happened yesterday.

So I did a search for "How far can a horse travel in a day?".  The answer was on the order of 40 kilometers for most horses, and closer to 150 for specially-bred endurance horses.  That would make it about a week to cover 1000km, assuming conditions were good, except that a horse, even a specially-bred one, needs to rest.

What if you could set up a relay and change horses, say, every hour?  At this point we're well off into speculation, and it's probably best to go to historical sources and see how long it actually took, or just keep in mind that it probably took a small number of weeks to cross that kind of distance and leave it at that.  But speculation is fun, so I searched for "How far can a horse travel in an hour?"

It may not surprise you that I didn't get the answer I was looking for, at least not without digging, but I did get answers to a different question: What is the top speed of a horse in km/hr? (full disclosure, I actually got an answer in miles per hour, because US, but I try to write for a broader audience here).  How fast a person or animal can sprint is not the same as how far can the same person or animal go in an hour.

This seems to be the pattern now that we have LLMs involved in web search.  I don't know what the actual algorithms are (and couldn't tell you if I did), but it seems very much like:

  • Look at the query and create a model of what the user really wants, based on a Large Language Model (LLM)
  • Do text-based searches based on that model
  • Aggregate the results according to the model
It's not hard to see how an approach like this would (in some sense) infer that I'm asking "How many kilometers per hour can a horse run?", which is very similar in form to the original question, even though it's not the same question at all.  There are probably lots of examples in the training data of asking how fast something can go in some unit per hour and not very many of asking how far something can go in an hour.  My guess is that this goes on at both ends: the search is influenced by an LLM-driven estimate of what you're likely to be asking, and the results are prioritized by the same model's estimate of what kind of answers you want.

It's reasonable that questions like "How fast can a horse go?" or even "How fast is a horse?" would be treated the same as "How many km/hr can a horse run?".  That's good to the extent that it makes the system more flexible and easier to communicate with in natural language.  The problem is that the model doesn't seem good enough to realize that "How far can a horse travel in an hour?" is a distinct question and not just another way to phrase the more common question of a horse's top speed at a sprint.

I wish I could say that this was a one-off occurrence, but it doesn't seem to be.  Search-with-LLM's estimate of what you're asking for is driven by the LLM, which doesn't really understand anything, because it's an LLM.  It's just going off of what-tends-to-be-associated-with-what.  LLMs are great at recognizing overall patterns, but not so good at fine distinctions.  On the question side, "How far in an hour?" associates well with "How fast?" and on the answer side, "in an hour" associates strongly with "per hour," and there you go.

That's great if you're looking for a likely answer to a likely question, but it's actively in the way if you're asking a much-less-likely question that happens to closely resemble a likely question, which is something I seem to be doing a lot of lately.  This doesn't just apply to one company's particular search engine.  I've seen the same failure to catch subtle but important distinctions with AI-enhanced interfaces across the board.

Before all this happened, I had pretty good luck fine-tuning queries to pick up the distinctions I was trying to make.  This doesn't seem to work as well in a world where the AI will notice that your new carefully-reworded query looks a lot like your previous not-so-carefully-worded query, or maybe more accurately, it maps to something in the same neighborhood as whatever the original query mapped to, despite your careful changes.

Again, I'm probably wrong on the details of how things actually work, but there's no mystery about what the underlying technology is: a machine learning (ML) model based on networks with backpropagation.  This variety of ML is good at finding patterns and similarities, in a particular mathematical sense, which is why there are plenty of specialized models finding useful results in areas like chemistry, medicine and astronomy by picking out patterns that humans miss.

But these MLs aren't even trying to form an explicit model of what any of it means, and the results I'm seeing from dealing with LLM-enhanced systems are consistent with that.  There's a deeper philosophical question of to what extent "understanding" is purely formal, that is, can be obtained by looking only at how formal objects like segments of text relate to each other, but for my money the empirical answer is "not to any significant extent, at least not with this kind of processing".


Back in the olden days, "Do What I Mean", DWIM for short, was shorthand for any ability for a system to catch minor errors like spelling mistakes and infer what you were actually trying to do.  For example, the UNIX/GNU/Linux family of command-line tools includes a command ls (list files) and a command less (show text a page at a time, with a couple of other conveniences).  If you type les, you'll get an error, because that's not a command, and nothing will ask you, or try to figure out from context, if you meant ls or less.

A DWIM capability would help you figure that out.  In practice, this generally ended up as error messages with a "Did you mean ...?" based on what valid possibilities were close in spelling to what you typed.  These are still around, of course, because they're useful enough to keep around, crude though they are.

There are now coding aids that will suggest corrections to compiler errors and offer to add pieces of code based on context.  In my experience, these are a mixed bag.  They work great in some contexts, but they are also good at suggesting plausible-but-wrong code, sometimes so plausible that you don't realize it's wrong until after you've tried it in a larger context, at which point you get to go back and undo it.

There's always been a tension between the literal way that computers operate and the much less literal way human brains think.  For a computer, each instruction means exactly the same thing each time it executes and each bit pattern in memory stays exactly the same until it's explicitly changed (rare random failures due to cosmic rays and such can and do happen, but that doesn't really affect the argument here).  This carries over into the way things like computer languages are defined.  A while loop always executes the code in its body as long as its condition is true, ls always means "list files" and so forth.

Human brains deal in similarities and approximations.  The current generation of ML represents a major advance in enabling computers to deal in similarities and approximations as well.  We're currently in the early stages of figuring out what that's good for.  One early result, I think, is that sometimes it's best just to talk to a computer like a computer.

Saturday, February 3, 2024

What's in a headline? Find out here

Goodness, it looks like 2023 was an all-time low for this blog, with one (1) post.  Not sure how that happened.  I honestly thought I'd posted at least one more.  On the other hand, I suppose it's consistent with the overall handwringing about whether there's even anything to post here.  But this post won't be that.

When I was in journalism class in high school, which was more than a few years ago to be sure, I was taught the "inverted pyramid": put the most important information, the who, what, where, when, why and how at the top of the article, then the important detail, then other background information.  The headline should concisely sum up the most important facts at the top.

Some typical headlines might be

  • Pat's Diner closing after 30 years
  • New ordinance bans parking on Thursdays
  • Midtown high senior wins Journalism award

If you've noticed that the titles (that is, headlines) of posts here don't exactly follow that rule, that's because I'm writing opinion here, not news.  That's my story, and I'm sticking with it even as I go on to complain about other people's headlines.

One of the worst sins in old-school journalism was to "bury the lede", that is, to put the most important facts late in the story (lead as in lead paragraph is spelled lede, probably going back to the days of lead type where the usual spelling might invite confusion).  If Pat's diner is closing, you don't start with a headline of Local diner closing and a paragraph about how much people love their local diners and only later mention that it's Pat's diner that's closing.

Except, of course, that's exactly what happens a lot of the time.  Here are some examples from the articles currently on my phone:

  • Windows 11 looks to be getting a key Linux tool added in the future
  • Nearly 1 in 5 eligible taxpayers don't claim this 'valuable credit', IRS says
  • 46-year old early retiree who had $X in passive income heads back to work -- here's why
I've tried to get out of the habit of clicking on articles like these, not because I think it will change the world (though if everybody did the same ...), but because I almost always find it irritating to click through on something to find out that they could have just put the important part in the headline:
  • Linux sudo command may be added to Windows 11
  • Nearly 1 in 5 eligible taxpayers don't claim earned income credit, IRS says
  • Early retiree with $X in passive income back to work after house purchase and child
One of these rewrites is noticeably shorter than the original and the other two are about the same length, but they all include important information that the original leaves out: which Linux tool?; which tax credit?; why go back to work?

The lack of information in the originals isn't an oversight, of course.  The information is missing so you'll click through on the article and read the accompanying ads.  The headlines aren't pure clickbait, but they do live in a sort of twilight zone between clickbait and real headline.  If you do get to the end of the article, you'll probably see several more links worth of pure clickbait, which is an art form in itself.

Real headlines aren't dead, though.  Actual news outlets that use a subscription model tend to have traditional headlines above traditional inverted-pyramid articles.  They probably do this for the same reason that newspapers did: Subscribers appreciate being able to skim the headline and maybe the lede and then read the rest of the article if they're interested, and that sells subscriptions.

I'm pretty sure half-clickbait headlines aren't even new.  The newspaper "feature story" has been around considerably longer than the web.  Its whole purpose is to draw the reader in for longer and tempt them to browse around -- and either subscribe for the features or spend more time on the same page as ads, or both.  For that matter, I'm pretty sure a brief survey of tabloid publications in the last couple of centuries would confirm that lede-burying clickbait isn't exactly new.

I started out writing this with the idea that the ad-driven model of most web-based media has driven out old-fashioned informative journalism, and also those kids need to get off my lawn, but I think I'm now back to my not-so-disruptive technology take: Clickbait and semi-clickbait aren't new, and the inverted pyramid with an informative headline isn't dead.  In fact, when I checked, most of the articles in my feed did have informative headlines.

In part, that's probably because I've stopped clicking on semi-clickbait so much, which is probably changing the mix in my feed.  But it's probably also because the web hasn't changed things as much as we might like to think.  All three kinds of headline/article (informative, semi-clickbait, pure clickbait) are older than the web, and so are both the subscription and ad-based business models (though subscription print publications often had ads as well).  It's not too surprising that all of these would carry through.