Showing posts with label division of labor. Show all posts
Showing posts with label division of labor. Show all posts

Wednesday, August 23, 2017

Tags and finding things

Putting together these recent posts, and posts on the other web, I notice I'm much more casual about tagging.  I can't bring myself to stop altogether.  A post without tags seems somehow incomplete.  But every time I add a tag I find myself asking "Why am I doing this?"

For years and years it's been possible to add "site:fieldnotesontheweb.com" to a search and find whatever you want on this blog (or likewise any other), whether I've tagged it or not.  The difference, if any, is more a matter of curation.

Donald Knuth, in putting together The Art of Computer Programming, made a great effort to put together a complete index, partly out of frustration with the textbooks he'd had to read as an undergrad.  To him, this wasn't just a matter of searching for all occurrences of a given term (which was possible since the text of TOACP was in digital form), or dumping out a concordance of terms by page.  Context mattered.  The index entry for C. A. R. Hoare might include pages mentioning quicksort, even if Hoare's name doesn't appear on those pages, for example.

I think tags on a blog fill a similar purpose.  If you click on the link for a tag, you'd expect to see posts on that particular topic, regardless of the exact words.  The link for annoyances on this blog includes several annoying things, whether or not I happened to include the word annoy or its forms in the posts.  Machines are getting better at this sort of inference, but they're not great yet.

I think that's a good theory, anyway, and I think human curation is still useful.  On the other hand, I don't really have time to post on this blog, much less read through it and fix up tags.  I've done some re-reading, but I've only really been through a couple hundred posts, and then only fixing typos and adding the occasional note or update.  So what you get here is hit or miss.  Not so much a careful taxonomy as a record of whatever I happened to be thinking at the time.

If I had time, I would probably trim the set of tags down significantly, particularly getting rid of tags that are completely redundant with search results, and probably consolidating a few similar tags down to one canonical choice.  But not today, and not any time soon.  If the tags as they stand make for more interesting browsing, great.

(By the way, I'm not particularly proud that annoyances is currently the most populated tag on this blog)

Tuesday, October 12, 2010

"The computer knows"

The other day someone asked me whether it was supposed to be cold out that week.  I didn't know offhand.  "That's OK," they said, "I'll check the computer.  The computer knows."

It occurred to me that if someone were trying to convince a skeptical public back in the 80s that this whole "personal computer" thing was really going places, and that person were allowed just one ten-second glimpse into the faraway world of 2010 to show the audience, they would probably give their eyeteeth for that particular glimpse.    Ditto for a budding AI researcher.

Except ... the viewer from thirty years ago would naturally take "the computer knows" at face value.  Computers in the 21st century would be so fast and so smart that the personal computer in the kitchen could predict the weather.

Today, by contrast, we don't generally assume that computers "know" much of anything, but we do assume that they can easily direct us to someone who does, in this case the people at a weather service.  Granted, said forecasters are making use of computers that, as far as computing power, could swallow an 80s-era supercomputer whole without a hiccup.  Nonetheless, we don't assume that our own computers could do any such thing, or even that a supercomputer is so omnipotent as to make weather forecasters redundant.

That's the difference between having a PC and being on the web.  The primary function of most computing devices -- personal computers, phones, netbooks, routers, etc. -- is communication.  That's not to say that computers aren't essential in producing and cataloging data, but data is only useful if you can get to it.

Thursday, September 2, 2010

Online customer service, only without the service

I don't generally like to criticize customer service reps.  It's a thankless job.  However, this particular one might have been a little more careful with those boilerplate macro keys.  It would be helpful, also, if SomeCompany's system would allow a password reset* given:
  • Account number
  • Username, being the service provider's home-grown email address for the customer
  • Customer's personal email address
  • Customer's full name and home address
  • Last four digits of customer's SSN
  • Customer's home phone number
  • IP address associated with the account (from which the system was already able to find the username)

It's sort of a division of labor anti-pattern. A human an a computer working together end up more obtuse than either alone.  Offering the customer the service the customer can't log into and the chat support that didn't help is a nice parting touch.

What follows is an anonymized and lightly edited transcript of an actual customer chat sent by one of my "army of stringers, researchers, fact-checkers and miscellaneous hangers-on."


Problem: Trying to sign in; need password

Hello Customer, Thank you for contacting SomeCompany Live Chat Support. My name is Service Rep. Please give me one moment to review your information.  I'm ready to assist you today. How are you doing by the way?

Fine, thanks
.

Nice to know that you are doing good.

I was trying to log in to your service


As what I have understood, you would like to have your password for you to sign in right?


Yes.  I thought I'd already set up an account and your website found a user name from looking at my IP address, but I can't reset the password
. Also, I'd rather choose my own user name rather than use the assigned one (wemadethisup@somecompany.com), if possible.

Oh, I see.
  I understand that it is very important for you to know the password of your here.
  I also know that you would like to have your email address personalized and change it.
  There is no need to worry since as your service representative today, I want you to know I am more than willing to help you today with your issue. I can assure you that we can have a positive resolution since we will be working on this together.

Here’s what I can do, Since your password is not allowing you to log in, and since we do not store our customers’ passwords, I can give you a randomly system generated password would that be okay?

That would be fine, thanks


Alright. Please allow me to pull up your account information so that we can resolve it in the most efficient way possible. I will be verifying security information to protect your account privacy. May I please have the account number, account holder's full name, home address, and the last 4 digits of your SSN?

<Customer gives the information>

Thank you.
  May you also verify your phone number and the Email address that you are using?

<Customer gives phone number and personal email address>

Thank you.
  I am referring to the SomeCompany email address that you would like to reset the password.

I don't use SomeCompany for email, so that address is only useful to me as the login ID.  I'm not set up to check that account and I would rather not have to be.  But I think you mean wemadethisup@somecompany.com.

Thank you.  While waiting, I will share with you a feature of SomeCompany that you can truly benefit. Are you aware of the customer self-help on SomeCompany.com? SomeCompany.com has an extensive series of Frequently Asked Questions (FAQs) that cover all of our products. Customers do not have to sign in to access the FAQs. Quick steps to do it...Open a web browser window and go to http://www.somecompany.com/. On the home page, the navigation menus are on the left side of the window and click on Customers then Help and Support.


I have already pulled up your account.

I don't think the FAQ will help.  Please just reset the password.


Okay. 
  Now, for the password, since we do not keep it for security reasons, I can reset it and provide you with a randomly generated one. Do not worry about changing it because you would be able to change it to your preferred password once you are able to log in. Would that be okay with you?

Yes, please.

Sure, now for me to push through the process and reset your password, may you please give me your security pin?

I don't remember setting a security PIN.

A security pin is like a password to your account. This will be sent via a postal mail to you, a few weeks after your service is started. It is a 4 digit number.   May you please try to check your postal mails?

OK.  I might have the mail somewhere.  I have no idea where.
 [time passes] Sorry, I can't seem to find anything.

Since you have not provided the Security pin, in order to push through with this process, I would have to call you right now on your phone number to authenticate. Would that be okay with you?

Unfortunately, no.  My kids are sleeping.
  Perhaps I should try again during the day?

Yes, you may always contact us. 
 We are available 24/7, Customer.

So there's nothing else you can do?

Customer I really know how important it is to have your password. I would like to apologize however, we need to call you to authenticate so that we can reset your password.

OK. I'll try again during daytime hours.

Thank you so much for your time, Customer.

[time passes]


Customer, here’s what we have done on this chat today, I have assisted you with your SomeCompany inquiry on resetting your password however we need to call you to authenticate.
 Customer, it has been my pleasure serving you today and I truly appreciate your understanding and cooperation. Do you have other concerns for me today? I will be glad to assist you further.

No, that will be all, thank you.

We strive to exceed your expectations and hope that you will take a moment to complete the 3 question survey that will follow our interaction, your feedback will help us to continue improving how we serve you. 
 Do you want to use our service? Go to http://www.somecompany.com. Thank you for choosing SomeCompany as your service provider and have a great day! SomeCompany appreciates your business and values you as a customer. Our goal is to provide you with excellent service. If you need further assistance, you can chat with one of our Customer Support Specialists 24 hour a day, 7 days a week at http://www.SomeCompanySupport.com



* Actually, SomeCompany is probably right to want better authentication.  It's quite possible that someone, say, found their neighbor's bill, with the account number, and leeched onto their non-secured WiFi or used other chicanery so as to connect from the right IP address and thence obtain the user name.  It's conceivable that such a person also somehow happened to know the customer's personal email address and last four digits of the SSN.

Calling the phone number of record (which the customer was challenged to give and the service rep is able to verify) would raise the bar significantly.  Likewise, assuming the snail mail with the PIN didn't also have the account number, the would-be thief would have had to steal two separate pieces of mail, typically delivered on different days.

The annoyance here is that the stronger authentication is strong on its own.  That is, "Tell me the PIN we mailed you" is about as secure as "Tell me the PIN we mailed you and several pieces of not-too-hard-to-find information." and "So you want a password reset?  Let me call you at the phone number listed on the account." is at about as secure as "Tell me several pieces of not-too-hard-to-find-information and I'll call you on the phone number listed on the account."  Unfortunately, Service Reps are generally required to go through the whole account verification cha-cha-cha before doing anything meaningful.

One wonders, though, why this bundle of not-too-hard-to-find information is good enough the let the customer access the account information, but not good enough to let the customer use the service itself.

Sunday, August 15, 2010

Wikipedia 1.0: journey vs. destination

While browsing through the Wikipedia policy pages (it was either that or just tattoo "Geek" on my forehead and be done with it) I ran across something I remembered running across a while ago, more or less shrugging at and moving on, namely an offline edition of Wikipedia. There seem to be two approaches:
  • The "German model": Distribute a snapshot of Wikipedia on CD. Why, I'm not sure. Perhaps to reach that select audience of people who have heard of Wikipedia but don't have an internet connection to access it*?
  • The "Wikipedia 1.0" model: Select the best, most polished articles and publish them, whether on paper, CD/DVD, read-only web site, or whatever.
The Wikipedia 1.0 project was proposed in 2003. At this writing, several versions have been released and 0.8 will be out Real Soon Now. That's not to say that 1.0 will be two versions from that. The beauty of the x.y version numbering scheme is that you don't have to go from 0.9 to 1.0. You can release 0.91, 0.95 ..., you can release 0.10, 0.11 ..., you can release 0.9a, 0.9b ... [But it looks like we'll go into 2016 still on version 0.8 ... my guess is that 1.0 isn't going to happen -- D.H. Dec 2015]

For my money, it's not particularly important whether 1.0 ever comes out. Plenty of good has come out of attempting the exercise at all, in particular as a spur toward improving the quality of core articles and encouraging the development of Wikipedia's quality and importance ratings. These exhibit a nice division of labor: People rate articles and computers aggregate the best-rated ones.

The main reason not to just leave it at that and integrate the ratings more directly into the UI, is that vandalism still has to be filtered by hand and, despite the lack of imagination exhibited by most vandals, always will be. But most likely even that could be handled without an explicit release mechanism, by means of "flagged revisions," which allow editors to flag particular revisions as being free of vandalism and otherwise up to snuff. Apparently the mechanism has been in place for a while but the community is still figuring out how best to use it.

What's the proverbial "simplest thing that could possibly work" here? Perhaps just allowing anyone -- or anyone with an account -- to tag a revision however they like, and allow readers to filter what revisions they see. E.g., only show me revisions that the quality rating committee has rated "good" or better and my friend Jimbo has rated "funny". The proposal for "sighted revisions" looks pretty close to this, though less flexible.


* That's a bit glib, as there are communities with access to computers but with limited or no bandwidth, but given it was the German edition at 3 Euros per CD, I doubt this was the intended audience. Nonetheless, 40,000 people opted to buy it.

Wednesday, May 26, 2010

It's a whole Zooniverse now

Almost two years ago now (has it really been that long?), I learned about the Galaxy Zoo. At the time it looked like an interesting approach to try -- invite the general public to classify galaxies, a task which is
  • useful (to astronomers, at least)
  • reasonably easy for humans
  • not at all easily or well handled by computers
  • able to be split into millions of independent pieces
Given those characteristics, the original project had at least a chance of succeeding, and indeed it has succeeded handsomely. Last April, it announced the 60 millionth classification, and as a result of all this work it now has "an incredibly robust, well-defined and scientifically valid catalogue of Sloan Digital Sky Survey galaxies." It has also produced some significant results, with more in the pipeline.

Along with all that, it's given hundreds of thousands of people (myself included) a chance to participate in real scientific research and see the same images working astronomers see. If that's not a clear win I don't know what is, and it simply couldn't have happened without the web.

As it became clear how well things were working, a second project was launched, aimed at spotting supernovas, again with good results. That project has since been joined by others and the whole crowd has overflowed the original galaxyzoo.org domain into the Zooniverse.

The latest addition, Moon Zoo, is aimed at classifying craters and other features in the heaps and heaps of data from the Lunar Reconnaissance Orbiter. The problem looks to fit all four criteria above and so have every bit as good a chance of success as its predecessors.

For a bonus, some lucky classifiers will run across human artifacts, from orbiters to footprints of astronauts. Take the resolution needed for that times the surface area of the moon and you've got some idea how ridiculously much data is involved.

Not every bright idea on the web makes a significant impact on the outside world, but some do. If you accept that basic research is significant to the world at large, then the Zooniverse has to rank as a major success story.

Wednesday, March 24, 2010

New tag: Division of labor

I wouldn't normally devote a post to announcing a new tag; they come up all the time in the natural course of things. However, I've recently found another common thread running through several posts: The division of labor between humans and computers.

One of the lessons of early AI work was that there's not a lot of overlap between what humans naturally do well and what computers naturally do well. I say "naturally" because much of the work in the ensuing decades has been in enabling machines to do things that don't map naturally to their capabilities.

For example, it's not hard to program a computer to calculate a million decimal digits of pi. It takes some cleverness to produce, say, a billion digits reasonably quickly, but the basic problem is not that hard. On the other hand, it's quite hard to get a machine to recognize faces or walk without tripping over, things which are easy for us.

One crucial aspect of engineering is making the best use of the resources you have. If your resource is a computer, try to put it on problems that involve crunching large amounts of data, not, say, perception, judgment, natural language processing or recognizing objects in the natural world. Machines can deal with those problems, too, to various degrees, but not nearly as easily as we can.

I've called this theme dumb is smarter. Division of labor is complementary. It has to do with putting humans in the loop so that the machines only have to do what they're good at.

Another division of labor

Knowledge Generation Bureau (yep, KGB) has been around for a while now, but their latest TV campaign has been visible enough that even I caught it. When you've got the Baldwin brothers working for you, all things are possible.

There are a few key differences in between KGB and the various web-based search engines:
  1. KGB is phone-friendly. You text them a question -- any question. They send an answer in concise English, not a list of links.
  2. KGB is only semi-automated. Typically, there will be an actual person involved at some point in producing your answer.
  3. KGB charges $0.99 per question.
Item 1 clearly distinguishes KGB from search engines, regardless of how the answers are found. A list of links just isn't a great fit for a phone, even a recent-vintage smart phone. The promise of answering any question also separates them from a text-based interface to a search engine. Item 3 separates KGB from the various ad-supported answers pages (and from vanilla search engines). Given that, and the longevity of all three flavors so far, I doubt that any is going to supplant the others. Sometimes a quick list of links is the right thing -- particularly if you don't know exactly what you're looking for -- sometimes text Q&A is good enough to pay for, and sometimes you're just curious and might rather wait for a horde of strangers to ponder your question.

What originally prompted me to post, however, was item 2. Clearly, KGB must have a well-developed database to help answer routine questions quickly [or ... perhaps they use one of the major search engines? -- D.H. Dec 2015], and I'd assume their business model depends heavily on being able to answer most questions with little or no human involvement. But to be able to deliver coherent answers to arbitrary plain English questions quickly and accurately, there has to be a human in the loop at some point. The trick is to tilt the balance as heavily as possible toward the machine. Another effort along those lines is Pandora, which relies on humans for perception and (some) judgment, but leaves the heavy lifting to the machines.

I'm not really inclined to shell out 13 bucks to try out the baker's dozen on KGB, but the results might be interesting.

[As of May 2015, KGB seems to still be going strong.  Their home page seems to give answers for free, but with ads, but their phone service still charges --D.H.]

Wednesday, February 10, 2010

Pandora's division of labor

A while ago Roku added Pandora to its selection of channels and a shorter while ago I got around to trying it out. I like it, though I don't listen to it all day long (I generally don't listen to anything all day long).

Pandora's main feature is its ability to find music "like" a particular song or artist you select. This is nice not only because it will turn up the familiar music you had in mind, but it will most likely also turn up unfamiliar music that you'll like. As I understand it, that's a major part of its business model. Record labels use Pandora to expose music that people otherwise wouldn't have heard, and Pandora takes a cut.

To that end, it will only allow you to skip so many songs in a given time (though there is at least one way to sneak around this). They pick out likely songs for you and they would like you to listen. You can, however, tell Pandora that you like or dislike a particular selection. Pandora will adapt its choices accordingly.

So how does it work? Pandora is based on the Music Genome Project, which is a nicely balanced blend of
  • Human beings listening to music and characterizing each piece on a few hundred scales of 1 to 10 (more precisely, 1 to 5 in increments of 0.5).
  • Computers blithely crunching through these numbers to find pieces close to what you like but not close to things you don't like.
This approach is very much in the spirit of "dumb is smarter". Rather than try to write a computer program that will analyze music and use some finely-tuned algorithm to decide what sounds like what, have the software use one of the simplest approaches that could possibly work and leave it to humans to figure out what things sound like.

Even the human angle has been set up to favor perception over judgement. The human judge is not asked to decide whether a given song is electroclash or minimalist techno, but rather to rate to what degree it features attributes like "acoustic guitar pickin'", "aggressive drumming", a "driving shuffle beat", "dub influences", "use of dissonant harmonies", "use of sitar" and so forth. There are refinements, of course, such as using different lists of attributes within broad categories such as rock and pop, jazz or classical, but the attributes themselves are designed to be as objective as possible.

This combination of human input and a very un-human data crunching algorithm is a powerful pattern. Search engines are one example, Music Genome is another, and if there are two there are surely more. In fact, here's another: the "People who bought this also bought ... " feature on retail sites.

Friday, June 12, 2009

Baker's dozen: Crowdsourcing

As we've seen, getting a computer to understand a simple English question is not necessarily easy. People, on the other hand, are reasonably good at the task. So instead of trying to get a computer to answer a question, why not use the computer purely as a means of communcation in order to connect a question with someone's direct answer? Two efforts along those lines come to mind.

The creation of Wikipedia founder Jimmy Wales, Wikia Search officially folded its tent last month. Naturally, Wikipedia has an article on the topic, not all of which has quite made it into past tense. The Wikia search site now redirects to Wikianswers, not to be confused with WikiAnswers.com, which I'll get to.

The first question of the baker's dozen to get an answer other than "This question has not been answered." is number 6: Who starred in 2001? This gets a "Magic answer", presented in a curtained frame with black background and a magician's top hat in one corner. The answer is attributed to Yahoo! answers and begins "It is an excellent movie. I give it four stars out of 5." The title of the movie is nowhere mentioned, but it appears to have starred Nicole Kidman and have been set during "gee umm WWI or WWII". A couple of minutes on IMDB identifies the film as The Others. Curiously, the more specific question Who starred in 2001: a Space Odyssey? gets no answer.

I also got a magic answer from Yahoo! on Who invented the hammock? and this time it's relevant: the hammock "originated in Central America more than 1,000 years ago." There seem to be two schools of thought on this one: Central America and Amazon basin. I say it was Colonel Mustard in the library with a lead pipe.

WikiAnswers.com is much the same beast as Wikianswers but commercial and -- according to Wikipedia -- more heavily trafficked. The results are not particularly different from those of Wikianswers, but it does answer How far is it from Bangor to New York?

Going a bit further afield, what about using Twitter as a search engine? If you've got a question, send it out as a tweet and see what comes back. There has apparently been some buzz about this concept, and indeed it's one of the options Wikianswers (the first one, not WikiAnswers.com) gives if it can't answer a question. Farhad Manjoo offers a contrasting viewpoint on Slate.com. The gist, if I understand aright, is that in order to sort through the responses, you need a real search engine, so why not just hook Twitter up with an existing search engine and be done with it?

All in all, crowdsourcing doesn't seem to deliver great results here. Why would that be?

Crowdsourcing, at least the free and open Wiki-style variety, depends on each person being able to get more out than they put in. This is possible because information is not consumed, only used -- if you learn something from a source, that doesn't prevent someone else from learning something from it later. It's also possible because sharing knowledge can be its own reward, but I suspect that's a smaller factor.

The classic case is Wikipedia. If 10,000 people read an article, and only 1/10th edit it, and only 1/10th of those edit it in a substantially useful way, you've still got a hundred people working on the article. Naturally I'm making up those numbers, but real experience suggests something of the kind is at work.

Single, discrete answers are not the same as in-depth articles. For example, suppose there are 10,000 places of interest. There are then 100,000,000 questions of the form "How far is it from X to Y?" You can get rid of the 10,000 cases where X and Y are the same and half of the rest because its just as far from X to Y as from Y to X, but that still leaves about 50,000,000 possible questions.

The odds of any particular question coming up more than once will depend on the prominence of the places. It's quite possible that many people will be interested in how far it is from LA to New York, but if I'm doing a tour from Schenectady to Poughkeepsie to Paducah to Tehachapi to Tonapah, I'm probably not going to find that someone else has already asked and had answered those particular combinations.

If I keep striking out asking questions, why should I go to any trouble to pass along the answers I finally do dig up elsewhere? The canonical answer is for the good of the wiki as a whole, and more selfishly to improve the odds I find my answer next time on the assumption everyone is doing likewise. But if I can generally find the answer without the wiki, why do I care whether the wiki can also answer it? Wikipedia wins because it gathers information that's not readily found in one place elsewhere.

On the other hand, a map database, once it's learned the 10,000 places and the routes between them, will gladly answer any and all distance queries with equal ease.

Not every potential question for a crowdsourced engine has the odds stacked so strongly against it. Probably lots of people want to know celebrity du jour's birthday. Unfortunately, that's just the kind of information that's fairly easy to track down with existing tools.

The True Knowledge experience showed another potential problem. Making information easy to find means indexing it, and indexing is a different beast from asking questions. Wikipedia, for example, provides two basic means of structuring information, as distinct from just typing it in: categorizing (tagging) it and organizing the body text into articles, sections, subsections etc. The results are not perfect, but they're very helpful and probably about as much as we can expect from the crowd. Trying to have the crowd too intimately involved the mechanics of a search mechanism itself is probably not a good fit.

On the other hand, crowd-generated content is great. A large portion, though not 100%, of the web is crowd-generated. As a result, just searching Wikipedia often works well. I prefer it when the result I'm after is something like an encyclopedia article. Along with its take, Wikipedia will provide links to sources and if that's not enough I can still Google. I'll use Wikipedia's native index if I know the particular topic (or can get close). Otherwise I use Google and happily read any relevant Wikipedia articles that show up.

This seems a good division of labor. People write the content and machines search and collate.

Saturday, June 28, 2008

Hunting the elusive voorwerp

Since it's been slashdotted and has appeared on the major news feeds, there's a good chance you've heard of Hanny's Voorwerp by now. It's a we-haven't-seen-anything-quite-like-this found by a Dutch schoolteacher named Hanny as part of the Galaxy Zoo project (I've seen voorwerp variously translated as "object" or "thing" -- the etymology suggests there might be a shade of meaning we're missing, so better not to translate).

The Galaxy Zoo is an interesting bit of crowdsourcing. Unlike SETI@home, GIMPS or similar projects it relies on human processing power rather than the idle cycles of millions of PCs. In the typical distributed-computing project, the algorithms are well-understood but require massive amounts of computing power. In this case, no one knows a good algorithm for classifying galaxies, but people can do it reasonably well, even with no training in astronomy.

Galaxy Zoo literally gives the rest of us a look at what kinds of things astronomers spend their time looking at, plus the chance of turning up something truly new. It also relieves the astronomers of the effort of taking a first look at millions and millions of images and turns up oddities, like the voorwerp, that would otherwise have gone unnoticed. Going by the entries on the Galaxy Zoo blog and the acknowledgments in the submitted papers, they're very grateful for the assist.

Apart from the human visual system's ability to recognize shapes, Galaxy Zoo takes advantage of another facet of human perception, namely its imprecision. While most people could easily distinguish a well-defined spiral galaxy from an elliptical one, many actual galaxies are harder to characterize.

In such cases a particular algorithm distributed to everyone's PC would always give the same answer, whether it's right or wrong. With some effort, you could distribute several different algorithms [or several versions of the same one, or one with some random "fuzz"] and look for discrepancies, but you'd have to write several different algorithms, or at least discover tuning parameters that made a significant difference. This not impossible, by any means, but you get it for free by having several people look at the same image (or, quite possibly, just from having the same person look at the image every so often).

Objects that produce varying answers are likely to be the interesting borderline cases, whether because there's more going on with them, or simply because we humans have more trouble figuring it out.

[Hanny's Voorwerp has since been identified as a "quasar ionization echo", but there's still plenty of research to be done.  Similar objects (voorwerpjes, or "little voorwerps") have been found elsewhere.  As usual, Wikipedia has a summary -- D.H. Dec 2018]