Wednesday, December 30, 2015

Prediction update

In mid-2009 I said:
Personally, I still don't see I Robot coming to life any time soon, but I do see things that got written off as impossible during the dead-and-debunked phase starting to stir to life again. I'm thinking, say, competent machine translation or robots that can pick things up and carry them around a house, happening gradually in the next decade or so.
So we're about 70% of the way there now.  I retried the round-trip translations from this post with a popular translation tool and got considerably better results, maybe even what you might call "competent".  I don't see any signs of thing-lugging robots appearing in the stores as mass-market consumer products in the next couple of years, but I wouldn't be surprised if someone somewhere is selling such a thing before long -- or has already.

That's the nice thing about vague predictions -- it all depends on how you count.

Wikipedia considered harmful ... or not

In an old post on Wikipedia, I said that

[I]t's easy to spot a backwater article that hasn't seen a lot of editing. This is not necessarily a bad thing. Obscure math articles, for example, tend to read like someone's first draft of a textbook, full of "Let x ..." and "it then clearly follows that ..." The prose may be a bit chewy, but whoever wrote it almost certainly cared enough to get the details right.

My feeling was that if you were really interested in, say, the functoriality of singular homology groups,  you'd probably have enough context to chew through prose like "This generality implies that singular homology theory can be recast in the language of category theory."*

In a recent ars technica article, John Timmer argues that impenetrable technical articles are actively harmful: "The problematic entries reinforce the popular impression that science is impossible to understand and isn't for most people—they make science seem elitist. And that's an impression that we as a society really can't afford."

I think that's a good point, but I'm not sure how bad the problem really is in practice.  A hard-to-read article is most likely to be harmful if a lot of people are seeing it, which also makes it more likely that someone will be able to improve it.  This is a fundamental assumption of Wikipedia in general, I think.  As such, it would be interesting to see some data behind it -- is there a strong correlation between the number of times a page is landed on and the number of edits (or edits per word, or such)?

Assuming that correlation holds, then someone coming to Wikipedia to learn about, say, physics should have a good chance at a gentle introduction.  Let's try:

  • The main article on Physics seems like a perfectly good Wikipedia page.  It starts with a general introduction, goes into history, core theories, relation to other fields and so forth.  Let's look at one of those fields:
  • Condensed matter physics still seems to be in good shape.  The first sentence doesn't seem completely useful at first: "Condensed matter physics is a branch of physics that deals with the physical properties of condensed phases of matter," but the next paragraph goes on to explain nicely what a condensed phase of matter is.  The rest of the article continues in a well-structured way to give an outline of the field.  Let's look at one of the theoretical aspects:
  • Symmetry breaking "needs attention from an expert in Physics".  I'd agree with that assessment.  The general idea is still there, but we're definitely getting technical: "In physics, symmetry breaking is a phenomenon in which (infinitesimally) small fluctuations acting on a system crossing a critical point decide the system's fate, by determining which branch of a bifurcation is taken."  For example, what's a bifurcation?  Well, we can at least chase the link and find out:
  • Bifurcation theory is actually a better-structured article, but no less technical.  It's actually a math article, not a physics article.
I would say that either of the last two would be intimidating to a non-physicist/mathematician.  I don't know if you could say the same about the first two.  Yes, there are still technical terms and concepts, but it's pretty hard to get away from that and still cover the material.  I would also say that a non-physicist interested in physics in general would be far more likely to land on the Physics article than the other three.

I also noticed that, while I have run across a few really impenetrable technical articles in Wikipedia, it didn't seem -- in this particular random walk, at least -- that the quality of the articles dropped off steadily as one went off the beaten path.  Fields intersect, and perhaps you're never too far from someone's beaten path.  I did chase a link from Bifurcation Theory to Stationary point, which was marked "The verifiability of all or part of this article is disputed", not something one expects in math articles, but it didn't seem particularly better or worse than the previous two, the warning notwithstanding.

Let's say that the random walk above is fairly representative -- and I think it is, based on other experience browsing Wikipedia.  What of Timmer's claim that general interest articles such as The Battle of the Wilderness are accessible, while technical articles such as Reproducing kernel Hilbert space are hostile?

I suspect Selection bias.  Timmer (and myself, and anyone who browses a lot of technical articles) sees a lot more technical articles than the average reader.  In fact, we should broaden that a bit and say "specialized" instead of "technical".  Just as math geeks might read a lot of math articles, history geeks will read a lot of history articles, sports geeks a lot of sports articles and so forth.  Should we really expect Reproducing kernel Hilbert space to be as newbie-friendly as The Battle of the Wilderness when the battle site has a plaque on a public highway talking about it?

Let's try following History like we did for Physics above, since Timmer's example of an excellent article is from that field.  The first area of study is Periodization, and already we see a notice that the article's sources remain unclear.  A large chunk of the article covers Marxian periodisation, giving the almost certainly false impression that this is the only significant way that historians divide history into periods.  This section seems to be a copy of the corresponding section in Marx's theory of history, suggesting that one of the most basic Wikipedia cleanups -- making sure that the bulk of the information is in one definitive place and everything else links to that place -- hasn't been done.

Just as impenetrable math or physics articles may give the impression that scientists are a bunch of elitists, the article on Periodization -- the second history article I tried -- may give the impression that historians are a bunch of leftists.  So maybe Wikipedia as a whole is unfriendly to academics as a whole?  That would be a sad irony indeed, but I doubt that's really what's going on.  Wikipedia is filling more than one function, after all.  It's a general introduction in some places, and a specialized reference work in others.  That seems fine.

There's a lot more to be said here.  Maybe survey articles like the ones on physics and history are the wrong place to look -- not that we've really defined what we're looking for.  I tried putting "The battle of" into the search bar and clicking on a few of the battles that came up.  All the articles seemed quite well written.  Perhaps the Wikipedia process works better for documenting specific events?  Or perhaps the search bar was just showing me the most popular articles, which in turn would be popular for being well written and well-written for being popular?

Overall, the proposition that Wikipedia articles on the sciences are bad for the sciences seems like a testable hypothesis, at least in principle, but properly testing it requires a lot more machinery -- methodology, statistical models, surveys, etc. -- than you'll find in a blog post or op-ed.  Without a more thorough study, we should take assertions like Timmer's as good starting points, not conclusions.

* I should point out that the article I linked to is not necessarily the kind of article that Timmer is complaining about.  I just picked a handy example of a technical article that would probably not be too familiar to most readers.  Yep, I ... just happened ... to be reading up on singular homology theory over my winter break.  What can I say?

Friday, December 25, 2015

Field notes: Proudly answering none of your pressing questions

Browsing through old posts and statistics, one thing that jumped out is that a fair chunk of traffic comes from people seeking answers to questions I specifically don't answer.  Examples:
  • Information age: Not dead yet was a reply to someone else's blog post, arguing that you couldn't really say when an "age" began or ended, and in any case "ages" tend to overlap rather than each new age supplanting the last.  If you came there searching for "When did the information age begin/end?", as several people have, you may be disappointed.
  • Off topic: Welcome to the new decade was, well, first of all, off topic.  It's a pet peeve of mine that people will argue that everybody gets some fine point of usage wrong, in this case when decades start.  Somewhere I ran across someone's argument that a decade is just a block of ten years and you can start one whenever you want, and that clicked.  I really should have linked to wherever I read that, but then, I doubt it was their original idea either.  In any case, if you landed there looking to settle a bar bet about whether 2010 was the first year of a new decade, you may be disappointed.
  • Now what happened to my bookmarks? Was further speculation on a previous post about why I no longer found myself using my browser's bookmark feature so much and never really ended up doing much with (Remember them? They still seem to be around, actually, under plain  Naturally, pretty much anyone who ended up there was searching for help in recovering lost browser bookmarks.  I added some links for that up at the top a few years back, and even updated them a couple of times, but I have no idea if they're still useful.  As I said, I don't really use bookmarks myself anymore.
  • "When I use a word, it means just what I choose it to mean" is the most popular Field Notes post of all time.  I'm not sure exactly what brought people to it -- the bulk of the activity seems to have scrolled off the easy-to-find summaries -- but it's just a rant on an annoying customer service involving spectacular overuse of boilerplate macros.  If you were thinking Alice in Wonderland, you would probably have given up a little past the title.
Two of these are genuine Field Notes posts -- analysis of things webby.  But those are the two that people seem to land on mainly in search of something else.  The other two, that is, the ones people might end up on more-or-less on purpose (depending on what was driving traffic to the last one), don't really convey what this blog is mostly about.  Of those two, the one that people probably land on looking for answers doesn't really answer the question.

And I'm not even disappointed.  To the contrary, there's something in all this that's deeply reflective of the web as I understand it.

Monday, December 14, 2015

My phone, then and now (or: Maybe coverage is the new coverage?)

(August? Really?  This must be a record gap even for the new, unhurried Field Notes.  Oh well ... it's been busy)

Re-reading through the blog, I ran across a short post on wireless carriers' advertising having shifted from coverage to bandwidth.  Most old posts seem to hold up well, but this one seemed remarkably out of date:
  • With 4G building out, coverage is very much an issue -- at least in ads
  • At the time I was using a "feature phone", mainly as an alarm clock and ... as a phone.  I finally took the plunge with a proper smart phone a while back.  I use it as ... 
    • An alarm clock
    • A phone
    • A GPS
    • A way to check email
    • A camera (but I also used the old feature phone as that)
    • A way to browse news stories, check sports scores, weather etc.
    • A way to text -- it's noticeably easier to text with autocomplete, though not miraculously so
    • A way to schedule appointments and to check my schedule and reminders
    • A way to look up stuff quickly on the web
    • A few other random applications
It's interesting that even feature phones had several of those -- phone, clock, text, camera, calendar -- and probably more or less the right set of them to get the most out of the limited bandwidth and CPU.  The newer iterations are generally better (auto backup of photos to the cloud comes to mind), but not mind-blowingly, night-and-day better.

My attitude at the time was "Meh ... I'm usually near my laptop and it has a bigger screen"  While I find I use my phone quite a bit during the day, I'm still near my laptop most of the time, and I'd only really miss the phone for really "mobile" applications, like
  • phone
  • GPS
  • camera
  • receiving texts anywhere
  • checking email everywhere
Oh ... and as an alarm clock.