Wednesday, December 30, 2015

Wikipedia considered harmful ... or not

In an old post on Wikipedia, I said that

[I]t's easy to spot a backwater article that hasn't seen a lot of editing. This is not necessarily a bad thing. Obscure math articles, for example, tend to read like someone's first draft of a textbook, full of "Let x ..." and "it then clearly follows that ..." The prose may be a bit chewy, but whoever wrote it almost certainly cared enough to get the details right.

My feeling was that if you were really interested in, say, the functoriality of singular homology groups,  you'd probably have enough context to chew through prose like "This generality implies that singular homology theory can be recast in the language of category theory."*

In a recent ars technica article, John Timmer argues that impenetrable technical articles are actively harmful: "The problematic entries reinforce the popular impression that science is impossible to understand and isn't for most people—they make science seem elitist. And that's an impression that we as a society really can't afford."

I think that's a good point, but I'm not sure how bad the problem really is in practice.  A hard-to-read article is most likely to be harmful if a lot of people are seeing it, which also makes it more likely that someone will be able to improve it.  This is a fundamental assumption of Wikipedia in general, I think.  As such, it would be interesting to see some data behind it -- is there a strong correlation between the number of times a page is landed on and the number of edits (or edits per word, or such)?

Assuming that correlation holds, then someone coming to Wikipedia to learn about, say, physics should have a good chance at a gentle introduction.  Let's try:

  • The main article on Physics seems like a perfectly good Wikipedia page.  It starts with a general introduction, goes into history, core theories, relation to other fields and so forth.  Let's look at one of those fields:
  • Condensed matter physics still seems to be in good shape.  The first sentence doesn't seem completely useful at first: "Condensed matter physics is a branch of physics that deals with the physical properties of condensed phases of matter," but the next paragraph goes on to explain nicely what a condensed phase of matter is.  The rest of the article continues in a well-structured way to give an outline of the field.  Let's look at one of the theoretical aspects:
  • Symmetry breaking "needs attention from an expert in Physics".  I'd agree with that assessment.  The general idea is still there, but we're definitely getting technical: "In physics, symmetry breaking is a phenomenon in which (infinitesimally) small fluctuations acting on a system crossing a critical point decide the system's fate, by determining which branch of a bifurcation is taken."  For example, what's a bifurcation?  Well, we can at least chase the link and find out:
  • Bifurcation theory is actually a better-structured article, but no less technical.  It's actually a math article, not a physics article.
I would say that either of the last two would be intimidating to a non-physicist/mathematician.  I don't know if you could say the same about the first two.  Yes, there are still technical terms and concepts, but it's pretty hard to get away from that and still cover the material.  I would also say that a non-physicist interested in physics in general would be far more likely to land on the Physics article than the other three.

I also noticed that, while I have run across a few really impenetrable technical articles in Wikipedia, it didn't seem -- in this particular random walk, at least -- that the quality of the articles dropped off steadily as one went off the beaten path.  Fields intersect, and perhaps you're never too far from someone's beaten path.  I did chase a link from Bifurcation Theory to Stationary point, which was marked "The verifiability of all or part of this article is disputed", not something one expects in math articles, but it didn't seem particularly better or worse than the previous two, the warning notwithstanding.

Let's say that the random walk above is fairly representative -- and I think it is, based on other experience browsing Wikipedia.  What of Timmer's claim that general interest articles such as The Battle of the Wilderness are accessible, while technical articles such as Reproducing kernel Hilbert space are hostile?

I suspect Selection bias.  Timmer (and myself, and anyone who browses a lot of technical articles) sees a lot more technical articles than the average reader.  In fact, we should broaden that a bit and say "specialized" instead of "technical".  Just as math geeks might read a lot of math articles, history geeks will read a lot of history articles, sports geeks a lot of sports articles and so forth.  Should we really expect Reproducing kernel Hilbert space to be as newbie-friendly as The Battle of the Wilderness when the battle site has a plaque on a public highway talking about it?

Let's try following History like we did for Physics above, since Timmer's example of an excellent article is from that field.  The first area of study is Periodization, and already we see a notice that the article's sources remain unclear.  A large chunk of the article covers Marxian periodisation, giving the almost certainly false impression that this is the only significant way that historians divide history into periods.  This section seems to be a copy of the corresponding section in Marx's theory of history, suggesting that one of the most basic Wikipedia cleanups -- making sure that the bulk of the information is in one definitive place and everything else links to that place -- hasn't been done.

Just as impenetrable math or physics articles may give the impression that scientists are a bunch of elitists, the article on Periodization -- the second history article I tried -- may give the impression that historians are a bunch of leftists.  So maybe Wikipedia as a whole is unfriendly to academics as a whole?  That would be a sad irony indeed, but I doubt that's really what's going on.  Wikipedia is filling more than one function, after all.  It's a general introduction in some places, and a specialized reference work in others.  That seems fine.

There's a lot more to be said here.  Maybe survey articles like the ones on physics and history are the wrong place to look -- not that we've really defined what we're looking for.  I tried putting "The battle of" into the search bar and clicking on a few of the battles that came up.  All the articles seemed quite well written.  Perhaps the Wikipedia process works better for documenting specific events?  Or perhaps the search bar was just showing me the most popular articles, which in turn would be popular for being well written and well-written for being popular?

Overall, the proposition that Wikipedia articles on the sciences are bad for the sciences seems like a testable hypothesis, at least in principle, but properly testing it requires a lot more machinery -- methodology, statistical models, surveys, etc. -- than you'll find in a blog post or op-ed.  Without a more thorough study, we should take assertions like Timmer's as good starting points, not conclusions.

* I should point out that the article I linked to is not necessarily the kind of article that Timmer is complaining about.  I just picked a handy example of a technical article that would probably not be too familiar to most readers.  Yep, I ... just happened ... to be reading up on singular homology theory over my winter break.  What can I say?

No comments: