Saturday, August 4, 2012

Answering my random question


I recently asked whether there were more than a Britannica worth of Britannica-quality articles in Wikipedia.  Looking into it a bit, I'd have to generally agree with Earl that no, there aren't.

Britannica has about half a million articles (according to Wikipedia's page on Britannica).  English Wikipedia has about four million.  I would not say that one in eight Wikipedia articles is up to Britannica standards.

Granted, the famous Nature study of 2005 found that Wikipedia science articles are nearly as accurate as Britannica articles -- and that Britannica is far from flawless.  One can dispute the methodology and conclusions of that study, and Britannica did, but the overall conclusion seems at least plausible.

However, apart from science articles only being part of the picture, the writing in Wikipedia is uneven and full of Wikipedia tics.  Britannica, with full-time writers and editors, ought to be a bit better.  I tend to think this is where Wikipedia generally falls short. Factually, the two are comparable.  In style and organization, not so much.

Taking content and writing together, there are probably relatively few Britannica-quality articles in Wikipedia, but there are more than enough that are close enough.


Now CAPTCHA-free

A while ago I turned on "Word Verification", which makes people leaving comments read a hard-to-read word in order to prove they're not a bot.

This seems to have done more harm than good.  I still get the occasional spam comment, and it's a pain for people to leave real comments.  To see what to do about it, I Googled blogger captcha, and up popped this post urging bloggers to "kick Captcha to the curb".  The gist is, no, that extra inconvenience to real readers isn't really worth it.  Spam filters catch spam even if word verification is turned off.

And, of course, "It flags your blog as less professional".  If there's anything this blog stands for, in tone, subject matter and publication schedule, it's iron-clad professionalism.

So I'm turning word verification off.  If it turns out to be a horrible mistake, I can always turn it back on.  Otherwise, no news is good news.

More rumblings in the world of academic publishing

I've written before about the use of online outlets for quick publication of informal (that is, non-peer-reviewed) results, and arXiv in particular.  In The Case for Books, Robert Darnton expresses concern about the state of academic publishing and the power that the major publishers hold over academic researchers and libraries and wonders what will come of it all.

Now it seems things are heating up.  There is a boycott in progress against Elsevier, the academic publishing juggernaut that owns such publications as Lancet.  A number, and evidently a growing number, of academics are simply refusing to publish in or otherwise participate in Elsevier publications, on the grounds that Elsevier's high prices and profit margins and their overall practices are harmful to those who must publish in them, the institutions who must buy the publications, and to the free exchange of ideas itself.

At this writing, 12,558 people have signed up, giving their full names and affiliations in a searchable list.  These are not random people taking potshots from behind pseudonyms.  These people are putting their reputations on the line publicly and, by walking away from one of the major sources of recognition and exposure, potentially hindering their academic careers.  Their names may be found on thecostofknowledge.com.


The basic issue here is that to have a career in academia, one must produce a steady stream of work.  The universal standard for measuring that stream of work is the number and quality of papers one publishes.  "Publish or perish."

Since anyone at all can print up a paper on a topic of research (and many do), there has to be some mechanism to determine whether a result has any real merit.  In the academic world, that mechanism is peer review.  If you submit a paper to a refereed journal, the editors will select a set of reviewers in your field to go over it.  The reviewers will either reject the article outright or accept it, likely with revisions.

Different journals have different standards for inclusion.  This allows readers to have some idea up front how worthwhile an article is, and provides some means of rating a researcher's output beyond the sheer number of articles published.  In principle, and for the most part in practice, the peer review process ensures that articles in journals are accurate and relevant, at least as far as the reviewers can tell at the time.  Essentially, journals provide brand names.

Peer review is clearly a valuable service, beyond printing and distribution of paper volumes, which is, of course, on the wane.  But there are problems.  In the call to action which started the current boycott,  Timothy Gowers puts forth several complaints:

  • Journals cost too much, particularly since the authors and reviewers are paid by their institutions, not the publisher, and it's largely the same institutions that pay for subscriptions to the journals they're paying to produce.
  • Online access is behind expensive paywalls.
  • Publishers drive the overall cost up by bundling, that is, requiring institutions to buy large numbers of journals, many of which literally go unread, in order to subscribe to the ones they really care about.  An institutional bundle from a given publisher can run into the millions of dollars per year.
  • While many publishers produce expensive journals and require bundling, Gowers calls out Elsevier in particular for several reasons, including supporting legislation that restricts access to published results and playing hardball with institutions that try to resist bundling.
In short, publishers are in serious danger of losing their relevance, and in the view of those joining the boycott, Elsevier is one of the worst offenders.


It's all well and good to object to publishers' behavior and organize a boycott, but the academic world also seems actively engaged in building a more open, web-enabled alternative.  This includes
  • Blogging as a means of informal sharing and discussion.  Indeed, Gowers' call to action appeared on his blog (which, with a mathematician's precision, he calls "Gowers's Weblog")
  • Sites, notably arXiv, for collecting unrefereed preprints.
  • New online refereed journals aiming to take the place of old ones.  Normally establishing a brand can be difficult, but if the editorial board of the new journal is made up of disaffected board members from old journals, their reputations come with them.

While writing this, I was wondering what would be a really webby way to do this.  Here's a sketch:
  • Articles would be published in something more like a wiki form, with a full revision history and editors making changes directly.
  • Since reputation is particularly important here, changes would ideally be digitally signed.
  • Individuals could put their imprimatur on (a particular revision of) an article they thought worthy.
  • The quality of papers could be judged by the reputation of those approving of them, which in turn would be judged by the quality of the papers they'd produced ...
And then it occurred to me that in practice there would probably come to be groups of people whose approval was particularly significant within particular fields.  It would be good to be able to establish groups of, say experts in homology or complex analysis.  It would also be good to have people who were good at steering new works to the appropriate groups of experts.

Hmm ... except for the revision history and digital signatures bit, this sounds an awful lot like a peer-reviewed online journal.

Friday, August 3, 2012

Cookies in the UK (or should that be "biscuits"?)

I haven't tracked down whether Parliament decreed this, though it seems likely, but a number of UK sites I've visited in the past couple of months show you a brief popup or other announcement to the effect that they use cookies (small files that your browser stores on your disk and hands back to the site on later visits so the site can tell it's you).  The announcement is typically a couple of simple sentences with a link for further information.  For example:
This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies.  Find out more here.
The linked page details in clear, precise language what cookies are and what the site uses them for.  It explains how to set your browser to disable cookies for the site, with the understanding that you might not have as nice an experience since the site won't be able to remember who you are.  Once you dismiss the announcement you don't see it again, because -- of course -- it has set a cookie and knows not to come back (unless you disabled cookies or later clear the cookie).


Wow.  They Got It Right.  Well done!


Random question

Are there now more than a Britannica worth of Britannica-quality articles on Wikipedia?

Is there a UX crisis?

Back in the early days of computing, a software crisis was declared.  Projects were being launched with high expectations -- this was back when computers could do absolutely anything -- only to end up late, over budget, disappointingly lacking in features, buggy to the point of uselessness, or not delivered at all.

Many solutions were proposed.  Software should be written in such a way that it could be mechanically proved correct.  Software engineering should become a proper engineering discipline with licenses required to practice.  Methodologies should be developed to control the development process and make it regular and predictable.  There were many others.

None of these things has happened on a significant scale.  A proof of correctness assumes you understand the problem well enough to state the requirements mathematically, which is not necessarily easier than writing the code itself.  For whatever reason, degrees and certificates have not turned out to be particularly important, at least in the places I've worked for the past decades.

Methodologies have come and gone, and while most working engineers can recognize and understand a process problem when they see it ("Why did I not know that API was about to change?" ... "How did we manage to release that without testing feature X??"), there is a high degree of skepticism about methodologies in general.

This isn't to say that there aren't any software methodologies -- there are hundreds -- or that they're not used in practice.  I've personally seen up close a highly-touted methodology that used hundreds of man-years and multiple calendar years to replace an old mainframe system with a new, state-of-the art distributed solution that the customer -- which had changed ownership at least once during the wait -- was clearly unhappy with.  And well they should have been.  Several months in it had been scaled down as it became clear that the original objectives weren't going to be met.

I've also seen "agile" methodologies put in place, with results that were less disastrous but not exactly miraculous either.  Personally I'm not at all convinced that a formal methodology is as helpful as a good development culture (you know it when you see it), frequent launches, good modularity and lots of testing.

Several things have happened instead of a cure, or cures, for the software crisis.  Languages and tools have improved.  Standards, generally de facto, have emerged.  Now that a lot of software is out, both customers and developers have more realistic expectations about what it can and cannot do.  Best practices have emerged (Unit tests are your friend.  Huge monoliths of code aren't.).  Projects get delivered, often late, over budget, lacking features and buggy, but good enough.  And it's just code.  We can always fix it.  I can sense the late Edsger Dijkstra shaking his head in disapproval as I write this, but nonetheless the code is running and a strong case can be made that the world is better for it.

We don't have, nor did we have, a crisis.  What we have is consistent disappointment.  We can see what software could be, and we see what it is, and the gap between the two, particularly in the mistakes we get to make over and over again, is disheartening.


Which leads me back to a persistent complaint: UXen, in general, suck.

Yes, there are plenty of examples of apps and web sites that are easy to use and even beautiful, but there are tons and tons that are annoying, if not downright infuriating, and ugly to boot.  For that matter, there are a fair number of pretty-but-useless interfaces.  Despite decades of UX experience and extensive research, basic flaws keep coming back again and again.  Off the top of my head without trying too hard:
  • Forms that make you re-enter everything if you make a mistake with anything (these actually seem to be getting rarer, and a good browser will bail you out by remembering things for you -- and in many cases that's a perfectly fine solution).
  • Lists of one item that you have to pick from anyway as though there were an actual choice.
  • "Next" buttons that don't go away when you get to the last item (likewise for "Previous")
  • Links to useless pages that just link you to where you wanted to go in the first place.
  • Security theater that pretends to make things safer.  Please make it stop.
  • Forms that require you use a special format for things like phone numbers.  Do I include the dashes or not?
  • Wacky forms for things like dates that throw everything you know about keys like backspace and tab out the window.
  • Error handling that tells you nothing about how to fix the problem.
  • Layouts that only line up right on a particular browser.
  • Pages that tell you to "upgrade" if you're not running a particular browser.
  • General garish design. Text that doesn't contrast with the background, which is too busy anyway.  Text that contrasts too much.  Cutely unreadable fonts.  Animated GIFs that cycle endlessly.
  • Things that pop up in front of what you're trying to look at for no good reason.
  • Editors that assume, a la Heisenberg, that the mere act of opening an edit window on a document causes unspecified "unsaved changes" that you must then decide whether or not to save (yeah, Blogger, you're guilty here).
And so forth.  This is just off the top of my head.  I've ranted about several of these already, though for some reason the industry doesn't seem to have taken heed.

How does this happen?

How does any less-than-satisfactory design ever happen?  One answer is that reality sets in.  Any real project is a compromise between the desire to produce something great and the need to get something out in front of the customer.  Perfect is the enemy of good enough.

In an ideal world, people would be able to describe exactly what they want and designers could just give it to them.  In the real world, people don't always know what they want, or what's reasonably feasible, and designers don't always know how to give it to them.  In the ideal world a designer has at hand all possible solutions and is never swayed by the desire to use some clever new technique whether it really applies or not.  In the real world designers are humans with limited resources.

This isn't unique to software by any means.  Doors have been around for millennia, and people still don't always know how to design them.

I should pause here to acknowledge that UX is difficult.  There are rules and methods, and tons of tools, but putting together a truly excellent UX that's both pleasant and fully functional, that makes easy things easy and hard things possible, takes a lot of thought, effort and back-and-forth with people actually trying to use it.

Again, though, that's not a property of UX.  It's a property of good design.  The question here is why are UX things that seem simple enough -- like avoiding useless buttons and links -- so often wrong in practice.  A few possible answers:
  • Actually, UX designers get it more-or-less right most of the time.  We just notice the failures because they're really, really annoying.
  • It's harder than it looks.  It's not always easy to figure out (in terms even a computer can understand) that a link or button is useless, or how to lay something out consistently on widely different screens.
  • The best tools aren't always available.  Maybe there's a really good widget for handling a changing list of items that allows for both quick and fine-grained scrolling and so forth.  But it's something your competitor wrote, or it's freely available but not on the platform you're using.
  • Dogma.  Occasionally guidelines require foolish consistency and UX is not in a position to bend them.  This may explain some tomfoolery regarding dates, social security numbers and such.
  • Plausible-sounding reasoning that never gets revisited.  It may seem like a great idea to make sure you have a valid social security number by requiring the user to put in the dashes as well.  That way you know they're paying attention.  Well, no.
  • Reinvented wheels.  The person doing the UX hasn't yet developed the "this must already exist somewhere" Spidey sense, or thinks it would be Really Cool to write yet another text editing widget.
  • Software rot.  The page starts out really nicely, but changes are jammed in without regard to an overall plan.  Inconsistencies develop and later changes are built on top of them.
Hmm ... once again, none of these seems particularly unique to UX.  Time to admit it: UX is a branch of software engineering, liable to all the faults of other software engineering endeavors.  Yes, there is an element of human interaction, but if you think about it, designing a library for people to code to is also a kind of UX design, just not one with screens and input devices.  You could just as well say the same things that make UX development error prone make library design error prone as the other way around.

To answer the original question, there is no UX crisis, no more than there was a software crisis.  We just have the same kinds of consistent disappointment.

But who asked?  Well, I did, in the title of this post.  Interestingly enough, no one actually seems to have declared a UX crisis, or at least the idea doesn't seem to have taken off.  Maybe we have learned a bit in the past few decades after all.