Monday, September 26, 2011

Real science, hot off the web

A while ago I commented on an Economist article claiming that Web 2.0 tools "were beginning to change the shape of the scientific debate."  My contention was that the web wasn't so much changing the debate as changing the means of publication.  In particular, there had always been a trade-off between speed of publication and thoroughness of review, and the web was becoming a publishing mechanism of choice on the lightly-reviewed end of that continuum.

More recently, looking for something I no longer recall, I ran across Cornell's arxiv.org (I assume the x is meant to represent a Greek χ), a repository for "Open access to 703,281 e-prints in Physics, Mathematics, Computer Science, Quantitative Biology, Quantitative Finance and Statistics."  The number 703,281 was current when I scraped it. It's probably higher by now.

That's a lot of articles, but does anyone use it for anything important?  Well, one recent entry is Superluminal neutrinos in long baseline experiments and SN1987a (Cacciapaglia, Deandrea, Panizzi et. al., yes, those neutrinos).  Indeed, there seems to be a lot of activity in the experimental high-energy physics section overall, which makes sense.  It's useful to have experimental results available quickly, bearing in mind that there can be quite a bit of calibration, number-crunching and checking before an experimental paper is ready for public consumption (months, in the case of the neutrino paper).

Submissions to arxiv.org "must conform to Cornell University academic standards".  It's not immediately clear to me what process is in place to ensure this, but from a little browsing it's clear that these are serious academic papers.  It also seems reasonable to assume that most of the papers have not been through the full process of peer review required for publishing in a major journal.  Indeed, the published version is almost certainly not going to appear on such a site, if only for reasons of copyright.

It seems like a good niche to fill.  If you have a significant result that you're comfortable sharing with the world and staking your reputation on, there should be some way to make it immediately available, with the implied tradeoff between speed and thoroughly careful vetting.  Publishing under the aegis of a major university gives everyone some assurance that you're at least doing real research.  I notice from random sampling that the reference sections generally don't cite arxiv.org, giving some indication of the preliminary nature of the publications.

With that in mind, arxiv.org looks like a great resource not only for working academics but for the curious layperson as well.

Wednesday, September 14, 2011

Urbandictionary vindaloo

Sometimes it's good to remember that www stands for World-Wide Web.  One fun example is Samosapedia, a burgeoning collection of slang from India and thereabouts, built on the same basic user contribution and rating scheme as its older cousin Urbandictionary, but twelve time zones away and, from my brief survey, without so much outright gamy material (somewhat off-color material, on the other hand ...).

Slang is an interesting window into a culture or cultures.  The slang here is every bit as lively as anyone else's, and thanks to the web we can all get a glimpse.  A few more-or-less random examples
  • Cup ice cream -- literally, what you'd expect, but the local flavor is in the details
  • Item number -- a bit of Bollywood
  • Do the needful -- um ... just how would you say that in, say, American English?
My favorite, on a par with the mighty might could, is cannot able to (listed under cannot able to be and, commentary notwithstanding, no more "warped" than many an "acceptable" construct).

Forbes on passwords

Wandering through the web (but not necessarily figuring it out as I went along) I ran across a slide show in Forbes on the subject of passwords, with what seems to me mostly reasonable advice.  Some highlights, mostly common-sense stuff that bears repeating:
  • Change important passwords frequently and don't reuse them
  • Use different passwords for different purposes.
    • Important passwords (e.g., for bank accounts) should be unique. 
    • Less important accounts can share passwords, but be aware that if one account is compromised you should consider all of them compromised.
  • Don't choose a password that's ever appeared elsewhere.  This rules out memorable phrases like "We the people of the United States of America".
  • Passwords should contain nothing personally associated with you (basically a version of the previous item).
  • Password managers may be useful.  The advantage is you can use random gibberish and the manager will remember it for you.  The disadvantage is that if the master password is ever cracked, you're completely hosed.
  • Use HTTPS when logging in.  HTTPS encrypts all connections and uses digital certificates to ensure that you're really talking to whom you think you are (just exactly how secure this system is is a whole other can of worms, but for now let's assume it's basically OK). You can tell if you are because web sites with it start with "https://" instead of "http://" and browsers now indicate whether you have a secure connection
  • Don't type your password into anyone else's machine.
  • Assume that a public WiFi access point is just that, public (the actual slide says to avoid it entirely).  If you're not using an encrypted connection of some sort (HTTPS, SSH, a VPN or such) assume that anyone can see your network traffic, including passwords you type when you log in.  Also assume that any random person can see anything that's publicly shared on your computer (another fine can-o-worms).
  • Don't depend on passwords generated by web sites or random software.  Even if everything's on the up-and-up, it's very easy to get password generation wrong, typically by using a weak random number generator (see this post for more on generating passwords).
  • Archive your important passwords in case of catastrophe, for example by writing them down on a piece of paper and storing it in a safe deposit box that can be opened in an emergency.
  • In general, if you're going to record a password somewhere, do it on a physical medium separate from your computer (see disadvantage of password managers, above).
There are also a few items that don't seem actively harmful, but probably don't help greatly either
  • When replacing letters with numbers and such, use non-obvious numbers, e.g., r7place instead of r3place.  This will add a few bits of entropy, which is good, but not really good enough on its own.  If your base word is in a dictionary of 500,000 words and you replace up to three characters with one of 15 replacements, you have about 30 bits of entropy, which is not that much.
  • Add a number to the end of sentence-based passwords "for extra uniqueness".  Adding a number adds about three bits of entropy.  Meh.
  • Scramble a password when writing it down.  This will make it harder, but not impossible, for someone who finds your written password to figure out the actual password, but it will also make it harder for you to come up with the actual password at two in the morning when you discover you don't quite remember how you scrambled it and the Very Important Site locks out accounts with more than three login failures.  Of course, you could write down how you scrambled it ...
  • Deliberately misspelling words can make passwords more secure.  Yes, but not very much more secure.
  • Use a sentence with lots of words, and include punctuation.  In theory this can work, but in practice people come up with much-less-than-random-words, particularly if the sentence actually makes sense.  Also, surprisingly many systems get indigestion if you try to use a long password.