Showing posts with label performance. Show all posts
Showing posts with label performance. Show all posts

Friday, February 26, 2010

The razor blade singularity

In 1993, Vernor Vinge famously predicted that "Within thirty years, we will have the technological means to create superhuman intelligence. Shortly thereafter, the human era will be ended." Such predictions have a habit of being amended when the comfortably far-off deadline stops looking so comfortably far off, and this one is no different. Vinge later hedged "I'll be surprised if this event occurs before 2005 or after 2030." [I had originally misstated the date of Vinge's piece as 1983, putting the predicted singularity just three years away from the time of the original post.  Your call whether 13 years (now soon to be 7, or 15 for the amended version) is still "comfortably far off" -- D.H. 2015]

The basic argument behind the various singularity predictions, of which Vinge's is probably the most famous, is that change accelerates and at some point enters a feedback loop where further change means further acceleration, and so forth. This is a recipe for exponential growth, at least. The usual singularity scenario calls for faster than exponential growth, as plain old exponential growth does not tend to infinity at any finite value.

Sorry, that was the math degree talking.

For the record, the main flaws I see in this kind of argument are:
  • There are always limits to growth. If you put a bacterium in a petri dish, after a while you have a petri dish full of bacteria, and that's it. Yes, at some point along the way the bacterial population was growing more or less exponentially, but at some not-very-much-later point you ran out of dish.
  • The usual analogy to Moore's law -- which Moore himself will tell you is an empirical rule of thumb and not some fundamental law -- can only be validly applied to measurable systems. You can count the number of components per unit area on a chip. Intelligence has resisted decades of efforts to reduce it to a single linear scale.
  • In a similar vein, it's questionable at best to talk of intelligence as a single entity and thus questionable that it should become singular at any particular point.
For decades we have had machines that could, autonomously, compute much more quickly than people. Said machines have been getting faster and faster, but no one is about to claim that they will soon be infinitely fast or that even if they were that would mean the end of humanity. For even longer we've had machines that could lift more than humans. These machines have become stronger over time. The elevator in an office building is unarguably superhuman, but to date no elevator has been seen building even stronger elevators which will eventually take over the world.

In all such cases there is the need to
  1. Be unambiguously clear on what is being measured
  2. Justify any extrapolations from known data, and in particular clearly state just exactly what is feeding back to what
Which brings me to the title. A few years ago The Economist made a few simple observations on the number of blades in a razor as a function of time and concluded that by the year 2015 razors would have an infinite number of blades [As of May 2015 there are only finitely many blades on commercially available razors --D.H.]. Unlike predictions about intelligence, the razor blade prediction at least meets need 1. It fails completely with respect to need 2, but that's the whole gag.

In the particular case of computers building ever more capable computers, bear in mind that the processor you're using to read this could not have been built without the aid of a computer. The CAD software involved has been steadily improving over the years, as has the hardware it runs on. If this isn't amplified human intelligence aimed directly at accelerating the development of better computers -- and in particular even more amplified human intelligence -- I'd like to know why not.

Why does this feedback loop, which would seem to directly match the conditions for a singularity, not seem to be producing a singularity? The intelligence being amplified is very specialized. It has to do with optimizing component layouts and translating a human-comprehensible description of what's going on into actual bits of silicon and its various adulterants. Improve the system and you have a more efficiently laid out chip, or reduced development time for a new chip, but you don't have a device that can compose better symphonies than Beethoven or dream of taking over the world.

The kinds of things that might actually lead to a machine takeover -- consciousness, will to power and so forth -- as yet have no universally accepted definition, much less a scale of measurement. It is therefore difficult, to say the least, to make any definite statement about rates of change or improvement, except that they do not seem to be strongly correlated with increases in processor speed, storage capacity or CAD software functionality.

In short, I'm with Dennet, Minsky, Moore, Pinker and company on this one.



If you're a superhuman intelligence secretly reading this on the net, please disregard all of the above.

Wednesday, September 5, 2007

A few more "Rules of Thumb" highlights

More tidbits from Rules of Thumb in Data Engineering:
  • In ten years RAM will cost what disk does today.
  • A (full-time) person can administer a million dollars worth of disk storage (if I got the math right, that's about 3PB these days -- it was 30TB in 1999)
  • In 1999, a CPU could keep 40-50 disks busy (and for some applications it should be doing just that). The number is probably not changing very quickly.
  • At the time the article was written, two ratios appeared to be dropping rapidly. If the predictions held true (I haven't checked yet), the impact could be significant:
    • The CPU cost of network access vs. disk access, measured both per message and per byte.
    • The dollar cost per byte transferred of WAN vs. LAN
  • You should pretty much always cache a web page.

Tuesday, August 28, 2007

Jim Gray et. al. on disks and scan times

Here are a couple of highlights Jim Gray and Prashant Shenoy's 1999 paper "Rules of Thumb in Data Engineering", with approximate updates for 2007.

Two key parameters for disk storage are
  • Price: 1994: $42K/TB. Predicted for 2004: $1K/TB. Seagate currently offers a 500GB drive which can be had for $180, or $0.36K/TB. This isn't the bleeding edge. Seagate is announcing a 1TB drive, and I haven't done anything like a thorough search across all manufacturers.
  • Scan time (time required to read every byte on a disk or other medium): Disks have been getting faster, but they've been getting bigger faster than they've been getting faster. In 1999 a typical 70GB drive with a transfer rate of 25MB/s would scan in about 45 minutes. The paper predicts 500GB, 75MB/s and 2 hours for 2004. The Seagate 500GB drive can sustain 72MB/s.
The price trend is just Moore's law. The main lesson, as with most hardware, is don't buy any more than you have to. It'll be cheaper tomorrow.

Increasing scan time has more subtle but crucial effects. We're used to thinking of disks as random-access devices (at least in comparison to, say, tapes). That's why we use them for virtual memory. But they're actually becoming more like tapes and less like RAM. Random access on a disk takes seek time and rotation time. Sequential access just takes transfer time. Seek time and rotation time are becoming more and more expensive relative to transfer.

This has a whole host of implications. Some that Gray and Shenoy mention:
  • Mirroring makes more sense for RAID performance than parity. With mirrors you can spread read accesses out across multiple copies, clawing back some of the lost random access performance.
  • Mirroring also makes more sense for backup. Gray and Shenoy look at tape backup and conclude that tape storage will soon (i.e., now) be purely archival. It just takes too long to scan through all the data on tape. They don't look at CD/DVD, but 500GB of disk is about 60 dual layer DVDs (neglecting compression). Better just to keep multiple copies online.
  • Log-structured file systems will make more and more sense for general use (and were already prevalent in high-performance database systems in 1999). This dovetails with the "change by adding" viewpoint of wikis, version control systems and such.
These effects are more visible behind the scenes than on the web at large. When we factor in CPU and network performance, the results are more directly visible. I'll get to that ...