
The Myth of In-Memory Data - dedalus
http://www.interana.com/blog/the-myth-of-in-memory
======
dalke
"The legend of “in-memory” seems to have been born from stories out of
Facebook. "

As a general statement, that's not correct. In my field of chemical
information, that 'legend' started in the mid-1990s. Quoting from
[http://www.daylight.com/dayhtml/doc/theory/theory.finger.htm...](http://www.daylight.com/dayhtml/doc/theory/theory.finger.html)
:

> Computer memories are roughly 10E5 times faster than disks. If one can read
> the bitmaps of a structural key or fingerprint into memory and search it
> there, the speed of the screen itself increases a corresponding amount.
> About the time fingerprints were developed, computer memory prices reached a
> point where the fingerprints from a relatively large database all could be
> loaded into memory. ...

> The evolution of the above-described screening techniques has now reached
> the point where the SMILES and fingerprints of all chemicals known in the
> world (roughly 10-15 million structures) can fit in the memory of a large
> workstation-class computer; it doesn't even take a "mainframe" computer,
> much less a supercomputer. An ordinary database of tens or hundreds of
> thousands of molecules can easily fit into the memory of today's run-of-the-
> mill workstations. Speed increases are correspondingly dramatic: an ordinary
> workstation can screen 100,000 to 1,000,000 structures per second using in-
> memory folded fingerprints.

You can all see this in the literature of that era. In the late 1980s,
computer searches of ~15M structures took 5 minutes on cluster hardware. For
the previous 20 years, everything was on disk because there simply wasn't
enough RAM.

But chemical information has a doubling time of about _18 years_. 1990 was
around the transition point where it was finally possible to put everything
into memory. And a system designed around in-memory random access lookup
really was 100K faster than the software designs of the previous 20 years,
which either did random seeks on a hard disk or full linear scans.

The company that first did this made a _lot_ of money. The literature shortly
afterwards shows a bunch of other companies explaining that they would be out
with an in-memory version "within a year" \- while they tried to redesign
their systems to catch up.

