Hacker News new | comments | show | ask | jobs | submit login

Write cache is useful of course but it doesn't make random reads any faster. If your dataset is too big to cache, disk latency becomes your limiting factor. The question then becomes how to most effectively deploy as many spindles as possible in your solution - SANs are one way, sharding/distribution across multiple nodes another.

No, you would never waste write cache on random reads, instead you'd buy more RAM for your server. Why would anyone ever buy a whole new chassis, CPU, drives, etc when all you need is more RAM? As for how to effectively deploy spindles the answer is generally external drive enclosures. Generally you can put a rack of disks and save a 2-4U for the server.

I said "where it's too big to cache" -- that means too big for RAM, aka >50GB or >200GB depending on what kind of server you have.

SANs are one way,

Agreed, if you include fast interconects like SAS and exclude the network[1] requirement of SANs.

sharding/distribution across multiple nodes another.

I disagree, for the sam reason that doing so with iSCSI over ethernet isn't: too much added latency.

Infiniband may help, but I have yet to try it empirically.

[1] Switching/routing, multiple initiators, distances longer than a few dozen meters.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact