In the machine learning world, some of the algorithms that are industrial workhorses will require you to have your dataset in memory (ie: all the common GBM libraries), and will walk over it lots of times.
You may be able to perform some gymnastics and allow the OS to swap your terabyte+ dataset around inside your 64GB of RAM, but the algorithms are now going to take forever to complete as you thrash your swap constantly while the training algorithm is running.
tl;dr - a terabyte dataset in the machine learning context may very well need that much RAM plus some overhead in terms of memory available to be able to train a model on the dataset.
I just ran `time cp /dev/nvme0n1 /dev/null` on the 1TB 970 Pro. The result:
What will cause you serious and unavoidable trouble is if you cannot structure things to have any spatial locality. If you only want one 64-bit value out of the 4kB block you've fetched, and you'll come back later another 511 times to fetch the other 64b values in that block, then your performance deficit relative to DRAM will be greatly amplified (because your DRAM fetches would be 64B cachelines fetch 8x each instead of 4kB blocks fetched 512x each).
Having 4/16TiB servers or "memory db servers" as I thought of them solved a lot of problems outright. Still need huge i/o but less of it depending on your workload.
> The kernel and your other processes need some space to malloc, and you dont want to page in/out.
Some space, like "most of 20-50 gigabytes"?
You want to take into account how exactly the space used by joins will fit into memory, but 2-5% of a terabyte is an extremely generous allocation for everything else on the box.