Some of my favorites tools it includes are csvsql and csvlook.
sort -k2 -k1 -k3 contacts.tsv
sort -k1 -k2 -k3 contacts.tsv
To overcome the slowdown of disk I/O, perhaps a workaround could be to use mfs or tmpfs, maybe something like:
mount -t tmpfs tmpfs /dir
TMPDIR=/dir sort -k2 -k1 -k3 contacts.tsv
TMPDIR=/dir sort -k1 -k2 -k3 contacts.tsv
Instead, your best bet in that case is to give sort as much physical memory as you can spare:
sort -S 95% -k1 huge.tsv
Note: in the special case that your dataset is slightly larger than physical memory, splitting it up in advance such that one of the `sort -m` input files lives on a tmpfs should indeed be faster.
Other things to check out if you need Very Fast Large Sorts:
- Use `sort --parallel=N` to use multiple cores. By default it only uses 1.
- Use `sort --batch-size=NMERGE` to increase the number of files merged at once. Otherwise you may be doing more mergesort stages than are necessary.
> Miller is like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON.
Use to8 to convert from UTF(32|16)(LE)? etc. to UTF8 first, then sort with this tool.
I've used iconv for years and it's never let me down.