Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Unix command-line tools are seriously undervalued for processing data, even big data.

Need parallelization? Try xargs -P on a multi-core machine.



Don't forget cut, sort and uniq.

If you have a big csv file, you could strip out all duplicate values in the fourth fields:

cut -d , -f 4 [file] | sort | uniq > values.txt

You could then grep out the records with the particular values of interest.

If a file is too big for an editor, you can still see the record structure with head.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: