Ask HN: What do you use to clean text data for ML/DL? - k4ch0w
======
ktpsns
I guess you mean preparing CSV files for another format, in order to load it
in some ML code?

Vim. It can easily handle large multi gigabyte text files.

For batching: head, tail, awk, grep -- the good old command line gems. They
have hardly been beaten in speed.

If you mean "clean" in terms of some standarization (thinking of natural
language recognition), I hardly can imagine there is a single tool which
covers all use cases...

