
GNU datamash - jonbaer
http://www.gnu.org/software/datamash/
======
jph
This is great work, and runs fast. The documentation is well done and has
plenty of examples.

Here's an example of datamash and R with timing.

    
    
        time datamash sstdev 1 < data.txt
        288891.28552648
        0.76s user 0.01s system 99% cpu 0.775 total
    
        time R --vanilla --slave -e \
        "x <- read.table('data.txt', header=F); sd(x\$V1);"
        288891.3
        2.68s user 0.06s system 99% cpu 2.761 total
    

(The data.txt file is 1 million lines, each line a random number 1 to 1
million. The timing is on a MacBook Pro Retina 13" 2014)

~~~
mbq
That is not fair; you count R's start-up time and all the guesswork which
read.table does and datamash doesn't have to do.

~~~
JamesMcMinn
Well, it is fair if all you want to do is mash some data together. Why
shouldn't startup times be taken into consideration?

~~~
mbq
Because with R you can and usually do multiple things with multiple data
sources within one session, which effectively dissolves the start-up and load
time. Even if reading one file and calculating mean or something with a single
script is the only thing you do, you can use Rscript which runs R without
loading heavy stuff like the methods package.

------
bagrow
Welp, I'm outta business...
[https://github.com/bagrow/datatools](https://github.com/bagrow/datatools)

~~~
thyrsus
You provide multivariate statistics, and this doesn't.

------
theophrastus
Apologies for the tangential question, but how does one find the public key
for (something like) datamash?

Downloaded: datamash-1.0.6.tar.gz and datamash-1.0.6.tar.gz.sig

Then did:

    
    
      gpg --verify datamash-1.0.6.tar.gz.sig datamash-1.0.6.tar.gz
    

Which results:

    
    
      gpg: Signature made Tue 29 Jul 2014 03:30:23 PM PDT using   RSA key ID 3657B901
      gpg: Can't check signature: public key not found
    

Where can one import that public key, and is it the public key for datamash or
gnu?

~~~
jonbaer
-> % gpg --search-keys 3657B901

(1) Assaf Gordon <agordon@wi.mit.edu> 4096 bit RSA key 2272BC86, created:
2014-07-09, expires: 2015-07-09

Initial announcement ... [http://lists.gnu.org/archive/html/info-
gnu/2014-07/msg00007....](http://lists.gnu.org/archive/html/info-
gnu/2014-07/msg00007.html)

~~~
theophrastus
thank you! (moral of story: don't start the search by visiting keyserver sites
like [https://pgp.mit.edu/](https://pgp.mit.edu/))

------
thristian
Don't forget the FreeBSD 'ministat' tool, which supports fewer operations but
will draw ASCII-art histograms:

[https://github.com/thorduri/ministat](https://github.com/thorduri/ministat)

------
dredmorbius
Sweet.

I've had a little awk routine that I wrote some years back that does much of
this -- it computes (or tabulates) n, sum, min, max, mean, median, standard
deviation, and percentiles of the input data series. For generating quick
stats, it's quite useful.

I'm looking forward to datamash turning up in my Debian repos.

------
dufferzafar
The page mentions Windows, but there aren't any binaries available for it. Am
I missing something?

------
grepinsight
I LOVE the interface and a variety of operations, especially the grouping
functionality! Thank you for making my life much easier.. I would love to see
more of R operations such as "sample" or "rnorm" added in the later version.

------
pavanred
I used to choose awk/gawk, python, R for different file, numeric, textual and
statistical operations. This is great, I would definitely use it.

------
_of
I love it. No more loading tables into R just for transposing it.. just doing

cat table.txt | datamash transpose

------
voltagex_
[http://www.gnu.org/software/datamash/manual/datamash.html](http://www.gnu.org/software/datamash/manual/datamash.html)

This looks pretty cool. Anyone used it in "real life"?

