
How to Shuffle and Sample on the Command-Line - jpalardy
http://blog.jpalardy.com/posts/how-to-shuffle-and-sample-on-the-command-line
======
ahh
Interesting. I did the same investigation myself a few years back, but was
frustrated by the lack of the -r flag for shuf(1). It seems that's been added
at some point recently (though many of my systems do not have it--GNU
coreutils percolates slowly through older Debian/Ubuntu versions. :))

Good to know things are still getting better in coreutils!

~~~
keithpeter
Oh, nice, on Fedora 23 beta, I can simulate die rolls

    
    
         shuf -r -z -n 100 -e 1 2 3 4 5 6;echo -e "\n"
    

And rolls of a non-transitive (Grime) die

    
    
         shuf -r -z -n 100 -e 3 3 3 3 3 6;echo -e "\n"
    

Which is timely. I sometimes forget how flexible the terminal prompt is...

~~~
zatkin
It's pseudo-random though, right?

~~~
keithpeter
Oh, I'd imagine so. Good enough for illustrative purposes and for catching any
gross errors in my arithmetic when analysing the games. Not good enough for
anything 'real'.

------
IgorPartola
I recently discovered shuf since I needed to shuffle a fairly large number of
URL's in a file to allow multiple processes to work through them in parallel
(yes I could have actually done this using a queue and a producer/consumer,
but this was a one time deal so it was faster to just throw a bit more
hardware at it). What I was amazed by is that it took 'cat urls.txt | shuf >
new-urls.txt' just about a second to complete even thoug the original file was
about 1GB. How does it work so incredibly fast?

------
pixelbeat
This uses a few utils and techniques to deal 5 random cards:

[https://twitter.com/pixelbeat_/status/587703133717057537](https://twitter.com/pixelbeat_/status/587703133717057537)

    
    
        paste -d '' <(printf '%s\n' $(seq 2 9) T J Q K A | sed 'p;p;p') \
        <(yes $'H\nD\nS\nC' | head -n52) |
        shuf -n5

------
latkin
On Windows, Get-Random handles the basic cases nicely

    
    
        > 1..100 | Get-Random
        72
        > 1..100 | Get-Random -count 3
        16
        96
        56
    

but doesn't have something like -r to resample. Or a nice way to simply
shuffle the whole collection (workaround is to just pass -count as large or
larger than the collection).

------
vortico
Nice! I've been using

    
    
        sort -r | head -n100
    

but obviously this requires the entire file to be shuffled before printing the
first 100 lines.

~~~
hnov
The -R option not being available on OS X, you might do something like

    
    
      awk "BEGIN { srand($RANDOM) } { print int(rand() * 1000000), \$0 }" | sort -n | cut -d' ' -f2-
    

to shuffle an input

~~~
bla2
[http://bost.ocks.org/mike/shuffle/compare.html](http://bost.ocks.org/mike/shuffle/compare.html)

~~~
epistasis
Note that hnov's awk command is the equivalent of "sort (random order)" at
that and shows good randomness properties in the plot. However, that link
shows "sort (random comparator)" by default which looks terrible at randomly
sorting lists. hnov's awk script should be suitable for most needs, though I'd
tweak it a bit:

    
    
         awk "{print rand(), $0}" | sort -g | cut -d' ' -f2-
    

which is shorter allows more than 1,000,000 random values, namely ~52bits in
awk's 64bit implementations.

------
wodenokoto
How does this handle files that are gigs in size? It looks to me like you load
the entire file into memory and pass it along, shuffled.

~~~
sgk284
See: reservoir sampling [1]

[1]
[https://en.wikipedia.org/wiki/Reservoir_sampling](https://en.wikipedia.org/wiki/Reservoir_sampling)

------
falsedan

        man shuf

~~~
lcswi
It's not the documentation, it's knowing which tools exist in the first place.

~~~
falsedan

        man coreutils

?

ooh _topical_ :

    
    
        ls /usr/bin | shuf -n1 | xargs man

