

Results of May's HN unix command survey (spoiler: lots of Rails) - hackerpants
http://jvns.ca/projects/unix-command-survey/graph.html

======
robrenaud
The graphs are nice. But I think raw histograms of the most used commands, and
a table with the highest correlations per given command would also useful and
more digestible.

If you still have the per user data around, I'd also love to see the most
often used commands by the community that I never or very rarely use. Maybe
I'll find a gem of a unix command I didn't know about yet.

~~~
hackerpants
I've spent quite some time looking at those histograms, but I haven't found a
good way to display it usefully yet.

The most popular commands are 'git', 'ls', and 'cd', and the table of most
used commands is too long to be able to easily digest. Any suggestions would
be appreciated, though!

~~~
IanChiles
It'd be awesome to see a dump of the data so that we can examine it on our own
- many people may think of ways to use it that you haven't, or draw different
conclusions from it. :D

~~~
hackerpants
Yeah! This is something that I'm interested in, but there are a bunch of
privacy issues with a raw data dump (references to specific files/URLs and
possibly passwords), and I haven't gotten around to making sure it's safe yet.

(note: I haven't actually seen any passwords, but that doesn't mean there
aren't any)

~~~
IanChiles
Oh, that totally skipped my mind. Maybe run something to recognize URLs
(replace with $URL, or some kind of placeholder), and then try to obfuscate
filenames and the like in a similar way? By normalizing the data like this,
you could get much better results with regards to command line switches and
the like.

------
hackerpants
Someone pointed out to me the 'ls' typo constellation =D

[http://jvns.ca/projects/unix-command-
survey/graph.html?cente...](http://jvns.ca/projects/unix-command-
survey/graph.html?center=sl&degree=2)

~~~
burgeralarm
Similar to the mysql-semicolon cluster: [http://jvns.ca/projects/unix-command-
survey/graph.html?cente...](http://jvns.ca/projects/unix-command-
survey/graph.html?center=;)

------
stared
In general, a very nice thing and thanks for sharing! But...

1\. Are you sharing the raw data? (It would be great!)

2\. It would be useful to see frequencies (e.g. as sizes of nodes).

3\. Why we cannot see 'git', 'ls', 'cd'?

4\. At the first glance things like "pytho", "sourc", "worko" look like a
glitch. Also, when there is a cluster of commands starting with the same
thing, the subgraph is hardly informative. How about scaling text (instead of
cutting them)?

5\. When it comes to a measure of co-occurrences, a nicer quantity than
correlation is the following -
[http://stats.stackexchange.com/questions/6047/does-this-
quan...](http://stats.stackexchange.com/questions/6047/does-this-quantity-
related-to-independence-have-a-name) with a direct interpretation of "how does
the observed coincidence rate correspond to the expected one for independent
variables".

I used it a few times (after testing other measures of co-occurrences (also:
conditional probability) and being dissatisfied by results, especially ones
favouring edges for big or small nodes). Examples (with their recipes) below:

My StackExchange visualization: [https://github.com/stared/tag-graph-map-of-
stackexchange/wik...](https://github.com/stared/tag-graph-map-of-
stackexchange/wiki)

And my visualization of themes in books:
[http://stared.github.io/wizualizacja-wolnych-
lektur/polish_b...](http://stared.github.io/wizualizacja-wolnych-
lektur/polish_books_themes.html)

~~~
hackerpants
I've answered 1) and 3) below.

Thanks for the link in 5)! That's really useful.

I really like the two visualizations you've made -- I'll definitely look into
incorporating some of those ideas when I have time.

------
ColinDabritz
This is pretty nifty!

Where is 'git'? I can't seem to find it using the search.

~~~
hackerpants
good question! The issue is that _everyone_ uses git, so it's not correlated
with any other commands.

