
Unexpected Interaction of Features (2018) - ColinWright
https://www.solipsys.co.uk/new/UnexpectedInteractionOfFeatures.html?sb17h
======
mannykannot
Technically speaking, this may be a case of undefined behavior. From my man
page:

    
    
        -u, --unique
      Unique keys.  Suppress all lines that have a key that is 
      equal to an already processed one.  This option, similarly 
      to -s, implies a stable sort.  If used with -c or -C, sort 
      also checks that there are no lines with duplicate keys.
      ...
        -n, --numeric-sort, --sort=numeric
      Sort fields numerically by arithmetic value.  Fields are 
      supposed to have optional blanks in the beginning, an 
      optional minus sign, zero or more digits (including 
      decimal point and possible thousand separators).
    

When you use -n, without a key fields specification, the whole line does not
meet the requirement for numeric sorting.

This sort does give me the intended output:

    
    
      $sort -k1,1n -k2 -u ~/tmp/sort.txt
      1 a
      5 which
      10 exotically
      15 aerodynamically
      15 differentiation
      20 electroencephalogram
    

Whether or not deduplication on _keys_ is ideal behavior, it is what is
specified here. What is not explicitly specified is what is considered to be
the key when you try to sort a non-numeric line numerically.

This is the sort of problem that you get with duck typing: it does what you
expect and intend, except in those corner cases where it doesn't.

------
trav4225
Ah, yes. The sort command definitely has a few gotchas like this. It's too bad
that we all seem to learn them the hard way. :)

Another one that used to bite me: locale-dependent sorting. These days, I
rarely use the sort command without LC_ALL=C.

------
userbinator
_Excellent, but now I realise there are repeated lines, and I need to de-
duplicate. So I use sort -u to do that_

I would just pipe it to uniq, the ultimate solution proposed --- because that
seems to make more sense to me. I have not ever used the '-u' option of sort
before, nor would I have expected it to have such an option (sort is for
sorting, not removing duplicate lines.) Maybe because I'm more used to the
"UNIX philosophy" instead of the GNU one?

~~~
microtherion
I agree. Often when I want uniquing, I want counts as well, so I have to go
through uniq -c anyway.

