Hacker News new | past | comments | ask | show | jobs | submit login

This is nice. I use `column` for pretty printing CSV/TSV but it fixes two tiny gaps in `sort` (skipping header lines) and `jq` (parsing CSV input. `jq` supports `@csv` for output conversion but not input).

  $ cat example.csv 
  color,shape,flag,index
  yellow,triangle,1,11
  red,square,1,15
  red,circle,1,16
  red,square,0,48
  purple,triangle,0,51
  red,square,0,77
  
  # pretty printing
  $ column -ts',' example.csv 
  color   shape     flag  index
  yellow  triangle  1     11
  red     square    1     15
  red     circle    1     16
  red     square    0     48
  purple  triangle  0     51
  red     square    0     77
  
  # sorting with skipped headers is a mess. 
  $ (head -n 1 example.csv && tail -n +2 example.csv | sort -r -k4 -t',') | column -ts','
  color   shape     flag  index
  red     square    0     77
  purple  triangle  0     51
  red     square    0     48
  red     circle    1     16
  red     square    1     15
  yellow  triangle  1     11



> `jq` supports `@csv` for output conversion but not input

Actually, `jq` can cope with trivial CSV input like your example, - `jq -R 'split(",")'` will turn a CSV into an array of arrays. To then sort it in reverse order by 3rd column and retain the header, the following fell out of my fingers (I'm beyond certain that a more skilled `jq` user than me could improve it):

     jq -R 'split(",")' example.csv | jq -sr '[[.[0]],.[1:]|sort_by(.[4])|reverse|.[]]|.[]|@csv'

    "color","shape","flag","index"
    "red","square","0","77"
    "purple","triangle","0","51"
    "red","square","0","48"
    "red","circle","1","16"
    "red","square","1","15"
    "yellow","triangle","1","11"
NB. there is also an entry in the `jq` cookbook for parsing CSVs into arrays of objects (and keeping numbers as numbers, dealing with nulls, etc) https://github.com/stedolan/jq/wiki/Cookbook#convert-a-csv-f...


Another option (based on https://unix.stackexchange.com/q/11856/109046):

    (sed -u '1q' ; sort -r -k4 -t',') <example.csv | column -ts','


> sorting with skipped headers is a mess

I like these command line tools, but I think they can cripple someone actually learning programming language. For example, here is a short program that does your last example:

https://go.dev/play/p/9bASZ97lLWv


It's a matter of perspective.

I like programming languages, but I think they can cripple someone actually learning Unix!

At the end of the day, you should just use whatever tools make you the most productive most quickly.


The whole point of UNIX userland is to not have to write a custom program for every simple case that just needs recombining some existing basic programs in a pipeline...


> simple case

that's just it though, the last example is not a simple case, hence why the last example is awkward by the commenters own admission. command line tools are fine, but you need to know when to set the hammer down and pick up the chainsaw.


>the last example is not a simple case

As far as shell scripting goes, this is hardly anything to write home about. Looks simple enough to me.

It just retains the header by printing the header first as is, and then sorting the lines after the header. It's immediately obvious how to do it to anybody who knows about head and tail.

And with Miller it's even simpler than that, still on the command line...


To me the last example is still simple. When I encounter this in the wild, I don't really care about preserving the header.

  tail -n +2 example.csv | sort -r -k4 -t','
Or more often, I just do this and ignore the header

  sort -r -k4 -t',' example.csv
Keeping the header feels awkward, but using `sort` to reverse sort by a specific column is still quicker to type and execute (for me) than writing a program.


I thought the Go code looked way too complex and Python would be simpler. Yes and no.

  import csv
  
  filename = 'example.csv'
  sort_by = 'index'
  reverse = True
  
  with open(filename) as f:
      lines = [d for d in csv.DictReader(f)]
  
  for line in lines:
      line['index'] = int(line['index'])
  lines.sort(key=lambda line: line[sort_by], reverse=reverse)
  print(','.join(lines[0].keys()))
  for line in lines:
      print(','.join(str(v) for v in line.values()))


Perhaps a `DictWriter` would simplify things:

    import csv
    import sys
    
    filename = "example.csv"
    sort_by = "index"
    reverse = True
    
    with open(filename, newline="") as f:
        reader = csv.DictReader(f)
        writer = csv.DictWriter(sys.stdout, fieldnames=reader.fieldnames)
        writer.writeheader()
        writer.writerows(sorted(reader, key=lambda row: int(row[sort_by]), reverse=reverse))


I thought about that but 1) it seemed like cheating to write to standard out, 2) you're assuming that the column to sort by is an integer whereas I broke that code up a little bit.

But yours has the advantage of being able to support more complex CSVs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: