This is nice. I use `column` for pretty printing CSV/TSV but it fixes two tiny g...

darrenf · on March 16, 2023

> `jq` supports `@csv` for output conversion but not input

Actually, `jq` can cope with trivial CSV input like your example, - `jq -R 'split(",")'` will turn a CSV into an array of arrays. To then sort it in reverse order by 3rd column and retain the header, the following fell out of my fingers (I'm beyond certain that a more skilled `jq` user than me could improve it):

     jq -R 'split(",")' example.csv | jq -sr '[[.[0]],.[1:]|sort_by(.[4])|reverse|.[]]|.[]|@csv'

    "color","shape","flag","index"
    "red","square","0","77"
    "purple","triangle","0","51"
    "red","square","0","48"
    "red","circle","1","16"
    "red","square","1","15"
    "yellow","triangle","1","11"

NB. there is also an entry in the `jq` cookbook for parsing CSVs into arrays of objects (and keeping numbers as numbers, dealing with nulls, etc) https://github.com/stedolan/jq/wiki/Cookbook#convert-a-csv-f...

asicsp · on March 16, 2023

Another option (based on https://unix.stackexchange.com/q/11856/109046):

    (sed -u '1q' ; sort -r -k4 -t',') <example.csv | column -ts','

2h · on March 16, 2023

> sorting with skipped headers is a mess

I like these command line tools, but I think they can cripple someone actually learning programming language. For example, here is a short program that does your last example:

https://go.dev/play/p/9bASZ97lLWv

cosmojg · on March 16, 2023

It's a matter of perspective.

I like programming languages, but I think they can cripple someone actually learning Unix!

At the end of the day, you should just use whatever tools make you the most productive most quickly.

coldtea · on March 16, 2023

The whole point of UNIX userland is to not have to write a custom program for every simple case that just needs recombining some existing basic programs in a pipeline...

2h · on March 16, 2023

> simple case

that's just it though, the last example is not a simple case, hence why the last example is awkward by the commenters own admission. command line tools are fine, but you need to know when to set the hammer down and pick up the chainsaw.

coldtea · on March 16, 2023

>the last example is not a simple case

As far as shell scripting goes, this is hardly anything to write home about. Looks simple enough to me.

It just retains the header by printing the header first as is, and then sorting the lines after the header. It's immediately obvious how to do it to anybody who knows about head and tail.

And with Miller it's even simpler than that, still on the command line...

carb · on March 16, 2023

To me the last example is still simple. When I encounter this in the wild, I don't really care about preserving the header.

  tail -n +2 example.csv | sort -r -k4 -t','

Or more often, I just do this and ignore the header

  sort -r -k4 -t',' example.csv

Keeping the header feels awkward, but using `sort` to reverse sort by a specific column is still quicker to type and execute (for me) than writing a program.

gabrielsroka · on March 16, 2023

I thought the Go code looked way too complex and Python would be simpler. Yes and no.

  import csv
  
  filename = 'example.csv'
  sort_by = 'index'
  reverse = True
  
  with open(filename) as f:
      lines = [d for d in csv.DictReader(f)]
  
  for line in lines:
      line['index'] = int(line['index'])
  lines.sort(key=lambda line: line[sort_by], reverse=reverse)
  print(','.join(lines[0].keys()))
  for line in lines:
      print(','.join(str(v) for v in line.values()))

cmdlineluser · on March 16, 2023

Perhaps a `DictWriter` would simplify things:

    import csv
    import sys
    
    filename = "example.csv"
    sort_by = "index"
    reverse = True
    
    with open(filename, newline="") as f:
        reader = csv.DictReader(f)
        writer = csv.DictWriter(sys.stdout, fieldnames=reader.fieldnames)
        writer.writeheader()
        writer.writerows(sorted(reader, key=lambda row: int(row[sort_by]), reverse=reverse))

gabrielsroka · on March 16, 2023

I thought about that but 1) it seemed like cheating to write to standard out, 2) you're assuming that the column to sort by is an integer whereas I broke that code up a little bit.

But yours has the advantage of being able to support more complex CSVs.