
List of command line tools for manipulating CSV, XML, HTML, JSON, INI, etc. - networked
https://github.com/dbohdan/structured-text-tools
======
chrisweekly
[lnav]([https://www.lnav.org](https://www.lnav.org)) is a terrific little
tool, a "mini-ETL" of sorts with an embedded SQLite client and a clean,
powerful interface. Its sweet spot is logfiles, but given regex-based custom
formats, works great with any semi-structured input. Lnav easily handles a few
million rows at a time. IME it pairs really really well with eg
mitmproxy/mitmdump for client request logs, as well as webserver logs.

~~~
hoistbypetard
Thanks for linking that. It's going to make my life easier this week, and I
had not heard of it. I was weighing setting up something like Graylog for some
troubleshooting and kind of dreading it. lnav looks like a perfect middle-
ground between that and my wiki page full of grep commands.

------
dancek
This looks like a great resource. The tools you'd like to have for a specific
problem are often quite un-googlable. So you either need complex hacks to get
inferior tools to work or you spend an hour googling the tools for a tiny
problem.

Of course, it would be even better if you could easily tell which of the dozen
JSON query tools is the best choice for the task at hand, or which you should
code if you only want to ever use one of them.

In fact I'd love if someone would like to share their set of tried-and-true
tools. Personally I mostly go with the POSIX tools, plus jq or gawk on
occasion (but I have to read their docs every single time...).

~~~
tannhaeuser
Nit: awk _is_ a POSIX tool, and has multiple implementations that you've
probably used (Debian/Ubuntu comes with mawk, and Mac OS with nawk).

[1]:
[http://pubs.opengroup.org/onlinepubs/009695399/utilities/awk...](http://pubs.opengroup.org/onlinepubs/009695399/utilities/awk.html)

~~~
jwilk
Besides mawk and gawk, Debian ships also the original awk:

[https://www.cs.princeton.edu/~bwk/btl.mirror/](https://www.cs.princeton.edu/~bwk/btl.mirror/)

~~~
tannhaeuser
Yes, and it works well on Debian. But I can't recommend switching to gawk on
Ubuntu.

[1]: [https://askubuntu.com/questions/1011414/gawk-is-crashing-
for...](https://askubuntu.com/questions/1011414/gawk-is-crashing-for-complex-
regexp-when-lang-is-set-to-utf-8-on-16-04)

------
Lio
This is great.

One thing I could suggest for the XML list is xmllint. It can be really useful
for converting xml to canonical format so you can then use diff to compare it.

E.g. something like diff <(xmllint —c14 first.xml) <(xmllint —c14 second.xml)

I’d love to heat about more command line SOAP tools if anyone can recommend
some.

~~~
arundelo
I'll look into xmllint. I currently use HTML Tidy for this:

    
    
      tidy -xml -indent -wrap 0
    

or

    
    
      tidy -xml -indent -wrap 0 -quiet

------
geocar
kdb+/q is another really good choice for dsv[1] and json[2]. You can certainly
create single-file databases (if you really want to e.g. for exchange), but
splayed table[3] is faster so you'd usually do that.

[1]: [http://code.kx.com/q/ref/filenumbers/#load-
csv](http://code.kx.com/q/ref/filenumbers/#load-csv)

[2]: [http://code.kx.com/q/ref/dotj/](http://code.kx.com/q/ref/dotj/)

[3]: [http://code.kx.com/q/cookbook/splayed-
tables/](http://code.kx.com/q/cookbook/splayed-tables/)

~~~
dancek
The problem with that might be the licensing costs once you use it
commercially (eg. at work). IIRC the license prices aren't public, but you're
looking at over $10k in any case.

I personally prefer J to K in the APL family of languages. They also have a
relatively cheap database, Jd [1]. Individual licenses are $600. Still a bit
too much for my data mangling needs. :)

[1]
[http://code.jsoftware.com/wiki/Jd/Index](http://code.jsoftware.com/wiki/Jd/Index)

~~~
geocar
$10k isn't a lot (assuming that's right; it could be). I mean, it's a lot if
you're used to something like MySQL or Postgres-levels of quality, but I've
seen quotes for Oracle being almost $50k per core. MS-SQL is something like
$7k per core, and kdb+ is definitely a lot more useful to me than MS-SQL.

There's also a per-core/minute pricing which might be useful.

~~~
dancek
Sure, kdb+ would probably be worth every penny even at $100k/year when it's
the right tool for the job. I gather it's genuinely the best in-memory
database for computing arrays of varying rank.

But a lot of the use cases these other tools are good for are small tasks
every now and then. I feel kdb+ is in a different category.

~~~
aplorbust
Anecdote: I frequently use kdb+ for small tasks. For me, its in the "all-
purpose" category. The limitations are only in the ability I have to use it.

For example, removing nonconsecutive, duplicate lines from a file, such as a
CSV file:

    
    
       exec echo "k).Q.fs[l:0::\`:$1];l:?:l;\`:$1 0:l"|exec q >&2;
    

where Q.fs is a function in a script thats bundled with the interpreter; the
chunk size for reading the file into memory is adjustable by editing the
function.

~~~
geocar
You can make it simpler:

    
    
        l:0;.Q.fs[{if[x~l;:];-1 l::x}each]`:input
    

or if you have memory:

    
    
        -1 distinct read0`:input
    

...or if you want to use k:

    
    
        -1@?0:`:input

~~~
aplorbust
Stupid question: With -1, how would I suppress the return value? Use a
function?

    
    
       k)a:{-1@?0:`:input;};a[]

~~~
geocar
It's not a stupid question.

    
    
        ;
    

What this does is return generic-null :: which .Q.s doesn't print.

------
bjoli
I took the time to learn recutils a long time ago, and it has been the gift
that keeps on giving

Sure, it is not as fast as many other formats, but on the other hand it
integrates very well into Emacs an org-mode. I manage a large part of my
different collections using a combination of both, and the Emacs integration
means it is all less than 2 seconds away.

------
jwilk
If you're curious what's "tabular JSON":

[https://johnkerl.org/miller/doc/file-
formats.html#Tabular_JS...](https://johnkerl.org/miller/doc/file-
formats.html#Tabular_JSON)

------
netol
I don't understand why csvkit is listed in the SQL-based utilities section.
csvkit is a suite of multiple command-line tools, including csvcut, csvsort,
csvgrep, csvjson, csvstat, csvstack, csvjoin, etc. and multiple converters, so
is not only csvsql

------
netol
Awesome. But a list like this could grow indefinitely. I wrote two CSV
utilities a few years back; a data generator
([https://github.com/pereorga/csvfaker](https://github.com/pereorga/csvfaker))
and a column randomizer
([https://github.com/pereorga/csvshuf](https://github.com/pereorga/csvshuf))

------
manaktir
On macOS there's also textutil, a pre-installed utility for working with text
in different formats. Manpage:
[https://developer.apple.com/legacy/library/documentation/Dar...](https://developer.apple.com/legacy/library/documentation/Darwin/Reference/ManPages/man1/textutil.1.html)

------
AdamJacobMuller
I'm very glad to see the 'silly' tools there, cut/join/paste/sort/uniq. While
I would never build anything 'important' with them, they're an extremely
useful tool to have in your toolkit.

~~~
badsectoracula
Why silly? I use those (especially sort and uniq) all the time, both in my
scripts and in command line.

------
stevoski
Can anyone recommend a command line tool for manipulating Excel files, that
runs on macOS?

Edit: I’m looking for a command line tool that allows me to open an Excel
file, make a few simple changes, and then save again as an Excel file.

~~~
gwn7
Try visidata.

[https://github.com/saulpw/homebrew-vd](https://github.com/saulpw/homebrew-vd)

~~~
reificator
What an unfortunate choice of repository name. I definitely do not want to get
Homebrew VD.

------
nol13
csvfix, prob some overlap, but i've found this one invaluable.

[http://csvfix.byethost5.com/csvfix15/csvfix.html](http://csvfix.byethost5.com/csvfix15/csvfix.html)

~~~
zabzonk
CSVFix author here - please note a better link is
[https://neilb.bitbucket.io/csvfix/](https://neilb.bitbucket.io/csvfix/)

~~~
nol13
Sweet, ty! Staring at what was starting to look like a much larger python
script than I'd anticipated, then realizing I could do it in 16 lines of (very
basic) bash with csvfix + csvcut/sed/iconv was big day for me! Some of my fav
code never written I think. Actually had most those files copied locally
because was afraid the bytehost link would disappear.

That said, link to the manual in the bitbucket link not working.

------
zatkin
This is missing comparison tables.

~~~
jwilk
What do you want to be compared?

