
Finding data items in one field that contradict data items in another field - eaguyhn
https://www.polydesmida.info/BASHing/2018-09-30.html
======
codeulike
Ok so the title of the blog is 'Data ops in the Linux command line'. Sounds
fun, but in the same way that 'paintball with blindfolds' would be fun.

e.g. this is one of the examples - a kindof sanity check on two fields in a
tab separated file to see if one is less than the other. The fields are
identified by their position (31, 33 etc)

    
    
        awk -F"\t" 'NR>1 && $31!="" && $33!="" && $33>$31' fish | wc -l
    

Surely much better to just import it into a database and do the analysis in
SQL. The SQL equivalent of the above would be something like:

    
    
        SELECT *
        FROM FishSpecimenData
        WHERE MinDepth > MaxDepth
          AND MinDepth is not null AND MaxDepth is not null
    

If you're worried about type conversions while importing into SQL, just import
everything as a varchar. You've still got a fairly easy job to compare the
numbers:

    
    
        SELECT *
        FROM FishSpecimenData
        WHERE Cast(MinDepth as int) > Cast(MaxDepth as int)
          AND MinDepth is not null AND MaxDepth is not null
          AND IsNumeric(MinDepth) = 1 and IsNumeric(MaxDepth) = 1
    

edit: To be fair, on this page
[https://www.polydesmida.info/cookbook/index.html](https://www.polydesmida.info/cookbook/index.html)
the author explains the rationale for using command line tools:

 _I 'm a retired scientist and I've been mucking around with data tables for
nearly 50 years. I started with printed columns on paper (and a calculator)
before moving to spreadsheets and relational databases (Microsoft Access,
Filemaker Pro, MySQL, SQLite). In 2012 I discovered the AWK language and
realised that every processing job I'd ever done with data tables could be
done faster and more simply on the command line. Since then my data tables
have been stored as plain text and managed with GNU/Linux command-line tools,
especially AWK_

So I guess the point of the blog is to promote that approach. Fair enough.

~~~
gjulianm
I don't think it's much better to use SQL, it's literally killing flies with
nuclear bombs. Awk does the job perfectly well and you don't have to spin up a
SQL database, import the data, deal with any possible problem when importing
the data. I don't know why a lot of people have this aversion to the console
when it can do the job fairly well, specially in these cases where you just
want a quick check.

~~~
mercer
How would SQLite fare in regards to your concerns? You won't have to 'spin up'
a database and importing from various sources, AFAIK, is pretty easy these
days (and OP even goes out of their way to deal with VARCHAR fields).

It strikes me as much easier than having to figure out Awk, but that's
assuming most people are more likely to know SQL and/or use it in the future.

I mean, I'd be happy to try and solve the problem with Awk because it's been
high on my list of things to learn, but SQL does seem like an easier solution
in most cases.

~~~
gjulianm
You still have to create the SQLite database and have the client handy. Awk is
there by default in all *nix systems, and for these basic tasks it's
incredibly easy (see
[http://www.hcs.harvard.edu/~dholland/computers/awk.html](http://www.hcs.harvard.edu/~dholland/computers/awk.html)
for a quick intro). Once you know it you can use it to process a lot of text
files in the console without any kind of worry. Maybe it's a preprocess step
before importing data to a database, maybe you're just exploring a dataset,
maybe you want to modify some command's output in the middle of a pipe.

I learned awk a few years ago and it does not have equivalent when the only
thing you want to do is to consume column-based text and do some simple
processing: it is simple and incredibly fast.

~~~
mercer
Thanks for the link and the motivation to keep reading the AWK book I started
a while ago!

------
lolc
Oh I thought the article would be about scientific fields. Not data fields. I
became increasingly irritated about the pedantry before I realized that this
was the topic.

Once the confusion lifted I could enjoy the read.

~~~
stephengillie
Last week I signed a lease that stated "Mold have existed since the beginning
of time." While scientifically inaccurate, I believe legally it is close
enough.

~~~
NullPrefix
Is that a waiver so you could agree to live in a moldy house? Do you have an
option of not living there?

Fixing your health issues could cost more than the savings you get from living
in that shack.

~~~
stephengillie
Mold can become a bad issue if your bathroom isn't ventilated after showers.
I've rented for decades in WA State and every lease I've signed has had a mold
clause. The building was built 4 years ago; I watched them build it.

