
Ask HN: If your job involves continually importing CSVs, what industry is it? - iamwil
I was wondering if people still use CSVs for data exchange now, or if we&#x27;ve mostly moved to JSON and XML.
======
_petronius
We do a _lot_ of CSV imports (inventory/data management for art galleries).
This is for a few reasons, most of which boil down to "it's an easy format to
share with clients":

\- Clients that don't already have their data in an inventory management
system usually have Excel spreadsheets, so we can just export that to CSV,
parse it, and stick it in the database.

\- If they really have nothing (some commercial galleries, especially ones
that have been around for 30+ years still do everything on index cards), then
you can give someone a spreadsheet template to do the data entry into, and
script the import step.

\- Even if they do, when preparing data for import, it's easy for the clients
to look through the data (and for us) to see potential errors/inconsistencies
that need to be corrected during the course of importing.

\- Small industry (data management services for commercial art galleries is
maybe 2-3 companies in Europe, 1-2 in the US, and that's about it), all
writing proprietary software, no semblance of agreed formats/standards. When a
client quits one company and moves to another, it's easy to dump their data to
CSV and leave the company doing the import to sort it out on the other end.
Trying to piece together random-looking values across 15+ CSV files to see
what the last legacy system meant by some string is a pain, but it's something
that the client will pay for in most cases.

------
hamandcheese
E-commerce advertising. I worked at a company that managed online advertising
(mostly Adwords at the time) for ecommerce companies. Most would provide us
with a CSV with all their products - this would sometimes be hundreds of
thousands of entries. We imported enough CSV (gigabytes a day maybe) that it
was worthwhile to do things like implement custom CSV parsers to deal with
some of our larger customers noncompliant CSV.

JSON might have been nice, but I can imagine it coming with its own trouble as
well. CSV worked well since our tools would allow analysts to interpolate
columns into the copy or key words when doing bulk ad creation. The nestable
nature of JSON would have complicated the interface given to our nontechnical
analysts.

I also imagine many of our clients wouldn't even know what JSON is.

------
westurner
Arguing for the CSVW (CSV on the Web) W3C Standards:

\- "CSV on the Web: A Primer"
[http://w3c.github.io/csvw/primer/](http://w3c.github.io/csvw/primer/)

\- Src: [https://github.com/w3c/csvw](https://github.com/w3c/csvw)

\- Columns have URIs (ideally from a shared RDFS/OWL vocabulary)

\- Columns have XSD datatype URIs

\- CSVW can be represented as RDF, JSON, JSONLD

With CSV, which extra metadata file describes how many rows at the top are for
columnar metadata? (I.e. column labels, property URI, XSD datatype URI, units
URI, precision, accuracy, significant figures) ...
[https://wrdrd.com/docs/consulting/linkedreproducibility#csv-...](https://wrdrd.com/docs/consulting/linkedreproducibility#csv-
csvw-and-metadata-rows)

... CSVW: [https://wrdrd.com/docs/consulting/knowledge-
engineering#csvw](https://wrdrd.com/docs/consulting/knowledge-
engineering#csvw)

    
    
      @prefix csvw: <http://www.w3.org/ns/csvw#> .
    

@context: [http://www.w3.org/ns/csvw.jsonld](http://www.w3.org/ns/csvw.jsonld)

------
byoung2
I am currently working on a project for a client in real estate management.
They are aggregating data from state licensing agencies, insurance rating
agencies, etc., and there are a lot that only provide CSV data. I imagine on
the other side of the fence there is some guy or gal managing all this info in
Excel, and CSV is the easiest path for the data to flow out.

Previously I worked for a company that provided a dashboard for small
businesses to manage their listings (Yelp, Facebook, Google, Tripadvisor,
etc). For multilocation clients, for initial setup we needed a list of all
locations, addresses, phone, etc., and not a single client said "here's our
api, grab the data in JSON format". Instead, we always got a CSV. We
eventually gave them a CSV template file for them to copy/paste into.

~~~
iamwil
Oh, so like information from data.gov or other public data to verify people.
That makes sense.

I'm guessing small businesses want to do integration with the web, but they
didn't really have the engineers to do an API integration?

~~~
byoung2
Some of these companies didn't even have the data in CSV format...we had to
buy it from another vendor in CSV form
([https://www.aggdata.com](https://www.aggdata.com)).

------
cocktailpeanuts
I am also curious, but just to share my own insight on this, I've actually
looked into a lot of CSV providers and they all seem to cater towards raw data
with less semantics compared to the type of data that we would normally see
through JSON or XML.

For example a lot of CSV data is used for plotting/visualization, because all
they have is numbers. If they had more metadata they probably would have
ported them to JSON.

Another observation: A lot of CSV dumps online are really that: data dump.
It's a huge size file that's meant to be used after downloading, not for
streaming like JSON/XML. You don't see many JSON apis that return huge size
data, but it's common to see a lot of huge sized CSV files.

~~~
iamwil
What sort of people provide the CSVs?

I myself have had to integrate with 3rd party logistics companies, and the way
we got them to send out our inventory was to send them a CSV of our orders
twice a day.

------
lazyjones
Former company: comparison shopping engine / website, used for transferring
e-commerce offers from merchants to our database. CSV is vastly superior (at
least to XML, maybe JSON) due to its size and quick line-by-line parsing. The
downside is lots of quirky and broken formats, there's no real standardisation
in e-commerce.

------
jtcond13
Yes - Insurance. For one thing, many people can code well enough to move data
from CSVs to a database (or vice versa) but not well enough to read/write an
API. I guess the main reason, though, is that many back-office applications
don't need to be 'real-time' and for those it's always going to be easier to
send files and have someone import them to a DB.

------
davelnewton
There's CSV all over the place.

Tables of data are best represented with... well, tables of data. For data
that's not nested it's a perfectly acceptable format. Importing well-formed
(and there's the rub) is trivial and a well-known process.

------
bhassfurt
Yes, many of our customers in the public utility industry use CSV. Use of JSON
is somewhat rare, but XML is quite common.

------
htwillie
I work with a lot of environmental data coming from disparate sources.

.csv is like the lowest common denominator of data formats.

------
et2o
CSVs are often superior for the specific types of structured, tabled data used
in bioinformatics and statistics.

------
jpindar
Electrical engineer here.

EDA tools often export BOMs as CSV, which we import to Excel.

ATE programs often export test data as CSV, also imported to Excel for
manipulation and graphing.

------
jetti
I'm in health care and we use CSVs as outbound file transfers. We also use x12
(EDI) for transfers which is not XML/JSON

------
herbst
> moved to xml

No people still use the obviously superior format for structured tabulary
data. Why would anyone use something that is highly suboptimal for things like
these?

~~~
iamwil
\s ?

~~~
herbst
Why do people seriously think csv is obsolete? Have you seen all the metadata
a json or xml contains? Why in the world would anyone use that for
structured/tabulary data.

Its like when people starting saving photos as png because they read somewhere
how "shitty" jpg is...

------
maplechori
In social science we use Stata/SAS when possible but CSVs show up a lot.

------
jackgolding
Media was a lot of CSVs (exports from TV buying programs), had some logistics
guys use CSVs a lot too

------
kidlogic
Finance

~~~
iamwil
What aspect of finance? Or rather, who are you getting the CSV from and what
do you do with it?

