
Design and Implementation of CSV/Excel Upload for SaaS - dennisgorelik
http://www.kalzumeus.com/2015/01/28/design-and-implementation-of-csvexcel-upload-for-saas/
======
davidw
> I simply didn’t feel right getting that amount of value for free from two
> projects which are run by very small teams, so I approached both and
> convinced them to sell me an enterprise license to their project. It is
> equivalent to the usual OSS license, except it comes with an invoice.

That's really cool of you, patio11! Most open source stuff I've worked on...
I'm happy if I get a "thanks!" from time to time. Sure, it has other benefits,
and I wouldn't be where I am today without open source, but handing out some
actual cash is very classy.

I wonder if more companies would really consider doing this, though - handing
over money that you don't have to is not something I've seen a lot of. I've
worked for people who don't even want to let the world they're using various
bits of open source software, let alone contribute back anything.

Also, there is a concern that money can really change the dynamics of a
community, but by and large, I'd rather see a lot more money funneled to open
source than there currently is. For instance:

[https://twitter.com/antirez/status/557851219088375808](https://twitter.com/antirez/status/557851219088375808)

~~~
zrail
Convincing businesses to pay for OSS is a frustrating problem. I built a
gem[1] that has had almost 5000 downloads, 89 since Monday night (rough proxy
for production users), related to payments. Exactly zero people have had any
interest in paying for the commercial license I offer, even though the gem is
directly related to payments.

The pro offering is probably not quite where it needs to be, but to have zero
interest at all is pretty discouraging.

[1]: [https://www.payola.io](https://www.payola.io)

~~~
pc86
Off topic: This is literally the first gem/package/piece of software I've seen
distributed in this way that has sales tax attached to it for residents of a
certain state. Is there some Michigan law pertaining to sales tax on software?

My business offers in consultancy-based software development so we are not
bound by my state's sales taxes, I was just curious if Michigan has a
software-specific law.

~~~
zrail
It's terrible. I'm a resident of Michigan and so I have to remit sales tax on
"packaged software". The rules for what constitutes "packaged software"
recently got changed to include anything downloadable. SaaS is specifically
excluded, of course.

------
pjungwir
What a great tool! I think one challenge putting this into production will be
all the "extra" stuff people put into spreadsheets:

\- one or more titles on top ("Patient Database", "current as of Jan 5, 2014",
etc)

\- data in a bunch of different sheets.

\- multiple types of "stuff" all packed into one sheet, maybe separated by a
few blank rows or maybe a summary pivot table pasted in the top-left corner.

It's easy to handle that if you are pre-processing the spreadsheets yourself,
but getting to where non-programmers can prep a spreadsheet for uploading
seems hard. How are they supposed to know that column headers belong on the
first row, that each row of data should have the same "kind" of thing, etc.?

Anyway, good luck and thank you for the great writeup!

~~~
dceddia
One of the commenters on the article suggested sending the customer an "import
template," with all of the column headers predefined. The customer simply
needs to copy-paste their data into your CSV import template, and upload it.

Never tried it myself, but it sounds like a decent solution to the problem of
wacky formatting.

~~~
DennisP
I tried it. Customers pretty frequently made their own spreadsheets that
looked mostly like the template, but not quite. I was constantly fixing their
uploads.

I built a system to validate their uploads and let them fix their errors on a
website, and all that hassle went away.

(I still gave them the template so they'd know what it was supposed to look
like, I just didn't rely on that alone.)

------
Weeekend
Thanks Patrick, I now have a url I can send to anyone who complains that
$OSS_PROJECT they depend on isn't supported to the degree they'd like.

~~~
patio11
That's probably worth an entire essay, but I don't think I'll be the guy to
write it, as the topic makes me positively volcanic.

~~~
Harkins
Justin Searls' recent talk "The Social Coding Contract" has a nice take on why
expectations get so out-of-sync:
[http://blog.testdouble.com/posts/2014-12-02-the-social-
codin...](http://blog.testdouble.com/posts/2014-12-02-the-social-coding-
contract.html)

Nobody is "the bad guy", but creators and late adopters have very different
understandings of what OSS is for, why it exists, and what to expect.

------
rwmj
A very common and very real problem. At my last company we addressed this by
writing a CSV library that really did handle all the Excel weirdness.

[https://forge.ocamlcore.org/projects/csv/](https://forge.ocamlcore.org/projects/csv/)

Actually some of that is not automatable, so this cannot by its nature be a
complete solution, but with heuristics -- ugh -- it worked. Months and months
of effort to get it to work, and we didn't even have the general public as
customers, just a small set of B2B companies.

------
daniel-levin
Hey Patrick! I'm a huge fan of your work, and really enjoy reading your blog.
I have one question though.

AR is HIPAA compliant, which implies that there is (medically) sensitive
information hitting your servers. Why is it not an issue for you and your
support agents to actually see that data yourselves (as you would when
manually fixing CSV errors)? If your seeing this data doesn't violate the
_letter_ of HIPAA, surely the ethical impetus behind the act would prevent you
from doing so?

~~~
patio11
I have spoken all the eldritch rituals which legally permit a doctor to share
patient information with me personally as long as they have a contract with my
name signed in blood on it.

Just kidding. It isn't actually that bad. Appointment Reminder is a "Business
Associate" of Happy Teeth Dental. I'm it's HIPAA compliance officer, attend a
yearly training session, have been threatened with the most severe of
sanctions if I misused patient data, see only the data required for my job,
and have my name and access rights recorded in a spreadsheet ready to be
audited (along with my access logs). That's probably half of the list. Clearly
HIPAA can't completely ban non-doctors from seeing medical data or the entire
medical sector grinds to a halt, right?

With regards to support agents, some people at the company are approved for
access and some are not. The system enforces access rights, naturally.

------
sandGorgon
I had a discussion with the author of sheetjs about a year back as a response
to my comment[1] on VBA and Excel. We went back and forth on building a
kickstarter campaign - especially for his "transpiler" that converts VBA to
JS.

I'm glad to see this has come so far ahead. My offer to contribute to a
kickstarter still stands !

[1]
[https://news.ycombinator.com/item?id=4361091](https://news.ycombinator.com/item?id=4361091)

------
christiangenco
Importing data from Excel shouldn't be this painful.

I'd love to see a service that takes in a user's mangled spreadsheet and some
regex validation for each column and spits back perfect JSON-formatted data,
walking the user through corrections along the way (ie: intelligently guessing
column mappings, highlighting malformatted cells, column joins, down/up/title
casing, and string substitutions). Something that could be integrated in a
line or two of javascript would be fantastically valuable (especially if that
"$100,000 in engineering time" heuristic is accurate).

~~~
patio11
This would be _particularly_ useful if you could somehow do it in such a way
that the service doesn't need to actually be in possession of the
CSV/Excel/etc data at any point. (That would be a non-starter for privacy
reasons at many companies.)

Would I have paid, oh, $500 a month for this? Heck yes. I would have paid it
for it on day #1 and continued paying for it for each of the last 4 years, and
it would _still_ be cheap at the price.

~~~
christiangenco
Huh. That'd be tricky, but not impossible. You'd have to do everything on the
browser, which would limit the size of the spreadsheet you could import (it
would have to fit in memory), and may limit browser support. It'd also be
harder to productize since the secret sauce is now just javascript.

I really want this to exist, though. Maybe it could be open sourced and
survive on your proposed enterprise licenses. Hmm...

------
wallflower
My assumption is Apache POI to "parse" native Excel files was never considered
because OLE2 is famous for being an inception-like file format.

[http://poi.apache.org/casestudies.html](http://poi.apache.org/casestudies.html)

------
Splendor
Thanks for sharing! The last section about enterprise licenses for OSS was
especially great.

------
dkersten
I haven't heard of SheetJS before, but now I will definitely look into it! I
have recently started using Handsontable though and its absolutely fantastic.

------
TheSisb2
I can't find any information on browser compatability - does anyone know?

