
Ask HN: My school needs a data storage solution - antigen8
I work at a New York Public High School, and we&#x27;re sick of dealing with the legacy Department of Education systems. You cannot imagine. We have to keep them up to date by hand anyway, so while we&#x27;re at it, we want to build our own data store so that we can work with our data to help our students.<p>Our data isn&#x27;t time-series, we need someplace to store our stuff, and we need advice from experts
======
mseebach
My expert advice is, Don't do it, unless you can buy the exact system off the
shelf, and allocate the budget to run and support it. If you can't, stick with
the crappy legacy system you have.

Building actual software for actual use cases for actual people is extremely
hard, and a lot harder than it feels like when you're just a few visionaries
with an itch talking about these things. I'll say it again, slowly. It is
EXTREMELY hard.

Sure, you can probably get a couple of enthusiasts together and build an
infinitely better 80% solution in Ruby on Rails over a weekend, but you will
learn the hard way just how important those last 20% are. That unpleasant
senior teacher that never liked you? He uses a feature in those 20% and he's
friendly with the principal and the union rep, and before you know it, you're
up to 3am, not building awesome solutions for the entire school, but building
bespoke solutions to that guy's convoluted backwards workflows that barely
made sense in the 70s when he came up with them -- and you don't have the
political capital to tell him to shut up and get with the program, however
warranted that message would be.

If you do pull through, your reward will be a pat on the back, and a lifetime
of being on 24h call.

Oh, and that's your _best case scenario_. There's a high risk of not actually
achieving a system that is substantially better than the legacy system, even
in the 80% case. And a substantial risk of introducing a subtle bug that
screws up grading at the worst possible time (you know, how an administrator
pulls a transcript giving an A+ student a B- average and sends it to his
prospective college, screwing up his future, and the error is only caught
after the deadline, and it's all literally, personally your fault).

To pull this off, you need a team, a budget AND hard executive sponsorship.
That sounds unlikely for a public high school.

If you think you have the vision and the domain knowledge, and the technical
chops to pull this off (but no budget and/or no executive cover), your best
bet might be something like YC: fix this problem for ALL schools.

~~~
crdb
Spot on.

(Been there done that, saving your excellent comment for future "let's rewrite
this" situations.)

Also summarised well here:
[http://www.joelonsoftware.com/articles/fog0000000069.html](http://www.joelonsoftware.com/articles/fog0000000069.html)

I'd only add:

\- nitpicking, but with Ruby on Rails, you'd probably use ActiveRecords (the
ORM) to manage the data; by not enforcing constraints in the database itself,
you'll get a tiny bit of orphaned data. Fine for an e-commerce website where
customer service can send a voucher; sucks for the few students whose life is
affected by your non-provably-correct solution.

\- most non-technical users (most of the teachers I know included) confuse
good and pretty UI. "Bad" legacy system usually means "not flat design/doesn't
look like my iPad". This usually pairs with the idea that "it would just be a
simple app, we can get it done for a grand".

------
gnoway
Having worked in a public educational setting before, I would recommend
finding out whether you actually have an option to do something yourself
before spending a lot of time on it. My experience tells me that the people
running the system you're trying to get away from are likely to be
territorial; it's extremely disappointing to get really invested in making
something better only to have politics scuttle the effort.

There may be also be regulatory things (ferpa, etc) your solution would have
to comply with.

Aside from that, your request is more ambiguous than you realize. You're going
to have to be more specific about what you're trying to do.

~~~
antigen8
You're right- it's a very complex process even trying to talk about bringing
new systems into a public school. FERPA is not an inconsiderable force.

I've tried to give more concrete insight into what I'm trying to do, above;
but I think part of the ambiguity is because it's very difficult for me to
understand what we need so far. We need a technical solution to serve people
who can't be expected to directly interact with it...

------
antigen8
Update:

Thank-you to everyone responding, especially those who are helping enormously
by giving me real-talk; Here's some more information about my situtation:

The legacy system I referred to has limitations that make it very
uncomfortable to contemplate continuing to use it as our sole data source.
Some issues: \- The database is updated once a day; Any changes you add to it
today won't show until tomorrow. \- Student absences (important so we can see
cutting, lateness, etc) can be recorded once per day; students are either "at
school" or not. \- There are numerous categories of data that we have that
can't be tracked or stored, for expample, whether a student is involved in a
sports team. We have a large body of instructional data about how students are
doing in class, but we're struggling to bring it together to get a full
picture because as far as our main system is concerned, students receive a
grade at the end of their course for their transcript, and that's that.

I'm not going to ignore what I'm being told and I think that building a new
solution is beyond our capacity; we have to keep our focus on teaching the
kids as well as we can. But we have a very young staff and an administration
committed to exploring our options. I've already been looking at Tableau and
it's so powerful and approachable that we can definitely use it. The
difficulty I now face is what tool can we use to complement and expand the
system that's causing a lot of grief.

We need to be able to store types of information including absences, student
biographical info, grades; we want to be able to store student ID pictures,
lesson plans, notes from teachers. Points of contact with parents. What we
think we know is that we need a SaaS solution, we have regulatory compliance
issues (not to mention ethical requirements) around security and privacy of
our data, which many of you noticed.

I'm reading every single comment and will respond by email and here where it
makes sense. We know we need help even to understand our problem. YC 2016
would be top of my priorities right now, but there are ethics rules preventing
me from being a vendor to any NYC public school while I'm working at one. As
much as it makes sense ethically, it really seems like that shouldn't be the
department's focus when they have kids who can't read yet and they can't tell
you which kids they are.

~~~
bonobo3000
Hey - i'm a recent CS grad so this may be naive.. but heres the easiest
possible solution - export all the data from the old system, and throw it into
a Google Docs Spreadsheet - google apps is free for education. [1] Thats the
simplest UI possible, not that easy to use though.

If you want a more custom tailored solution, you need a programmer to 1.
create a dump of all your data 2. create a web frontend to display it and
update it nicely. A club in my college did this kind of thing - they worked on
simple websites for local companies. College students are a great option
because this is an easy enough project (technically that is, i dont know about
the politics) and they will work for free.

I don't know much about slicker SaaS solutions - Tableau is definitely pretty
great. Sometimes SaaS companies will offer demos over skype of the product and
can answer your questions - i say take advantage of that, tell them your
problem and let them answer how well their product can work for you.

[1] [https://www.google.com/edu/products/productivity-
tools/](https://www.google.com/edu/products/productivity-tools/)

~~~
Nilef
Google Spreadsheets is absolutely not what a government school would consider
private or secure, nor is it likely to be able to handle a database like this,
as it tends to crap out a little ways into the 5-figure row region.

They also won't pay some college kids to do it, enterprises, especially
government enterprises, tend to require contractual support, which is probably
outside the realm of what a college kid can or would want to do

------
patio11
You need two competent programmers and, more importantly, buy-in from about
six people to let you hire them and substantially more once you go to
implement the recommendations pulled from the data. You do not need a very
complicated data storage solution, but your local IBM or Oracle rep would be
more than happy to explain the benefits of their $100,000 offering over a
steak dinner.

~~~
oneweekwonder
Can I recommend IBM Cloudant, first $50 is free if I remember correctly and
you can hit the ground running with a webapp using multiple solution out
there.

If you like software that has been around for decades using Oracle MySQL, or
if you are liberal MariaSQL will do! I will personally recommend Postgres.

No the questions how much they can get out of these datasets and how they are
going to collect the dataset is up for discussion.

But maybe just one competent teacher and a class full of enthusiastic kids,
can make for a interesting project.

------
bonobo3000
I'm assuming 2 things - 1\. The data is accessed by people, so an 50ms of
latency doesn't matter so much for most purposes. Writes/modifications are
infrequent so even a global lock when writing would be ok. 2\. You have about
a few hundred GBs of data at most.

If thats the case, the technical problem doesn't really require anything
novel. A simple SQL engine like sqlite works - its very common, most
programmers know or can pick up SQL and its mature/stable/battle-tested. I
would go with that.

For a more detailed answer, maybe you could bring in a programmer or
consultant to take a look, although they'll probably charge you way more than
its worth.

~~~
antigen8
bonobo3000- I think you're right that, generally, the technical sophistication
of this problem isn't super demanding. We're not storing millions of records,
we're not (as far as I can tell) asking for anything "complicated". We
probably don't have the technical sophistication to manage this ourselves, so
we're probably looking at a hosting SaaS, as has been said in other places

------
thaumaturgy
What kind of data do you need to store? This is important, it will have
impacts on everything from access to security to support to backups and
redundancies.

What level of the architecture are you looking at? Do you need better
hardware, a better filesystem, a better RDBMS, a better reporting interface
for the data you already have?

What kind of budget do you have? It's a public school, so it's probably safe
to assume $0, but it would be good to know what kind of administrative support
you have for this project.

What, specifically, is wrong with what you have now? What isn't working? What
are your frustrations with it?

You may not need two (or any) competent programmers, and I'd be reluctant to
recommend Oracle or IBM unless you're looking to solve a problem that they are
uniquely good at solving.

~~~
antigen8
Thanks thaumaturgy-

I used your questions in writing the update post above, which hopefully
provides some more clarity

------
afiedler
Hey antigen8,

I am working on something very similar with a non-profit in NYC, for NYC
schools. It's very exciting, we are building on NodeJS and MongoDB, and
already have some data syncing set up with the DOE's systems. Can you shoot me
an email (in my profile)? I would love to talk with you and make an
introduction if it makes sense.

~~~
brianwawok
New projects are still using MongoDB? I thought people started to like their
data still being there in 6 months..

------
brudgers
I won't pretend to be an expert, but from reading the thread it sounds like a
separate data store with the additional information [e.g. who is playing
sports, who is not showing up in 5th period, etc.] could readily run alongside
the existing system.

To put it another way, the existing system is built to solve a particular set
of problems. The set of problems that you would like to solve are mostly
orthogonal. The existing system is designed to include every student, the
problems you want to solve only involve some students. Essentially the
problems of interest are fine grained and the legacy system is inherently
course grained. The problems have soft answers, the legacy system is for hard
answers.

In reality, the duplication of effort for a parallel system is pretty much the
name of the student of interest into the new system. Everything else...sports
teams, fine grained absences, etc. has to be entered regardless...and a lot of
information such as home address doesn't really need to come across because
it's only relevant at the point where a specific action requires it.

This is a case where most of the problems of interest don't depend on data
normalization because the problems are mostly related to a small number of
individuals and are handled on a case by case basis. Document store and search
are fine.

Build a system that scratches the actual itches as you have them, not the
system to end all systems.

Good luck.

------
gtf21
YC W16 applications are open, sounds like you could give it a go if you think
that this problem could apply to all schools and that schools would be willing
to pay for it.

~~~
antigen8
I think we all know that Education is one sector, not unlike legal services,
that could stand for a big wake up call.

Lots of challenges to getting new or useful tech into schools, not least of
all that everyone is too busy teaching to be able to pull new tools into their
chain.

You're not wrong though: it would be grand to chip away at some of the
entrenched, systemic inefficiencies.

------
fgutmann
I asume you are managing classical adminstrative data like student records and
so on. For this case I advise a relational database like postgres.

------
bliti
I work in ed tech and just implemented a similar system. Shoot me an email.
I'll gladly chat about it privately.

------
kuyan
> we need someplace to store our stuff

What do you mean by "stuff"?

------
arm55
This isn't going to be a problem you can solve with a quick question on HN.
You need to sit down with someone with a list of project goals and a budget.
I'm happy to chat, because I know how frustrating it can be from your side of
things. My email is in my profile.

~~~
jgeralnik
The email field on hn profiles is hidden, if you want people to see it you
should post it in the about field.

~~~
arm55
Oops thought it was. Thanks!

------
vellum
You should update your post and give examples of what you're storing. Test
scores, word docs?

~~~
patja
yes...it could be as minimal of a need as just needing a file server/NAS. I
teach at a middle school where even the simple task of turning in an
assignment as a file and not a piece of paper is a huge chore with no
institutional support. Everything is either sneaker nets of USB flash drives,
or convoluted processes for working around online services' general
expectation that everyone has an email address.

Anyways, OP has given so little info it seems a bit of a waste that they have
received so many long replies without even knowing what the problem is.

~~~
antigen8
You're totally right, and one of the limitations on the info I was able to
include was that it's very difficult for me to know how to describe what we're
trying to do, not having the expertise. We seem to be trying to find a SaaS
solution that will allow us to store academic records, and is flexible enough
to accommodate things like student work samples, assessment data, and other
forms of information that we might not completely anticipate.

I know we can't build what we need from scratch, and I know that we need
something more sophisticated than a NAS- we don't want to invest the capital
and time to create an inhouse technology stack, we use google apps for our
student and staff accounts and document hosting, and it seems like our data
storage needs to be a similar kind of managed database product which allows
our office staff to input new records through internal web forms and similar
approachable tools.

------
codingdave
It depends on what kind of data you are talking about. There are legislative /
regulatory requirements for some data, as it is required to be public. Other
data is required to be private. There are vendors that have systems you can
buy off the shelf, specifically for schools/districts, and other vendors that
have more generic solutions that you can customize yourself.

But it is hard to even start such a conversation without knowing what you are
really talking about.

You might get some value from having your superintendent talk to other
districts - you may be able to share solutions. Also, NYSSBA has some
partnerships with software vendors, so if your district is a member, there may
be some answers from them as well.

------
douche
Don't buy Blackboard? Without knowing a little more about what you really want
to do, I couldn't recommend anything else.

Educational software (admin or teacher-facing especially) tends to be garbage

------
jasoncrawford
You could check out Fieldbook:
[https://fieldbook.com/?rc=VJTdQhbp](https://fieldbook.com/?rc=VJTdQhbp)

It's a simple data store that feels like a spreadsheet but lets you organize
like a database (disclosure: I'm a founder). Happy to help you get set up with
it.

------
ChicagoBoy11
I'm an NYU student in Education and Digital Design - HIGHLY interested in your
problem. Can you describe a bit more about what the issue you are facing is?
(what kind of data/how it is accessed/ etc.) With this info people on here are
bound to give more specific and useful advice.

~~~
antigen8
Hey ChicagoBoy11-

I wrote an update above with some more information on the types of data we
need to accomodate, but feel free to reach out on the email listed in my
profile, I'd love to make contact with you

------
AlecJacobs
Good managed database solutions:

[https://aws.amazon.com/rds/](https://aws.amazon.com/rds/)
[https://cloud.google.com/sql/](https://cloud.google.com/sql/)

~~~
antigen8
Thanks Alec, I'll be reading a lot about these today!!

------
markjspivey
for an open spec meant for your specific use case, check out "Tin Can API" /
"Experience API" / "xAPI":

[http://tincanapi.com/overview/](http://tincanapi.com/overview/)

specifically the idea of the "Learning Record Store".

it is the new open standard for learning, training, and experience, etc...
backed by the DoD and ADL to replace SCORM (previous standard used by things
like Blackboard and MOODLE and LMSs 'learning management systems').

let me know if you have any questions, i work and research in this space.

------
protomyth
Can you give a little more information on what exactly you want to do? My
"home" gig is a community college and we have done data gathering and storage,
but I'm not quite sure what your needs are.

~~~
antigen8
Hey protomyth,

Mine is a public high school with a desire to collect information about
student assessment, attendance, and other educational records from their
disparate sources together into one home base where the data can then be
explored using tools like tableau.

We have limited knowledge of databases, SaaS services, and managed database
product. What I think we need is a SaaS data hosting service with an aligned
mission, and I'm not sure how best to evaluate the offerings or what the
possible products might be.

I know that my ideal product is one which allows our teachers and office staff
to send arbitrary types of data to the store using internal webforms and such
so that they don't need to have a direct understanding of how to use, interact
with, or manage the data itself.

~~~
protomyth
Not sure what to tell you. We have a current project to do the same, but it is
still in the infancy stage and has no requirements other than retrieve the
data on demand (search is fine). I had an old project with a bit of structure
and that was quite a lot easier. Post if you find anything and I will do the
same as we move forward.

------
brooklyndude
And it all makes you wonder if the NYC school system is the next to be dis-
rupted. A budget of 25 billion ($$$) a year. That's 10X the total VC invested
in NYC last year.

Expenditures of $20K per student.

------
Drahflow
My school used the products from [https://iserv.eu/](https://iserv.eu/) They
sell a complete software solution for servers at schools.

------
oneJob
You don't want to build your own data store. You want a better solution than
the legacy system you're dealing with now.

I work contract jobs, which has given me the opportunity to see many different
approaches to IT infrastructure, team organization, workflow management, IT
solutions, etc, in companies from some of the largest in the world to mom and
pop businesses. I've only ever seen companies in the large to enormous range
engaged in the type of project you're proposing.

It wouldn't just require software engineers and software licenses, you'd also
need prob two business/data analysts with appropriate experience in data
warehouse design and software migration to draw up requirements (if you don't
want it to crap it's pants 10 months into production use, thus making it the
new "Legacy System") and end user / operations documentation / training
material. Then you need the software engineers (prob one application engineer
and one database engineer) to turn those requirements into a working product.
Oh, and these people mentioned thus far will be too busy with this work for
3-9 months to be doing anything else, and then generally not stay on after
because there won't be any appropriate work for them once it's up and running
and they've trained the end users. Next you'll need at least one person who's
primary role is maintaining the system, even if 50% of their time is free to
do other stuff. They need to be able to drop what they're doing at all times
to address production issues, which will inevitably occur from time to time.
This is assuming your organizations needs are static and you won't be keeping
a developer on to iterate new releases in the future with new / different
functionality, to keep your shiny new system relevant and from becoming
"legacy".

Or, you can do what most of the business world does and get a SaaS solution.
Many SaaS (Software as a Service) vendors provide free / cheap licensing
options for educational institutions. Even if you can't find a solution which
offers a free license, for your use case it'll be cheaper in the long run to
go SaaS rather than attempting to bootstrap your own Dev shop, and paying for
all the mistakes that come along with that enterprise.

What you're looking for is an end user SaaS solution from a vendor that will
continue to innovate and has open-source at its core (these type of companies
generally don't leverage closed source code to lock-in customers, and if they
do they are at least easier to migrate away from). You're also looking for a
vendor who's mission is in line with your organization's mission. This
requirement is an intangible, but one that will pay massive dividends down the
road. Finally, steer clear of any solution that includes the first step of:
"Pay our consultant to come to your office to migrate your legacy system to
Solution XYZ (which we didn't actually design and build, but trust us, we're
certified)." I could make a living on just the contract jobs where I'm paid to
come in after these companies and clean up their mess or finish implementing
functionality they weren't able to implement before going over budget.
Additionally, this type of company almost never cares to understand the domain
specific issues or instance specific issues relevant to the customer. They may
"specialize in and only do educational software / institutions", but more
often then not, the result is boiler plate code from their last three jobs
dropped into your servers, totally ignoring the specifics of your system,
historical data, and historical work arounds.

To prevent this issue, and to do it right, you should considering either
hiring or training one employee to be full time on this software, and have
that employee be someone that has intimate knowledge of both your old system
and your organization's operations. This employee must also show a willingness
to adopt best practices and work from a place of consensus (both as relates to
technical AND organizational decisions), rather than someone who will
stubbornly implement what they are already comfortable with and refuse to
listen to anyone that is not as technologically savvy as they consider
themselves to be. This person should consider themselves a steward of the
school's system and its many stakeholders (and maintain buy in and seek the
input of the other stakeholders, who should include a senior administrative
"champion" of the project who also fosters consensus rather than deliver
decries), and not consider the school a user of their system. This person
should aim to learn the solution inside and out, and develop an effective
workflow / ticketing system to see that the system is responsive to it's end
users' needs. One of the first orders of business for this person should be to
document all aspects of their job. This documentation should be updated
whenever any aspect of the role changes. This is the so called "bus manual".
If that person is hit by a bus on the way into work any particular morning,
the mysteries of operation your system do not go with them. Morbid, but if you
only take two things from this post, let this be the first.

Insist on ease of use of the product, clarity of terms of software licensing
(so that you aren't surprised by unforeseen costs if you require scaling up,
adding new users, or making the system available via a web site, all things
which will enviably happen). To avoid vendor lock in and avoid paying shoddy,
expensive consults to implement the system, insist, I say again, insist on the
following two points. If you take away only two things from this post, let
this two part point be the second thing: 2a) And open data model and
unfettered / unlimited api access to your data 2b) Excellent end user
documentation, both for the api and the application

I'd suggest starting with checking out Tableau, Socrata
([http://www.socrata.com](http://www.socrata.com)), and Informatica. You want
something that is as dead easy to use and as powerful as Tableau. And you want
something that is as awesome with your data as Socrata. I've worked with a Dev
from Socrata at a hackathon, I was really impressed. They've really got their
heart in the right place and they have an awesome product. Informatica would
be a more powerful/expensive alternative to Socrata, but likely overkill for
your needs. I'm sure there are other products out there that will work for
your organization with which I'm unfamiliar. Sit with and look over the
shoulder, for a whole day, of people that already implemented any solution
you're seriously considering. Ask them about their experience of implementing
it and utilizing it. Also speak with people that may have gone with another
solution, but started out with the same problem you're trying to solve. Ask
them what they learned, would have done differently, biggest headaches /
opportunities, etc. don't be afraid to solicit guidance from someone in a
meetup or university for guidance through the process. I'm certain someone
would be happy to serve as a sounding board. Just be wary of anyone eager to
take an active role in the project. They are not a long term stakeholder, so
they should only be fielding questions and offering wisdom, not driving the
process, at all.

Best of luck! :) I have several family members who are educators; I wish you
success!!!

~~~
antigen8
oneJob,

I can't thank you enough for the depth and thoughtfulness of your response-
you're 100% right about looking for a SaaS solution, about finding one that is
aligned with our mission. I've been experimenting with Tableau for the last
two days and it strikes the perfect balance of incredible power and usability.

From what I can tell, Socrata is focused on open data and sharing data with
outside organizations. Their product is what we're looking for, their mission
is spot on with ours, but our data has to remain secure and private. We need a
socrata for internal use.

I've shared your thoughts with my administrators, thanks again

~~~
thoumasd
May I suggest you to have a look at OpenDataSoft? This is a SaaS solution
which lets you publish data both externally AND internally with fine grained
access control. It has advanced data processing and data visualization
capabilities. Feel free to get in touch If you would like to give it a try.

[http://www.opendatasoft.com](http://www.opendatasoft.com)

Disclaimer: I work for OpenDataSoft.

~~~
antigen8
Thanks for mentioning this, I'll definitely check it out

------
a3voices
If you choose to build this, there is an opportunity cost because you could be
building something more important.

~~~
antigen8
a3voices-

It would be great if you could expand on this.

