
Ask HN: SQL or NoSQL? - steinsgate
I need to analyze some football match data using a neural net. The data is in XML, one XML file per match. The XML file is really dirty. I wanted to know if its make sense to extract the data from XML and structure it first in an SQL DB? Or is it advisable to work directly with the XML files? If anyone has experience on the matter, I would really like to hear about it.
======
rubyfan
This is less SQL vs. NoSQL and more that you just need to extract a usable
data set.

XML in the raw is not directly suitable for modeling or analysis. NoSQL
solutions that query or otherwise make XML accessible are probably not worth
the added trouble of their administration overhead.

XML extraction is simple to yield tabular data sets to feed into a model,
database, data frame or otherwise.

------
geophile
A lot depends on what kind of cleanup you need to do. If each cleanup step can
be done in the context of a single XML file, then your simplest approach is
probably to skip databases completely, just process one file at a time in your
favorite language.

If you need set-oriented operations, it's hard to imagine you can do better
than use SQL, although that presumes that you have normalized the XML into
SQL, which may or may not be trivial. Depends on the structure of your XML
docs.

After the cleanup: hard to say what's best, but it depends on the structure of
the data, what you want to do with it, and how much of it there is. To me, SQL
is the tool of choice nearly always, unless you have requirements for data
volume or data structuring that are incompatible. On the latter point (data
structuring), since you have XML, I would guess that it would not be difficult
to define a SQL schema. I.e., the schemaless aspect of NoSQL systems might not
be important for you.

~~~
steinsgate
Thanks, that analysis helped me a lot.

------
saluki
I'm working on a fantasy football(NFL) app.

You might need to write an importer that parses your xml, cleans it up.

I just setup an importer for weekly game results. I cut the data off the
website that hosts our league and paste it into a form, select the season and
week and then click import. It cleans up the text, sets the teams, gets the
points, etc and inserts it in to MySQL.

I basically then display the data, run some calculations, etc.

This app is for a history of our league so we have our results if we change
providers, etc.

As far as the best fit for you.

What is your goal, do you want to access this data run reports show historical
data. If so I would extract it out to MySQL.

Or if you're just analyzing the match maybe import the xml file, show results
as a one time process.

Good luck.

~~~
steinsgate
Our final goal is to predict results using deep learning. Good luck to you
too!

------
systems
you need to ask yourself this ..

    
    
       * do i want to create a database of the match data
       * are you more comfortable with sql than other languages
       * are you more comfortable with nosql
       * do you have technical (i.e. performance) constraints or requirements 
       * will others need to access and analyze this data
       * what reporting tools do you plan to use 
       * what reporting tools are your users used to use
    

after you starting asking yourself the right questions (i.e analyzing) .. the
answer should present itself to you

------
einhverfr
Depends on what analyze data means. Relational math is _wonderful_ for many
kinds of data analysis so my first inclination is to assume that SQL would be
preferable.

But on the other hand... It sounds like this is training data for a neural
net? In that case maybe working with the XML directly makes more sense?

In the end, I would probably go with SQL to start with just because the
analysis features of relational databases would provide some really nice ways
to check results. That may be secondary but it is significant.

~~~
steinsgate
Yes, this is indeed training data. But you are absolutely right, we would
definitely need to inspect and analyze the data for sanity checks. Therefore
storing to a DB that has nice analysis features probably makes sense. Thanks
for your answer!

------
cauterized
You should definitely extract the data from the XML. Whether to extract it
into a SQL vs a NoSQL DB is a different choice. You're likely to want a DB
regardless, for the sake of the efficiency you get from indexing (compared to
XML) even with a document store, and the flexibility you get from databases
rather than custom native classes and data structures (though you may also
want to build those on top of the DB layer).

