
Ask HN: Help needed on database schema designs - antoaravinth
Hi HN,<p>I work as full stack developer, but never really spent time on Database designs. I had written code to retrieve data out of DB, but never ever created a schema from scratch. So, I thought of doing some side project to understand how database design involves.<p>I had decided to create a Resume Builder Application. Here are some of the features that I had thought about:<p>* User can sign up to create his&#x2F;her own resumes.<p>* User can have multiple resumes. (only applicable for paid users, for free users only one resume)<p>* Resume can be built from the UI (from available components) and user can save it as draft.<p>* Resume can be published via url.<p>* Published resumes stats like visitors count needs to be calculated.<p>* User can delete the resume.<p>* User can upload his&#x2F;her pic.<p>* User can add description about themselves.<p>Given all these, I have come up with this schema:<p>https:&#x2F;&#x2F;ibin.co&#x2F;3CoJEfeQTuAp.png<p>Here are the questions I have:<p>1. I have a general knowledge on RDBMS, so my schema targets databases like Oracle, MySql etc. But can we choose NoSql for this problem? If so why?<p>2. Is my schema correct? Does it work on large data with ease?<p>3. I thought of saving the resume layouts (layouts for each resume is decided by the user via say UI) in JSON format. Is it a good practice to use JSON datatypes?<p>4. ResumeData table serves to store the user data for each resume. Is it a good to have a separate table for the same?<p>5. If the answer for the first question is RDBMS, which database you will prefer? I had decided to use PostgreSql.<p>Thanks a lot for spending time on answering these questions.
======
brudgers
It is probably easier to design a schema from a data-centric viewpoint rather
than a user centric viewpoint. Schema are ontologies and ontologies answer the
question, what the hell is it?

For example, a resume contains photo*. CRUD by the user is no different that
CRUD by the site owner and no different from CRUD by a recruiter. Differences
don't happen at the schema level they happen at the application level:
recruiters and users and administrators might all stored in the people table
and the application grants them different CRUD privileges and generates
different views.

Ontology is largely orthogonal to whether the persistence is in a relational,
key-value, or graph backing store.

Good luck.

~~~
antoaravinth
That's a very interesting reply. I have never heard about Ontology w.r.t
database. Thanks!

~~~
brudgers
The term is common when discussing meta-data. What is a database but meta-data
about an application that describes the state of the mechanism (just as the
source code describes the mechanism's bits and pieces)?

Anyway, it was sort of where I landed when I started writing my comment about
data-centric design. Seemed like a good abstraction and I'm on my second mug
of coffee.

~~~
antoaravinth
Thanks brudges. One noob question, how should we select our database? For
example we should go ahead with RDBMS or NoSql? Are there any tips for the
same? Or does it often comes via experience? Or for my problem which database
is good, RDBMS or NoSql?

~~~
brudgers
With the caveat that I'm a pundit not an expert.

Relational databases are built on the mathematical model described by the
relational algebra. There is approximately forty five years of literature,
thirty five years of wide-spread use, and three open source projects:
Postgres, MySql, and SQLite with approximately two decades of use.

NoSQL does not have a mathematical basis. It is a term invented for a
conference about non-relational databases. There are mathematical bases for
many things that are called 'NoSQL' and good products and technologies behind
them. But there is no commonality between them.

SQL unites relational databases and makes the practical differences between
relational databases differences of features rather than architecture. NoSQL
databases differ in architecture: key value stores versus graphs versus semi-
structured data.

There is a well understood and common meaning of 'database schema' when
referring to relational databases. While a particular NoSQL database might
have a strong notion of schema (or might not) it will differ from NoSQL
database to database and among them the common notion of schema is very
abstract at best and almost certainly grounded against the definition of
schema in the relational database world.

In terms of learning, the relational model seems like a better starting point:
better learning resources, mature products, mathematical structure, and
constituting the common basis for discussing databases in general.

On the relational side I'm probably biased toward Postgres as a starting point
because there is more signal to noise than with MySQL. It's not that there's
anything wrong with MySQL it's that the success of the LAMP stack means there
are lots of 'How to build a potato chip rating website with MySQL' blog posts
whereas with Postgres there is a higher likelihood of useful technical detail.

That's not to say that a resume site needs a relational database (or a NoSQL
database). Files on disk would probably work and that might be an interesting
alternative exercise.

~~~
antoaravinth
Thanks a lot brudgers!

~~~
brudgers
You're more than welcome. I had one more tangential thought about databases
last night. The files idea sort of got synapses firing.

Frameworks like Ruby on Rails abstract away the a lot of the complexities of
schema design with objects and Classes. There is a full relational database
schema in a Rails application and the programmer pretty much is discouraged
from touching it directly.

So that's another approach.

~~~
antoaravinth
Yeah that's right. I have worked with Groovy and Grails which has the similar
philosophy.

------
mhd
You might want to look into versioning your resumes. A 'version' column in
your ResumeData could be sufficient. And I wouldn't store images as blobs.

If resume components are highly volatile and mostly text, I'd probably go for
a simple KV table, instead of adding new columns to ResumeData. Then you could
also put versioning there. Alternatives would be a JSON column or a proper
table-per-component schema.

I wouldn't worry about "large data" if you're going with Postgres. You'll
mostly enter that territory by storing lots of changes and updates, which just
doesn't happen that much with a traditional

~~~
antoaravinth
Thanks for your answer. I have few questions:

>> You might want to look into versioning your resumes. A 'version' column in
your ResumeData could be sufficient.

I didn't quite get this. Why do we need to add 'version' column to ResumeData?
Lets say when user updates the resume, my intention was to just replace the
data accordingly (as now you have told, I might use JSON column and just
replace it with the value). In this case why 'version' is required?

And why not storing images as blobs? Any drawbacks?

But thanks for your insights, it really helped me.

~~~
mhd
Even if you don't want to let (paid?) users access their previous revisions,
it's an easy way to have basic undo functionality. You'll get "Help, I deleted
my whole work history and don't want to enter it again!" support requests or
disappointed users ("I can just hit Ctrl-Z in Word, why do I bother with
this?"). Take care of stuff like this in the beginning and it will save you
some headache later.

Images as blobs for webapps are generally a bad idea. You can't do a lot with
them in databases (browse, analyse, join etc.). There's some overhead if you
fetch them from the DB for each request, so they'll end up in some file-based
cache anyway. Save that step and just put the path or filename in the DB. Or
if you've got one image anyway, just ditch the column and have
/img/users/<db_pk>.png.

~~~
antoaravinth
Thanks a lot! It really helps!

