Hacker News new | comments | show | ask | jobs | submit login

For someone who knows nothing about NoSQL and decent with MySQL, can someone give a brief Idiot's overview of how NoSQL works? If I add a record in say a table named "news", where is the data stored? If I need to do a search by news id or description, what's the front-end api like and what happens at the backend when the api is called?

I would very seriously recommend reading the journal paper on Amazon Dynamo which is the predecessor to Apache Cassandra: http://s3.amazonaws.com/AllThingsDistributed/sosp/amazon-dyn...

PS: Why aren't we in the habit of citing papers like this? So many links are to websites rather than direct sources, and the info in papers is usually highly readable and extremely informative about the details.

I agree so much! Most blogs fail to sumarize such papers properly. And if I want a summary or introduction, I can get find those in the papers, too.

The same with RFCs and W3C documents. All that stuff is written very well, from the technical view as well as regarding their readability. Why putting a layer of blogs and websites around them, which actually lower the perceived quality?

Also, I'm frequently annoyed by articles that write about what some other people (e.g. RMS) wrote. These aren't much shorter than the source article, and aren't written nearly as clear as the original.

I appeciate it when an author just links to the source and elaborates on that topic, stating his own opinion, and perfers to quote rather than to paraphrase what's already clearly written in the source article.

http://wiki.apache.org/cassandra/ArchitectureInternals links some other papers that influenced our design too.

But yeah, Dynamo is the most important of those.

I'm new to this myself but I'll attempt an explanation. Then someone can correct me if needed :)

The basic idea is that data is stored as key > value. You can sort of relate it to the MySQL practice of denormalization and storing a bunch of data from different tables as an array in one cell.

You don't want to (or can't) do joins because data is not normalized, but instead have to process the results you would want separately in a batch process called map/reduce.

It all boils down to this one fact: NoSQL is write heavy in order to improve read performance. Because your data is in multiple places and you are doing batch processing to get the right format/calculated data then you are saving time when the user requests it from you. This is one of the reasons its good for scalability. The other reason seems to be that denormalized data is easier to scale out then normalized.

You should also look up CAP theorem and NoSQL.

A few links that I've been saving to teach myself: - http://stackoverflow.com/questions/1189911/non-relational-da... - http://stackoverflow.com/questions/2170152/nosql-best-practi... - http://www.linux-mag.com/cache/7579/1.html - http://blogs.computerworld.com/15510/the_end_of_sql_and_rela...

You can try MongoDB and Redis in your browser, with these excellent interactive tutorials:



The above site has been discussed today on HN: http://news.ycombinator.com/item?id=1181714

the mongo shell above needs some Stop Eval/Reset button as it can get into a non-responsive state after a bad parse > db.scores.find({'sdf':) >> })

also no copy paste in the browser version AFAICT

SQL is relational. DB's that fall into the NoSQL camp relax the relational integrity constraints to get performance and scaling benefits.

Think BASE (Basically Available, Soft state, Eventual consistency) instead of ACID (Atomicity, Consistency, Isolation, Durability).

ebay's paper on BASE: http://queue.acm.org/detail.cfm?id=1394128

(I believe this is where the term originated)

This gives a little background and might help answer some of your questions (I have similar questions to yours): http://www.facebook.com/note.php?note_id=24413138919

There isn't an answer that fits all of the different systems called 'NoSQL'. While you can count on any common RDBMS to be accesible through SQL, there is not a 'NoSQL' language or specification. In general, access is somewhat like using an ORM.

The data is stored in files of a special format, just like an SQL database.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact