Hacker News new | comments | show | ask | jobs | submit login

It's even harder to VERIFY correctness inside the DBMS with respect to keeping it managable and debuggable. That's why we pull it out into code behind an ORM. We can run a fully automated test suite in 30 minutes which will verify that the application will function as expected from the domain model.

Believe me, we still need ACID/MVCC and by no means do I want to break the consistency part of CAP theorem if possible.

Also set operations break down under the following conditions:

1. when you have to aggregate a LOT of data together they don't scale. If it scaled well we wouldn't need OLAP solutions and could do everything with live, perfectly normalised data. That is just not the way it is.

2. They really don't work for temporal or complex non-uniformly structured data at all well.




It is not harder to verify correctness inside the RDBMS. Rather, it is easier because RDBMSs are highly engineering and extensively tested.

You say you still need ACID, but you advocate manually writing joins on top of an ORM layer which does not give you ACID and introduces massive latency. Using an ORM on top of an a key value store doesn't absolve you of ACID or having joins. You still do joins and transactions, but you need to write them manually and test correctness manually. Both are very expensive to do correctly and running some unit tests is not sufficient to show correctness at this level. You are writing these things without recognizing what they are.

The whole NoSQL movement is fundamentally about trading correctness for performance, and usually read performance at that. There is nothing else new about NoSQL databases and ironically many even use a variant of SQL for queries! SQL is just a language, not an implementation.

As for aggregates, yes they are hard. By doing it yourself you are not making the problem less hard. Rather, you are possibly implementing solutions that RDBMS's can't yet do. I suspect that with a reasonable relational design you could probably do whatever aggregates you need to in a high performance manner while still taking advantage of the work done in an RDBMS.

It sounds like you don't understand the requirements of your application. If you need full ACID, your solution with nosql + ORM + manually written joins and queries is failing you. Unless, of course, if you are also writing your own global lock or using some form of lock-free algorithm.


The ORM layer does give you ACID or rather it doesn't take it away. It does joins and transactions. It's just an abstraction. I suggest you read up on PoEEA [1] as I don't think you understand what an ORM is properly or you've had experience of some rubbish ones like ActiveRecord or Django's ORM. See Unit of Work pattern, identity map etc to start with and work back.

NHibernate, SQLAlchemy, Hibernate are where you need to look and compare.

The ORM does a few joins where a performance gain is possible but we avoid them though as they are expensive on IO and transfer.

Most of our tables match the UI 1:1 and we populate these from the domain model when it is mutated or when we can afford it to be done (either directly or via message queues). Effectively they are materialized views which are updated when data changes or before it is required. Google CQRS.

I'm not suggesting NoSQL - just caution when using RDBMS features. I'm not an advocate of NoSQL.

[1] http://www.amazon.co.uk/Enterprise-Application-Architecture-...


One can't just say "joins are expensive on IO and transfer"; you have to specify exactly what kind of join you're talking about and how it's being used. The way an INNER JOIN will influences how rows are retrieved is completely different from how an OUTER JOIN or CROSS JOIN will.

In all those cases, the devil remains in the details. For example if you're just looking to add one or two columns to the result set, an INNER or OUTER JOIN usually reduces the amount of data that has to be shoved through the pipe. That's because you avoid transmitting the keys that are needed to join the data in-app. On the other hand, if you're joining a small number of larger values to a table with lots of rows, perhaps the join duplicates those values enough to really bulk up the result set. But it's effectively impossible to know whether that will be the case without measuring the performance for a couple different options.

Here's an alternative consideration: ORMs tend to encourage a style of coding wherein every column in the table is always grabbed all the time, regardless of how much of that information is actually needed for the task at hand. This is done in order to be able to generate full-fledged objects like the ORM wants to, and to reduce the amount of custom mapping that needs to be maintained. Now if you're looking for a needless waste of bandwidth, there's a needless waste of bandwidth.

If you're consistently having performance problems on joins, it's likely that the real culprit is poor configuration of keys and indexes. Or if you're not watching how the ORM chooses to join the data too closely, perhaps you've got some N+1 queries sneaking in there.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: