Hacker News new | past | comments | ask | show | jobs | submit login
A survey of the current state of Golang ORMs (go-steel-programmers.org)
39 points by nnx on Oct 5, 2014 | hide | past | favorite | 48 comments

I read this presentation some months ago and I chose to stay with golangs builtin "database/sql". That way I'm using the purest solution with a well designed api. If I really wanted to use a 3rd party package I would choose sqlx because it stays close to golangs official sql package and carefully enhances it.

3rd party packages come and go but "database/sql" is here to stay.

What are you even talking about.

The database is hands down the worst in the entire standard library.

Importing a package to pick your driver? Side effects much? Who needs multiple drivers in the same scope anyway?

Consistent behavior or test suits for drivers? nah... why bother.

The database API was poorly thought out. That's why most people use no sql solutions, and what tools do exist for it (eg. test suites) are cobbled together 3rd party collections that have to be manually updated when a new driver is written.

I'm literally flabbergasted to hear someone singing its praises.

I'm confused. sql.Open's first argument is which driver you want to use, in string format. These drivers become available after they've been imported, and the side effects seem insignificant[1]. This seems to make it perfectly possible to use multiple drivers within the same scope. The drivers themselves also seem to have several unit test files, though I can't judge on the quality of those.

[1] https://github.com/go-sql-driver/mysql/blob/master/driver.go...

See... that's the thing. It might work. If the driver plays nicely. Or it might not. In fact, you have no way of knowing without reading the source code.

The sqlite driver used to not work with the postgres one. I don't know why. I never looked into it... but, that this can happen is pretty bad, I'd argue.

Packages that invoke code when you import them and require this sort of import:

    import _ "github.com/go-sql-driver/mysql"
Aren't idiomatic. They're hacks, because the database module didn't really consider how it was going to work properly.

I've found that https://github.com/jmoiron/sqlx (mentioned in the slides) works very well for SQL and mapping values to structs in Go. It's not an ORM though. You still write plain parameterized SQL queries, but mapping results to structs is very elegant.

The mapping process is very similar to how other packages in Go work (e.g. encoding/json), so if you already know Go it feels very natural.

Yup, totally agree with you.

I'm still of the opinion that ORM frameworks are just too much of an abstraction away from the database.

What's wrong with just writing pure SQL?

In my many years I've seen four reasons that stand:

1) Maintainability between the database and the software models. Specially when starting a new project, the database model can change a lot and it can be pain.

2) Every project I've seen using SQL DAOs, end up writing some sort of ORM. You can see this when you find a class called "Select" in your persistence code.

3) Not every developer is a good SQL developer.

4) Your persistence/DAO layer has to keep up with other stuff which an ORM usually provides: connection management, transactions, caching, key restrictions, error handling. This is a critical bug waiting to happen scenario.

I'm also not a big fan of ORM's (and actually have been trying to stay away from them in new projects) but most of the times it is the best solution.

3) Not every developer is a good SQL developer.

I find that developers are better SQL developers than their ability to successfully design a domain model.

After years of using Hibernate and NHibernate, considered best of breed and most well thought out and thoroughly engineered ORMS, I agree. They fall of a productivity cliff pretty quickly, particularly on larger projects (I'm talking things with 100+ domain objects). I've settled on:

Use CQRS (Command Query Response Segregation) and stuff ORMs where the sun doesn't shine.

Separate Update/Insert/Delete commands and queries. Commands return nothing and are assumed to work unless an exception is thrown, then the whole batch is rolled back. Queries return data. Transaction boundaries are arbitrary based on the requirement of the calling service, not the unit of work and/or session. By default all transactions are imperative.

Stored procedures, triggers and views are banned.

Commands are classes with properties attached (easy to validate, test and provide good encapsulation) and are internally represented as raw SQL with bound parameters.

Queries return enumerable lists of objects and are internally represented as a raw SQL query with bound parameters.

Everything is programmed to generic interfaces: IStorageCommand<T> and IStorageQuery<T>

That is all. I've used Dapper on .Net to do the internal representation. It is portable across non SQL stores as well and has been used to interface with cache and Redis as well.

Plus due to the loosely coupled abstraction, you can use AOP to stick read-through cache interceptors in between the IStorageQuery<T> interface and the consumer. Also you can stick cache invalidators between IStorageCommand<T> and the consumer.

Moving one of our core subsystems over to this with some careful design and DBA consultation knocked our 48 Core Xeon SQL box from 60% utilisation to 5% (mainly IOPS/storage throughput. CPU is a non issue there).

At the end of the day you're still doing object relational mapping.You just built something custom instead of using Hibernate,so you're still using an ORM.

Your domain objects are still a graph that need to be hydrated from the database. And dont tell me that writing hydrators by hand is fast and dry,especially when you have 100+ domain objects,compared to a few XML files or annotations that can later take advantage of query builders or HQL.

Fair point but it's a terribly light ORM behind an abstraction boundary that doesn't make assumptions about the type of store it is communicating with.

We don't write any hydrators. On our previous solution they are generated at compile time. Then they were replaced by runtime code generation inside Dapper.

There is no XML, there are no query builders, there is no HQL. There are too few guarantees in each of those technologies.

> Stored procedures, triggers and views are banned.

So... NoSQL? I'm only half-joking -- I find it strange how people will use an SQL DBMS, and yet choose not to use the main features. Model, Normalize/refactor, then apply views/triggers/stored procedures to present a simple "API" (or maybe just "view") of your datamodel to the application?

If you don't use the DBMS for modelling the data, do you really need it for storing your data (obviously, one valid answer is "no" -- see "NoSQL-movement", Object DMBS, Document DBs etc...).

Actually we use both. Both have their advantages so we use SQL Server and Redis.

The data model is what we're exposing, not the table model.

(database here is shorthand for your SQL RDBMS of choice)

Out of curiosity, where do you fall in terms of having business logic in the database via functions and triggers? Why do you feel that way?

I'm kind of of the opinion that database should only store data, maintain relations, and handle filtered querying (read-only)...anything more complicated should be handled by application-side logic. What's your stance on that?

My stance is exactly the same as yours. All the logic sits in the service layer in front of this abstraction and nothing touches anything but the service layer. Nothing touches the tables at all unless it goes through both layers. This was a massive battle to get the organisation to accept.

I've seen many many piles of fail where all the logic is pumped into the database where possible. The result is making scale up expensive and scale out impossible. Also where people do ETL straight into the tables resulting in constraint failures outside the limited capability of the database engine.

No we killed those expectations, everyone sees the benefits.

Awesome, thanks for the response--I'm not a DBA, so I'm having to try and play catchup. :)

How do you (or others?) feel about foriegn keys? They seem like a really good idea for read-heavy workloads and for documentation and maintenance and interop with tools, but for write-heavy loads leaving them off seems to have non-trivial performance benefits (reference case is 9.2 PGSQL, if it matters). Again, my instincts are that they're probably the right thing to do, and that removing them later is going to be easier (if it turns out we need to) than trying to shoehorn them in too late.

From my perspective, foreign keys are important as they are the last guarantee that two bits of data belong to each other. Also they do have a positive performance impact in JOIN scenarios.

If FK performance is an issue, then there are better solutions before you break them (denormalise on write etc).

> knocked our 48 Core Xeon SQL box from 60% utilisation to 5%

Do you believe it was the move to CQRS specifically that caused the drop in utilization or mostly other factors (caching, the DBA consult)? A quick read through of CQRS makes it seem like its mostly a way to organize your data access code and should have little affect to performance.

Yes definitely. The previous solution was thoughtless and poorly abstracted so there were queries with tens of joins all of which multiplied IOPS and index lookups considerably. The dataset is too large to fit in any reasonable amount of RAM (2.2Tb) so we rely on fast IO. Utilisation went down to about 10% after refactoring this.

The caching is the icing on the cake and is used to avoid execution latency rather than reduce IO as it can run async with the database queries. It deals with only the most common datasets due to the complexity of what looks like even simple cache invalidations.

For ref, I'm a DBA as well as a pile of other things...

Yes, nice, same here.

One question out of curiosity - on your IStorageQuery interface what methods do you have? Do you have the usual by Id's etc or do you have a query object that you pass in that can build/translate to the relevant SQL?

Query object. The interface is as follows (I simplified a bit above:

    public interface IStorageQuery<T,Q> where Q : IStorageQueryCommand
        IEnumerable<T> Query(Q query);
IStorageQueryCommand is an empty marker interface.

Internally this is all bound to the query by property name.

You should write a blog post.

I'd need a blog for that and the time to do it :)

Not a bad idea though.

CQRS/Event Sourcing sounds like a good idea, but not something easy to introduce into an enterprise, unlike e.g. packaging or micro-services.

CQRS and event sourcing are separate concepts. I wouldn't introduce event sourcing into any product. It makes consistency very hard and expensive; more so than to ignore it and buy much larger machines. In fact, after watching people follow Udi Dahan down the ES hole a couple of times, scrap the architecture entirely and keep the CQRS bit it's a proper computer science ghetto I don't want to see again.

Packaging and micro-services have a lot more promise if you ask me, so I'm all for that. Unfortunately the complexity of some systems, particularly in the financial sector, tend to make micro-services impossible as the smallest possible abstraction is huge and everything is tangled like spaghetti.

It's repetitive (because it doesn't support abstraction well enough), the tooling is poor, and it's difficult to test. If I want to write a function to retrieve these three objects based on these criteria, it should be possible to separate the logic from the actual retrieving from the database. And the interface between those two should be something more structured than String.

For one point of data, I found migrating a medium-sized Django web app from MySQL to Postgres was trivially easy in the parts where I'd used an ORM, and annoyingly difficult in the parts where I'd dropped to SQL for performance.

> What's wrong with just writing pure SQL?

What's wrong with wanting to abstract something that only the database care about? when you want to read a file in go,do you execute a shell command like "cat myfile.txt" and pipe the result? or do you use a pure go api ?

Each time you do an http request,do you send headers and body through tcp sockets as text,or do you use an http api?

I'm no sure the point your making, but you appear to be agreeing with the OP.

Go's os.Open function is a thin veneer around the open system call, which is the same thing cat uses.

The HTTP package provides as thin a layer over actual HTTP as possible.

The story with database/sql is the same.

my point is ,you dont write raw tcp text commands when you do an http request in Go(well you can,but what's the point when... )...There is an abstraction layer,you deal with "objects",not HTTP text commands directly.

I dont see why it's nice to have an abstraction for HTTP but as soon as it comes to RDBMSs, string commands should be the way to go...

Your analogy is flawed. Using a httplib for http is like using a database driver for SQL. An ORM for http would be something like Ruby's Active Resource or some other abstraction that his all URLs and query strings from you.

Because it's a massive impedance mismatch for most nontrivial cases. Relational and object systems have never played particularly well together. The contention is, and it's one that I agree with, is that the tooling that inevitably builds up to address this impedance mismatch is too hard to usefully generalize in a way that doesn't end up taking over your project.

(This is why I dig mappers for document-based databases and avoid them for relational ones.)

The whole purpose of ORM's /is/ abstraction.

The idea is to take the raw concepts of data atoms and abstract them into programmatic entities. This way, you can work with the data directly and not need to worry about SQL semantics and nuance.

Additionally, there's the side effect of encouraged utilization of DB server functionality/optimization (views [materialized and direct], triggers, procedures, enforced foreign keys, indices, etc).

Alternately to my sibling comment here, how about reading (and rebutting if you see fit) [slide 1][1] of the link before commenting on it. -.-'

[1]: http://present.go-steel-programmers.org/talk-review-orms/gsp...

Why stop there? You should just provide the query opcode program directly to the db!

Point being, different folks like to work at different levels of abstraction.

Yes, you can use a relational database as a dumb object store, but do you want to? There seems to be a great fear of SQL among programmers, which is puzzling, because the basic operations aren't very complicated.

There's an annoying mismatch between SQL and data structures. SQL's approach of making all queries a big string is a pain, especially since careful escaping is needed to avoid SQL injection vulnerabilities. However, in Go, it's not too hard to get data in and out of SQL as an "[]interface", which is a mixed array of any type. Converting that to a struct is straightforward. This comes up often enough that Go should have compiler support for struct <-> []interface. It can be done through introspection, as some of the ORMs do, but that's slow. The layout of the struct is fixed at compile time, so this can generate fast code, mostly type checks and moves. In general, we should have more compiler support for marshalling. It's done a lot, it's easy to compile, and it's slow to do interpretively.

If we had that, much of the need for the ORMs described would go away.

It's worth noting that this presentation is nearly one year old.

I recently started using gorm [0] and have found that it meets my needs and works nicely, and the maintainer jinzhu has been helpful and pleasant to interact with. So with that being said, it's still an ORM and comes with the usual laundry list of caveats.

[0] https://github.com/jinzhu/gorm

Do you mind sharing this laundry list? Ie in summary what is wrong with ormappers?

Look for Martin Fowler's ORM Hate and the discussion around it on HN. Read the paper he links to ("The Vietnam of Computer Science"), and perhaps search HN for ORNs or your favorite particular ORM. It really is a laundry-list.

I just updated the go-modeldb library, for anybody who wants to use a non-ORM solution.


You can see it used extensively in this sample application:


But, it looks like sqlx may be a better alternative.

I use https://github.com/coopernurse/gorp on daily basis. But I find it doesn't support Postgres array type. Do you know what ORM is supporting it?

As far as I know no ORM support arrays natively. I have a few projects made around sqlx, and I had to support arrays. So I ended up doing the support myself.

I see. Do you have it open sourced somewhere? Or do you have some references on this subject for me to explore? thanks.

I started with this Gist. https://gist.github.com/adharris/4163702

But ended up with a custom type implementing Scan & Value.

Check this out: https://groups.google.com/forum/#!topic/golang-nuts/aPJVwRQq...

Good luck :-)

"Supports SQLite, MySQL, PostgresSQL, MongoDB"

we're just never going to learn....

Never going to learn what?

Could you please make your comment explicit?

I've stumbled upon this presentation while researching about "idiomatic" Go support for Postgres 9.4's new Binary JSON type.

Any recommendation?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact