Hacker News new | comments | ask | show | jobs | submit login
Poll: What database does your company use?
697 points by daniel_levine on June 22, 2011 | hide | past | web | favorite | 355 comments
Upvote please if you think it's an interesting question so that more people will respond

Last year I asked this question (http://news.ycombinator.com/item?id=1411937) and I think it was useful to a bunch of people. Figured it's worth asking again and the diffs will be interesting.

2432 points
1239 points
839 points
683 points
535 points
530 points
521 points
CouchBase (Couch, Membase included)
123 points
115 points
46 points

SQLite all over the place – it's great having a super portable DB format for quick little hits.

There's nothing quite like sending a DB as an email attachment.

SQLite is so pervasive that I'm pretty sure everyone uses it at some point. It's in client applications, it's used by yum (the package manager found on CentOS/RHEL/Scientific Linux systems), it's part of many web applications, it's part of many spam filtering systems, it's part of Android (maybe iOS, too?), it's basically impossible to avoid it if you're a nerd.

Also, it's awesome.

SQLite is indeed built into iOS and Mac OS X as well. Core Data's store uses SQLite as an option (XML and Binary are the others), and it is accessible via third party frameworks like FMDB.

It's also built into Android.

Unfortunately, it's quite slow on Android. And make sure never to put your db on the sdcard, or else your whole phone becomes unusable while doing writes to the db.

SQLite is overall slow if you are writing item-by-item. Does not matter what platform you use. Write multiple items in single transaction. As well understand what you get and what you loose by writing all items in single transaction.

And iPhones, and BlackBerries. It is by far the most widely use relational DB in the world (despite what the MySQL people claim on their website, they're not even close)

Firefox uses it.

So does Chrome. Actually, this comes in handy: http://sqlite.org/famous.html

So good to see Sun's logo there :)

They should replace it with Oracle's logo now. Just for giggles.

Ouch, Php got a mention, but they snubbed Rails, despite the fact that it comes bundled ... DHH won't like that one bit!

At this point, it would be faster for them to just list the companies who aren't using sqlite.

Rails doesn't bundle sqlite3, just lists it as a default dependency for new projects.

Probably couldn't use the Rails logo because of stupid trademarking, so they left it off.

So does Chrome

For me JSON has replaced SQLLite, with a large reduction in code and complexity. [ Admittedly I've written my own routines to access it more simply/directly on iOS ]

Yeah and like HTML5 has totally replaced our web server.

He means a flat file of JSON records. Your sarcasm is unnecessary and uncharitable.

The sarcasm was perhaps a little mean but he has a point. If a flat text file storing JSON could replace your database, then you probably never needed a DB in the first place.

Lots of people use full blown rdbms's when they dont need to (see blogs), its not a silly point and the sarcasm was unwarranted

Sarcasm is never unwarranted.

If you can use SQLlite, you probably didn't need a real database. So the use cases overlap a bit with flat-files and JSON.

The JSON flat file sounds like variable length records where length is determined by parsing each record with global reader/writer locking?

It's like public storage where you have to sift through everyone else's crap to get to yours, every item is stored in a bulk cargo box and only one customer gets to store their stuff at a time.

If you're interested in db internals, here's a few algorithms that MySQL uses. Note this doesn't cover InnoDB which performs far better under high concurrency loads and offers row level locking, clustered indexes, an excellent caching algorithm, foreign key constraints with cascade, etc..etc...


The way I've stored things in flat files is to make use of the file system's lookup capabilities. I wouldn't suggest storing data that may be requested or written to by more than one client at once with a method such as this, either. I wouldn't try to use it like a C array or anything. One file per user or more makes sense. I've also used stuff like this for system admin scripts.

Something like

   #get the file, decode, etc.
It's a quick and dirty caching method that has good persistence, of course, is relatively performant under low loads and easy to understand.

Thanks, I'll check out the MySQL thing, but I'm not actually intending to build my own database.

what's your cutoff line for something to be a "real database"? sqlite is fully acid compliant.

sqlite is single user and locks a bit too frequently to be very scalable, so while quite useful, I mean sqlite not capable of fully replacing a typical engine such as postgres or mysql.

How do you know he means that?

Because it's the only interpretation that makes sense, and charitable discussion demands that I assume my interlocutors are reasonable people who say things that make sense until I have evidence to the contrary.

AND fun

I think you could make your point better by asking a question about the negative consequences/limitations of using JSON.

Sarcasm doesn't aid in making points online because people who don't know anything about the issue at hand make sarcastic 'points' as easily as an expert.

I left it unsaid that I assumed SQLLite was normally used in places other than the traditional back end server data store - I tend to use postgreSQL / mysql or perhaps a noSQL variant such as couchDB/Mongo for server-side data.

So I meant that instead of using SQLLite on mobile devices, pc based local fat clients or in the browser... I now use JSON now whereas I might have used SQLLite before [ I also use JSON where I might have used XML or windows config files etc ]

I did not intend to say JSON is replacing MySQL / PostgreSQL.

You're getting jumped on, but you have a good point. It's often easier to just stringify a dictionary and read it back as the object instead of handling the database creation/insertion/selection if all you need is to store and access a few values.

I can sorta see many cases where that would be useful, but personally I am happy letting SQLLite do the heavy lifting extracting things with indexes, grouping and using the where clause.

But yeah, JSON is great for small amounts of data.

I also tend to use JSON as the main messaging format for Client/Server communications [ where I might have used sockets custom binary protocol previously ].

This means I can reuse the same REST style data 'provider' from within a Javascript web UI and within an iOS or Android client app.

So theres quite a few reasons why using JSON simplifies things for me - particularly I notice that there is a whole layer or ORM style boilerplate code that I dont need now [ whether its SQLLite <-> Object or XML <--> Object ].

I actually like the SQL Lite implementation!

It's pretty handy, and an easy upgrade to a 'real' database if the need ever arises - especially mongo/couch.

Because JSON is well known for its ACID compliance.

Love SQLite when doing some Django development. Great to easily scrap the database with a quick `rm` and repopulate with good data.

I've been breaking myself of that habit: I try and use the flush command so I don't have to keep doing the superuser creation step.


Good point, as the superuser creation bit does tend to get annoying. But you have to admit–there's something much more fun about getting to rm your db ;)

yeah, but if you are modifying the schema and are not using south or something like that, you will still want to rm & syncdb again....

Agreed. SQLite also has a great copyright notice in its headers:

  ** The author disclaims copyright to this source code.  In place of
  ** a legal notice, here is a blessing:
  **    May you do good and not evil.
  **    May you find forgiveness for yourself and forgive others.
  **    May you share freely, never taking more than you give.

It's also worth noting that SQLite is, for the time being, the underlying store for Membase

You mean the eventual persistance store.

In the microsoft world SQL CE 4 is pretty nice for doing the same sort of thing.

I still much prefer to use SQLite on the MS stack, given how reliable and ubiquitous it is.

In some scenarios I prefer Microsoft Excel over SQLite. Yes, Excel can be used as a database as well!

Do you mean as a database engine using the ODBC driver for Excel or as a database-like application?

Also damn useful when plugged into NHibernate for bringing up test cases using in memory databases for performance.

I'd vote for PostgreSQL multiple times if I could. I'm a consultant DBA (-ish; I do other stuff as well, but that's what puts most of the food on my table), and have multiple clients using pg.

Why do you use pgsql over mysql?

I've used mysql a lot, and pgsql a little. I can't tell the difference, other than pgsql being slower and having less support. Some people swear by it, so I'm curious what I am missing.

When I first started doing serious DB work (back in the PostgreSQL 7.x/MySQL 3.x days), there were some show-stopper flaws in MySQL that made it a non-starter for my scenario. (In particular, you couldn't self-join a table, and "February 31" was treated as a valid date.)

In performance terms, granted, pg was a bit of a dog in those days, but it outperformed MySQL in every benchmark I could throw at it, with minimal tuning. So, if you're experiencing it as less performant, you're most likely running with a default configuration — which is deliberately tuned for something like a 486-class box with 128MB RAM or so — and/or haven't ANALYZEd your data. (Those are complete WAGs, knowing nothing more than what you've said about your scenario, but they tend to be among the more common reasons for lackluster performance.)

As for support, as a complete noob, I had a weird performance problem I couldn't make sense of, so I went to the mailing lists. Within a few hours, I was exchanging stack traces and other sundry debugging/profiling dumps with Tom Lane. (He's in the Wikipedia. Even if you've never touched PostgreSQL, you use his code every day of your life.) I don't think you can get much better support than that. Since, I've never encountered a problem that I haven't been able to have addressed, or at least get pointed in the right direction, by asking on — or searching the archives of — the relevant mailing list.

From there, it was largely a matter of, "This is the one I already know how to use...", along with the better feature-set (not mentioned by any of the sibling posts thus far: transactional DDL); the lack of a known-evil corporate overlord who could pull the plug at any time; the consistent tens-of-percent performance improvements in every major release; a development community that will punt a feature to the next release if it's not 100% ready and provably correct; and, let's be honest, the fact that, as someone who's been doing pg work for as long as I have, I can command a very comfortable hourly rate on the basis of that depth of experience — particularly when it's been focused in high availability and replication.

(EDIT: proofreading.)

I'll second this. For the longest time mysql was the quickest way on the web to lose your data. As in corrupt databases.

It also didn't support transactions where postgres did. Essentially mysql was a dumb datastore with sql interface, whereas postgres was a database.

That's changed now, but postgres is still a head in reliability(shit just doesn't break) and feature-set. and I hear in speed these days. But I no longer care about speed these days, as an SSD backed postgres handles anything I can possibly throw at it.

Be very, very careful using SSDs under your DB, whether PostgreSQL or anything else. If you don't have supercaps on your drives, you will lose data in a power loss situation, even with a battery-backed RAID controller. That data loss could take the form of anything from silent corruption of a table or index, to unrecoverable filesystem loss. (Briefly, the drive's controller uses the on-board cache to accumulate writes into erase-block sized chunks before flushing to the NAND media. If you disable the on-board cache, performance drops through the floor — USB thumb drives look fast by comparison — and you shave at least an order of magnitude off the drive's lifetime.)

At present, it looks like the best choice is the forthcoming Intel 710 series drives, but if you need an SSD now, their 320 series, the Sandforce controller based drives with supercaps (like the OCZ Vertex Pro models, though I wouldn't touch those with a competitor's database), or a FusionIO card are the only remotely safe options.

This is why I will be able to go back tonight and honestly say that I didn't waste time on Hacker News in the morning. :)


Except if you want to do a nice pagination interface over a set of data. select count() in postgresql takes forever on large tables. select count() over myisam is speedy. select count(*) over innodb is speedy but an estimate (I'd still prefer an fast estimate over exceedingly slow accuracy for many scenarios, such as paginated web interfaces).

That's because of PostgreSQL's MVCC (Multi-Version Concurrency Control) architecture. And, yes, while some aggregate queries can tend to suck on very large tables (for which there are a good half-dozen workarounds; this is the FAQ-est of FAQs), the upside of MVCC is that read queries can never block write queries, and write queries can never block read queries.

The benefit that has in terms of increased concurrency is worth far, far more than having to implement a workaround for quick-and-dirty row counts for simple things like pagination, IMO.

(Aside: InnoDB is also MVCC-based, which is why its COUNT(*) is an estimate. The MySQL folks apparently decided that it was better to provide an estimate than an exact count, while the PostgreSQL folks decided the other way. There's a part of me that wants to call that symbolic of the way the two projects operate on a much broader level...)

I personally picked PostgreSQL instead of MySQL since I don't trust Oracle. Why support a free database server when it competes with Oracle RDBMS? Granted Oracle RDBMS is extremely expensive, but still, something doesn't sit well with me...

I felt the same mistrust for Oracle, but at Railsconf, I had the chance to talk with the team from Percona, which is a free fork of MySQL that makes some performance improvements and sells consulting (think Red Hat).

The Percona team said that Oracle has a lot of customers who are using Oracle RDBMS for big stuff and MySQL for smaller stuff, and they like being able to sell support for both, but they don't see MySQL as competing with their flagship DB. He also said that Oracle has so far kept their promise to keep developing MySQL and that Percona has pulled their upstream changes.

Besides that, if Oracle decides to stop supporting MySQL, Percona is a drop-in replacement, as is MariaDB, developed by some of the original MySQL developers. So you won't get stuck.

All of that said, however, I'm interested in PostgreSQL just because I've heard that it's a well-made database. There's nothing wrong with switching for other reasons; I just don't think that uncertainty about MySQL's future is a good reason right now.

Thanks for the clarification billybob. :) Part of my perception was driven by the unknown; what would happen if Oracle stopped putting money into it? It's nice to hear that Percona appears to be a potential candidate to continue its development if anything bad ever happened to it on Oracle's end.

This is a good point, but only became the case fairly recently. I might lean towards pgsql in the future based on that, actually.

MySQL is licensed under the GPL, and that isn't going to change. Oracle can't kill MySQL, nor can they force you to pay for it. I think your fears are unfounded.

• EXPLAIN output is awesome (once you understand the format, you'll know exact algorithm postgres uses for the query with bottlenecks highlighted)

• Subqueries are optimised as well as JOINs. I can go all-Inception in my queries and they perform well (I find subquery style often easier to understand than equivalent JOIN).

• You can do UPDATE … SELECT on the same table.

• Postgres has query rewrite (RULE) that can be used to implement writeable VIEWs (which is awesome for migrating legacy applications to new schema)

The only thing I really miss from MySQL is ON DUPLICATE KEY UPDATE. Postgres has only weak 1-row hacks emulating this, and "standard" MERGE syntax for this is horribly ugly.

I'm a longtime Postgres guy, but I've used MySQL on a few sizable projects. MySQL has come a long way over the years, such that a lot of my original criticisms are no longer valid. My reasons for not using it this day are mainly lack of transactional DDL and the fact that adding indices require a full table lock. There are some interesting solutions to the latter problem that the Percona guys have pulled together, but Postgres handles it all out of the box. Things like partial and expression indices are icing on the cake.

My impression (having used mysql at a number of jobs over the years, and talked several very sharp pgsql fans) was that pgsql used to have a big lead in cool features that mysql didn't have. That lead has been eroding, but I don't think mysql is 100% caught up, though I could be wrong. In any case, even when mysql had implemented the various cool features, pgsql's implementations naturally tended to be more mature and stable for a while.

Disclaimer: I don't think of myself as a database guy, and I couldn't give you a definitive list of the differences. I remember some times in the past when things like transactions, triggers, and stored procedures were on the list of differences the pgsql fans quoted to me.

I'm no longer a database "poweruser" due to changing requirements in my job, but not too long ago I used to be and MySQL had no CHECK constraints. Also triggers are less flexible than in Postgre, like in a cascading delete/update won't fire them.

Historically, PostgreSQL has been considerably more SQL-feature-rich than MySQL. It still is, yet to a lower extent (see previous paragraph) due to MySQL caching up. For example, MySQL only has views, triggers and stored procedures since version 5, and Postgre had them since at least 7 or 8 which were already mature at the time MySQL 5 was still development. I guess a lot of people became adepts at that time.

Atomic transactions, perhaps at the expense of pgsql being slower.

Subqueries, but we don't require this in production.

Wariness of Oracle's conflict of interest.

Mysql has had subqueries since version 4.1, circa 2004!

But the optimiser is really bad at doing sensible things with them.

Worse still, subqueries in the FROM clause are (documented to be) implemented as an unindexed temp table.

Google apparently compiles subquery support out of their mysql instances so people don't mistakenly think they're usable.

First off - whatever you're doing, you'll probably be fine using either.

That said, I'm generally more frustrated when using MySQL than when using PG. Here's a sample of the problems I've encountered from using both MySQL and PG. I haven't updated my list in a while now - please feel free to correct me on things - but hopefully it's a little more illustrative than that Wikipedia feature matrix, and a little more specific to MySQL vs. PG. (This list is a cleaned-up selection from my notes wiki at http://yz.mit.edu/notes/Hackery.)

No referential integrity.

No constraints (CHECK).

No sort merge join, let alone hash-join. http://www.dbms2.com/2008/07/10/how-is-mysqls-join-performan..., http://www.mysqlperformanceblog.com/2006/06/09/why-mysql-cou...

Generally poor at analytical workloads, since it's designed for transactional workloads.

Can't reopen TEMP table - WTF? (Still not fixed!) http://bugs.mysql.com/bug.php?id=10327

Multiple storage engines has always restricted progress: http://www.mysqlperformanceblog.com/2010/05/08/the-doom-of-m... (PG also supported multiple storage engines in 80s, then concentrated on one)

No WITH clause: http://stackoverflow.com/questions/324935/mysql-with-clause

Crappy errors: “Incorrect key file for table ‘stock’; try to repair it” on “alter table stock add constraint pk_stock primary key (s_w_id, s_i_id);” where stock is in InnoDB (which has no “repair table”) means I have no /tmp space (no Google answers)

Crappy EXPLAIN output - somewhat better when using the visual-explain tool from Percona.

InnoDB auto-extends ibdata1 file; only way to trim (garbage collect) is dumping and loading.

Scoping is broken:

  mysql> create table t(a int, b int); Query OK, 0 rows affected (3.30 sec)
  mysql> select a, (select count(*) from (select b from t where a = u.a group by b) v) from t u;
  ERROR 1054 (42S22): Unknown column ‘u.a’ in ‘where clause’
Optimizer leaves plenty to be desired, e.g. not pruning unnecessary joins.

“InnoDB is still broken…Just last week we had to drop/re-create an InnoDB-table in one project because it would not allow to add an index anymore, no matter what we tried…Mysql::Error: Incorrect key file for table 'foo'; try to repair it: CREATE INDEX [...]” http://news.ycombinator.com/item?id=2176062

MySQL only recently got such things as per-statement triggers and procedural language support.

MySQL has only its own internal auth system, whereas PG supports a wide array of auth providers.

PG has more supple ALTER TABLE implementation.

MySQL doesn’t support ASC/DESC clauses for indexes http://explainextended.com/2010/11/02/mixed-ascdesc-sorting-...

Optimizer only recently started working properly with certain subqueries

OK documentation, but still considerably unpolished compared to PG's. Random omission: auto_increment jumps up to next power of 2 but inconsistently across versions (platforms?).

(Older issue, not sure if it's still relevant) Crappy concurrency, >3 cores sucks vs PG: http://spyced.blogspot.com/2006/12/benchmark-postgresql-beat...

These days, Postgres is faster than MySQL+InnoDB, and scales much better across multiple CPU cores. (MyISAM is still faster, but that's not an appropriate comparison.)

Some features that make Postgres awesome:

* Transactional DDL. You can do "create table" in a transaction. _Everything_ is transactional, it's not a tacked-on feature, it's the basis of everything.

* No legacy cruft. Compared to MySQL, which is filled to the brim with historical warts. The Postgres people have been careful to weed out obsolete functionality. There are essentially no sneaky border cases that a developer needs to be aware of, no weird special cases like "0000-00-00 00:00" having special meaning.

* No need for a "strict" mode, since Postgres is always strict. Postgres doesn't allow invalid dates, doesn't allow byte sequences that violate character encodings, etc. It diligently enforces contraints and generally doesn't allow you to screw up. To Postgres, data integrity is paramount.

* PostGIS. Simply awesome. (MySQL's geospatial stuff also tries to implement the OGC API, but last I looked, it was a half-hearted attempt that negelcted to provide the fast R-tree-based (actually GiST-based) indexing that makes PostGIS so super fast.)

* Replication. It's late to the party, but I rather prefer how Postgres has implemented its replication, even though it has some downsides where it will abort a long-running query if some data has changed under its feet (but if you're using transactions it's easy to simply restart the query). 9.1 will be getting synchronous replication, which is pretty cool.

* Extensions. Postgres can integrate languages like R and Ruby as first-class languages that can be called from SQL. It also has a module system that can extend the type system (a bit of trivia: This was originally the main reason why Michael Stonebraker invented Postgres) with new types, eg. multidimensional matrix columns, or new features, like remote tables.

* The "text" type. Seriously, why should do people keep writing things like varchar(255)? Postgres' text type is an unlimited string. Unlike MySQL's text type, it can be efficiently indexed without limitations. (Varchar is internally implemented as a bounded text type.)

* Cost-based planner backed by row-level statistics. This is the stuff that allows Postgres to do complex nested queries and still perform incredibly well.

* Partial indexes. You can do something like "create index ... on themes (name) where color = 'blue'". Whenever you do a query that falls within the expressions's range, Postgres will use that index, potentially vastly reducing the search space.

* Functional indexes. You can do something like "create index ... on (lower(name))". If you then do a query such as "select ... where lower(name) = 'xyz'", then Postgres will recognize that it's the same expression, and it will be able to use the index.

* Windowing functions and recursive queries, both from ANSI SQL99 iirc. Look this up, they're great.

There are some bad points, none of them significant and all of them a matter of taste:

- I have never really liked Postgres' text indexing, which feels a bit creaky and antique. At least with 8.x, GIN index updating was slow as hell.

- Partitioned tables are a great feature, but I will never use it because of the requirement that one does the plumbing yourself (creating partitioning rules and so on); I keep waiting for something like Oracles's automatic partitioning.

- Stored procedures -- ie., running logic inside the database -- feels wrong to me, and always has. For some people this is a requirement, so I'm not really complaining. In some cases, writing a stored procedure can be essential to speed up queries/operations by saving on database roundtrips.

- Still no "upsert" SQL command (aka "insert or replace", "insert or update") for asserting the existence of a row atomically.

You should add the Neo4j (http://neo4j.org/) graph database to the list.

Graphs are a much more modern and elegant way of storing relational data. I've used Postgres for over 10 years, but it's not a graph database. With graph databases you don't have to mess with tables or joins -- everything is implicitly joined.

And Neo4j is ridiculously sweet -- store 32 billion nodes (http://blog.neo4j.org/2011/03/neo4j-13-abisko-lampa-m04-size...) with 2 million traversals per second (http://www.infoq.com/news/2010/02/neo4j-10), and you can use Gremlin with it (the graph traversal language), which let's you calculate PageRank in 2 lines.

Neo4j is open source, and the Community Edition is now free (https://github.com/neo4j/community).

I recommend pairing it with the TinkerPop stack (http://www.tinkerpop.com/).

We are also using Neo4j in an academic environment and we are really pleased with it. We recommend it despite of the lack of a good native binding Python, we had to build a neo4j-rest-client (https://github.com/versae/neo4j-rest-client) through REST API.

There is an open-source Python persistence framework in the works called Bulbs that connects to Neo4j through the Rexster REST server, and there are binary bindings in the works as well.

There is also a Python open-source Web development framework for graph databases called Bulbflow that is based on Bulbs and Flask.

Both frameworks should be released in the next few weeks.

> Graphs are a much more modern and elegant way of storing relational data.

Actually, storing data as graphs is older than relational approaches. It used to be called "network databases". They were not supplanted for the hell of it, relational databases have certain advantages.

For select applications, object databases are absolutely the way to go. But for most purposes relational is hard to beat.

For tabular data, relational databases rock. But the relational model doesn't align well with object-orientated programming so you have an ORM layer that adds complexity to your code. And with relational databases, the complexity of your schema grows with the complexity of the data.

The graph-database model simplifies much of this and makes working with the modern-day social graph so much cleaner.

Graphs allow you to do powerful things like find inferences inside the data in ways that would be hard to do with relational databases.

How would you calculate PageRank using a relational database? As I said, with a graph database and Gremlin, you can do it in 2 lines.

To see the types of things you can do with graphs, check out Marko's short screencast on Gremlin (http://www.youtube.com/watch?v=5wpTtEBK4-E).

And also check out Peter Neubauer's introduction to graph databases and how they compare to RDBMS' and where they stand in the NOSQL-movement (http://www.infoq.com/articles/graph-nosql-neo4j).

> But the relational model doesn't align well with object-orientated programming so you have an ORM layer that adds complexity to your code.

The mismatch comes about for a number of reasons:

    * OOP has no formal basis so it can't be reliably transformed into relational terms.
    * OOP is identity-bound -- each object is essentially
      an address in memory, not a relation in a set.
    * The biggie: OOP mixes data with behaviour.
      Relational does not.
> How would you calculate PageRank using a relational database? As I said, with a graph database and Gremlin, you can do it in 2 lines.

For something on PageRank's scale, a custom datastore based on matrices and their multiplication makes business sense. Or MapReduce over a distributed key-value store (note that these are both OLAP approaches).

Still. SQL's a bit verbose, but these days we have recursive queries. For Oracle users, I'm talking about CONNECT BY. If I find myself running the social graph every minute, I develop an ETL package that periodically moves data from my write-bound system to my read-bound system with a more query-friendly schema. Depending on how you look on it, relational systems invented "eventual consistency".

More to the point, boring old database greybeards have learnt that OLTP and OLAP are very different use cases. The 3/4/5NF of the OLTP database will be very different from the star schema of the OLAP database.

There's really not much about NoSQL that hasn't already been done, under a different name, by the relational crowd.

I still think there is a place for NoSQL. It's just not as universal a replacement for relational systems as many make it out to be, whether we're talking about document stores, distributed key-value stores, graph stores and so on.

The "NoSQL vs SQL debate" name will eventually change. Instead of isolating SQL from the other datastores, the debate will move toward discussing which database is the right tool for the job, where SQL/relational databases will have a specific role.

Putting relational databases into their own group is silly. I believe this came about because relational databases have dominated for so long, and this has caused developers to try and fit every problem into the relational database model.

In the early days of the Web, choosing a database meant choosing between relational database management systems -- Oracle was king but expensive, Microsoft SQL Server if you were on a Microsoft stack, and PostgreSQL or MySQL were the primary open-source options (with one being a real RDBMS while the other was basically just an SQL interface to the file system).

Ten years ago the RDBMS was the only good option so that's what everyone used even though many problems and programming languages didn't match up well with it -- it was like trying to fit a square peg in a round hole.

You don't have to do that anymore so relational databases can stop being the one-size-fits-all solution, and instead we can move toward using them for the specialized cases where they're the right fit.

Sorry, but this is very amusing: IBM were doing this, with IMS, in the 1960s and IMS is still their highest-revenue product today.

But isn't IMB a hierarchical database (http://en.wikipedia.org/wiki/Hierarchical_database_model), which is a tree-like structure?

This is somewhat different than a general graph database (http://en.wikipedia.org/wiki/Graph_database) where nodes are not restricted to being a hierarchy.

I'm surprised to see that Oracle even has as many mentions as it does, given how rarely you read about it here on HN. We're locked into it at my workplace, and every time I'm reminded of this PG quote I cringe a little:

"The more of an IT flavor the job descriptions had, the less dangerous was the company. The safest kind were the ones that wanted Oracle experience. You never had to worry about those."

I'm probably starting to sound like a broken record around here, but:

1. I build products on Oracle at a startup.

2. Oracle has many compelling features that really don't have first-class open source alternatives: OLAP, encryption (wire and at-rest), VPD, materialized views with query rewriting, object and document storage, monitoring and tracing, among others...

3. Oracle got more compelling since being offered--license included!--on Amazon RDS.

There are a ton of responses for Microsoft; why doesn't anyone question that? Actually, I'm curious now. Is there a new incentive for SQL Server? Maybe something in BizSpark that makes it attractive?

It is perplexing that Microsoft (presumably SQL-Server) has got so many votes. In my experience, absolutely nobody chooses to use it. They just have aggressive sales people who use it as a bargaining chip to elbow out Oracle and help win other business.

I actually did.

I've never had a bad experience with it, it needs a lot less maintenance than Oracle does (or did, at least), it performs pretty well (we're an insurance company with large datasets), is integrated in a .NET stack (while you might not want to go there, especially as a startup, it is very nice for corporate work), it came with Reporting Services which we used to replace Crystal Reports, it has lots of (admittedly non-standard) extremely useful SQL functions, data types, etc. And I love SQL Server Management Studio and the other Microsoft tools.

MySQL felt like a toy database in comparison (especially the management aspect), and the non-relational stuff is out of the question for now. We might not have done due diligence by not looking at other alternatives (notably Postgre I guess) but they seem like a poor fit given our developer's strong Microsoft-centric backgrounds. Most of us also have at least one Microsoft training course in their SQL server, and they have very strong support in my country (Uruguay) against nonexistant for most other platforms.

I started out with Oracle on AIX, then Sybase on Solaris, then Postgres 8.3 on Windows and Linux, and am currently using MS SQL Server. Postgress (8.3 on Linux) is currently used for an inventory db in house, and SQL Server is used for most apps that reside at customer premises.

Sybase was better than Oracle, and Postgres was just as good as Sybase at that time. But SQL Server (version 2005 on) has stood taller than all of them in my view. Excellent management tools, stable, reliable and no performance issues for our apps. Most of our apps are write light and read heavy with thousands of users hitting the db - served off one Win 2003 server (we have a warm spare).

So there is some truth to your statement about nobody choosing to use it - for us the primary motivation to not use it internally was cost. At our customer premises they picked up the tab. We have MSDN so sql server development licenses are not an issue. Have not met a Microsoft salesperson yet!

We actually have our own extensions to use Perl filehandles to write directly to Oracle LOBs. I don't know if that is Oracle-specific or if it can be used elsewhere. I've never cracked open the implementation.

Also, do not forget or underestimate the cheek-in-tongue "Nobody got fired for buying Oracle" in this case.

Having Oracle as your db doesn't mean that you are not working on interesting problems. Every company is not a startup. Some of us work on well established products used by huge customer base who use Oracle and in turn we are forced to.

Above all, are you helping a customer? solve his problems instead of compounding it - that my friend is more important than choosing Oracle or Mongodb.

    Having Oracle as your db doesn't mean that you 
    are not working on interesting problems
First, I consider consultancy for customers using Oracle to be OK, as long as you earn something from it.

But if your company chooses Oracle, it means that you're bogged down by legacy, stupid company processes and/or clueless managers.

And in most such environments, the harder you try to change it for the better, the harder it fights back, putting yourself in an awkward position in which you are considered the bad apple of the team. So you end up either adapting (not giving a shit), quitting or finding some small project with no perspective for the company (i.e. less controlled) that can bring you pleasure.

Of course, some companies, like Adobe for example, use Oracle when it doesn't impact their core competencies, as it's a safe choice for corporate types. But a company like Adobe doesn't earn money from projects that are relying on Oracle and other projects inside Adobe are also using HBase and MySql and their own distributed file-system that can be queried and so forth.

Either way that quote is correct. I'm not promoting the latest fads (personally I'm not into NoSql unless it makes absolute sense), but you can safely ignore companies that make decisions based on brochures and lap dances.

I hope you realize that a company might choose to use Oracle for a reason other than "brochures and lap dances", and that a "legacy" decision to use Oracle doesn't necessarily mean you are "bogged down" or that the decision was stupid.

I don't understand how using Oracle for your RDBMS is suddenly equated with all the things you mention.

Such as? Really, I'd like to know the reason for why a company might choose Oracle.

Oracle is a bear to get setup. Nearly impossible, best I can tell, but the feature set is incredible. It's like Excel 2003 -- pretty much every feature you might want is in there somewhere, just need to find out how to expose it. For example, I've seen people say that its auditing and regulatory support has no peer. I don't really know that stuff well, but the ppl who told me spent non-trivial amounts of time with other solutions to see that Oracle made the most sense.

My point, if all you ever need from a database is solved with key/value store then Oracle is not mildly interesting.

Because they might want to run some of the applications that Oracle sells that run on top of their database - the example that I am most familiar with being Oracle Hyperion HFM which is by far the leading financial consolidation application for large corporations. Even Google uses Hyperion:


[If 12 dimensional hierarchical databases with complex logic are your thing then Hyperion is really rather cool in a perverse kind of way].

BTW If anyone here is interested in this kind of space I have had some interesting experiences in building extensions to HFM...

  true clustering (Oracle RAC)
  multi platform

Oracle costs at least twice as much as any other databases. Yet companies still buy it. Why do people buy things that cost twice as much? Hint: it's better.

Forget price for a moment, any feelings you might have for Larry Ellison, the good/evil nature of the company, or whatever. Consider the software itself.

Many people aren't all that familiar with the basic things a database is supposed to do. The list is large. I can't possibly give it justice off the top of my head, but here are a few that come to mind: make it possible to see data in a consistent state (as of a single point in time), without being blocked, even as other people are changing it; support "transactions" -- a group of requests that either all succeed or all fail -- so that the database cannot end up in an inconsistent state; guarantee recoverability to a consistent state even when people change their minds, statements fails, power fails, hardware fails, or all of these things happen at the same time; and many, many other things like security, support for the relational model, SQL, joins, etc. Real databases aren’t simple key-value stores.

Historically, Oracle obtained a huge lead in market share because it delivered the most complete mix of these basic things long before anyone else.

Something not widely recognized, but that should be obvious to programmers, is that low-level architectural decisions have a huge impact on how well a database performs these basic duties. Oracle maintained its lead for a long time because it got many of the low-level architectural decisions right. It is really hard to catch Oracle if you are trying to polish up a bad locking model, for example. Oracle still does the basics better than most. That, in and of itself, is a reason to consider Oracle.

Oracle remains viable, even as others continue to catch up, because Oracle builds on its solid foundation by adding additional capability and features relentlessly. I challenge you to read the new features guide for any new release of Oracle and to remember just the names of 20% of the new features. The Oracle documentation, as of 6 or 8 years ago (10g), was 40,000 pages. No telling what it is now. I can tell you this. If you have something you need to do with databases, Oracle probably figured out how to do it a long time ago.

I, personally, am a huge fan of PostgreSQL (and it’s freeness), but I recognize that Postgres is never going to be able to touch Oracle in features. It’s impossible. Working with Postgres is just going to require a lot more manual labor. Some things aren’t going to be possible. Performance may just have to suffer sometimes. Let’s hope Postgres does the basics well (it does). That’s the most important thing. But when it comes to building spacial indexes on hierarchical dimensions, or whatever, Postgress just isn’t going to have a feature for that. I’ll have to figure that one out for myself.

One key point: if you need to build a large, high-performance, data-driven application that provides nearly instantaneous response for thousands of simultaneous users, Oracle is one option that can get the job done. If you need to build something huge, say billions of rows, that provides nearly instantaneous response to dozens of users, Oracle is one option that can get that job done. So no wonder that some companies consider Oracle.

The last thing I’ll mention is that it takes a lot of time to learn something like Oracle. I would be surprised if after a year of using it, for example, that you can really make it hum better than any of its top competitors. If you do use it for a while I think you’ll find that it is really good at the normal things and in a different league when it comes to the unusual things. This comes in handy when you are being paid to get things done (by a company that can afford Oracle) -- hence the reason that most startup-oriented people don't have much appreciation for Oracle.

"Oracle remains viable, even as others continue to catch up, because Oracle builds on its solid foundation by adding additional capability and features relentlessly. I challenge you to read the new features guide for any new release of Oracle and to remember just the names of 20% of the new features. The Oracle documentation, as of 6 or 8 years ago (10g), was 40,000 pages. No telling what it is now. I can tell you this. If you have something you need to do with databases, Oracle probably figured out how to do it a long time ago."

Are these good things?

We can also reverse this argument: if you're using Oracle, you will never have a use for 99% of its features.

Or - as I picked up from some long forgotten blog post:

If you need Oracle, you'll know it. If you don't know that you need Oracle, you don't need Oracle.

I run MySQL, Oracle SQL Server, hundreds of databases, a couple in the 'many thousands of queries per second' range.

There are reasons for each database platform.

> But when it comes to building spacial indexes on hierarchical dimensions, or whatever, Postgres just isn’t going to have a feature for that. I’ll have to figure that one out for myself.

In this case, yes it is, Postgres probably has the best type system and extensibility of all SQL databases. GiST and GIN are very good technologies that also exist in Informix.

If I had to choose something that is missing in Postgres and very far away for sheer want of implementation effort, it's parallel execution of queries.

Oracle RAC is also quite a crazy and neat feature, except when it has problems, in which case you get even more crazy for that crazy.

On the other hand, Oracle is also not a strict superset of the features in PostgreSQL. For example, 9.1 has the only credibly fast implementation of SERIALIZABLE level isolation that I am aware of. There are also interesting features like "index exclusion constraints" that have no equivalent in Oracle, and solve problems hard to solve otherwise (example constraint: there are no overlapping circles in this table)

Amazon uses Oracle. I think to completely disregard Amazon because of this fact would be a mistake.

As does Facebook I believe

Facebook uses MySQL as it's primary data store, with some hbase, cassandra, and other more minor usage storage solutions in various places.

Facebook uses Oracle for certain things.

I suspect that it's the HNers still toiling away in large companies that are voting for Oracle. You'd probably see about as many votes for DB2 had it been an option.

You are correct in my case.

We use Oracle, MS SQL Server, and Sybase because that's what the vendorware we use requires. We do have some open-source software that uses MySQL for administrative purposes.

Same with Sybase.

Oracle is all over the place. My daily bread-and-butter relies on Oracle. For some thing it is, as I like to say, more 1980s than Top Gun crossed with Duran Duran -- still no SQL boolean type after 30 years! -- but for other things it has no peer except DB2, Sybase and SQL Server.

Oracle Express actually has a free license. The Express version has most of the same functions as the expensive versions except the 5GB limit on database size. If your app doesn't require lots of data but needs all the SQL power, it's a good choice.

I was wondering the same thing. Our startup currently uses it when doing research on data sets, but it's a db I don't hear about very often when talking about startups. I'm assuming a big factor is price.

I use to work with Oracle daily (previous job) and now I use MySQL mostly with some Postgres and SQLServer at times (lots of client work).

I would say all these databases are worthwhile as long as the DBA is competent. Given that, unless you have a reason to choose Oracle (e.g. familiarity, integration with other applications, a specific Oracle only feature, contractual obligation), I see no reason to choose Oracle. Many companies choose Oracle because Oracle built a solid database/brand early on while catering to big businesses. MySQL and Postgres made great progress in the past decade make them valid alternatives.

The licensing is really an issue, especially when you get into the higher-end. They don't charge per computer, they charge per CPU. So you'll have to pay 4x as much in a dual CPU system (I'm not sure if they have a similar licensing structure for dual-core or quad-core systems)

Yeah, Oracle is mostly used by bigger companies.

Riak! We're ( http://bu.mp ) using more Riak every day. So far so good.

Ditto (http://dropc.am). Very write heavy load for us, which Riak handles without blinking. Fault tolerant, robust, and easy to administer. Every machine is identical, no special "master" nodes or anything like that.

I've heard this about Riak and I was quite excited to test it out for a new project, but in the limited testing I've done Cassandra and HBase both absolutely smoke Riak in terms of write performance. Not really apples to apples I suppose, but I was really surprised at how slow Riak was when handling many (millions) of small writes.

We haven't finished our testing/profiling phase yet, so any hints on how to optimize a large number of small writes (on the order of ~dozen bytes each) would be appreciated.

Without going too much into specifics and picking on individual databases (which I could do, boy do I have the scars...)

When you hit a certain traffic level, scalability, latency and robustness become far more important than single-node ops/s. I need to be able to add nodes and repair failed nodes while under load--I need the 99.9% latency mark to stay ~100ms while doing so. I don't really care how many bajillions of ops a second your database can do in some concocted scenario, b/c you're not going to do that many in the real world anyway (trust me, we tried). The disk subsystem is going to give you a few hundred, maybe a few thousand if you're lucky, IOPS, then your latency will spike to hell and your phone will wake you up at night.

Maybe in the world where 99% of ops are reads, you will put up impressive numbers, but now you're just showing you are pretty good at using the disk cache. That's a relatively easy problem.

The riak guys seem to get all this better than most: http://blog.basho.com/2011/05/11/Lies-Damn-Lies-And-NoSQL/

So, to give you a short answer to your direct question:

Use SLC SSDs + md + RAID-0. Have at least 5 nodes. Use bitcask, but realize that your keys will need to fit in memory. Also, realize that really small values aren't a great fit for Riak in some ways b/c the overhead per value is at least a few hundred bytes.

Also, it's important to note this is where I'm at right now, but maybe not where you (generally) are at. Riak may not make you happy at server #1, but it will make you pretty happy at server 10 and server 100.

Riak's sweet spot is people with scaling pains. If you only need a server or two to try some stuff, and you don't have any users yet, you might cause yourself more headaches than you need. Sometimes you don't need a locomotive, you need a motorcycle.

(These guys have a pretty great motorcycle: http://rethinkdb.com/ )

For everything you've said about Riak re: stability and happy scalability, that's exactly why I was excited to include it in this test. I would like to prevent myself/my team from acquiring too many of those scars, so please elaborate on what caused them ;) Especially if those scars came from Cassandra or HBase!

The test hasn't been a "concocted scenario", it's measuring the performance[1] of a prototype implementations for what will be an essential piece of our infrastructure and process (bulk loads of large numbers of small records, very read heavy after the initial load). Riak's write performance was completely adequate, just nowhere near what we got out-of-the-box with the bulk insert operations available in Cassandra and HBase. I asked on #riak channel on freenode and got told to use protocol buffers (which we already were), I'd really appreciate advice beyond this.

> Also, realize that really small values aren't a great fit for Riak in some ways b/c the overhead per value is at least a few hundred bytes.

This is pretty much what I've chalked it up to. It's unfortunate because that is the use case for which we currently need to provide a solution for right now, and once we've got some of our data in one distributed data store, it's convenient (and considered less risky) to use that same technology for the next project. (This is really a culture thing though, it's taking us months to get the necessary buy-in and approval for a postgres 8.2 -> 9.0 upgrade rolled out for a different product, where we know it would solve a specific issue we have).

[1] We've been running our tests on a 4 node cluster, each node has an 8 core 2.8ghz xeon, 32gb of ram, and a woefully inadequate disk: the machines were repurposed from a system that required them to have redundancy and didn't require write performance, so the drives are RAID1. We also need to make recommendations to IT for their hardware purchase plan after our testing.

> The test hasn't been a "concocted scenario"

Btw, b/c I can totally see why you'd read it that way, that particular barb wasn't directed at you, more directed at some of the public benchmarks touted by (non-distributed) NoSQL database systems.

Re: cassandra, please see my reply to the sibling on this thread.

Also, feel free to email me jamie@bu.mp if I can answer any specific questions for you with things we ran into with various database systems.

Can you make a blog post about the issues instead? It would greatly benefit everyone.

I'd love to, but there are things politically complicated about all this in the very small startup world. So maybe one day, but I can't do it right now.

Ah, that's too bad. Thanks anyway.

Have you looked into using ets storage instead of bitcask storage? Millions of keys at ~12bytes value would fit just fine in memory without the bitcask overhead and as long as your N value is > 1 you shouldn't have to worry about data loss unless your whole cluster loses power.

Was your experience with Cassandra different? Happy at server 10 and 100, that is?

TBH, we didn't seriously pursue Cassandra when we considered distributed database systems b/c the vast majority of "back-reference checks" we did on the YC network and other area startups was "stay away."

We got some very frank advice from some people whose opinions on databases I take very seriously to stay away, including reports from within FB.

Having said that, I cannot claim to have firsthand proven or disproven anything about Cassandra.

Facebook for a long time didn't use the vastly (and I mean vastly) improved open source version of Cassandra, instead opting for their internal fork. Instead of choosing to do so, I believe they have now switched to HBase, mainly for its easier consistency model. So I would take their advice with a grain of salt, because it's probably based on their experiences with an old fork.

There are a few people (YC companies even, alas) who are very vocally negative about Cassandra, but I also saw some of those same people ignoring direct advice given to them in #cassandra on IRC, and then turning around and bashing it when it didn't work as planned. Simply following the advice could have made for a completely different story.

I suppose the lesson to learn is that you need to develop software in a way that simply won't allow developers to shoot themselves in the foot, because people never want to blame themselves for doing it, they blame the gun.

Eric, Jonathan Gray said it very clear at his talk at BerlinBuzz: Facebook is now using HBase instead of Cassandra. http://berlinbuzzwords.de/content/realtime-big-data-facebook... You can find a lot of info about the FB process to choose HBase in favor of Cassandra. This one for example: http://facility9.com/2010/11/18/facebook-messaging-hbase-com...

That's exactly what I'm talking about: that facility9 blog post explaining why they chose HBase had many factual errors about Cassandra when it was posted, and had to be revised after several respected people in the space contacted the author.

Quora's decisions to not use Cassandra and Adam's answer regarding it lead me to the same conclusion (http://www.quora.com/Quora-Infrastructure/Why-does-Quora-use...). Evidently few from Facebook are advocating it.

Quora on MySQL failed outright when AWS EBS failed, companies on AWS using Cassandra like SimpleGeo and NetFlix did not. To their credit, Facebook were clear enough on their reasons for using HBase over MySQL and Cassandra, such as wanting to double down on their current Hadoop system/knowledge and having easily obtainable ordering guarantees on messages. It's also clear they've invested in making HBase good enough.

At large loads and footprints, imvho, Riak, Cassandra and HBase present viable options. But there are some factors to consider that don't seem to get mentioned in the pop tech press

- What are you able to operate in production?

- What are you able/willing to debug and patch?

- What hardware options do you have?

- What are your workloads?

- Which variable of C.A.P, when you lose it, most damages your business?

- Will your company's choices be evaluated in the press?

- Does your board/investors have capital tied up in business's that are using something else?

- What architecture tradeoffs and styles sit well with you?

- What kind of data access and consumption patterns make you money?

- Can you pay for help?

The right choice is context sensitive, and I'm fairly sure for this class of systems at this point in time, there's no free lunch. That means you have to do the legwork for yourself and make your own choices and commitments; doing what you heard worked for someone else is a cargo cult.

Or maybe it's wisdom. "A fool learns from his mistakes, but the truly wise learn from the mistakes of others." -- Otto von Bismarck

Loving Riak as well! Our balance is definitely on the write-heavy side of things.

I love Riak. It's become my go to for "this just has to work" (and I actually work on problems that need to scale, not ones I hope will have to scale).

The only improvement you could make to it would be adding some of the fancier bits that make Redis really nice, like sets and lists.

Which datastore is it closer to? Mongo, Redis, Postgres? I haven't looked much into it, but I hear so many good things that maybe I should.

Cassandra. It's an eventually consistent, fully-distributed database, in the Dynamo mold: http://www.allthingsdistributed.com/files/amazon-dynamo-sosp...

Thank you.

It's Amazon S3, but without the outrageous per query costs.

+1 for Riak, it's suiting our needs very well so far.

I haven't had a chance to use Riak in a production application yet but am definitely looking for the first possible excuse to use it.

We're using it in production and loving it.

Hear hear! Mongo but no Riak? Come on.

Riak isn't in the same ballpark as MongoDB. They are closer to Hadoop.

MySQL and PostgreSQL aren't in the same ball park as Mongo. It's irrelevant.

Based on merits, architecture, and implementation Riak handily beats several of the DBs listed. It's a glaring omission and not the only one. As many chose Other as did Oracle.

What kind of data are you storing in Riak? And is it write or read heavy usage?

All kinds of stuff, mostly ~1-4k values. Read:Write is some low integer, ~2.

SQL Server unfortunately. I was reminded why I hate it today when I was trying to set up a simple remote connection to the instance running on my new Windows 7 machine. TCP/IP enabled, IPv4 addresses activated and enabled, remote connections enabled, firewall disabled (for now), and it still didn't work!

I tried introducing MySQL over a year ago only to have some hilarious emails with a senior programmer about how we would have to pay for MySQL. The GPL is not hard to read, but some people don't consider anything not made by Microsoft worth using. Like Linux. "No one uses Linux in the real world!" "PHP is for small personal websites!" - real quotes, sadly.

I've never worked much with other databases other than SQL Server. But one time I had to work with Oracle and MySQL, I had trouble getting a good enough client (like SQL Server Management Studio) even though I was ready to pay.

May be it's my personal preference, I tried many clients (TOAD, DBArtisan, SQL Developer, SQL Plus etc.) and found that none was as polished (not that i'm mentioning functional) and integrates well with Windows as the one from Microsoft.

It's off topic, but the same applies for Visual Studio and other IDEs.

The SQL Server Management tool is pretty good. And to be honest, I've actually never had a problem using SQL Server. It's been a solid, dependable database on any number of applications for me over the years. Granted, the businesses have been less on the startup side and more on the larger older organisation side of things.

I will say that the SQL Server Management Studio is a great tool, but because it's a GUI tool everything has to be buried in menus. If you plan on using it, read a guide first, because it's not entirely intuitive. It has the occasional cryptic error message or even unhandled exception, but my biggest annoyance is this: You can copy in rows from results and paste them to insert rows, but say you made a mistake and paste in hundreds of rows that don't fit the database schema: you will get hundreds of messagebox popups. The only way out I've found is to end the process.

> but some people don't consider anything not made by Microsoft worth using.

Also some people don't consider anything made by Microsoft worth using.

Perhaps your senior programmer did fall into the former category, I don't know. However, you should take care not to fall into the latter.

This isn't something I would do; I know that every tool has its use. I basically own the website and wanted to do a PHP/MySQL site, but we are under contract to use .Net and my supervisor wants to not do anything that could be considered a breach of contract. However, now that I'm using Linux as my main operating system I am constantly being reminded of how the different philosophies between the operating systems affect me. Ninite is a great tool, but installing everything I want from packages is simply awesome. And of course, Linux systems don't need to be restarted in order to apply updates, but on Windows you always need to. This is why I have to possibly go to our data center tomorrow morning if the Server 2008 box our database is on doesn't come back up for remote access when I update it tonight.

> you should take care not to fall into the latter

Like pg? "I never used Microsoft software" http://www.paulgraham.com/microsoft.html

If licensing and running on Windows would not be an issue, I'd choose SQL Server over Mysql any day.

I sort of would too... but then again PostgreSQL runs just fine in Windows.

I only have experience with SQL Server Express (my employer was still using their licensed copy of SQL Server 2000 when I got there in 2009). There are a lot of features I'd love to have that are missing from the free version such as replication and automation tools.

Have you tried to do the same thing with MySQL and had success? There's a lot of things in your example that could be causing you grief other than the type of SQL Server you are running; i.e. Network Security, Local Security Configuration (Both machines), User Error (is this something you setup all the time?)

I had similar problems with previous installs, but those are fairly infrequent. I'm sure I'll go in tomorrow and figure it out. This was one of the few things I didn't document how I did it, probably because I was so relieved at getting it to finally work. A student I went to school with just got a new job and one of the tests they had him perform was to log into a local instance of SQL Server using the Management Studio. The catch was the service wasn't started. Too easy... they should have made him do this!

Just as a side note, make sure you had SQL browser service running! It is not running by default on some setups

As far as I can tell, it doesn't work with SQL Server Express. It's there on the configuration tool, but it's disabled completely.

The GPL doesn't forbid charging for software licensed with it. The Free Software Foundation even encourages you to sell your GPL'ed software [1]. However, once you have the software, you can do whatever you like with it, and that includes giving it away for free.

[1] http://www.gnu.org/philosophy/selling.html

Right, which is why some companies (such as MySQL AB) charge for support and custom tools instead of the software itself (which is always free to use). The senior programmer I spoke of called them and their marketing staff was of course desperate at the chance to make some money, so she came off thinking that we would have to pay for MySQL. And of course, there's the whole "you have to release all of your source code if you use MySQL anywhere our system!" fallacy.

What error were you getting? The first time I had to set up SQL Server it was a total ballache and I struggled with the same problem for a while. But like most things, after you do it once it's becomes easymode for the future.

At Yammer we use Postgres, Vertica, Redis, Riak, and Memcache. Here's a video about how we use Riak (and Scala) in production: http://blog.basho.com/2011/03/28/Riak-and-Scala-at-Yammer

PS: https://www.yammer.com/jobs

MySQL, but I want to move us over to PostgreSQL soon; we are stuck with MySQL because we use WordPress for the public facing pages so my cofounder can use it as a sort of "CMS" without my needing to build any of that...

Use Drupal 7 - as good (better?) a CMS as Wordpress. You can buy themes off the net or build your own. Very hacker friendly - you can manage/script your Drupal site using "drush", the drupal shell.

First class support for Postgres.

If SQLServer is going to be referred to as "Microsoft" shouldn't MySQL be lumped under the "Oracle" grouping?

MS really only has one relevant offering in this discussion, unless you want to add Exchange as a document-oriented store.

It would not surprise me if MS Access were used at more companies than MS SQL Server.

I'll leave it to you whether you call it a database :-)

Hah! Don't discount it that quickly - MS Access can be surprisingly useful, particularly for RAD work at a large company where you need to push stuff out to a bunch of MS Office users. I've delivered a bunch of "light" Access/VBA applications to support business initiatives (cranked out in under a week or two) which probably account for over 80% of the value created by my hands on technical work (several of which provided the "chassis" for multi-million dollar performance improvement programs).

Access plays well with everyone (including Oracle, the MS Office family, and Sharepoint), can be heavily extended if you know VBA and are willing to hack a bit, and can often be handed off to a non-technical person for support. The trick is to try to use the same coding/design standards that you would use in a real langauge - the internal parts of my Access/VBA work is influenced by Python and C++.

And from a consulting perspective, it has the advantage of being aligned with what 90% of your Fortune 500 end-users are familar with (MS Office)... simpler handoffs.

That being said...not my first choice for after hours work (do a lot of MySQL, looking to step up to Postgres).

Incidently (for the other thread) - one really nice thing about oracle is their statistical analysis functions - very nice set of tools, co-located with your data, and can be accessed through SQL. Only free solution I've seen that is competitive in terms of functionality is Postgres...

Lightswitch ( http://www.microsoft.com/visualstudio/en-us/lightswitch ) also seems to be aimed at the sort of thing you describe. RAD, but providing a solid MS architecture in the background. Have you looked at it yet?

I think it would be fair to say that Jet (Microsoft Access) is a least as much a database as SQLite.

I wont argue, but Jet hasn't been the default backend to Access in a long time. It is SQL Server Desktop Edition now, IIRC.

A nice distinction would be SQL Server and SQL Server Express. Express being free, it would be interesting to see how many people are using SQL Server Express as a free alternative to MySQL and how many are using SQL Server as the paid alternative

Yes, sadly FoxPro died when it was acquired by Microsoft.

Microsoft has key-value database on Azure as Azure Tables. They also have Azure SQL which is different than just SQL Server (though very similar).

I have to agree, and planned to comment just to say the same. SQL Server is incredibly powerful, easy to use, and scales okay. If StackOverflow is any indication, people are seriously underestimating (and mis-representing) the Microsoft offerings.

Just to name a few:

* SQL Server

* SQL Server Express

* SQL Compact 4

* SQL Azure

Hey Redis folks, if you haven't played with http://radishapp.com/ yet I'd love to hear your feedback on it.

Also, go Postgres! Woot!

Failure. You don't know your audience.

Fortune 500's will pay you to spit out graphs, but they don't use bleeding edge tools like Redis.

Anyone who uses Redis in any serious fashion already knows that the numbers you show on the homepage are easy to get to. Just write a script to pump them into $graphing_software.

My advice would be to give the current product away for free (yes right now), making the barrier to entry lower than doing it yourself and capturing future customers. Then focus on selling people things that are hard, like real actionable intelligence. Everything you charge for should answer a question like "what", "who", "where", "why". Free stuff should answer questions like "how many".

[Edit: Just realized you are the Hoptoad guys (of which I am a happy user), so I know you have the potential]

I'd totally use an open source clone of that.

I think this same argument goes for any kind of SaaS. Hopefully with us we provide enough value over rolling your own Redis monitoring/data analysis that you won't need to.

Agreed. Or at least a free version of it.

There's a 30 day free trial...sign up and give it a spin!

Interesting to see Tokyo with so few votes. On benchmarking it had the smallest memory footprint when loading ~17GB of packed binary data. Been awhile but I remember testing redis, postgres, mongo, and bdb. This let us keep all the low latency read-only data we needed available in memory and provision the smallest possible machine to do it.

I bet a lot of the proportion is down to most people writing in with what they use in their day jobs.

Also, while I love Tokyo's speed, it is just a better bdb - which is an impressive feat, but I can see why a lot of people would use something slower but more featureful.

Tokyo is my standard database for personal projects and goes great with Lua. Both are lightweight and fast, Cabinet has an excellent binding, and Tyrant has embedding.

How about adding IBM's DB2, it seem there are very few companies actually using it.

A lot of companies use DB2, they just don't write blog posts about it.

When you have a problem with DB2, you don't write a blog post about it, you call IBM.

Nailed it :)

With new PostgreSQL major releases (latest one is 9.0 w/ embedded replication) its getting better and better. The community rocks and there are exciting new features coming to 9.1 (i.e. synchronous replication).

Full disclosure: My daily job is postgres developer/consultant and I love it :)

Legacy apps, Ingres.

All sorts of odd little things, SQL Server.

Various legacy data processing and newer data warehousing jobs, SAS.

New projects, in theory Oracle but there seems to be a degree of resistance. It'd be interesting to see how that pans out but I won't be around there much longer :)

Ingres is underrated, I find it quite ok to work with.

It might be good potentially but it certainly isn't as we're currently using it. Quite aside from the incredibly basic console apps we're having to use to browse and test against the DB server, performance isn't exactly its strong point and it has sometimes been crashed by running individual queries. I wasn't impressed.

Large corporations also consider IBM's db2 equivalent to Microsoft's/Oracle's database. There should've been an option for that too.

The relatively low amount assigned to couch is what surprises me. I thought it was fairly widely used (in the NOSQl world)?

These numbers seem representational to me from what I've seen/heard in the SF Bay Area.

MongoDB is typically introduced to optimize part of a stack, although more and more it is used as a sole/primary data store. I think ORMs and use of Mongo by prominent consulting shops helped boost adoption.

The particularities of CouchDB replication are very well suited to a enterprise application with a distributed architecture that I'm working on. I hope it sticks around for a long time, even if it doesn't have the biggest dev user base.

> The particularities of CouchDB replication are very well suited to a enterprise application with a distributed architecture that I'm working on.

That is what made it stand out for us. CouchDB carved itself a nice niche in that area and currently doesn't have any viable competitors. We have a cluster that runs continuous replications to sync data around and it works great. Had we used Erlang we would have used Mnesia (the dataset and pattern of access fits it well) but we use Python & C so CouchDB works great for us.

Also wondering what future will bring and if they'll ever end up with some hybrid of Membase+Couchbase? Reliable document saving and replication + fast key value store, rolled all in one?

Well, we're not using it ourselves, but at one client our application connects to DB2... on an AS/400... using JRuby :).

nitpick: DB2 on the iSeries (nee AS/400) is DB2 in name only.

That being said, I do the same thing using Python.

Percona builds of MySQL. They're awesome, and Percona support is much, much better than official MySQL support.

Percona builds should become the default install for MySQL (only half kidding).

MySQL, with SQLite for development environments. I want to learn/migrate to Postgres though, what with MySQL being in Oracle's hands.

One of the best things you will ever do.

Agreed. I had to migrate a copy of Team City from MySQL to PostgreSQL a few days ago (due to a buggy mysql version in the Ubuntu 10.04LTS package repository). You just don't get crap like that with Postgres. It just works.

Voted "other".

We're using the App Engine master/slave datastore for getcloak.com. We're moving over to the HRD soon; the role of the HRD in App Engine's future wasn't clear when we started building our app.

Same here. Voted other. I have a big database on the App Engine, and I'm moving more of my data from mysql to the datastore.

Out of curiosity, are you on the HRD already? We have some fairly sensitive billing code and... well, I'm nervous about the transition to HRD...

Do it. We made the switch a couple of months ago, and haven't looked back. Very little impact on performance, huge impact on reliability - we hardly ever get datastore timeouts now. And of course no maintenance outages either.


How do you move data between datastores? Do you have to create a second app and migrate manually?

Yes; you can use their bulk dump/upload tools to help make it happen...

Ah, thank you. Unfortunate that they don't provide an automatic migration tool...

InnoDB with clustered indexes is pretty hard to beat for many scenarios. I hate that Oracle owns it, but damn it works well. Also frequent cascading deletes are surprisingly fast, especially in high lock contention environments. [As a side note, it's unfortunate that Wordpress still uses MyISAM by default]

Redis has also been incredibly fast for our ad network. We throw around 7000 qps at the thing with a low number of writes and we haven't restarted the daemon for months. [Thanks Salvatore!]

>InnoDB with clustered indexes is pretty hard to beat for many scenarios. I hate that Oracle owns it, but damn it works well.

After Oracle stopped distributing the source on their website, an open-source project started continuing its development here: http://www.haildb.com/

FileMaker. Yes, just laugh. I also wish they would replace their proprietary database system with something more standard, however the developement process is really, really fast and easy to learn.

They should make the DB some SQL variant and use web technologies as their layout engine and it would be a pleasure to work with. I tried to learn Rails and find it hard to get into, though I'm lacking experience as a developer.

My company has 130,000 employees, so I suspect every one of the above is represented somewhere, but the code that I hack on runs on PostgreSQL.

None. I recently re-designed my project to use pre-generated static files.

I am surprised that nobody mentions IBM DB2.

In the bank where I work, the banking system uses DB2 on mainframes.

Our data warehouse uses SAS.

I love couchDB, I'm surprised its not more popular here.

I've voted MySQL because it's the one I personally choose where possible, but applications I write interact with various kinds of datastore including Oracle and Microsoft SQL Server. (We probably have more of those.) One application I made has its canonical datastore inside a series of SharePoint lists, which gets sent to me in CSV via Microsoft Access. Well-meaning but technically incompetent users will find all sorts of ways to store their data!

LDAP as a hierarchical data store with same flat schema for each node containing application specific serialised Python dictionaries.

Oh, and PostgreSQL for reporting.

dan you should add 'hosted kv' as an option to account for appengine, simpledb etc.

It's really hard to compare results between both polls. It would be nice if pg added the percentages next to each item, or at least sort them.

Here is a graph of this poll: http://koldfront.dk/misc/hn/database2011/data.png

... and one of the 2010 poll: http://koldfront.dk/misc/hn/database2010/data.png

(Updated every ~4 hours currently; I will be fading out the update rate.)

The bars are different sizes and in different orders, this makes it much harder to compare by flipping between browser tabs.

They reflect the polls, so they also tell you that the polls didn't have the same choices.

The two sets of data ought to be shown in the same graph, really.

Edit: I have now joined the two years in one graph, and plot the percentages to make it easy to see the change: http://koldfront.dk/misc/hn/database/data.png

Very surprising thing is the upward change in Microsoft SQL Server, that too in HN.

On a lighter note, does this support the theory of HNers who complain that HN is not the same any more and it has been generalized?

Maybe more Microsoft SQL Server users follow Hacker News now than before?

SQL Server! I especially love the way I can push in a large dataset and have it upsert the lot in a couple hundred milliseconds!

Almost all of the above. Because my company ships an appliance, we've got Postgres, Tokyo, Redis and several other semi-databases (Judy, Memcached, etc) on the appliances, and Postgres, Couch and HDFS/HBase on our side.

Much of the reason for the large number is for legacy code that is quickly being replaced. We're settling into Postgres, Redis and Couch.

flat files!

My company uses actual filing cabinets that have actual physical files.

Do you work at an elementary school, by any chance?

I'm not trying to be a wise guy - it's more of a comment that elementary schools are often last to implement technologically savvy solutions (i.e. solutions that cost a lot of money), so I wouldn't be surprised if you said "yes" to my question.

what kind of sharding scheme do you use?

Duh: Customers with a last name starting with A through E go in the drawer labelled A-E, etc. We just implemented a RAID 1+0 solution that involves a fleet of entry level workers furiously photocopying every document and putting them in boxes for Iron Mountain...

I'd imagine by filename :)

In school before learning SQL I wrote my personal site's blog on engine my own flat file database hack. :) Fun times.

Likewise. Perl's DBD::CSV module was my greatest discovery - SQL queries across flat files! No joins though. :)

Me too... in Excel 95's VB for applications! fun times.

I hear you.

IBM DB2. It's still unmatched for XML support.

What do you think of MarkLogic?

It's not really comparable. We need the flexibility of supporting both relational and hierarchical models in a single database with high performance for OLTP applications. MarkLogic doesn't target that market.

I work for US Steel and what you use here depends on the type of programming you do. On the mainframe, everything here is either DB2 or IMS (aka DB1). If you do "alternate platform development" (read: Windows and/or Linux) then you use Oracle. I'm not a mainframer so everything I create personally has to work with Oracle.

Speaking of Databases - Has anyone here @ HN got a first/second hand experience migrating from Oracle 10G to PostgreSQL?

Let's say migrating a multi node 10G RAC instance - what would I lose in terms of functionality - Distributed transactions, limitations on # of replicated nodes, RAT etc.

Just curious how far it has come in terms of replacing Oracle.


What is it about Postgres that you don't like?

FileMaker Pro in my case. I build my own DB apps, but then I'm only a SOHO, so I'm stretching the defn of company.

Despite a desire to use Postgres, we use MySQL in production. MySQL is what I knew and thus still what I know. Given that most future work involves an ORM, probably moving to postgres as soon as I learn more about configuring the server.

Currently using memcached (not memcachedb) for caching, thinking of trying out Redis.

It might be more interesting to see what purposes the database is being used for. Are companies choosing different databases for embedding in saleable products, internal applications or saleable services? We use PostgreSQL for our product that we sell, but we use MySQL for in-house applications.

Not really in the line of work where my company makes any significant use of database software, but I've been using Postgres lately for some personal projects. Based on my testing, it's a bit slower than MySQL but uses significantly less memory, which is important to me. Very happy so far.

No Salesforce option on the poll? Or do you just pick Oracle for that?

edit: Why am I getting downvoted for this question?


Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact