PostgreSQL was built with their priorities being: correctness first, then features, then performance.
MySQL was built with their priorities being: performance, then features, then correctness.
Also, as somebody who's being very careful with which fields I update, keeping in mind indexes, I'm still in shock about learning that PostgreSQL updates all indexes on a table, even for columns not included in that write - that ha to have a cost.
Edit: non-updated columns, I mean.
`/dev/null` is the fastest "database" for writes ;)
Interesting, I've found the exact opposite to be true the last time I looked. Most comparisons which I've looked at which have shown PG to be faster are usually making really basic mistakes in scaling up the MySQL config (which is, admittedly, not a trivial task, but there are a lot of resources out there).
Can you point to any benchmarks or usage patterns which show PG to be the better performer?
- Postgres Example: to set up Postgres correctly, you need a Postgres user in the OS, who runs the database. That should enforce to not run the database process as root.
> For examples, the Unicode collation sorts "ß" like "ss", and "Œ" like "OE" as people using those characters would normally want, whereas utf8mb4_general_ci sorts them as single characters (presumably like "s" and "e" respectively).
If some row is deleted due to a foreign key cascade constraint, delete triggers are not executed. When you delete directly, they run obviously.
It's worse than having no trigger support.
I always got the impression (since I started using it in Prod in the late 90s) that the MySQL team had never worked on an RDBMS before, even as users, and didn't really understand it. Back then they would say, you don't need foreign keys, you can just enforce consistency in your application, you don't need transactions, you can just handle it in your application, and so on and so on. Monty Widenius was very arrogant and thought he knew everything. Eventually they matured a bit and realized that actually, yes, the entire rest of the database community weren't idiots, maybe there is something to these features, but now they needed to find a way to retro-fit them onto what they had and now - 20 years later - they still haven't figured out how to smush it into their architecture.
Whereas the Postgres crew were fresh out of the Ingres project led by the genius Michael Stonebraker, they had a solid foundation and very clear vision and a culture of doing the right thing, not the easy thing. There's a steeper learning curve to Postgres but once you have learned a few things you can easily guess the rest because it's so consistent. For MySQL you really need a "guru" because there are so many dark corners and weird edge cases that make no sense, you just have to "know" them.
IIRC that's actually correct.
I was working for an ISP at the time and responsible for a gTLD, my project was called DNSQL, it was as you can imagine a DNS server backed by MySQL so we could edit it with SQL statements rather than manipulating the zonefiles in Perl or sed... Relational constraints and transactions would have made my job a lot easier... A few years later I inherited a 2Tb MyISAM database, which was "fun".
* Thread-per-connection (as opposed to process-per-connection with PG) means much lower memory overhead with many concurrent connections; with PG you might need to use something like pgbouncer, which doesn't come without its own issues.
* Logical replication; the MySQL format is compatible between minor release versions (e.g. between all 5.x releases), which enables upgrading individual replicas or master without complete shutdown. With PG this often a becomes way more complex.
* Better caching/buffer management. Essentially, InnoDB has its own implementation of LRU cache while PG relies on kernel page cache. This generally allows finer tuning on the database configuration level and you don't have dive into kernel page cache tuning (which will be always OS-specific).
EDIT: most of this stuff is probably even better described in the Uber article, linked in other comments here.
Logical replication has the problem that missed messages can have really nasty effects on your data down the road.
* Statement-based is logging raw write SQL
* Row-based is logical effects of executing that SQL
* Mixed uses statement-based for everything except statements that are non-deterministic or otherwise unsafe for statement-based replication
Physical replication in MySQL would be storage-engine dependent. For example, in InnoDB physical replication would mean shipping the InnoDB transaction logs rather than the separate binary logs. Alibaba built something to do this, and from what I understand Amazon may be doing something along these lines under the hood in RDS MySQL and/or RDS Aurora.
but switched back to mysql in 2016 which generated a lot of discussion:
There were a lot of followup posts afterwards arguing both sides if you search for them:
It has a lot of problems, but when you need to scale, you can do so much more affordably with MySQL.
Here is a video "Real Time Analytics at UBER Scale":
EDIT: I'm referring to https://news.ycombinator.com/item?id=14067406 ; my reply to that comment included more specific questions. Unsure of why receiving downvotes here; being somewhat of an InnoDB expert myself, I am legitimately curious why the parent keeps making assertions about InnoDB being a K/V store.
Also, you're describing how InnoDB is designed, not how MySQL is designed. Furthermore, you can effectively get a heap in InnoDB if a table has no PK and no unique indexes. InnoDB will internally use a hidden auto-incremental value in this case for the primary lookup, which more or less effectively acts as a row pointer.
MyISAM isn't used anymore. MySQL has ACID transactions and foreign keys. Get over it. The engine is still there for users who want to use it and that's not a reason to knock MySQL.
Almost every one of the correctness and SQL standard flaws in MySQL is handled by the sqlmode flag. It allows users to choose correct behavior, but doesn't suddenly break the millions of MySQL apps when they upgrade. To suddenly break millions of apps to be correct by default without sufficient time for apps to migrate makes no sense.
The fact is, MySQL is significantly faster for simple CRUD applications which is really common in today's REST and Microservice designs.
That being said, Postgres is an amazing RDBMS, and would probably become the #1 database server if administration tasks like SHOW DATABASES weren't accomplished through \l (or if \l where a shortcut).
Postgres users wonder why people use MySQL. They wonder why people use NoSQL when storing JSON in Postgres can be faster. The fact is, the learning curve of Postgres is needlessly higher than these other systems.
> Postgres is an amazing RDBMS, and would probably
> become the #1 database server if administration
> tasks like SHOW DATABASES weren't accomplished
> through \l
The layman administers MySQL through Wordpress installs, cpanel, and phpmyadmin where nobody even runs a single query against the database by hand.
Those are the things that made MySQL popular.
Not \list vs SHOW DATABASES. C'mon.
I would like to re-enforce a good point you mention: quirks can be a side-effect of making sure you have a good upgrade story. Sure; we would prefer not to have them, but you want a vendor which has a strong commitment to backwards compatibility too. There is a balance: and I think we've gotten better at it in the last 5 years. The default is now STRICT mode on (since 5.7/2015 and new configurations since 5.6/2013).
Upgrade is a feature too :-) Because of the rate of CVEs at the moment, it is important to keep within the vendor EOL policy. Because MySQL Replication is logical, most users do major-version upgrades without downtime.
For anybody depending on MySQL transactions, be sure to keep its implicit commit behaviour in mind. Don't assume that rolling back an active transaction will roll back all the stuff you've done since you called BEGIN. MySQL is still weird.
> would probably become the #1 database server
> if administration tasks like SHOW DATABASES weren't
> accomplished through \l
For the programmer who chafes at such terse command-line syntax, I imagine that they're the type of developer who wouldn't use psql at all, but instead one of the many graphical tools that list databases and so forth on start-up for you.
We also use tokudb storage engine for a large number of servers. This engine is designed for datasets that do not fit into memory, and we've had a great experience with it. We are able to load CSV's in 1/3 to 1/5th the time of innodb. Most queries aggregate queries run faster, there is online index creation and column changes. From a DBA prospective tokudb prints detailed statuses in the show processlist! It is definitely a niche storage engine, and not right for all cases but this has contributed to our continued use of MariaDB of PostgreSQL.
With PostgreSQL 10 things are looking interesting. We tested out 9.6 with it's parallel query and saw some significant improvements on aggregation queries. It was not enough by itself to warrant use moving off of MariaDB but has me very excited going forward.
MariaDB is not sitting still, there are many improvements coming to 10.2 which should be released soon. For most applications I'd use whatever database you are familiar with. If you have a specific case, terabytes of data, millions of incoming writes, requirements for live aggregation, that is where the PostgreSQL vs MariaDB really gets exciting.
Can you explain a bit more about why this is "a thing" please? Because pretty much every database does this, and has done since the 1970s.
While most engines will still work with frequent disk paging, performance can degrade rapidly. An engine designed for the paging can be smarter about performance, at the cost of performance on smaller datasets.
Postgres shines if you need advanced functionality: recursive queries, lateral joins, window functions, foreign data wrappers, partial indexes, advanced types... The list of things MySQL doesn't do as well or at all, be it user facing or under the hood, is fairly long.
There are plenty of examples of both DBs scaling successfully in the wild.
IMO it's a coin toss decision for most CRUD apps. For more advanced apps, you might want to pick Postgres over MySQL for the same reason some companies prefer niche languages over Java or PHP: the pool of competent engineers is smaller but tends to be of higher quality.
I ran into an issue  with MariaDB a year and a half ago: 'SELECT * FROM table' failed to return a large fraction of the rows of that table. Selecting them in a number of different queries with ranges of ids worked fine. Nothing fancy, no transactions, on an idle database. I might have been unlucky, but I will pick PostgreSQL over MySQL/MariaDB any day if I have the choice. Having had a cursory look at the MariaDB source code and a more in-depth look at the PostgreSQL source code confirms that preference.
They're protocol compatible, but MariaDB has been drifting away from the core storage engines and optimizer for some time. For good and ill (I think you hit the latter).
It isn't really a question of how advanced your app is, unless your app completely relies on some specific advanced pg feature. There are areas where MySQL is substantially more advanced than Postgres, such as replication options, which in turn affects operational maintainability at scale.
> There are plenty of examples of both DBs scaling successfully in the wild.
This is true, but there are far, far more examples of ludicrous-scale MySQL than there are for pg. Compare the list of largest sites/apps using MySQL (see some of my comments from a few months ago) vs a similar list for pg and the discrepancy is quite noticeable.
For MySQL, the largest users were (and still are) the biggest drivers of scalability-related features... and that's despite MySQL being far less of a community-driven open source effort than PG.
From my perspective, the biggest roadblocks for PG adoption at high-volume OLTP use-cases are its historical weakness in replication options, its process-per-conn model, and its lack of clustered index support.
PG replication is improving at a great rate, especially with the addition of logical replication, but the overall spread of options is still behind MySQL (re: multiple kinds of logical replication; multiple options for async vs semi-sync vs sync replication; large ecosystem of tools to interact with the replication stream).
The other two issues are a bit more fundamental. But given the number of excellent engineers working on PG, I'm sure that solutions will be developed in the near future.
My team spent many months trying to mangle MySQL into doing a similar thing (which it will do) but eventually we hit a performance stop with MySQL.
We did many things to improve the performance; switched to using fastsockets in the kernel, changed memory allocators (tried two different ones) but at 40cores and 8 pci-e ssds it was spin locking some function in memory.
Not to mention it's history with throwing data away.
Most people use MySQL because they know it already- and I would say that's fair comment. But getting started with PostgreSQL is easier and it scales better, so it's hard for me to agree with using it in 2017.
To get started with a random test db you can do:
pg_ctl -D database initdb
pg_ctl -D database start
psql -h database
I wonder if redis can be convinced to flush data to disk. AFAIK it does it periodically?
If only SQLite was a tiiiny bit more scalable. It's the only DB where you have to explicitly disable force-flushing of transactions to disk.
$ createdb mytest
$ psql mytest
Simply having the binaries is enough- this is something that gives pgsql another edge on mysql.
May I ask what company you are working for?
Facebook and Youtube use MySQL, YT via Vitess, in 2017. I am just curious if you could exactly point out what part of PSQL scales better.
Facebook has spent literally thousands of man-years working to make this system scale and perform in the way they need it to, but there is nothing about this effort that required MySQL and in fact a lot of effort was expended working around glaring deficiencies in MySQL. If they had it to do over again I think that no one at Facebook would select MySQL at this time.
There are far, far fewer than 100 people working on MySQL at Facebook. I say this as a former member of the team, and I am still in close contact with several team members.
Additionally, every engineer at FB has the ability to issue ad hoc SQL queries, and many teams do use special-case MySQL DBs that they interact with directly. I was the lead on FB's DBaaS so trust me that I know what I'm talking about here. Over 1000 engineers had used the DBaaS at the time I left the company.
There is no legacy issue involved. FB is changing storage engines in MySQL right now; they have the resources to completely move away from MySQL if there was a compelling reason. Your statement of "If they had it to do over again I think that no one at Facebook would select MySQL at this time" is completely incorrect.
Even if you ignore FB, a much larger percentage of the largest sites on the internet use MySQL than Postgres.
I will simply state that having also spent a long time inside FB I disagree with your assessment regarding the high regard in which you think MySQL is held within the company.
You can say with absolute certainty that you know the opinion of thousands of engineers at Facebook? This is clearly hyperbole. It does not appear you are interested in having a meaningful discussion on this topic.
> I disagree with your assessment regarding the high regard in which you think MySQL is held
Where did I say it's held in high regard? I simply said there is no compelling reason to move away from it, and also said your numbers were way off base in terms of how many people are working on MySQL at FB, as well as how many people directly interact with MySQL at FB.
I also greatly question how closely you were interacting with the managers of the db teams if you think there were 100 people on these teams!
[EDIT: It's also possible that we just have very different definitions of people "on the db team". For people "working on MySQL" I would only include MySQL Eng, DB Client, MySQL Infra PE, and the majority of Data Performance PE. Arguably you could include some RocksDB people at this point, though their work is used in many non-MySQL ways too, and even then that brings the total to still way under 100 people. Overall I would not include teams like Wormhole or TAO as "working on MySQL", though even if you add them in, still significantly under 100.]
In your second reply (sibling to this one) you seem to state that Facebook's db engineers are motivated by self-preservation (re: staying with MySQL), and you imply they are over-paid. If self-preservation was a motivation at all, FB wouldn't be migrating major data sets from InnoDB to MyRocks, which has a completely different set of performance characteristics and administrative/operational/automation concerns. As for pay, yes the db teams are very well-compensated because they are literally among the top experts in the field, working on an absolutely massive billion-daily-user OLTP system and all of the insane challenges that entails.
Put yourself in my shoes. You worked in security, yes? If I posted some factually incorrect comments about your team, and the engineering motivations and talents of your team, on a public forum like this, how would you react?
I still maintain that while MySQL was the right choice in 2006, if you were starting FB today it would be hard to convince someone that MySQL is a better choice than Postgres.
> if you were starting FB today it would be hard to convince someone that MySQL is a better choice than Postgres
Not sure I agree. Seems like more of a toss-up at the very least. How would you find or build the right talent pool? With MySQL, Facebook was able to hire the top folks from both Google (which cancelled its MySQL efforts around the exact right time for FB to poach) and MySQL AB / Sun.
I also strongly suspect it would be substantially harder to build something like Wormhole on top of Postgres replication.
That said, I will include my usual disclaimer that Postgres has a lot of compelling features that MySQL lacks, and I can definitely see the other side of the argument.
Why's that? It seems like there's all the infrastructure needed for that?
Arguing "x uses y therefore it's good" is a poor metric, I personally spent months with a lot of very expensive hardware finding out if we could go forward with mysql (there was a strong push for mysql), but I found PGSQL not having any of the same single server scaling issues (IE; all 40 cores could be uses, all memory could be used, all disk channels could be used) with no hacking of the kernel or memory allocators.
At some point, it's easier to throw resources at your old stack to make it scale instead of migrating to something else.
Example: Facebook and PHP. They decided it was easier to write an entirely new runtime instead of migrating to something else.
Often, this results in significant improvements (MySQL and PHP are great examples).
If you spend actual time in the database using SQL, Postgres offers vastly more functionality, e.g. window functions, broader and more mature GIS support, more data modelling options. Postgres also certainly _can_ outperform MySQL in a variety of situations - it supports different indexing options (i.e. last I checked MySQL didn't have partial indexes), and is moving to more parallelism in its query plans. MySQL's partitioning support is a little more natural than Postgres's but that's changing in 10.0.
So, I would say, if you just want a back end for an app and have never _really_ cared about your RDMBS, use MySQL or whatever you're most familiar with. If you have complex query requirements, time series data, analytics workloads etc, try Postgres.
But you're also making it sound like MySQL is easier to deal with than Postgres and I find that to be completely untrue. Postgres has far better documentation, sane defaults and only one engine whereas the mysql documentation is all over the place and "googlable information" conflicts depending on your engine or hell, even whether you're using MySQL or MariaDB.
In fact if you don't want to have to care about administration, I'll go as far as recommending Amazon RDS Postgres.
With that in mind, I don't come to a different conclusion though. Just about every database I've worked with has had some timeseries data in it, but separating data out into multiple systems/models is usually something I would avoid until there is no other option.
Better how? FWIW I'm currently running a 1TB time series DB on Postgres. It runs butter smooth, provides great performance (I'm using table inheritance), great querying tools, plus admin load is minimal.
You're mixing your metaphors there. Vertica is a relational DB which deals with time series just fine...
The best advice I can offer is, "Use what you're more comfortable with." Both can be configured on a single machine to serve a ton of traffic. Both can be backed up to recover when they falling over. Both can be configured to keep your data consistent and safe. Don't worry about scaling, don't worry about sharding or how much better indexing method A is than B. Worry about your application.
Now that the obvious is out of the way, I personally prefer MySQL over PostgreSQL. It will be easier to scale when the need arises (the corollary to "avoid premature optimization" is "design to make optimizations easier"), it performs like a madman when configured properly, and there are simply more people out there who can help you configure it properly.
Tangent time: yes, it is comparatively feature poor. I can't store native JSON documents in it, I can't run Python stored procedures, I can't create nearly the same breadth of indexes. Then again, I have never needed to. JSON maps nicely to objects, and an ORM maps those objects (somewhat less nicely) to native tables and columns. I don't run stored procedures. I use purpose built tools for handling (for example) full text indexing when the built-in searches get too slow.
My biggest question, when everyone points out how many features PosgreSQL offers, is "why are you doing that in the database?" Your database is your biggest chokepoint. It's your biggest point of failure. If you're using it for anything more than "transactionally store and retrieve data", you're asking for even more trouble when it goes down.
Not if, when.
It also makes scaling harder; if you're doing most of your computations in the DB, you have to scale the DB when one machine can no longer handle it. Scaling DBs is inherently harder than scaling just about any other component in your stack; you can't just add another server to the pool.
In summary, if you use your database as a database, and only as a database, either option will stand you in good stead. The biggest advantage MySQL offers at that point is that there are simply more experts out there who can help you with it when you need that help (and will probably cost you less professionally).
> there are simply more people out there who can help you configure it properly.
There are also more people out there who can help you mess it up. MySQL has two engines and three different mainlines. Advice that applies to MariaDB doesn't necessarily apply to MySQL. Advice that applies to MyISAM doesn't necessarily apply to InnoDB.
A lot of people in this thread are underestimating how much of a cognitive burden that is. Postgres has one mainline, one engine, one set of very solid documentation and its community is relentless about strictness. This greatly improves the quality of help you will receive (be it through Google, documentation, or contractors) compared to MySQL.
> "why are you doing that in the database?"
I'm a poor person to answer this but a lot of your post really stems on this question. You don't see the need to do certain things in the db, thus it doesn't bother you that you can't do those things in MySQL. That makes sense. At any sort of load though doing certain things in the database is a requirement. You may have to scale the DB earlier than you otherwise would, but it also sometimes means that application-level code you would spin up 8 dedicated mapreduce instances for requires orders of magnitudes less work and you don't have to deal with scaling as early in the first place!
You shouldn't be doing any CPU intensive calculations in the database if you can avoid it, definitely. But these postgresql features are often about improved efficiency. Take a look at new things coming in postgresql 10. All the new features in there are either performance neutral or big performance wins.
FWIW, I've spent the last few years fixing performance problems in PostgreSQL databases. The solution is most often to de-normalize/pre-compute something, which means using stored procedures and triggers, which are most definitely doing more than "transactionally store and retrieve data".
MySQL adhered to a philosophy of "make it fast, then if someone complains enough, consider making it correct".
The result of this is predictable: Postgres is now fast and correct in its implementations, while MySQL is fast and burdened with a large number of "gotchas" and legacy incorrectness.
MySQL's traditional "fast" table type doesn't support transactions. So you have to select a different table type if you want transactions, but then when you do, you discover that you need to usually go "up" one level from what you'd expect, because the REPEATABLE READ level only affects SELECT, not INSERT, UPDATE or DELETE queries. Which in turn means you can have an inconsistency between the set of rows you'd see in a SELECT with a set of conditions, and the set of rows that would actually be affected if you issued an UPDATE with that same set of conditions. You have to go up one level to READ COMMITTED in order to start solving this.
MySQL's Unicode collations have some bugs; when I was at Mozilla, one we ran into was that we had to set up our own custom collation to get, for example, "e" and "é" to be considered distinct characters (if you ever used MDN, and once saw a strange bug where French versions of topic tags showed on an English article? That's why -- MySQL thought "CSS Reference" and "CSS Référence" were the same tag).
MySQL infamously accepted almost any kind of garbage values regardless of declared column type, accepted values larger than the declared column width (i.e., you could insert a 300-character string into a 256-character VARCHAR column), and would silently or near-silently mangle, truncate or just drop the data.
I cannot at this point imagine using MySQL for any kind of serious project.
InnoDB is the traditional table type now. Myisam was forgotten long time ago.
> we had to set up our own custom collation to get, for example, "e" and "é" to be considered distinct characters
You could just use "varchar CHARACTER SET utf8 COLLATE utf8_bin"
> MySQL infamously accepted almost any kind of garbage values regardless of declared column type
There is a setting that allows strict checking.
> I cannot at this point imagine using MySQL for any kind of serious project.
It is used by many for serious projects.
I found the above link in another HN thread some time ago. When I showed it to some PHP devs recently (I unfortunately don't remember exactly who I talked about it to) I was promptly told PHP does not do what is described in the above link (from 2006) and is generally smarter nowadays.
It's sad that languages and runtimes take so long to fix their core insanities (if ever), and it's also sad when systems are built without front-and-center regard to design correctness from the start. But on the flip side, the only people capable of envisioning good designs of this level of scale are often worn out shells of their former selves at the "you'd have to pay me 7 figures" stage... and of course you have to make sure you pick people who aren't completely out of touch with the latest developments in $category science...
You don't get UTF-8 if you do that though, do you? That was MySQL's first attempt at UTF-8 support. You have to tell MySQL to use utf8mb to get complete support for UTF-8.
Edit: In MySQL 5.5 and 5.6 it takes around 1 second but in 5.7 it takes around 70 seconds.
I work on the MySQL team; we are 2x the size under Oracle, and have made many improvements. Please check out my list here: http://www.thecompletelistoffeatures.com/
My boss got a call from an Oracle "sales representative" or something about getting our licenses - and we only use the default MySQL that comes with AMI on AWS.
I think we concluded he had entered contact information into one of the forms while trying to download something.
create table test (name varchar(3));
insert into test (name) values ("hello world");
>Query affected 1 row // success!!
Even though my varchar size is 3, Mysql didn't even complain about my string size in insert statement. The weird thing is that, when you try to retrieve the value it returns only the first 3 chars which is "hel" (What??). I was expecting an error message but Mysql was happy to eat my error!
>> Error, value too long for the given varying (3)
The Postgres looks doing simple things correct for me.
For my own startup which the primary use-case is analytics, I have moved back from Postgres to MemSQL.
The four reasons were :-
1) In-memory rowstore, thus eliminating my in-memory nosql dependency.
2) The ability to join in-memory tables with disk to provide up-to the second data is awesome.
3) Multiple read/writer group nodes.
4) The biggest for me. Columnar storage.
Some examples on how columnar storage is so awesome.
Note that I am comparing MariaDB vs MemSQL on a old desktop. 8 cores, 8 GB Ram, HD (non ssd) drive.
- A query with a count of 1 table with 50m rows.
MariaDB - 49.60 sec
MemSQL - 1.12 sec
- A query with a count, sum, where clause and group by, with 50m rows.
MariaDB - 17 min 44.84 sec
MemSQL - 8 seconds
Here are some things to be aware. Prior to migrating over to Citus. We were shopping for MySQL consultants to see whether they could improve on the above figures. They quoted $70-100k to work on the project, without guaranteeing results.
With Citus I was able to see far better results and this is why the company switched over. However fast forward to now, MemSQL beats Citus soundly. 1 desktop beats a cluster.
I wager, that with a MemSQL cluster with decent hardware on the cloud, I will be seeing sub 1 second queries across the board.
Oh and the amount of data? It's tens of billions of rows a month. That's the workload I'm talking about here.
This post isn't to bash postgres. I loved using it. If Postgres 11/12, included columnar storage. I would literally scream! :)
PostgreSQL ensures on-disk data consistency for each transaction. MemSQL does not implement this kind of complex logic.
VoltDB and MemSQL are of totally different kind of products. There is the CMU database course which explains the details.
For primary operational datastores, it's probably not a fit, but it's great for real time data warehouse needs.
VoltDB is on the other end of the spectrum and more exotic, designed for purely real-time in-memory SQL transactions, almost similar to in-memory datagrids like Apache Ignite/GridGain or Hazelcast.
As a VoltDB dev, I don't accept this. Performance and sync disk persistence for every transaction don't have to be mutually exclusive, they just require the right architecture and engineering work. We get our speed from memory-centric data structures, and by never waiting on disks/users in a transaction. Doing batch commits (fsyncs) to the WAL every few MS means you can keep your throughput very close to an async thoughput without trading any persistence.
I'm not sure how exotic VoltDB is, but I agree it's a different animal than MemSQL. VoltDB is focused on transactional and operational workloads, while MemSQL is more of a pure analytics play. It has "transactions", but with weaker guarantees all-around. We take our guarantees very seriously (see Jepsen https://aphyr.com/posts/331-jepsen-voltdb-6-3).
VoltDB does do "real-time analytics", which we typically define as analytics you can do in milliseconds. This leverages materialized views, special sorted indexes with ranking support, and lots of other tricks that MemSQL doesn't focus on. This enables our users to transactionally respond to an event as it happens in a few milliseconds, all while leveraging gigabytes or terabytes of state.
This is what I said. There's still a lag, regardless of disk speed, batching, append-only parallel writes, etc. It's physics.
MemSQL and VoltDB and other in-memory db with persistence do it the same. Not much of an issue these days with distributed systems anyway since the network is faster and the data is already on another node.
2) The way VoltDB does persistence is actually quite different than MemSQL. VoltDB is able to write deterministic logical ops before the work even starts, while Mem has to write binary logs after the work is finished. The difference in practical terms is dramatic and will show up in almost any benchmark.
... so not really a good comparison then. You're comparing the speed of your 8GB of RAM to an old, spinning bit of metal on an old desktop. Anyone who would use such a benchmark to drive their tech choices is being rather silly.
I'm showing that a columnar store beat out a rowstore. With insane results.
Not only that, but MemSQL on a really crappy desktop beat out a CitusDB cluster of 5 bare-metal servers, each with E5-2620 CPU v2, 128GB Ram, 10x4TB 7200 RPM SATA Drives, RAID 10.
I actually still have the communication with Citus and here's similar workloads they worked on for me:
Query 7: local 1139.0s, distributed 76.2s (~15x improvement)
I could dig up the query, but I'm pretty sure it's similar. But guess what. It took days of consulting and optimising with CitusDB and one of their engineers. With MemSQL I was getting insane results in a matter of hours.
Talking about costs. A MemSQL cluster is going to be significantly cheaper to run than a Citus/Postgres cluster when looking at tens of billions of rows. I know because I'm looking at the CitusDB numbers and the MemSQL numbers when using enterprise products and bare-metal servers.
Postgres has CStore today but even with included columnstore, it still isn't distributed which is one of the issues. Citus is looking into it though so maybe it'll all come together in a few years.
Has anyone here experience with ClickHouse?
That's pretty troubling, but at least they're open about it. That said their performance claims are pretty spectacular, and it seems solidly engineered. Further if you're not planning on using replication it certainly seems interesting. I'd be curious to hear about someone's production experience as well, since the list of companies running it seems rather thin.
Yes, replication in ClickHouse is asynchronous by default. For intended use cases (OLAP queries aggregating data from many rows) data that is a few seconds stale is usually okay. In a serious production deployment you absolutely should enable replication, otherwise you risk losing all your data, not just last couple of seconds of inserts.
That said, sometimes synchronous replication is necessary despite the latency penalty that comes with it. This feature is actually implemented but not yet considered ready for prime time.
We have several years of production experience with ClickHouse (as a DBMS powering Yandex.Metrica - second largest web analytics system in the world). If you have questions - just ask.
It's been good so far. Good support from the maintainers, too.
As a result of this design decision the resource management of each major part, like the buffer pool, the lock manager, etc is explicit and straightforward.
This, in turn, leads to indeed stable and predicable runtime behavior during high loads.
Basically, PostgreSQL is one of very few canonical examples of what sanity and attention to details could accomplish.
- add a null column to a big table => have fun and plan your maintenance accordingly. Good luck estimating it. This is basically a no-op in Postgres
- partial indices; absolutely important for big tables. Think about ticketing systems where in general the number of "open tickets" is much lower than the closed ones
- being able to use advanced SQL features like window functions can easily be underestimated; you can't use potential you don't have
- renaming indexes, just like that. If you're into refactoring and lessen developer surprises, you want to have proper names and fix accidently mistakes or just legacy stuff
- in my experience (but I'm no DBA), the logging of PostgresSQL is much more useful then what I get with MySQL (e.g. deadlocks or slow logs)
- the EXPLAIN (ANALYZE, BUFFERS) output is ... what you get is beyond comparison to MySQL
The only technical downside we experienced with Postgres (but in comparison, these things are benign):
- limit of length of identifiers; even with aliases. This can trip ORMs which try to do some clever naming under the hood
- you cannot change the order of columns, you can only append at the end
To me, everything I do with Postgres feels much more predictable, whereas with MySQL it always felt a game of gamble.
- MySql is the choice for niche workloads and to be used by experts not only in DB but in MySql Quirks.
- PG is the sane choice for everyone. Experts benefit more, but inexperienced developers will be better served from a more solid foundation, and still PG is so good that will be along your side for years to come.
PG maybe will not shine in some niche workloads but we are talking about problems that most DO NOT HAVE.
Years before, I have read (and believed!) that "PHP, MySql, JS" was "better" for inexperienced (or new) developers, because are more "easy" to learn and use.
Then eventually is show how facebook and others used them.
However, after +18 years on the field, I become more and more convinced that is totally the reverse:
- PHP, JS, MySql, etc are tools for experts.
- More strict tools are better for everyone.
- Only expert will use correctly any tool, even if have quirks.
- This is even more notorious if the tool is more dynamic than static, or provide more "freedom"/less "safety", like C VS. Pascal.
- More well designed or strict tools provide equal benefits to experts, but sometimes a tool with not safety-protection allow to do hacky things easier (or possible).
"Jack of all trades master of none" and all that. What's the natural conclusion to this or does it stay this way forever?
It's like there's an online task force to make MySQL look like shit.
And for most cases it won't matter and one should pick what they're more comfortable with. And if the requirements aren't that simple, people should do their own research.
Yes, we call them the MySQL development team. For a long time MySQL was simply the default choice for many projects, and this status has persisted long after it was the best choice and is extending into the period where it is no longer even a reasonable choice. If you have a specific reason to use MySQL then go ahead and use it (and 'it is what I know and am comfortable with' is a reasonable justification) but there are few reasons to suggest it to someone else except for the fact that misery loves company.
It is a mediocre at best technology when there are better options available, and anyone who suggest it to someone else for a greenfield project should be considered an untrustworthy source of technical advice.
There are also a number of tools to work around it for other for other cases:
Basically it gives me everything I need in a relational database and let's me worry about other things.
When I used MYSQL, I found its EXPLAIN counterpart much more rudimentary.
But if I were to teach somebody SQL, I would probably just use sqlite because it just uses a simple binary file as a backend and doesn't need a running process :)
I am not really OPS or DB guy, so I can't comment on meirts of the databases in production.
Seriously, what is "not easy" about Postgres? Just read the official documentation, which is actually readable and good.
About hobby hosting… there is Heroku Postgres, but no Heroku MySQL. (There are third party addons for everything, yeah). Cheap shared Apache hosting is TERRIBLE. It should NOT be used for any new projects.
* MySQL is a really awful database. It's usable only for small hobby projects, like a simple Wordpress site or so. For serious datasets, it's a horror. Complex queries on large amounts of data are just broken. Many features don't work or work incorrectly. The design is fundamentally flawed.
* Use PostgreSQL instead.
* If you absolutely have to use MySQL, use MariaDB instead. It's a fork that fixed some of the issues.
* Still better use PostgreSQL. Or even Oracle or SQL Server. They're fine, MySQL is not.
> use MariaDB instead. It's a fork that fixed some of the issues.
Which ones exactly? ;)
One also needs to consider the entire landscape. For example, the LAMP stack is mature community with out-of-the-box, ready-to-ship web sites. There exists lots of support and tools. Software stacks like Django for Python typically have community involvement for a particular database. If one is using an ORM like SQLAlchemy, what is SQLAlchemy best used with?
Here are just some of the things that should be qualified when asking database choice:.
1. Transaction support.
2. Scale out support.
3. NoSQL and other data store integration.
4. Cloud support if deploying in a cloud. Really depends on your cloud vendor.
5. Location, location, location. Is this an internal application on your company LAN, or a social networking application to be installed around the world? Replication in MySQL quite useful.
6. SQLite? Why did you exclude SQLite? I'm assuming because the person posting this question already knows something about the workload that defeats SQLite? What is that workload? Testing against SQLite is trivial, just create an in memory database.
7. ETL support. Are bulk updates due to ETL processing of files and such common place? Bulk updates happen better when indexes are not attached, etc.
8. Reporting support.
9. Denomalized versus normalized support. It is common place in web applications with data distributed applications around the world to shard a database table, completely denormalized.
The point is that designing a database schema is heavily dependent on the expected workload. The choice of SQLite, PostesSQL , MySQL, SQL server, Oracle or what have you depends heavily on the workload. It is nonsensical to ask what the right tool for the job is without knowing the job.
depending upon workload this can be significant.
Was actually kinda of surprised after using SQL server for years then looking at Pg that it didn't have IOT at all. Thought everyone had that option.
MySQL has phpMyAdmin, which is (compared to some of the tools out there) incredibly basic (and ever so slightly buggy), but really helpful for getting started and providing a barebones "look at the database" interface. (In all fairness, it does have some pretty decent features.)
pgPhpAdmin... isn't really the same. In my case I couldn't even get it working - I don't remember what errors I got but as a complete beginner I wasn't sure how to respond (I think the errors I was seeing weren't giving any hits on google either).
So, I'm curious what other web-based or Linux-compatible postgres administration tools are out there?
btw, if you want to work with mysql you might want to use mysql workbench instead of phpmyadmin.
pgAdmin looks great, and TIL about MySQL Workbench, which looks awesome (wish there was a PostgreSQL Workbench! :P) - this will be great for learning.
Databases are one of those things I know I badly need to learn more about, this is great start. Thanks again.
-> Pg - Pg - Pg - Mo - Pg - Pg - Pg - Pg - My - My - My - My
Round14 - Preliminary data!
-> Pg - Pg - Pg - Pg - Pg - My - My - Pg - Pg - Pg - Ny ....
In 2011, we started out using Drupal with PostGres. After a year or two, we migrated to MySQL.
I think it's likely that the situation is quite similar with WordPress.
it would be easier if you listed what you mean by "cons". Postgres is much much more powerful than MySQL if you are really into SQL and exploiting an RDBMS, MySQL administration is simpler.
This question is a bit too vague to warrant a specific answer.
I would say this used to be true in 5.5 and before but it's less and less of an argument. 5.6 and 5.7 bring a lot of features like JSON or geospatial.
Here's a list of the new feature of 5.7:
(Want to be able to support Marten document store on more than 1 database)
If there was one, I will switch in a heartbeat as I will love to have access to some more advanced SQL than what's available in MySQL (like window functions)
I started using Jetbean's DataGrip product last year when they released it. It is built on the IntelliJ IDEA tech stack and provides a fantastic multi-DBMS platform that has everything I need not only for Postgres but MSSQL and others.
It can also manage SQLite and a few others.
It seems like Everytime on HNs people say postgre is so much better. Why is MySQL so much more popular?
PostgreSQL wins at everything else and the gap is closing fast.
Facebook is in the process of changing MySQL storage engines (from InnoDB to MyRocks), which is a massive migration involving a large number of engineers, yet they're still staying with MySQL overall because it's the best fit for their OLTP workload. Since pg doesn't even support pluggable storage engines, it would be a total non-starter, among many other reasons.
I'm actually using MySQL (MariaDB, actually) in production because it does work fine, does all we need in an OLTP data store, ops tooling is very good and we need reliable multi-master (Galera) for load balancing.
I'm not an expert on databases, but I've heard so many DBAs tell me how PostgreSQL is superior with regards to features + correctness (features that I don't actually need for our use case, but still) except for scaling out.
What's your take on this? If you were to start on a green field, what database would you recommend for a typical SaaS/OLTP workload?
I'm not sure there is a "typical" workload in the general case ;)
Don't get me wrong; Postgres is a really really good piece of software with many appealing qualities. But its fanbase is a bit overly rabid, or at least over-eager to trash MySQL for things that haven't been true in years or never were true in the first place. There are some very valid MySQL complaints mixed in there too, but it can take some effort to separate them out...
Our team is definitely going to stick with MySQL (no reason not to) and I have no reason to hate it, but I always like these sort of comparisons since you learn a lot about the technology.
I'm really interested in PostgreSQL for "real time OLAP" use cases with things like Citus - apparently, many companies use Citus in particular to run distributed analytics queries on their main database.
I had little experience with Oracle SQL at school, and MySQL's bad design became very apparent to me, even when I was a junior dev. I'm afraid devs are repelled from SQL because all they've seen is MySQL. Postgres has everything Oracle has, with an even better design, and minus the price.
PosgreSQL now has its legs, and has a native replication story (incomplete, but improving), but MySQL has a lot more intertia, and has also been improving in that same time. As a result, MySQL just has more (and thus less expensive) experts, which makes a big difference when picking a DB for your company.
MySQL is MySQL or MariaDB.
Not to be trite, but I think the biggest high-level Pro/Con comes down to this: Postgres has the more active OSS community, MySQL has Percona. Seriously, if you're in a situation where you have the option of spending money to solve problems, then having the option of hiring Percona to fix your problems alone makes MySQL a sound decision. If you don't have a large cash reserve, but are willing to spend time combing through message boards and blog posts, pick Postgres.
In my experience, this is unfounded. The Postgresql project is far more careful about data corruption and correctness. A few examples:
Mysql used to truncate records silently (I think this was fixed a while back with strict mode)
Mysql used to let you insert nonsense like 'notadate' into dates (now fixed I think with strict mode, is it in default?)
Data corruption - I've seen data corruption in Mysql when a server was rebooted which led to the server being unable to read the database data, and required a repair before it would restart. I've never seen something similar with Postgresql.
For stable and reliable I'd recommend Postgresql over Mysql without hesitation.
Thats not what I have heard of and experienced with postgreSQL. Its feature set isn't stable in the sense that they add new features all the time, but it is stable in the sense that features that are present stay. They carefully consider features before committing to adding them (example: upset support)
I also haven't heard of unreliability claims w.r.t. PostgreSQL, certainly not more than with MySQL.
Also, MySQL's deviations from the SQL standard (silently chopping data of long strings to fit them in the database, for example) for me are a cause for concern.
I would even say that it's one of the most safe and stable software to ever exist.
The benefit of MySQL is that partitioning the database is much easier to do than in Postgres. There are also cons to mysql in how it handles data by default (silent coercion, not checking for invalid data like invalid dates, strings too long get truncated, etc.)