I had the misfortune to use MongoDB at a previous job. The replication protocol wasn't atomic. You would find partial records that were never fixed in replicas. They claimed they fixed that in several releases, but never did. The right answer turned out to be to abandon MongoDB.
> Did any of you actually read the article? We are passing the Jepsen test suite and it was back in 2017 already. So, no, MongoDB is not losing anything if you know what you are doing.
Can you imagine saying the phrase "if you know what you are doing," in public, to your users, as a DevRel person? Unbelievable.
- The system warns about unsafe usage at either compile time or runtime, and you ignore at your peril.
- The system does not warn, but official documentation is consistently verbose about what is required for safety.
- Official documentation isn’t consistently helpful and can be downright dangerous, but the community picks up the slack.
- The company gaslights the community into believing it is possible for a non-core-team member to “know what they are doing” from one of the above levels when Jepsen provides written evidence that this is not true.
I’m fine with things that are the third level from the top. I like to live dangerously. But I don’t think anyone can look at that last level and say “people are giving informed consent to this.”
However I can _quite easily_ see how a non-native English speaker could use the phrase “if you know what you are doing” to mean “if you are careful”.
He just replaced "cannot consistently synchronize data" with "cannot consistently deploy a system that can consistently synchronize data". But what's the difference between those statements to people that need to solve that problem? None.
But obviously it is a tactic of the doomed.
This kind of thing is a scourge. I had a Chinese friend respond to something I said once by saying "that's nice". It looks so innocent... but it's really hard to overlook the fact that "that's nice" is a serious insult coming from a native speaker. I had to ask them to please never use that phrase to me again.
You can also be explicit about who's being nice. "That's nice of you."
This really depends on what you want to mean by "nice".
To my ear (American who grew up in the South, lives in the Midwest, and works with people largely from the Mountain West), "that's nice" could definitely be used dismissively or sarcastically, but there's any number of ways to say it that are actually sincere and genuine, and I can't imagine a situation in which it would be a "serious insult".
I mean, I can imagine that in a professional setting, if the coworker was saying it with a sarcastic tone or being showily bored or dismissive when saying it, that their behavior might be insulting. But anything can be insulting if it's delivered in an insulting way. "That's nice" has no particular edge to it to my ear.
So I have to assume you aren't American, or that this is a regional thing that I don't have experience with. In any case, your reaction to "that's nice" reminds me of an American friend of mine who moved to London and when her coworkers would ask her if she had a preference about where to get lunch, she would reply "I don't care", which would be totally normal (to me) in the US. But to her British colleagues, that word choice made it a very rude thing to say (the appropriate reply being "I don't mind" to indicate that she didn't have a preference and was willing to go anywhere).
I'm much more concretely worried by a software design for which the authors (not hostile critics) consider "if you know what you are doing" an acceptable safety and quality standard for data integrity.
Actual example: a long time ago someone in my company introduced a race condition into their product because they didn't know about transaction isolation levels (or the locking facilities exposed by MSSQL). I can give many more, as I'm sure anybody here can.
All complex tools need large investment in time to understand. I suspect MongoDB's team are using that as an excuse but in general you must jnow what you're doing with any tool.
If it happens because the driver just forgot to signal, it's a mistake and it's hard to blame the car.
It's like knowing what transaction isolation levels can be used, but asking for the wrong one or trusting a random default.
On the other hand, if the directional lights remain off despite pulling the lever because the car is in a probably illegal power-saving mode, activated by turning the headlights off, and the driver should have reactivated directional lights with an ambiguously named item in the depths of some options menu, blaming the accident on user error would be very partisan: it's the car that doesn't know what it's doing.
It's like figuring out the correct transaction isolation level according to DBMS documentation, asking for it, and expecting it is actually used.
I imagine things are better now.
on edit: between the lines.
I could store it in a table with key and value columns, but since I always use it together as one thing, I don’t see the benefit and it just means more rows would need to be accessed.
Maybe its not a good design, but it works well for me and makes my life easier.
Hyperbole aside, the best option often is somewhere in between. I find that a relational database with columns for primary keys/relations/anything used in WHERE statements in the normal application path, and a json blob for everything else that's just attributes for the current row/object/document, makes for a very flexible and successful DB schema. You get all the strong relational consistency benefits of traditional schemas, plus the flexibility to add and modify features that don't require relation changes without touching the schema, and the ability to represent complex structures while still allowing ad-hoc and admin path queries to peek into them where necessary.
In fact, most "fully normalized" databases end up reimplementing a key/value store or twelve in there anyway (e.g. config settings, user attributes, and the like). Might as well just use JSON at that point.
So I have to agree with GP wondering why JSON is so important to people, and is even portrayed as a relief or saviour to developers. In my experience, JSON in a relational DB is always a sign of organizational failure, developers using the wrong tool for the job, or not knowing what they want or do.
Fully normalized databases are a nice academic idea, but the supposed benefits of going all the way don't materialize in the real world. That kind of approach is just old school cargo cult - just like full NoSQL is new school nonsense. Good developers know that the answer isn't to use whatever was the fad was when they were in school, be it fully relational databases or NoSQL or anything else, but rather to look at the available technologies and take the most appropriate bits and pieces in order to make a project successful.
After all, if JSON were nonsense, why would a serious relational database like PostgreSQL be adding full support for it? They know it has good use cases.
I know it has full use cases, so I use it, along with proper primary and foreign keys and a properly relational base data model. Yes, all my primary objects are tables with proper relations and foreign keys (and a few constraints for critical parts; not for everything because 100% database side consistency is also an impossible pipe dream, as not every business rule can be encoded sanely in SQL). Just don't expect me to add a user_attribute table to build a poor man's KVS just because people in the 90s thought that was the way to go and databases didn't support anything better. I'll have an attributes JSONB column instead.
And yes, JSON is just a trivial data serialization format that happens to have become de facto. It has an intuitive model and the concept isn't new. It just happens to have been what became popular and there is no reason not to use it. Keep in mind that PostgreSQL internally stores it in a more compact binary form anyway, and if clients for some programming languages don't yet support skipping the ASCII representation step that is an obvious feature that could be added later. At that point it ceases to be JSON and just becomes a generic data structure format following the JSON rules.
What's the point of putting, say, every single user management field into columns in a "users" table? The regular application is never going to have to relate users by their CSRF token, or what their UI language is, or any of the other dozens of incidental details associated to a user, some visible, some implementation details.
What matters are things like the username, email, name - things the app needs to actually run relational operations on.
If you look at any real application, pretty much everyone has given up on trying to keep everything relational. That would be a massive pain in the ass and you'd end up with hundreds of columns in your users table. You'll find some form of key/value store attached to the user instead. And if you're going to do that, you might as well use a json field.
And if you do use a json field with a database engine that supports it well, like PostgreSQL, you'll find that it can be indexed if you need it anyway, and querying it is easier than joining a pile of tables implementing a KVS or two. Because yes, I might want to make a report on what users' UI language is some day, but that doesn't mean it has to be a column when Postgres is perfectly happy peeking into jsonb. And I don't need an index that will just cause unnecessary overhead during common update operations that don't use it, when I can just run reports on a DB secondary and not care about performance.
I designed an application in this manner and it has turned out exceedingly well for me. We have only had about a dozen schema changes total across the lifetime of the app. One of them involved some JSON querying to refactor a field out of JSON and into its own table (because requirements changed) and that was no problem to run just like any other database update. If we need to move things to columns we will, but starting off with an educated guess of what will need to be a column and dumping everything else into JSON has undoubtedly saved us a lot of complexity and pain.
One of our tables is just a single primary key and then a json blob. It stores event configuration. Why? Because it's always loaded entirely and there is never any reason to run relational ops on it. It's a huge json blob with lots of little details, only edited administratively, with nested fields and sub-fields (which is way more readable than columns, which are a flat namespace), including a sub-document that is actually a structure that drives generation of a pile of HTML forms. If I'd tried to normalize this I would have ended up with a dozen tables, probably more complexity than the entire rest of the DB, and a huge pile of code to drive it all, and every single application change that added an admin knob would've had to involve a dabatase schema change... All for what? Zero benefit.
I don't need JSON per se, I want to store data with a predefined, recursive format which most of the time will be just serialized / deserialized but occasionally also queried without having any idea ahead of the time what the query will be. For all I care, it could be Python pickle, PHP serialize, XML or whatever ungodly format you want as long as the above stands. (XML actually works in MySQL thanks to ExtractValue and UpdateXML but alas, I break out in rashes when I touch XML :P)
Let's say you want to display a page of text and images. (Doh.) But you want to give your authors great flexibility and yet nice styling so you give them components: a rolodex, a two-by-two grid, tab groups and so forth (we have 44 such components). This stuff is recursive, obviously. Are you going to normalize this? It's doable but a world of pain is an accurate description for the results. The data you get back from deserializing JSON should map pretty well to the templating engine.
Rarely you want to query and update things, some analytics, some restructuring etc. If it were just a key-value store with page id and serialized data the only way to do maintenance would be to deserialize each page and manually dig in. Sure, it's doable but having the database do that is simply convenient. That's the reason we use database engines, right? At the end of the day, you don't need SQL, you can just iterate a K-V store and manually filter in the application in whatever ways you want -- it's just more convenient to have the engine do so. Same here. The nicest thing here is that if someone wants ongoing analytics you actually can add an index on a particular piece in the blob and go to town.
I know that you, dear developer, would never produce inconsistent data. But lots of other developers do.
It is often the case that you will need to query that inconsistent json data, but either the pain is too great, or the value too low, to normalize that data. Thus, you dump it as is into a json field or database.
I pretty much refuse to deploy a new instance of it now, I've been burned too often.
As an intern at Shopify, I got an email from MongoDB asking us to switch. Shopify was 10 years old the time. Plus several coworkers would also receive similar emails two years later (and some in between of course).
I have a shirt from MemSQL that says "Friends don't let friends NoSQL" and I wear it proudly.
For us, as a result, it means anytime we add a new model or change a table, we write the table definition in more verbose api, and sometimes resort to sql commands for adding things like defaults in the migrations. (sequelize for some reason can not specify uuid default value for postgres, for example, so we set a default ourselves, even though we don't need one as we have time dependent uuid generator on the client to help with indices).
We kind of learned working around sequelize shortcomings :)
I am still looking for a tool that'd make incremental backups on postgres easier than it is, but for now things are ok.
The biggest annoyance I had with migrations was anything that needed a model to be defined inside the migration. I eventually gave up on that and just wrote my own scripts to work around it. But perhaps I just wasn’t persistent enough...
But I'd think MongoDB the company increasing in revenue isn't totally related to the quality of MongoDB the database. In fact a lot of their products seem to be targeting the "I don't want to learn how to set it up and understand indexes" crowd.
No. I’d consider adding an index. An index is not free, it comes at a cost and that cost may well be higher than the costs of not having that index. For example, if a reporting query that runs once every few hours is lacking an index, the cost of updating that index on every write (and the disk space/memory used) may well exceed its benefits.
Just to note, this is referring to the features for a hosted databased on cloud.mongodb.com, and not something built into MongoDB the database.
I guess to each their own.
But it auto-suggests what index to use and has a button for you to immediately apply it. I'd say it definitely intends for you to avoid learning how indexes in MongoDB work. The index suggestions it makes are often terrible.
> performance [which is laughable even with indexes]
It really depends on your use-case and how you can structure and query your data. For one such service I'm the lead on I haven't found anything faster that doesn't sacrifice the ability to do more generic queries when we occasionally need to. (Yes we've benchmarked against postgres json fields which is always everyone's first suggestion)
But I've always been curious: as a person who like MongoDB, do you have an opinion about Mongo vs. Couch? Keep in mind that you don't have do convince me that there are niches where that style of DB is appropriate ;-) I'm mostly just interested in the comparison since I've never spent any time looking at MongoDB.
It's great for things like a realtime layer of some app that merges data with a slower and more historical layer of data running on a SQL engine or something safer. Or for services that provide realtime or recent-time analytics, while storing your historical data somewhere else (see any patterns here so far? :P ). In my case the main usage is for an advertising bid and content serving engine, which was pretty much the ideal example use-case for MongoDB mentioned in books I read years ago when first learning it.
Just to note, yes the data integrity problems are "fixed", but only if you configure your instances properly and your read and write statements properly. It's not terribly hard to do, but I don't know if I would really go recommending MongoDB for newbies. If you know how to configure it properly for your data-safety needs, and would benefit from being able to have a flexible schema in early development.... I'd still maybe suggest looking at other document DBs unless you need the read/write speed Mongo can give on simple queries (and fresh projects probably do not need that)
For situations where you don't know the schema or for different schemas per record mongo is a great place to dump.
For data when you care about speed and don't care about losing some data. Think sending back a game screen when the client moves and requires a redraw. Depending on how fast the screen is changing dropping a screen isn't the biggest deal.
Reporting was a little bit more difficult but somehow rewarding.
You’d be astounded how common it is at so-called “enterprise” startups. It blew my mind.
A lot of people simply never went through the LAMP stack days and have little/no experience with real databases like Postgres (or even MySQL). It’s disheartening.
Are you sure?
The docs you likely found on the wiki are dated, but MongoDB is definitely being used in Enterprise Engineering.
Source: I'm currently on the EE traffic team.
It's difficult to make accurate blanket assertions about large technical organizations, which Facebook certainly is.
I am amused by the downvotes that my previous comments received.
Thanks for the clarification.
As an example: would you consider the backend software stack that manages physical access to the campus 'mission critical'?
In much the same way I wouldn't say that my site is powered by Microsoft Excel; but you can be sure Microsoft Excel is used in my company.
Mission critical - essential for operating Facebook.com
Is is a company wide system? Its all fun and games until the cooling in a dc starts to have issues and the facilities team can't gain access because....mongo... ?
Disclosure: I worked at Facebook, but not in that department, or anywhere near any MongoDB.
Curiously, MongoDB omitted any mention of these findings in their MongoDB and Jepsen page. Instead, that page discusses only passing results, makes no mention of read or write concern, buries the actual report in a footnote, and goes on to claim:
> MongoDB offers among the strongest data consistency, correctness, and safety guarantees of any database available today.
We encourage MongoDB to report Jepsen findings in context: while MongoDB did appear to offer per-document linearizability and causal consistency with the strongest settings, it also failed to offer those properties in most configurations.
This is a really professional to tell someone to stop their nonsense.
MongoDB explains that pretty well: https://www.mongodb.com/faq and https://docs.mongodb.com/manual/core/causal-consistency-read...
Postgres most certainly does fsync by default.
It's tru, you can disable it, but there is a big warning about "may corrupt your database" in the config file.
My point is people complain about MongoDB are the one not using it most likely, MongoDB is very different from 10 years ago.
I like to remind people that PG did not have an official replication system 10years ago and as of today is still behind MySQL. No DB is perfect, it's about tradeof.
So wal is synced before commit returns, and if you power cycle immediately after, the wal is played back and your transaction is not lost? So it's fine?
It does not need to sync all writes, only the records needed to play back the transaction after restart. This is what all real databases do.
So PG keeps data consistent by default - unlike MongoDB.
> MySQL and PG are not truly consistent per default, they don't fsync every writes. MongoDB explains that pretty well [links]
Where in those MongoDB doc links is there anything about MySQL or PG?
Whatever failings MySQL or PostgreSQL may or may not have are not important at all here.
On MySQL: https://dev.mysql.com/doc/refman/8.0/en/innodb-dedicated-ser...
InnoDB uses O_DIRECT during flushing I/O, but skips the fsync() system call after each write operation.
The fsync thing is more complex than it looks like.
And, obviously that's a bug, it's designed to do so.
Also, if you write with O_DIRECT, a fsync is not needed, as it's how you tell the OS to block until written.
>>> I have to admit raising an eyebrow when I saw that web page. In that report, MongoDB lost data and violated causal by default. Somehow that became "among the strongest data consistency, correctness, and safety guarantees of any database available today"! <<<
It's not wrong, just misleading. Seems overblown given that most practitioners know how to read this kind of marketing speak.
So basically whatever MongoDB was doing 10 years ago, they are continuing to do there. They did not change at all, yesterday or two days ago there were few people defending mongo that indeed in early years mongo want the greatest, but it is now and people should just stop being hang up in the past.
The reason why people lost their trust with mongo wasn't technical, it was this.
Based on this, my understanding is: most of the time you want a relational database. If a relational database becomes a bottleneck for certain data, and you don't want to do typical scaling solutions for relational data, then you need to know what you'll trade for the higher performance. Based on what you trade, you then decide what kind of data store you will use.
* MySQL: I don't like to rock the boat, and MySQL is available everywhere
* PostgreSQL: I'm not afraid of the command line
* H2: My company can't afford a database admin, so I embedded the database in our application (I have actually done this)
* SQLite: I'm either using SQLite as my app's file format, writing a smartphone app, or about to realize the difference between load-in-test and load-in-production
* RabbitMQ: I don't know what a database is
* Redis: I got tired of optimizing SQL queries
* Oracle: I'm being paid to sell you Oracle
Did I miss something huge?
Arguably the world's most popular database is Microsoft Excel.
If a customer's API was down, the event would go back on the queue with a header saying to retry it after some time. You can do some sort of incantation to specifically retrieve messages with a suitable header value, to find messages which are ready to retry. We used exponential backoff, capped at one day, because the API might be down for a week.
I didn't think of RabbitMQ as a database when I started that work, but it looked a lot like it by the time I finished.
But also no, RabbitMQ and Kafka and the like are clearly message buses and though they might also technically qualify as a DB it would be a poor descriptor.
Back when I worked in LA my CTO used to joke that most places use Microsoft Outlook as a database and Excel as BI tool.
[Source: I was friends with the guy who wrote it as well as other EToys employees. God that was a trainwreck.]
Basically if you're manually scanning the heirarchy for anything but a consistency check or garbage collection you've already lost.
18:35 $ tree .git/objects/
│ └── 9581d0c8ecb87cf1771afc0b4c2f1d9f7bfa82
│ └── 97b950623230bd218cef6aebd983eb826b2078
10 directories, 10 files
Would love to talk to anyone on the EToys team or anyone who has done something similar.
I'm @akamaozu on twitter.
Many of my employer's applications started out as a shared spreadsheet or Access database.
Our development team worked with the users and built a web application to solve the same problem.
This approach has a lot of advantages:
* The market exists and has an incumbent. There's a lower risk of a write-off.
* The users are open to process changes. You still have to migrate people off of the spreadsheet, though.
* It's easy to add value with reporting, error checking, concurrent access, and access control.
* You can import the existing data to make the transition easier. This will require a lot of data cleaning.
Edited to add the following text from another post.
You can cover most of the requirements with a set of fixed fields.
The last 10% to 20% of the use cases requires custom reports and custom fields.
Users should be able to define their own reports and run them without your involvement.
They should also be able to define custom field types with validation, data entry support, etc.
If your web application has these two features and other advantages then you should be able to replace Excel.
I had to work on a tool that shows what's wrong with an assembly line: missing parts, delays, etc... So that management can take corrective action. Typical "BI" stuff but in a more industrial setting.
The company went all out on new technologies. Web front-end, responsive design, "big data", distributed computing, etc... My job was to use PySpark to extract indicators from a variety of data sources. Nothing complex, but the development environment was so terrible it turned the most simple task into a challenge.
One day, the project manager (sorry, "scrum master") came in, opened an excel sheet, imported the data sets, and in about 5 minutes, showed me what I had to do. It took me several days to implement...
So basically, my manager with Excel was hundreds of times more efficient than I was with all that shiny new technology.
That experience made me respect Excel and people who know how to use it a lot more, and modern stacks a lot less.
I am fully aware that Excel is not always the right tool for the job, and that modern stacks have a place. For example, Excel does not scale, but there are cases where you don't need scalability. An assembly line isn't going to start processing 100x more parts anytime soon, and one that does will be very different. There are physical limits.
The devil is in the details, and software is nothing but details. The product owner at the company I work for likens it (somewhat illogically, but it works) with constructing walls. You can either pick whatever stones you have lying around, and then you'll spend a lot of time trying to fit them together and you'll have a hell of a time trying to repair the wall when a section breaks. Or you can build it from perfectly rectangular bricks, and it will be easy to make it taller one layer at a time.
Using whatever rocks you have lying around is like building a prototype in Excel. Carefully crafting layers of abstraction using proper software engineering procedures means taking the time to make those rectangular bricks before building the wall. End result more predictable when life happens to the wall.
Unfortunately which specific features of Excel are acceptable to remove are unknown until you have already way over invested into the project.
The best I've seen this done is having Excel as a client for your data store. Where read access is straightforward and write can be done via csv upload (and heavy validation and maybe history rollback).
That way the business can self-service every permutation of dashboard/report they need and only when a very specific usecase arises do you need to start putting engineering effort behind it.
I suppose you can also supplement the Excel workflow with a pared down CRUD interface for the inevitable employee allergic to excel.
Here is another option that we use instead of CSV import.
Our applications support custom reports and custom fields.
Users can define new reports and run them on demand.
They can also define custom field types with validation, data entry support, etc.
This combination provides some of the extensibility of Excel while retaining the advantages of an application.
Edited for wording changes.
You can complain about their solution or see it as an opportunity.
I posted elsewhere in this thread about my employer's practice of replacing shared spreadsheets with web applications.
This approach works quite well for us and I would encourage you to consider it as an option.
Confluent, the company behind Kafka, are 100% serious about Kafka being a database. It is however a far better database than MongoDB.
ksqldb is actually a database on top of this.
The thing is that they have an incrementally updated materialized view that is the table, while the event stream is similar to a WAL ("ahead of write logs?" in this case).
Because eventually you can't just go over your entire history for every query.
I'd probably choose Postgres over MySQL for a new project just to have the improved JSON support, but there's upsides to MySQL too:
- Per-thread vs per-process connection handling
- Ease of getting replication running
- Ability to use alternate engines such as MyRocks
Though I'm willing to put up with it due to its incredible compression capabilities...
It used to be that bargain basement shared-hosting providers would only give you a LAMP stack, so it was MySQL or nothing. But if you're on RDS, Postgres every time for my money.
For MySQL, I haven't found anything that beats SequelPro. For Postgres, I haven't found anything that comes close to parity, but my favorite is Postico.
I know people that swear by IntelliJ for their db stuff, it just never hit home for me personally though.
I can't compare against SequelPro as I don't have a Mac, but DBeaver's worth a try for anyone looking for a cross platform DB editor
It’s the only DB client that doesn’t look like it was built in the 90’s. Slick UX & UI. Nice balance between developer tool & admin tool
It's really sad because all the contributors to Postgres have made an AMAZING database that's such a joy to work with. And then there's PgAdmin4 where its almost like they just don't care...
I don't feel I'm smart enough to contribute anything to PgAdmin4 to try make it better. So I stick to DataGrip and DBeaver.
Oracle is great if and only if you have a use case that fits their strengths you have an Oracle specific DBA, and you do not care about the cost. I have been on teams where we met those criteria, and I genuinely had no complaints within that context.
Every time I need to work with an Oracle DB it costs me weeks of wasted time.
For a specific example, I was migrating a magazine customer to a new platform, and all of the Oracle dumps and reads would silently truncate long textfields... The "Oracle experts" couldn't figure it out, and I had to try 5 different tools before finally finding one that let me read the entire field (it was some flavor of JDBC or something). To me, that's bonkers behavior, and is just one of the reasons I've sworn them off as anything other than con artists.
I gotta say, as much as I hate it with a passion, and as often as it breaks for seemingly silly reasons (so many deadlocks), it's at least tolerable (even if I feel like Postgres is better by just about every metric).
sqlite> create table foo (n int);
sqlite> insert into foo (n) values ('dave');
sqlite> select count(*) from foo where n = 'dave';
SQLite does not use column typing except in integer primary keys.
I think most people have realised weak typing is not a good idea in programming languages. It’s especially not a good idea in databases.
> As far as we can tell, the SQL language specification allows the use of manifest typing. Nevertheless, most other SQL database engines are statically typed and so some people feel that the use of manifest typing is a bug in SQLite. But the authors of SQLite feel very strongly that this is a feature. The use of manifest typing in SQLite is a deliberate design decision which has proven in practice to make SQLite more reliable and easier to use, especially when used in combination with dynamically typed programming languages such as Tcl and Python.
It's intended behavior that's compatible with the SQL spec.
The big issue is that sqlite does full db locking for any operation, so during any write you can't just easily read at all.
This can be fixed with WAL mode, but WAL mode is broken in uts early versions, and new versions of sqlite aren't in all disteos yet, despite being out for almost a decade. And even WAL mode gets abysmal performance.
They have their own layer on top that happens to use SQLite as the storage format on disk. This layer means they aren't using full SQLite at the application level, but rather using their custom database in the application, and SQLite within their custom database.
Further, they've customised the SQLite codebase as far as I can tell to remove much of the functionality that SQLite uses to ensure that multiple instances can safely edit the same file on disk together, then they memory map the file and just have many threads all sharing the same data.
: FoundationDB also does this, and scales to thousands of nodes. The trick is that it's essentially _many_ separate, very simple SQLite databases, each being run independently.
I'm familiar with the variant, "InfoSec won't let us deploy a DB on the same host".
I can tell you this emphatically as I spent 6 months trying to eke out performance with MySQL (5.6). PostgreSQL (9.4) handled the load much better without me having to change memory allocators or do any kind of aggressive tuning to the OS.
MySQL has some kind of mutex lock that stalls all threads, it's not noticeable until you have 48cores, 32 databases and a completely unconstrained I/O.
EDIT: it was PG 9.4 not 9.5
They were both the latest and greatest at the time
> redo the benchmark today and I’ll be surprised if you come to the same results.
I would, but it was not just a benchmark, it was a deep undertaking including but not limited to: optimisations made in the linux kernel, specialised hardware along with custom memory allocators and analysing/tracing/flamegraphing disk/memory access patterns to find hot paths/locks/contention. (and at different scales: varying the number of connections, transactions per connection, number of databases, size of data, etc)
It was 6 months of my life.
> PGsql even has a wiki page where they discuss implementing MySQL features and changing their architecture so they can scale.
Just because mysql has some good ideas doesn't mean it scales better. I know for a fact that it didn't in 2015. I doubt that they have fixed the things I found, I could be wrong. But it would have to be a large leap forward for MySQL and PostgreSQL has had large performance improvements since then too.
also, I read that page and it talks nothing about scaling, just that some storage drivers have desirable features (memory tables are very fast, and PGSQL doesn't support it; archive tables are useful for writing to slower media, you can do this with partitioning but it's not intuitive)
yes, I should run the test again, but it was 6 months of my life, and I don't see how much could have changed.
Logical replication or synchronous multimaster replication may meet your needs.
Almost none of is remotely accurate e.g. RabbitMQ isn't even a database.
It may be good idea to take a break from the computer and find something less stressful to do.
The fact that you put Kafka and RabbitMQ in the same category sort of makes me feel like you're out of your element, Donnie.
We use it for a very specific use case and its been perfect for us when we need raw speed over everything. Data loss is tolerable.
Edit: never mind, I think the other URL - http://jepsen.io/analyses/mongodb-4.2.6 - deserves a more technical thread, so will invite aphyr to repost it instead. It had a thread already (https://news.ycombinator.com/item?id=23191439) but despite getting a lot of upvotes, failed to make the front page (http://hnrankings.info/23191439/). I have no idea why—there were no moderation or other penalties on it. Sometimes HN's software produces weird effects as the firehose of content tries to make it through the tiny aperture of the frontpage.
I'd pay to watch Kyle screaming at people in the MongoDB offices, not that he screams or anything. Just a spectacular mental image: "IT'S NOT ATOMIC! IT COULDN'T SERIALIZE A DOG'S DINNER!"