Hacker News new | past | comments | ask | show | jobs | submit | bit_flipper's comments login

Definitely an interesting question. Some things that may explain why --

Mongo was always AGPL and relicensed to SSPL. This had the following consequences:

* Very few companies and zero large cloud companies ever attempted to run the MongoDB codebase in production as a managed service, other than MongoDB the company.

* Mostly because of the above, MongoDB did not receive many code contributions that did not originate from within the company. There were some, but not nearly to the extent of the others you listed

* The difference between AGPL and SSPL is not nearly as large as the difference between BSD and SSPL or Apache and SSPL.


The license matters less than copyright ownership. Prior to the license change, MongoDB insisted on copyright transfers. Every line of code was owned by them. That's why they were able to re-license it. Elastic, which used to use the Apache 2.0 license, did the same thing: they insisted on copyright transfers as well.

Other people that didn't own the copyright to any mongodb source code of course had the right to take the source code and fork it under the AGPL. But there would have been no choice about the license under which to distribute that fork because of how strict AGPL is. By insisting on copyright transfers, Mongodb was able to dodge that and re-license the entire code base without having to require permission from anyone because they owned all of it.

For the same reason, there never was much of a community of contributors outside of Mongo. Most large companies would have steered clear of that legal mess and declined to contribute or fork. The flip side of course is that this strong ownership marginalized mongodb as a community even before the license change. It simply didn't matter much to most large companies as they would have steered clear of it anyway.

With Redis, this is not the case. Redis the company was an active contributor to the code base but most of the contributions actually came from the outside and they never owned the copyright to those contributions. The BSD license allows anyone (including Redis Inc.) to redistribute the code under whatever license. Which is why Redis can do this. But for the same reason everybody else can continue as is using Valkey without having to worry much about Redis the company having retired from what otherwise is a thriving OSS community.


Were there any MongoDB cloud offerings of the AGPL version?

If SSPL is effective as a poisonous pill against AWS and Co. but SSPL is not, that's a big difference in my book.


Yes, MongoHQ, later renamed compose.io: https://en.wikipedia.org/wiki/Compose.io


mLab/MongoLab, which MongoDB acquired when they started to seriously pursue their own cloud offering.


none that immediately spring to mind. aws reimplemented just the API.


I've run Postgres at large scale (dozens of machines) at multiple companies. I've also run MongoDB at large scale at multiple companies. I like both generally. I don't really care about data modelling differences - you can build the same applications with approximately the same schema with both if you know what you're doing.

I don't understand how folks seemingly ignore Postgres' non-existent out of the box HA and horizontal scaling support. For small scale projects that don't care about these things, fair enough! But in my experience every Postgres installation is a snowflake with cobbled together extensions, other third party software, and home-rolled scripts to make up for this gap. These third party pieces of software are often buggy, half-maintained, and under-documented. This is exacerbated by Postgres' major version file format changes making upgrades extremely painful.

As far as I can tell, there is no interest in making these features work well in Postgres core because all of the contributors' companies make their money selling solutions for HA/sharding. This is an area where MySQL is so significantly better than Postgres (because so many large Internet companies use MySQL) that it surprises me people aren't more unhappy with the state of things. I don't really want to run another Postgres cluster myself again. For a single node thing where I don't care about HA/scaling I do quite like it, though.


You'll never see true support for horizontal scalability in Postgres because doing so would require a fundamental shift in what Postgres is and the guarantees is provides. Postgres is available and consistent. It cannot truly be partitionable without impacting availability or consistency.

When an application grows to such a scale that you need a partitionable datastore it's not something you can just turn on. If you've been expecting consistency and availability, there will be parts of your application that will break when those guarantees are changed.

When you hit the point that you need horizontally scalable databases you must update the application. This is one of the reasons that NewSQL databases like CockroachDB and Vitess are so popular. They expose themselves as a SQL database but make you deal with the availability/consistency problems on day 1, so as your application scales you dont need to change anything.

Context: I've built applications and managed databases on 10's of thousands of machines for a public saas company.


Because vertical scaling can take you so far these days that 99% of companies will never, ever reach the scale where they need more. There is just few incentives.

Especially since:

- Servers will keep getting better and cheaper with time.

- Data is not only in postgres, you probably have redis, clickhouse and others, so the charge is balanced. In fact you may have different dedicated postgres, like one for GIS tasks.

- Those hacky extensions are damn amazing. No product in the world is that versatile.

- Posgres has much better support from legacy frameworks like django/ror/laravel than nosql alternatives. People shits on ORM, but they enable a huge plugin well integrated ecosystem that makes you super productive, and PG is happily and transparently handling all that.

- If by some miracle you actually reach the point you need this, you'll have plenty of money to pay for commercial HA/sharding, or migrate. So why think about it now?


> vertical scaling can take you so far these days that 99% of companies will never, ever reach the scale where they need more

its less about the scale and more about HA and service interruption: your service will be down if server dies.


Never heard of docker/k8s?


I don't think these two words will buy you HA automagically. You will need 3 layers of various open source components on top, and I am not sure if they will improve or reduce HA at the end.


> This is an area where MySQL is so significantly better than Postgres (because so many large Internet companies use MySQL) that it surprises me people aren't more unhappy with the state of things.

I’m not sure precisely what you mean by “HA”, but, in my experience, out-of-the-box support for the most basic replication setup in MySQL is pretty bad. Just to rattle off a few examples:

Adding a replica involves using mysqldump, which is, to put it charitably, not a very good program. And the tools that consume its output are even worse!

There is nothing that shops with MySQL that can help verify that a replica is in sync with its primary.

Want to use GTID (which is the recommended mode and is more or less mandatory for a reasonable HA setup)? Prepare for poor docs. Also prepare for the complete inability of anyone’s managed offering to sync to an existing replica set via mysqldump’s output. RDS will reject the output due to rather fundamental permission issue, and the recommended (documented!) workaround is simply incorrect. It’s not clear that RDS can do it right. At least Azure sort of documents that one can manually real and modify the mysqldump output and then issue a manual API call (involving the directives that you manually removed from the dump) to set the GTID state.

Want point-in-time recovery? While the replication protocol supports it, there is no first-party tooling. Even just archiving the replication logs is barely supported. Postgres makes it a bit awkward, but at least the mechanisms are supported out of the box.

But maybe the new-ish cluster support actually works well one it’s set up as long as you don’t try to add managed RDS-style nodes?


> Adding a replica involves using mysqldump

That's one path, but it is not the only way, and never has been.

MySQL 8.0.17 (released nearly 5 years ago!) added support for physical (binary) copy using the CLONE plugin. And MySQL Shell added logical dump/reload capabilities in 8.0.21, nearly 4 years ago.

Third-party solutions for both physical and logical copy have long been available, e.g. xtrabackup and mydumper, respectively.

And there was always the "shut down the server and copy the files" offline approach in a pinch.


CLONE is indeed nifty. But why is it a plugin? And who don’t any of the major hosted services support it? (Or do they? The ones I checked don’t document any support.)

I wouldn’t call xtrabackup or mydumper an out-of-the-box solution.


What's wrong with CLONE being a MySQL plugin? I mean a good chunk of this page is people praising Postgres for its plugins.

As for support in hosted cloud providers, that's a question for the cloud providers, no one else can answer this. But my best guess would be because they want you to use their in-house data management offerings, snapshot functionality, etc instead of porting MySQL's solution into the security restrictions of their managed environment.

Yes, xtrabackup and mydumper are third-party tools, as I noted. If you needed something out-of-the-box prior to CLONE, the paid MySQL Enterprise Edition has always included a first-party solution (MySQL Enterprise Backup, often abbreviated as MEB). Meanwhile Community Edition users often gravitated to Percona's xtrabackup instead as a similar FOSS equivalent, despite not being a first-party / out-of-the-box tool.


Citus is open source and well financed. This comment may have made sense a few years ago, but no longer.


By "well financed" you mean "owned by Microsoft"?

That situation raises a separate set of concerns, especially in the context of Microsoft's main database cash cow being SQL Server, not Postgres/Citus.


How is that different than owned by Oracle?


Yep, exactly. Apologies, my previous comment was semi-sarcastic but in retrospect that was way too vague :)

On average, HN leans anti-MySQL, with concerns about Oracle ownership frequently cited in these discussions (mixed in with some historic distrust of MySQL problems that were solved long ago). But I rarely see the same sentiment being expressed about Citus, despite some obvious similarities to their ownership situation.

Personally I don't necessarily think the ownership is a huge problem/risk in either case, but I can understand why others feel differently.


I'm as skeptical of MS as anyone. However it is licensed GNU AGPL, so not particularly worried.


I guess some people really, really dislike Oracle (understandably).

And MariaDB is lagging behind, less and less compatible with MySQL etc leading to various projects dropping support for it - notably Azure. I wouldn't pick it for a new project.


This depends on what level you consider HA and horizontal scaling to be required. I could make the same argument, based on my personal experience, that postgis ought to be included out of the box. Of course, I'll assume most people don't need it :)


I feel like I have read this exact comment before verbatim


You can also build whatever you want with SSPL, as long as absolutely everything you use to run a service that supports it is also licensed as SSPL. It's not that different from the AGPL in spirit.


> as long as absolutely everything you use to run a service that supports it is also licensed as SSPL.

There isn't an SPPL-licensed OS available, is there? Is that not included in "absolutely everything you use to run"? I actually don't know, I haven't tried to make sense of the license. Is there a boundary somehow that you are allowed to run it on a non-SSPL OS? Where is the boundary exactly, I might be using many other open source licensed (or even third-party proprietary licensed tools) in my total ops stack -- which of them don't have to be SPPL?


By which metrics are you evaluating those companies' license changes? Both are significantly more profitable than before they changed licenses, MongoDB especially. I'm not sure there's a causal relationship, but it doesn't seem to have significantly harmed them.


This article seems to have inspired others to look at MongoDB again, so I'll give my thoughts after using it recently.

MongoDB Atlas is a surprisingly good managed database product. I'm not a huge fan of someone else running my databases, but I think it might be the best one you can run across any cloud. If you like MongoDB (and, ignore the memes, there is a lot to like nowadays), and are OK paying a bit more to have someone run your database, I'd strongly consider Atlas.


The problem is when you grow. They can be really tough to work with on pricing. Also, their licensing does not allow servers past a certain size. Can you imagine Oracle telling the CIA they can't use servers with more than 256gb of ram? Just silly.


I'm not sure what experience you have, but I've run both their Enterprise licensed database on prem as well as migrated to Atlas and there have never been any licensing issues preventing vertical scaling of databases. One of our clusters on Atlas right now has machines larger than 256GB of RAM -- you're more limited by what your cloud vendor has available than Atlas.


Actually yeah for Atlas, I guess they automatically bill you as if it was 2-3x Enterprise Advanced licenses so there's no discussion. I thought it was the same as Enterprise Advanced but I guess not. With EA each unit above 256gb is billed as an additional license. See [0] [1] [2].

[0] https://www.mongodb.com/community/forums/t/for-mongodb-enter...

[1] https://www.mongodb.com/community/forums/t/for-mongodb-enter...

[2] https://www.linkedin.com/pulse/mongodb-sizing-guide-sepp-ren...


Oracle charges per 2 vCPU. This is quite standard.


In our case in the support call they just told us it wasn't allowed. Now I'm realizing they were wrong.


Yeah atlas with federated queries and what not makes thinking about the entirety of an application storage layer a breeze. And gpt is even better at generating mongo queries than it is at sql, which is a nice unexpected facilitation in day to day usage.


No support for collation is an enormous dealbreaker. It means you can't have case-insensitive searches or keys (e.g. foo@bar.com is treated differently than Foo@bar.com). You also can't rename databases.

The query syntax is also a massive pain IMHO. Especially when you get into nesting expressions. E.g. I can never remember if it's `{$id: $regex{'/pattern/'}}` or `{$id: ${regex: '/pattern/'}}` or `{$id: $regex{ pattern: '/pattern/' }}` or something totally different.

SQL is still superior.


>It means you can't have case-insensitive searches or keys (e.g. foo@bar.com is treated differently than Foo@bar.com). You also can't rename databases.

Can't you just query for a lowercase version of the input and sanitize data going into the DB so it's only lowercase? I'm not a mongo user but that doesn't seem like a dealbreaker to me


Yes, you'd do something just like that. Document databases aren't relational databases. People need to think about solving the same problems a little differently. A lot of folks used to relational databases push for some perfect third normal form like disk space is still the major constraint in massive databases. For document databases, you design them based on how they are going to be used. Even if that means duplication of data like storing the lowercase version of something for indexing or rolling up your data into summary collections instead of performing those sorts of queries or collection aggregations on-demand.


Relational DBs often involve duplication for these kinds of use cases. Simple case-insensitive match can be handled with a custom index that's automatically derived from the col data, but there are more advanced cases where you need manually denormalized tables. Or sometimes it makes sense for performance reasons, but you only do this after you notice the normalized way being too slow.


Sure. It's not like all folks who lean on relational databases are third normal form zealots. But third normal form zealots exclusively come from a relational database background. Which makes sense.


They probably do, but idk, I've never considered 3NF or the preferred type of DBMS as part of someone's personal identity. These are just tools.


I’ve been using DynamoDB for the first time recently and this is some good perspective to bring to it!


SSPL has no provision even close to the reach of the "anti-competition" clause Hashicorp is using. While SSPL is not considered open source, it isn't that far off from the AGPL. The difference between SSPL and AGPL is that SSPL (1) is in effect regardless of modification of the service and (2) extends copy left virality to all programs which support running the service, including those that interact with the software over a network.

MongoDB, Elastic, etc. cannot stop you from running a competitor based on the terms of their licenses, they just ask that you publish the source code for whatever service you're running in its entirety (I acknowledge there are disagreements about how far "entirety" extends). The clause in Hashicorp's license actually revokes the right to use their software at all if you're a direct competitor.

OK, no one is going to build an open source competitor to Elastic or MongoDB because then you have no moat and your business will probably fail, I get it, but it's still possible to do without repercussion. It's not like the AGPL is that far off in terms of limitation, either, which is why you don't see many copyleft services run by large corporations unless they've been dual-licensed.


Not your main point, but MongoDB didn't commission Kyle to do that report as they had in the past, he did it on his own time. That's why his report doesn't mention repeat testing. They do actually run his tests in their CI and those new tests were used to isolate that specific bug. Moreover, some of the complaints about weak durability defaults for writing were later fixed: https://www.mongodb.com/blog/post/default-majority-write-con.... They still do default to a weak read concern, but writes are fully durable unless you specifically change the behavior. For what it's worth I agree with Kyle that they should have stronger defaults, but I don't really see a problem with MongoDB's response to the report because there is room to disagree on that.


Do you have a source for this? I got the impression at the time that there was some commissioning of his services, but that they didn't like the report. But he publishes work, and released the report, which forced them to deal with it.

Every distributed tech fails when he test it, but the tenor and nature of the report for MongoDB was different. It basically said between the lines "do not use this product".

MongoDB has a history of really crappy persistence decisions and silently failed writes, and as soon as it gets publicized saying "we fixed it in the next release". The same thing happened here of course. I simply don't trust the software or the company.

Mysql has the same annoying pattern in its history, although I have more confidence in the software because of the sheer number of users.

Still, I would probably pick PostgreSQL for both relation and document stores.


Source for which claim? Kyle was paid for work testing 3.4.0-rc3[1] and 3.6.4[2] which analyzed single document concurrency in a sharded configuration. Those tests run in their CI [3]. MongoDB had some somewhat misleading copy on their website about the result of those tests, so Kyle decided to test the new multi-document transactions feature for 4.2.6 and found some bugs.

It's fair to not trust the database or company, I don't blame you for that. But I think Kyle's MongoDB 4.2.6 report was not nearly as concerning as his PostgreSQL 12.3 report which found serializability bugs in a single instance configuration, among other surprising behaviors. MongoDB's bugs were at least in a new feature in a sharded configuration. I don't think his most recent report was actually as negative as it may read to you. I say this as someone who mostly runs PostgreSQL, by the way!

As a side note I believe there are consistency bugs existing right now in both MongoDB and PostgreSQL (and MySQL and Cassandra and Cockroachdb and...) waiting to be discovered. I'm a jaded distributed systems operator :)

[1] https://jepsen.io/analyses/mongodb-3-4-0-rc3

[2] https://jepsen.io/analyses/mongodb-3-6-4

[3] https://github.com/search?q=repo%3Amongodb%2Fmongo+jepsen&ty... (note: not an expert in when or what suites it runs, just have seen it running before as a demo)


Where is the root of trust for package signatures? Who is verifying signatures: the package index or end-users? How do you distribute public keys? PGP is mostly maligned because of its support for old cryptography standards, some needless cruft, and especially the poor usability of its defacto standardized implementation in GPG, but cosign by itself doesn't actually make any of the trust questions I mentioned go away. There are major tradeoffs to be made about who-trusts-who and what that actually means in terms of security beyond just theatre. I'm not convinced that there exists a good trust mechanism that a package index can enforce that actually moves the needle on supply chain security.


closer to TLS CA setup with ephemeral certificates, a public log of issuance, then pgp individual trust circles and semi static keys.

https://www.sigstore.dev/how-it-works


If you use Musl 1.2.4+ (or Alpine 3.18+), there are no longer the same DNS fallback issues: https://www.openwall.com/lists/musl/2023/05/02/1

To summarize the issue: DNS is done optimistically over UDP because it's faster, but this doesn't work when DNS responses are large because of the design of UDP. TCP should be used as a fallback mechanism when responses are large. This is uncommon normally, but increasingly DNS responses are large in special scenarios; for instance when you're querying an internal DNS for service discovery (read: k8s or nomad deployments, most commonly).

Musl's maintainer interpreted the spec for a libc's resolver to not require TCP fallback (source: https://twitter.com/RichFelker/status/994629795551031296?lan...), so for a long time Musl simply didn't support this feature, justifying it as better UX because of the more predictable performance.

I don't agree with the maintainer on this interpretation, but I am glad the feature was added and the issue is no longer a concern as an otherwise very happy Alpine user!


I’d found bits and pieces of this, but I didn’t have all the context. Thank you for summarizing!


I'd say he was wrong here, and his assumption was incorrect.

RFC2181 specifically says 'Where TC is set, the partial RRSet that would not completely fit may be left in the response'

'may be' being the key words. This would mean that it's up to the implementation to decide whether to include any records at all, and many do not.


This article doesn't touch on the actual reasons why Mercator is still in widespread use:

* It was the first widespread projection because of its practical use for nautical navigation (where it is still the best projection available), so it was easy for map makers to sell for non-nautical uses, even after "better" projections became available. And inertia is a hard thing to overcome for something considered somewhat inconsequential.

* Mercator and its cousin Web Mercator are extremely simple and fast to calculate relative to other projections. Compare the formula for Web Mercator (https://en.wikipedia.org/wiki/Web_Mercator_projection#Formul...) to Equal Earth, an excellent compromise projection for general use (https://en.wikipedia.org/wiki/Equal_Earth_projection#Formula...). Web Mercator is very easy to generate and serve tiled maps out of, Equal Earth and the like require somewhat non-trivial engineering to make serving those maps at scale to users in a web browser economical and quick.

* Preserving angles is legitimately important still for large scale (very zoomed in) road maps. Projections which preserve size can cause things like 90 degree road intersections to render at very strange angles which confuses drivers. Mercator and Web Mercator are therefore excellent choices of projection for local road navigation, which is by far the most common use of maps today for most people.

I strongly recommend folks interested in map projections to read this from Mapbox: https://www.mapbox.com/blog/adaptive-projections. Google Maps now has similar features, but both companies relied on Mercator for many years with good reasons before technology caught up and better solutions became available.


No map shown to a driver at the detail level of navigating intersections should have to care about projections much. At that small a scale earth is approximately flat, and any half decent projection should have minimal distortions of any kind.


You can read about why projections still very much matter for large scale (which is what I think you meant; small scale would be something that shows you whole countries and not used for road navigation) maps in the article I linked. Google Maps tried out an alternative projection back when it was still Keyhole and ran into problems with angular distortion when zoomed in. The original post is sadly lost to Google shutting down their product forums, but here's a quote from a Google Maps engineer on their use of Mercator and why it matters:

The first launch of Maps actually did not use Mercator, and streets in high latitude places like Stockholm did not meet at right angles on the map the way they do in reality. While [Mercator] distorts a “zoomed-out view” of the map, it allows close-ups (street level) to appear more like reality. The majority of our users are looking down at the street level for businesses, directions, etc… so we’re sticking with this projection for now.

Sourced from https://ilyabirman.net/meanwhile/all/map-and-reality-distort..., but there are other citations of the same quote as well.


Why does Apple have the only map (that I know of) that projects Earth as a ball, as it is? It seems like a very obvious solution.


In the 90s, my family (inveterate roadtrippers) always kept a Rand McNally road atlas in the car. Each page had a map covering (usually) a whole state. I don’t know what projection it used—Albers, perhaps?—but I remember as a child wondering why some straight‐looking state borders were actually subtly curved.


Navigation concerns these days may include getting as fast as possbile(i.e traveling on the geodesic), avoiding bad weather areas(forecasts), using the least fuel as possible etc..

Stereographic projection(of the half sphere where the origin and destination lies on) solves the geodesic issue, and it's different from the Meractor


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: