Hacker News new | past | comments | ask | show | jobs | submit login
The disproportionate influence of early tech decisions (brandur.org)
184 points by kiyanwang on Aug 2, 2022 | hide | past | favorite | 105 comments



This article is timely for me! I think about this a lot while working on Notion — and I’m drafting an RFC for a change that future engineers will either praise or curse.

Right now the eng team is ~150, so we’re exiting the period when an experienced senior engineer (ie me) can easily make a profound tech stack change. Our database (Postgres) and backend stack (Express/Typescript) are already quite solidified. There is still time to change our front-end stack, although React is probably a given, we can change our API contract layer, our state management pattern, and possibly our client-side replicated ORM. It’s tricky to weigh continued speed today versus long-term speed. I think a lot about the tradeoff of “invest in infra improvements below the interface line” and “change the interface now, while there’s still time” when it comes to our ORM/state manager. We have a bunch of interesting challenges across all layers (see https://www.notion.so/blog/data-model-behind-notion) so it’s not obvious what changes today will really change the constraints on our solutions tomorrow.

My current philosophy is that we need to add new abstraction layers now if there’s a chance we need them, before detangling later becomes an insurmountable lift.


A vote for adding the abstractions now you think you’ll need later. I found once we crossed around the 1000 engineer mark, the main abstractions were so baked in that refactoring projects started taking years. Longer than the typical employee tenure, which effectively meant they couldn’t be done. 1. People generally don’t even start projects they won’t be around to finish, and 2. A refactoring project that turns over all its staff forgets what it was trying to do and grinds to a halt one way or another. (For instance because the junior staff, only ones willing to get involved with a long embattled project, don’t have the political capital to keep it funded in terms of headcount.)


Do you need (more?) new abstraction layers or do you need extremely clear un-abstracted separations of concern?

In the cases of trying to build abstraction layers in advance "in case we need them" that I've worked on, they rarely are sufficiently right-shaped for what the future feature need ends up being. Having more direct access to the primitives usually works better in terms of avoiding "the business logic is split across three layers of service calls" type of situations.


We share the same record types (primitives?) across the stack - but because of that sharing, those types are hard to evolve, especially when it comes to their semantics. Too much code interprets then directly, which ossifies those primitives. How to enforce separation of concern when such data structure is shared liberally?

> the business logic is split across three layers of service calls

we are thankfully very far from this world.


convert the records at the service boundaries to something better suited for that service. then everything doesn't depend on each other


Your advice is "add abstractions", which is basically the opposite of the grandparent commenter's advice. Right now I am on board the "add abstractions" train.


I suppose, but converting it to another structure isn't a huge abstraction. Maybe it can complicate things if the conversion isn't obvious


Modularization will give you the flexibility to untangle the spaghetti and swap pieces if needed. Nothing has to be forever. Worst is fixing data though, not code.


I’m curious what you imagine when you say “modularization.” It might be different from what I imagine.


In my mind there's two types of modules:

- technical, e.g. db abstraction layer

- business, e.g. checkout

What I'm suggesting is having this structure to help you with future swaps/refactors, because they will stay very localized.

Of course where to cut a module out of a codebase is a skill in itself. Cut too much and you have redirection everywhere, cut too little and you have multiple responsibilities, bigger PRs and harder refactors.


This has always felt like the catch 22/anthropic principal of software development to me:

1. Early on you favor decisions that enable rapid iteration and growth without being able to put in much deep thought around long term productivity. Most of these tend to be fine in the long run but there's probably at least 1 or 2 that will haunt the company for the rest of its days. Companies that don't take these "short cuts" will probably kill the business.

2. Companies that survive largely have their architecture dictated by #1 and are fairly constrained in how radically they can change the decisions in #1.

So most of us end up working on a product which has some pretty critical architectural flaws that are very hard to repair.

The only companies I've seen that do work their way out of it are the printing presses (google, linkedin, yahoo, amazon, etc) that are able to invest in very strong platforms and move their applications onto them, and even then it's a long drawn out process.


There's a way out of this, but it requires discipline and besides, most companies don't want to do it.

Prototype.

Deliberately build "one to throw away." Iterate fast and figure out what you need. When you've learned the important lessons about architecture, structure and collaborations, make a note of those lessons in a design document and then build the "real" product with proper engineering, but with a lot less risk.

I worked for an organization that did this and it results in good quality software, but you need buy-in at all levels of management or you run into "why did your team just waste 6 months on throw-away work?" questions.


arent you just describing the business from #1 that does this method and fails because they have no money and no customers?


It really doesn't have to be.

If you come from a hardware background, you get used to the idea of quickly prototyping with techniques (e.g. breadboard electronics) and materials (e.g. 3D prints in plastic instead of CNC milled parts) because it is vastly faster and cheaper. See also scale models etc.

The problem with software is it nearly infinitely malleable, so the tempation is always there to think that you will be faster if you ship-of-Theseus your current prototype into production code.

It works sometimes, but often it's faster to actually prototype, then do it "right" from what you've learned. It's a skill though, to know what to concentrate on and what to skip/fake/ignore.


It’s important to time this well and start doing it as soon as you have your feet on the ground, and the hard part is to hit the ground running. Too early and you crash, too late and you grind to a halt

edit: missed “6 months”, I live in another world — a week to prototype and throw it out. spending 6 months to prototype anything and you’re biting too big


> spending 6 months to prototype anything

I come from the world of medical devices. Nothing happens in a week ;-)


Most of the medical device industry could move a lot faster than they do. There are an interesting collection of reasons why they don't, but it's not inherent.


Disagree. Everyone will make at least one major blunder in 1. and probably more. I don't care how much thought you think you put into it, you will make mistakes if you move proactively and not just sit there and wait until you are forced to act or the problem goes away.

Companies live or die by how nimble and effective they are at fixing the messes made in 1. while moving forward.


Why should it be otherwise? Any software development interesting enough to be worth working on will be at least partly a research project, and research always involves wrong directions and dead ends. If you didn't learn something that would radically change your understanding of how you should architecture your system, you launched too late.


the problem is that most people/orgs don't think this way, they still think their solution is scalable and easy to start (like RoR) and no one optimizes for deletion so eventually you are stuck with all those initial choices until you spend a lot of money to overcome them (literally in every company i have worked at, authentication, MVC - with more than 3k loc controllers - and monoliths were "impossible" - as in not economically sustainable - to refactor.

I think there's a misunderstanding of the word "easy", django, springboot, laravel, RoR etc are easy to start with, as most architectural choices come out of the box, but then you can never leave that box.


yep. and then they put tight timelines (meaning like 1.5 years) to finish those massive migrations and so we take more shortcuts to finish them and i guess technically we end up on the new platform but the implementation is about as bad as the original


Early tech decisions can become myopic in the long run. If you're not careful and admit that you made quick decisions early on these early decisions can become entrenched as gospel. This can alienate later, more experienced developers and become a factor in burnout as they are forced to deal with, "this is how we do things around here."

One way to deal with tech debt is to take advantage of the experience and skill of your developers and embrace change.

You might not want to be switching databases every time someone suggests, "X is better than Y," for sure! Inertia has its uses. But you don't want to stagnate either. Conclusion of tfa is solid: the truth is somewhere in the middle. The keyword for me is sufficient: the local minimum that meets all of our requirements. If we need to be able to be able to ship changes faster and it takes weeks to get changes merged, something is not sufficient in our stack/process/etc.


It can be helpful to write down the context of why you made your decision, so that if/when you revisit it later it’s more clear. ADRs are one way to do this: https://adr.github.io/madr/


Short of something structured, any format that captures key assumptions and intentions is good. The problem we are solving is X. The "pre-mortem" is the idea is that if we fail, it would be because of Y. Then in your design or decision, you address Y as a contingency. The collateral idea for code would be "good until" - this is good until (we expect) some threshold.


I've run into at least a few people who can't understand why my answer changes when the context changes. I said we were going to do A because of C, D and E. Why are you acting like it's a gotcha when we find out E doesn't apply anymore and now I agree B is a better option?

Did you think we were joking when we said, "It depends"?


Great advice. I also use learning milestones to manage our novelty budget and record our findings/decisions/context/etc. I don't know if this is formalized or written down anywhere.

The idea is that when your team wants to try something and learn from it you record down your thoughts and ideas into a document. You then have a triggering condition in that document for when to review it later. When you do the review you record what you learned and what decisions you make as a result.

Over time your teams build up little libraries of these learnings and it can help build organizational continuity. People come and go, leaving teams or the company, joining other teams, etc. It's a good practice to avoid having to re-transmit what your organization has learned through re-telling folk-lore.


Had to deal with this in the past, where junior engs made choices on the initial architecture and by the time I joined they were in management/high roles. They refused any change that was too complicated for them to understand and never tried to improve their knowledge. Meantime our team was suffering under the load of technical debt vs non technical POs wanting to dish out features experiments fast.

> this is how we do things around here

resonates so much


Some of the most uselessly toxic behavior is defending X now because it was critical to our survival 3 years ago.

This world is absolutely rife with ideas, actions and substances that are appropriate to a situation but are detrimental or even deadly once that moment has passed. Oxy because of a bone graft is not the same as Oxy because it's a Tuesday.

We aren't questioning your wisdom because you did this thing three years ago. We are questioning your wisdom because you won't let us kill it right now.


One solution is to design things so swapping out one component isn't harder than it needs to be.

Easier said than done, but still quite doable with modular design, which you should do anyway.


that's also how we end up with an excessive number of microservices


There’s a flip side to this when people who come later totally underestimate the problem space and difficulty when trying to “fix” things.


Definitely and highlights another reason why it’s important to document these things for future maintainers and developers.


Good in theory but in practice nobody is going to have the time to document every one of hundreds (or thousands) little decisions with potentially large implications while trying to keep the product afloat. Even for those they do, in my experience, nobody bothers to read them...


Can’t help people who don’t want to help themselves.


It’s true but it’s worth keeping in mind that there are some underwater rocks that OGs didn’t tell anyone bc they had better things to do


Early decisions in every field have disproportionate influence, not because they were the 'right' or the best choice but because its much easier not to change, or its easier to improve the existing choice (making the current database faster) than jumping to another.

a) The politicians who drafted the US constitution probably have had more impact than most presidents since.

b) Train tickets, timetables, having different class carriages, ticket offices and train guards is mostly unchanged from when introduced by GWR in the 1840s

c) Car peddles, typewriter keyboards, pizza recipes. etc


The US Constitution is a good example. While it's a politically controversial opinion, if the framers had extended 'indentured servitude' status to all slaves in the colonies at the time (a common status for many European migrants in the Middle Colonies, generally involving a ten-year period of forced labor before freedom was granted), it's entirely possible that the US Civil War could have been avoided. Some argue that southern slave states would never have gone along with this, however. The distribution of free laborers, indentured servants, and chattel slaves across the American colonies in the 17th century is a very interesting history that's typically neglected in education:

https://www.oercommons.org/authoring/6734-slavery-and-indent...

Another example might be how the plantation system took hold in the American South but not in the New England Colonies.


America may not have become the absolute economic powerhouse that it became if it wasn't for the 2+ centuries of uncompensated labor, but IIRC America was the only nation that had to fight a war to end slavery. Most nations simply bought the slaves from the slaveowners and then freed them, which might have actually been cheaper than fighting the Civil War.


The south wasn't a meaningful contributor to the US's economic powerhouse status.

That's the whole reason they lost the civil war — it was an economic backwater with almost no factories, and what useful goods they traded abroad were fairly easily replaced (so no European country bothered trying to help them).


Most nations have slaves far longer than 2+ centuries. Poor families had been selling their children to rich families to work until early 20th century in China


> While it's a politically controversial opinion, if the framers had extended 'indentured servitude' status to all slaves in the colonies at the time (a common status for many European migrants in the Middle Colonies, generally involving a ten-year period of forced labor before freedom was granted), it's entirely possible that the US Civil War could have been avoided. Some argue that southern slave states would never have gone along with this, however.

I'm pretty sure the South wouldn't have gone along with something like that unless the date was set very far in the future (e.g. much longer than a lifetime). The Constitution banned import of new slaves starting 20 years after ratification, and IIRC that only passed because the slaveholders thought they'd have enough by then to have a self-sustaining population.


"path dependence" - nobody has said the words that are the proper rubric for this discussion, so I thought I would. If you want to find some of the previous bibliography on this subject, them's the magic words.


In counterfactual regret minimization and in reinforcement learning this shows up too. The solution there tends to discount early iterations with something like a weighted average. It makes convergence to the true values happen faster.

A similar thing happens in k mean clustering but is more horrifying because you can get arbitrary wrong solutions rather than just taking longer to converge. The solution there is to construct a sampling distribution when making your initial decision - instead of making a decision according to the principle of indifference to instead make a decision after getting more informed about the problem. This doesn't mean you don't have bad solutions, but it gives a bounding on just how bad those solutions can be.

It happens in other places too, I'm sure, because this isn't really a software problem - this is a fundamental problem. Uniformed choices are uninformed but influence the choices that come after them.

Certain heuristics like delaying your decision to the last responsible moment can help to avoid making your decisions with insufficient information, but it is important to realize the type of decision you are making. Are you in a case where getting things wrong just means you go slower but eventually converge? Or at you in a case where you can't reverse course and will have your outcomes determined by the decision you make without any guarantee that it will turn out well and no simple path backward. I hear people talk about this with terms like "one way doors" versus "two way doors". In the latter case you are much better off thinking more deeply through the problem.

We're not really solving new fundamental problems. We're solving fundamental problems in a new context.


One thing I realized working on a large company codebase is that if you are the first person to do something you have an additional moral responsibility to do it well.

In large codebases, the default answer to "how do I do X?" should always be to do it the way it has been done before. Consistency is hugely valuable.

As a result, both junior and senior engineers, when faced with a thing they need to, will start by checking to see if that thing has been done elsewhere in the codebase already.

This means that if you are the FIRST person to do something, it's on you to set a really great example - because it will inevitably be copied many times over in the future.


Some early tech decisions matter, but most don't matter as much as people think they do. Other than the OS, I haven't really seen it be a complete deal breaker for anyone that's willing to put in the work in the first five years.

Postgres not handling things well? Slow migration to DynamoDB or to sharding.

Python falling over? Roll the hot codepaths in Cython or horizontally scale.

Node doesn't have great data science libraries? No worries, build a couple of small python data science services and get on with life.

The only reason I think the OS choice matters at all is because I've never seen someone move from .NET on Windows to Linux or visa versa. Could be done, but people generally don't do it. So your hiring pool is going to look different. Generally more scrapy on the linux side, generally more careful on the Windows side, but even there that's mostly about culture and priorities. Both stacks can handle basically anything.


>The only reason I think the OS choice matters at all is because I've never seen someone move from .NET on Windows to Linux or visa versa. Could be done, but people generally don't do it.

Done it myself. Seen it being done all the time in my org. We're migrating large amounts of legacy .NET Framework apps running on Windows on EC2 to .NET Core / .NET apps running in our K8s cluster using Linux containers.


I have taken projects of significance to me from MySQL to Postgres. It was certainly a short-term cost but the upside justified it. (first class IPv6 objects, JSON)

I have tried to do the language port thing Several times. I never succeeded in bringing others with me (Perl to python) but that said, others I work with have done Perl -> Java and C -> Perl -> C and Perl -> Java -> Rust so I think language may be one of the "tractable" problems in many cases.

We tried AWS for low-end k8s and it was too complex. GCP/GKE was just better presented. It has the same complexity but gcloud/kubectl seemed to work "better" for us, which led directly to helm outcomes we liked. I think the AWS experience we had could be called at best "toe in the water" maybe try before you buy works?

The one which has proved the most sticky is VM/Cloud provider, for routing. If you are in a world which cares about BGP, the one you pick that works, you probably stay with. We partnered with CloudFlare for some stuff, and Linode for other stuff. I do wish at times we'd gone digitalocean, or fastly, or even Akamai (which is behind Linode apparently) -but the BGP surface we have, is the one we want to maintain because we understand it. Maybe it's not that different to AWS vs Azure vs GCP.

Also DNS provider. You tend to stick to the underlying NS and related registrar inside your chosen namespace registry.

Also Cryptography TA. Well.. once you move to letsencrypt you stop moving I guess.

I do think the DNA baked in early tends to persist. If you do monolithic solutions, you aren't walking to microservices without a huge conversation. If you went XML/SOAP you would be asking the cost/benefit of JSON given JSON and XML can be considered strictly equivalent for data content. If you went apache and mod_<something> then moving to nginx or caddy incurs massive cost for what?

But if you wind up behind dead/legacy platform (apache/mod_python) that can suck.

Even just moving off wordpress costs.

Moving off one CI to another CI costs. I don't know any CI which works all the time, they all seem to incur opex burdens

We moved self-hosted gitlab to k8s gitlab to k8s in GCP gitlab and that went ok. Maybe when the abstraction moves to the package, change in how the package is presented matters less?


I'm a bit surprised that Stripe uses Mongo? For financial transactions?

Is it the primary backing store or a cache?

I always thought of Mongo as the DB with questionable durability and performance claims ... i.e. the "quick and dirty" DB. And also under-featured compared to relational DBs.

But I haven't used it so maybe that is not accurate.


AFAIK, most of the FUD around Mongo came from this article (and the subsequent "telephone" game that followed it): http://www.sarahmei.com/blog/2013/11/11/why-you-should-never....

In practice, Mongo is a solid-enough database with its most valuable asset being its query language. I've worked on projects where it handles millions of records without any issue. Reaching this size of footprint is difficult unless you have a massive user base or you're working with transactional data. FWIW, the project in mind has a disk size of ~5GB which isn't much at all. Imagine most projects won't even get close to testing the limits of MongoDB and base their opinions of it on hearsay.

Disclaimer: I'm a developer not a full-blown DB admin so I'm sure there's a specialist who has more details about specific performance issues.


Mongo FUD has more to do with the fact that RDS...es work better with almost any line of business application. The ONLY application I have ever seen Mongo, or any other key:document store work well is integrating with form.io for custom forms. And even then, RDS might have been a better choice.


Anecdotally, I've been at several companies where MongoDB was a crucial part of a tech stack due to early tech decisions. In every case, most of the engineers working with it wished there was a reasonable way to migrate away from MongoDB to PostgreSQL. I've never seen the opposite (engineers wanting to transition from SQL to MongoDB).


Sorry but "any other key:document store" seems to imply that you're not aware of the value of consistent secondary indexes: a lot of people are not aware and conflate K/V stores which are fundamentally inflexible with more general purpose alternatives like Postgres and MongoDB: frankly these two are both able to express a wide variety of modern needs and whether you prefer one versus the other is a question of tradeoffs. The world is no as reductive as you imply


MongoDB isn't a general purpose alternative. You're right that there are tradeoffs, but relational databases are the general purpose "good at everything" engines. MongoDB is amazing at some things, but worse at others. Indexes can help, but - as with any index - don't always help. Some performance characteristics are more structural than that, which is why document databases exist.


I struggle to understand how relational isn't a subset of documents (namely flat documents): what am I missing?


I'm no expert but my understanding is that a lot of these specialised databases' strengths and weaknesses come from laying the data out on disk differently to how a relational database would.

A relational database saves data by table, and within each table by row. This means it's very good at getting a row of data out of a table, and pretty good at column operations (e.g. sum the "price" column in this list) and pretty good at joining data (e.g. "fetch me related data from these other tables").

A document database such as MongoDB saves all the data per-key together, so it doesn't have to do work to relate data. It just reads and returns it. That means it's exceptional at getting related data (as long as you saved all the related data in one document), but terrible at joining to other documents by field, and terrible at column operations across documents (e.g. "sum the price of these books, where each book is a different document").

A columnar database is amazing at column-oriented operations (e.g. "compress this data" or "sum this column", but bad at getting all the data for one record, as it has to traverse all the columns to find the right data, and therefore bad at relating data, as it has to do column traversal across multiple tables.

A graph database is amazing at getting related tables or entities (e.g. "I'm a person, now through a self-referential edge back to the Person node get me all my friends of friends of friends"). I can't think off the top of my head what that would be bad at, but perhaps it's bad at summing columns, that sort of thing.

Again: I'm no expert. That's just my understanding.


Hopefully someone corrects you if you got something wrong, because to me that was an amazingly succinct and clear explanation of the advantages and disadvantages of different document types.

One side note is that with document databases in the Cloud, like Cassandra, I also have to worry about Partition Keys, even though most businesses don't work at that scale (https://www.baeldung.com/cassandra-keys)


Yes, I had the dubious pleasure when I dabbled (thankfully briefly) with Azure's CosmosDB!


Ha indeed let's be honest most Azure building blocks are not up to par but this takes the cake in terms of peak embarrassment


I think there's some false equivocation going on here.

When you point out that behind the scenes rows sit adjacent to each other on disk in an RDBMS, that's the same thing that's happening behind the scenes with documents... they're sitting adjacent to each other on disk. Documents instead of rows is all: and guess what, a flat document without nested documents or arrays is a relational style row.

You jump to the ability to get speed up operations with column scoped operations, but really what you're getting at here is a question of indexing... in a document oriented database with rich secondary indexes you can do aggregations across fields (fields are like columns in an RDBMs but in a document) which can include arrays. Frankly that's why it's so important to have consistent secondary indexes that can express the breadth of your query patterns and evolution therein and frankly it's why key/value stores like dynamodb are absolutely not general purpose, in other words cannot be used to express bread and butter workloads that folks would have used RDBMS for in the decades leading up to this moment.

You are correct that column store engines are a different category entirely optimized for super efficient lookups on individual columns as opposed to entire rows and therefore are less optimized for updates: but again frankly there's no reason why this can't actually express a document structure it's just custom running back to the 1970s that causes us to think and use terms like this.

Now bringing up joins is a great point, and it's true that one of the things document databases allow people to do is store data in a denormalized way which offers scalability opportunities and perhaps more importantly a way to store data the way you think about it in your code but this is certainly not a requirement... it's just the best practice where appropriate. You still have to express different business objects in distinct parts of your data model and occasionally have to join them.. join performance is simply a question of the engine's implemented optimizations and is not related to the data model.

By the way, graph databases maintain data structures optimized for closeness but there's frankly no reason why such a data structure can't be also an optimization on another data engine, again it's essentially distinct from the data model; the industry has leaned into a false understanding that somehow these all need to be separate categories by default.


Thanks for the detailed answer, but I don't think this is right. I will attempt to be as civil as you in my rebuttal :)

The key point I think your perspective foregoes is that documents aren't like tables, and columns aren't like fields. A document holds one complete entity. A table holds a part of the information for all entities.

As an example, if I were a bookshop, and I chose to store books in an RDBMS, I might store their prices in a price table that just has id, book_id and price_in_cents in it (ignoring valid_from and valid_to niceties). If I store them in a document database, I would have a price field embedded in each Book document.

If we wanted to operate on all the prices, say to generate a histogram of all current book prices, I can query my RDBMS table for each price per book. The RDBMS knows exactly where to look in the data file(s) as it operates on fixed byte offsets defined in the table schema. The document database needs to jump into each document, pull out the price, find the next document and repeat. This is very far from fixed offsets, and likely involves loading far more data into memory in the first place.

I await a rebuttal with interest :)


This is what secondary indexes enable: if you have a sub-field of the document for price and you have a secondary index on that sub-field, you can do a covered query straight out of the index on an aggregate of that price field.

Essentially in your example you could think of the price table that you described as effectively a secondary index style data structure in a user space table.. but why not just make it an index that the database engine manages consistently for you?


Well, I suppose because indexes can slow some things down (e.g. writes), and can also fail to speed other things up. They're not a magical answer to everything, or they'd be on by default on everything!

But let's take your example of using an index to get all prices. How would that work?


You would do an aggregate query on the price field grouped by price


> Mongo FUD has more to do with the fact that RDS...es work better with almost any line of business application.

Nope. That's simply not true, and it has nothing to do with why people dislike MongoDB (you don't see similar attacks on Cassandra for example). The issue is that it has a lot of cases of data loss especially in its default configuration, and that's not a document database problem, it's a MongoDB problem. (Indeed it's very similar to the reputation that MySQL used to enjoy, for those of us who can remember back that far).


Work better in what way? Speed? Reliability? Scalability? And is that based on direct experience or indirect?


Simple operations like joining data together or filtering a table for a specific value are way more performant.

They are also extremely common operations.

Mongodb became obsolete when postgresql added the json column type.


Extensive experience with the lower-end (in terms of revenue) Fortune 1000 companies.

Every single cloud and MongoDB pushes horizontal scaling and sharding. This rarely a necessity in practice.

Traditional RDS normalization is almost always necessary to keep clean data. As I said, I have used MongoDB extensively for non-normalized, client created forms, and it was fine for that, but I wouldn't do it again given the choice now.

In more simple terms - when I work for a staffing agency and delete a profession from the database, I want the DB to yell at me about FK constraints that needs to be considered (e.g. millions of jobs and candidates set to that profession_id)


MongoDB is used by enterprises all over the world. It's true that it's the easiest DB to get started with, and it's also true that in the distant past (pre 3.0, back in ~2015), they had some issues. But they now run the financial backends of companies as varied as banks and coinbase and... https://www.mongodb.com/who-uses-mongodb


MongoDB 3.x was broken too[0]. Their reputation for consistently shipping databases with fundamental design defects is well-deserved. Data loss was a hallmark of that product across multiple major versions. Maybe they've suddenly turned a new leaf in the last couple years but I wouldn't want to bet my business on it. Most of the companies I know that use it, use it for data that doesn't matter such that data loss isn't a big deal.

[0] https://jepsen.io/analyses/mongodb-3-4-0-rc3


I wrote for the technical writing team at MongoDB when this "security issue" made the news. The precise problem was whether authentication was turned off/on by default for the default database, I believe, during installation. Product management initially chose this configuration to make it quicker for evaluators to get something up and running as quick as possible. The assumption was that no developer would actually use that in production. Further, the documentation clearly stated that this default configuration option was not secure.

Blaming that data loss on MongoDB rather than the noob dev who deployed that evaluation configuration is as erroneous as the logic used by the noob dev. We hear it from developers all the time, so let me repeat it: User error. It's a thing.


That evaluation isn't about security; it's about durability -- does the database actually save your data under all circumstances?

In this Jepsen analysis, we develop new tests which show the MongoDB v0 replication protocol is intrinsically unsafe, allowing the loss of majority-committed documents. In addition, we show that the new v1 replication protocol has multiple bugs, allowing data loss in all versions up to MongoDB 3.2.11 and 3.4.0-rc4. While the v0 protocol remains broken, fixes for v1 are available in MongoDB 3.2.12 and 3.4.0, and now pass the expanded Jepsen test suite.

It does look like it was fixed though.

I remember some mealy mouthing but maybe I got them confused with others subject to Jepsen tests ...


The parent refers to an issue from 2016 when 30000 (!) dbs on the internet got deleted/ransomed. The fact they blame the customers for this and totally confuse this with data loss is kinda telling huh? Sounds like data integrity wasn’t a big thing internally at mongo after all


I think we're conflating two different issues.


Yes, this https://www.zdnet.com/article/hacker-ransoms-23k-mongodb-dat... has nothing to do with jepsen report pointing out durability issues within db engine itself which mongo was known for


Yeah, that was a separate issue, unrelated to the default authentication setting of the default db. And, yes, we fixed it.


> We hear it from developers all the time, so let me repeat it: User error. It's a thing.

And as developers, we should do our best to prevent the user from being able to make those errors.


Agreed. Unfortunately, getting most developers to pay attention to their users has historically been quite difficult. Asking them to distinguish between differing levels of technical expertise with the product within the same audience and for a specific context - noob devs with no product experience in this case - requires empathy for the user that is missing in most developers.


> The assumption was that no developer would actually use that in production. /.../ the noob dev

Ah, yes, classic "you're holding it wrong".


I worked for a company that opted for Windows Embedded back in the early 2000s, probably inspired by Tektronix scopes due to them also making scientific instruments.

They boomed. They must have had good relationships with Microsoft because they produced kernel patches just for them.

Anyway, 15 years later of "it works now, why change it", they start a new project to rewrite the cash cow. It was a Phoenix Project; and it also had a name associated with rebirth, as most of these projects do. Turns out, Windows Embedded is still not a good idea 20 years later, and they couldn't change their code because it was so tightly coupled with Windows. The project took 250% more time than budgeted and even then it was a MVP.

A competitor decided to do the same product but in Yocto Linux using modern tech and remade the cash cow product in less than their budgeted time. The interface was snappy, there was barely no bugs, etc.

Software entropy is real, and I've seen it in many startups that got bought out after they boomed. Then the entropy is left for those who weren't there during booming phase as a poisoned gift. I like to call it tech inflation.


To me, your story is “Linux good, Windows bad” and you have created a narrative around that.

Anecdotally, I have seen multiple successful greenfields rewrites on Windows. I am sure we could find plenty of examples of unsuccessful rewrites from Windows onto Linux. That is: your cause/effect analysis is probably incorrect. Rewrites are hard, and the OS is probably only a minor role.

Disclaimer: I am a long time Windows hater, but Windows has been used to put food on my table.


It's more about the right tool for the job. Also, was your greenfield projects in Windows Embedded specifically?


This is why I like playing Go (the game). The decisions made early in the game has huge effects on the rest of the game. Lingering effects of dead formations (aji) comes back to haunt and shape future decisions.

Playing Go exercises this, and helps develops the intuition on the effects of architecture, outside of the game.

I am again also reminded of Christopher Alexander’s ideas in living architecture.


Go is the most beautiful game. Chess is the dominant game where I live, and it is fun to play quick blitz games and look for tactics, but the feeling is nothing like a game of Go.

As far as early decisions, one of the most important in Go is knowing when to rethink your plans. You may place a stone with the intention of defending it, but if the board state changes, it might end up serving better as a local influencer or a sacrificial stone instead. To that extent, your early decisions are key, but also you have to remain flexible and attentive to the evolving game as moves are played, adapting your early decisions to unexpected later complications.

Playing a good Go game feels almost zen or enlightening; you pick your moves with careful determination, but your plans flow like water around the fluctuating board state. There's a reason there are so many Go "proverbs", something in the game mechanics makes it a perfect metaphor for life or the universe.


Or it's wildly discouraging as you discover that 'cause you're human, you "tunnel" to parts of the board too much and you just can't stop it. AI plays very differently than any human, by not-tunnelling. Associative meat-minds don't seem to work that way.


AWS is early? I was thinking about all the crap Unix did in the 1970’s that we’re still stuck with. The tty subsystem, for one example.


It was an early decision in Stripe's history, which is the example system in the article.


>these were known to be very imperfect implementations even when they were originally added, but they were added anyway in the name of velocity, with an implicit assumption baked in that they’d be shored up and improved later

I've heard this a million times but don't remember ever seeing it actually done, at least on full coverage. This is why it's critically important to get the right first team of developers. A team who also knows this strategy is rarely implemented.

Don't feel too bad, it's probably in every live codebase on the planet, just a matter of degree.


I feel another point is that early tech decisions have a huge impact on the type of people you will have to hire. For example, Ruby attracts ruthless productivity and Go attracts people who prefer longer standing apps with less dependencies and maintainability.

I make it sound like Go may be the 'better' choice here but that is not the case, as the author mentions, it's a balance.


"Cloud provider: AWS originally, and with so much custom infrastructure built on its APIs, unlikely to ever change."

Who cares? It literally doesn't matter unless you are doing something very, very specific with ML. As far as line-of-business projects go, every service you need has a direct equivalent. Also, NO ONE EVER SWITCHES CLOUDS, this is a TERRAFORM MYTH. Terraform being USEFUL in Cloud transition is a MYTH. I have direct experience with this where a corp started competing with Amazon (once Amazon went into groceries) and had to move to Azure. This is the ONLY instance where it made sense.

"The underlying database: Mongo originally begot more Mongo, but switching any database anywhere is unusual."

RDS is best 90% of the time.

Other than that, just keep it modular. Definitely no MVC front ends married to the BE - API always.


People do switch clouds, but sure it's never trivial. https://about.gitlab.com/blog/2019/05/02/gitlab-journey-from... for example.


Thanks for the link! I also know a national grocery store that switches from AWS to Azure once Amazons started competing with them directly. With that said though, I think it's very, very rare and I hate how people use it as an argument to justify Terraform, which largely doesn't help because all the APIs are different enough to still require manual adjustments.


“Software has inertia.”


> "And this fails the other way too, where major believers in academic-level correctness agonize over details to such a degree that projects never ship, and sometimes never even start."

Anxiety paralysis. This is also why many people aren't too enthusiastic about the hottest new thing, and would prefer to stick with say, a language like C for low-level development. Experts in the field trying to push the boundaries aren't necessarily good examples for everyone else to follow.


Heh. Read about cosmic microwave background: it's a print of some microscopic disturbances in the baby-age Universe, now permeating all of it.

It's just how this world works. Tiny early events grow to colossal scale as the thing that experienced it grows, be it a code base, a company's culture, biological structures, all the way up.

No matter what, we have to put up with that, and tread lightly when anything of consequence depends on us. For which, of course, there's never enough time.


It still blows my mind that Facebook invented a better PHP runtime instead of carving off bits of functionality into other technologies.


They did that too.


Inertia is a thing, especially around mental models and ways of working. Its hard to change an early decision, because many other parts and people have subsequently made decisions ontop of that. Like the ship of theseus, replacing parts is possible, but the replacement needs to fit in the gap, and the inertia means even if you replace everything, its always a ship.


Its pretty easy to find flaws or old parts in any stack. The trick is figuring out which ones need replacing, in which order, with an ROI you can justify. Most of the time you can’t because that same time is better used for other things.

Its not to say it’s not possible. Just that most ic’s, managers, pms, etc dont build careers on proactive technical debt.


Also it depends on how fast the software/company is growing. A slowly company could reasonably decide and reactor their stack.

Whist one that moves fast won't want to risk blowing up everything, nor want to spare resources on something that customers don't notice until it all blows up.


Even so, it's still better to start something and make mistakes (thus gaining experience) rather than getting stuck in analysis paralysis.

Technical debt is real, and has interest rate that would make a loan shark blush, keep that in mind, and you should be just fine.


I like to describe this kind of thing via the analogy of the big bang: the little ripples in the plasma, after expansion, led to the galaxies that we see today, the smallest part of early state have MASSIVE implications later on


Sometimes the early decisions are obviously the right ones in the space, which is why they are early.

E.g. eyeglasses having nose pads, and arms that rest on the ears.

Attempts to replace pneumatic tires with airless have been failing for over 100 years.


Is this any different to any other technology? One could easily point to the rapidly diminishing advantage of AC over DC for power. ICE over electric cars. I'm sure I could extend this list.


Not to stretch the analogy too far, but isn’t this true of any non-trivial dynamical system? They are highly sensitive to initial conditions.


Often this dynamic is called path-dependence too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: