It seems like the biggest issue Etsy had was a lack of strong engineering leadership. They relied on an outside consultant to rewrite their application in a fundamentally flawed way. The lack of eng leadership resulted in both poor technical decisions being made and bad engineering processes for how the app was developed.
Etsy's biggest technical mistake building an application using a framework that no one understood. This led to the predictable result that when they deployed the application to production, it didn't work. Even if the application had worked, Etsy still would have needed to maintain the Twisted application indefinitely. Maintaining an application written in a framework no one understands sounds like a recipe for a disaster. Sooner or later you are going to run into issues that no one will know how to fix.
Process wise, Etsy made the mistake of not properly derisking the Twisted application. They only found out that it didn't work when they were deploying it to production. They made the same issue a second time when they tried to deploy the replacement. When I'm building a new service, the first bit of code I write is to test that the fundamental design of the new service would work. Usually this only takes a few days instead of the months it takes to implement the full service. It sounds like Etsy could have setup a simple Twisted app that did a fraction of what the final version did. If so, they would have found a number of fundamental flaws in the design before having spent a ton of time building out the service.
To be honest, this story shows how a business can succeed even with a bad engineering team. It would be one thing if this sort of incident killed off Etsy. Instead Etsy has gone on to become the $5 Billion company it is today. I'm not saying engineering doesn't matter. All I'm saying is you can build a pretty successful business even with a bad engineering team.
Know many engineers at Etsy; this is absolutely the problem and it persists to this day. The engineers are fine and do their best to self-organize but there is no cohesive engineering vision. They complain about this issue over beers with some frequency.
Even if the python rewrite didn't immediately fail what would success look like? What does failure look like? How to measure?
Not picking on Etsy here in particular as lots of companies have this problem. But it's not something people often think about as being crucial for good engineering culture.
I completely agree. The most important lesson I’ve learned in the past decade of software development is:
Good product/UX design can save bad engineering, but it doesn’t work the other way around.
Tackling a real problem > Having a big addressable market > Good product / UX > Great Engineering.
Great Engineering pays back at a much larger time scale and makes for an enabler or a breaker at the critical scaling stage only. You can get surprisingly far with a protoduction service and still build something huge (see Twitter).
Great design is atomic and homogenous. Weak technology leadership ends up in opening up a vacuum that gets filled by Architecture Astronauts (Etsy), every team doing their own thing technologywise (SoundCloud), or at the very worst, some dogmatic homebrew NIH stack (various places I‘ve worked at).
Every unhealty Technology organization looks unhealthy in its own way, but the great ones look all alike: Clean, simple, rather homogenous and logical.
Depends on how "bad". I've seen _many_ counterexamples, where no amount of lipstick could fix the underlying flaws. What good is a pretty website if it's slow, or if it doesn't work at all half the time.
Case in point, all servers were configured to accept a maximum of 5 connections, so when the load balancer got more than x concurrent requests, they started failing.
Instead of fixing the problem, we can just scale the amount of servers (horizontal scalable yay), until the problem goes away.
Now you have tons of 8GB/4C machines sitting (almost) idle, and immense amount of money wasted.
This works if the bad engineering causes a linear increase in scaling costs. If the bad engineering causes an N! increase or a 2^N increase in scaling requirements, then there's a good chance all the servers on the planet aren't enough.
I'd say engineering is no easier to fix by throwing money at it than UX is.
The two most important things are data and UI, the one thing we developers spend all of our time thinking and talking about is code :)
Because data >>> UI >> code.
> Also, business logic is encapsulated in code - if that's borked, no amount of UI will save it.
The assumption here is that the bare minimums are fulfilled. If the software doesn't work at all you have none of the above.
If you’ve spent any time in and around Etsy the reality of this is a little more complicated. Etsy has had some very impressive people on its technical team throughout and an unusually large number of really talented developers. It’s also suffered from some serious disorganization in its technical path and complicated leadership situations.
It’s the mix of both that produced Etsy. I think with a bunch of “bad” developers they would have been dead long ago. It’s been more like a (often) good engineering team making (often) questionable decisions.
Someone who cares deeply about the company will understand their core product is flawed and work to find someone to make it better. Not ignore devs
I liked your comment.
I would say "bad engineering". Look at all the banking stuff (disclosure: finance/banking lead) and you will immediately recognize this as a reality.
Why? Because there are no KPIs that relate to speed, sustainability, maintenance. There is a distinction between input, output and outcome.
Also a lot of architecture is done using power point. Drawing a line in PP is easy, connecting the services in reality is often times anything else than trivial.
From a lot of the "startup stories" I read on HN, I rather gather that often there simply isn't an architectural vision, PowerPoint or otherwise.
Many, many years ago, I was asked to take on a new role as the system administrator for a group of advanced-degreed engineers, with their Unix workstations and FEA/CFD software. I setup centralized servers to host their software, a user directory, shared home directories, backup systems, and a Citrix server for Office. It was a tremendous success, and everyone loved the new setup. Upgrades to software became a breeze, people could access their files from anyone else's machines, and most got rid of their secondary PC for just doing email.
The drafting department, with all of their CAD machines, was a different story. Software and data directories were cross-mounted around all of the machines. It was such a mess that, when we had a power outage, it would take them THREE DAYS to get everything going again. This happened a few times a year.
I moved on to another new role to setup a Sun E10K, 3 cabinets of EMC disk, a 384-tape backup machine, and all the stuff that went with this. I was trying to explain the difference in these setups to my (very senior) IT manager, to get a point across about what I was doing. (I might have been arguing about setting up a third, "backup area network," in addition to the second "administrative network," but memory fades.) I got done, and she stared at me like I had a hunchback, and said (in reference to the drafting department), "Yeah, but it WORKS, right?"
You're absolutely right. Senior execs look at the fact that business is improving, and think to themselves, hey, we must be doing something right, and move on to other things. However, it is a source of continual disappointment that there is SO MUCH lost opportunity in the decisions that get made about how IT should be done. And, what's more, those bad decisions are being made by people who, by their role and their pay, should really be expected to understand the difference in my 2 examples, and what the extra overhead is costing.
I mean, if you had two physical production lines in plant, which were in series with each other, and one of them couldn't do anything for 3 freaking days after a power outage, while one carried on as if nothing had happened, plant managers would immediately fire a line manager who told him, "Yeah, but it WORKS, right?"
I think these situations endure because the emperor has no clothes. The people who are calling the shots can't understand the technology that MAKES the difference, and don't want their ignorance exposed.
If the product is Digital Ocean or DynamoDB or embedded software installed in Ford F-250s UI/UX and slick business aren't going to mask bad engineering.
Etsy was originally built by two NYU students while they were still in college. It was their first production software project. It grew super fast in a period where the whole industry was still figuring out how to scale web applications; perhaps only Amazon & eBay had actually figured it out by then.
Then VCs got ahold of Etsy and tried to put out the fires by replacing the "kids" who built the thing. This was around 2006, which predates even Twitter's fail whale and Series A funding to round.
The founding team wasn't a bad engineering team. It was a young founding team who built a thing that grew like crazy, and which, had it not been built that way, would have meant no engineer could have ever come along to preside over a rewrite. But because it was crashing and dollars were at stake, and because there was some drama that prevented the founding team from having the time to build a proper scale-up engineering organization, a bunch of people came in and immediately reached for a rewrite.
They had a working thing, on a not-that-uncommon stack, but the new engineers insisted on a rewrite. Why?
Because new engineers always insist on a rewrite. Because, as my friend James Powell once put it, "Good code is code I wrote; bad code is code you wrote."
Python and Twisted. I am a big fan of Python and watched the evolution of Twisted -- the rise in popularity, and then the decline. I can imagine how they came across a group of engineers who were absolutely convinced Twisted was the silver bullet to solve their engineering challenges, at that specific moment in time -- when Twisted was at a peak of popularity.
Just like Django/Rails was the silver bullet in 2011. (Or Scala?) Just like Angular was in 2015 (or Go?). Just like React is now (Or k8s?).
Of course, I am not using the "silver bullet" term accidentally. Tech trends change, but "there is no single development, in either technology or management technique, which by itself promises even one order of magnitude [tenfold] improvement within a decade in productivity, in reliability, in simplicity." (Brooks, 1976)
New tools are just an excuse for a rewrite, and rewrites are the ultimate shiny object to a new engineering team, brought in to "fix" an existing and working system.
As an aside: the founding software team at Etsy didn't get nearly enough credit for building a thing that worked and that grew from nothing. If they had focused on the "right" architecture, none of us would have ever have heard of Etsy, and we certainly wouldn't be debating the rewrite. That founding team didn't get enough credit, neither in equity nor in history. But that's another story.
It is very nice to try new things though because without that we can't move forward, but how much forward is enough seems to be the troubling part as investing time learning the "new unproven thing" every year or so is taxing.
> As an aside: the founding software team at Etsy didn't get nearly enough credit for building a thing that worked and that grew from nothing. If they had focused on the "right" architecture, none of us would have ever have heard of Etsy, and we certainly wouldn't be debating the rewrite. That founding team didn't get enough credit, neither in equity nor in history. But that's another story.
Very unfortunate reality here.
> “the second system is the most dangerous one a man ever designs”
Never thought about it that way. Feature creep and aversions to what you thought went wrong from the last time get tested here.
> So, all in all, we did several “upgrades to stable” that were actually bleeding edge upgrades in disguise.
I can relate. Being burned by certain upgrades can push one to veer on the side of caution even if the previous version had bugs. Software is never finished as they say.
Good code is code that is testable and tested, bad code is code that is code that is not tested (and may not even be testable).
As someone who's written too much non-tested (and sometimes non-testable) code, I'm pushing myself to test, and to hold that line, on projects both large and small.
Came in to a project last year that had been around 4 years. It was not only untested, but the original code in place turned out to be somewhat untestable altogether. 16 months later, we're still finding "rock bottom" moments of "holy shit this is garbage". By "garbage" I mean finding out that data that people that was being recorded was being discarded or overwritten, for months. Financial data being simply wrong, and 'correct' data not being recoverable.
First reaction looking at it was "this needs to be rebuilt". No no no, we were told - it's already working it just needs a few more features. To the extent that things were "working", they were working on accident. There were no docs or tests to demonstrate that things were working as expected.
The last year has been spent keeping it 'going' while trying to unravel and repair years of bad data is continually uncovered every few weeks. "We don't have time to rewrite!". The fact is, probably 60% of the code that was in use before has been rewritten, but it's taken over a year. It would have been faster to do it from scratch, and worry about importing/transforming data as we verified it after the fact.
So... good code is testable and tested. Absent these qualities, if I'm the 'responsible party', I will advocate strongly for a rewrite. If that 'rewrite' is 'in place', people will be told up front everything will take 2-3x longer than they expect, because we're now doing discovery, planning, testing and whatever 'new feature dev' you were asking for.
Part of the problem with this particular system was that it was started as a rewrite, but for the wrong reasons, or... executed wrong. "We can't understand the old code", so they tacked on a second framework in place and started blending the code - but still documented nothing, nor wrote any tests, or had a repeatable build process. Nothing. So instead of one undocumented system, they had 2, but one was in their head, so it was 'OK'. Based on what we inherited, one could only assume they really didn't have any understanding of the domain problems they were tackling, which just compounded every bad decision.
A good business model covers a multitude of technical sins.
I can’t say whether the Twisted consultant’s advice was good or bad re: business needs, but your comment seems very wrong regardless. A solution built on Twisted would be nearly the opposite of something no one understands, and there would be big communities to go to for help, and you wouldn’t even have too much vendor lock-in since many frameworks that are competitors to Twisted could be used with minimal code changes.
I wonder what's the motivation though - is it the likes? The informal setting?
I really miss those days when we just had content sitting with default font styles inside plain 'ol HTML tables.
1) It's not about you the reader it's about me
2) I don't intend to have so many tweets but after writing 1 or 2 I am inspired in a way that keeps me going and I end up at 20.
3) I have not figured out a way to channel my creative energy if I am faced with something that feels like work rather than a reaction or to fill an immediate creative inspiration.
We went from blogs to twitter because you know for whatever reasons. Now we are tweeting out blogs as series of threads and using something like ThreaderApp[https://threader.app/] to recreate the blog.
It does look like some sort of attempt to garner likes rather than an innovative means of publishing content?
Then again, that's twitter.
While we have been able to scale it out for volume of requests, it did not scale well for multiple teams and multiple services. It became a single point of failure whereby the access patterns of one team to their own databases could affect other teams abilities to access their own databases.
We've since moved to a new model whereby team's own their datastores and access methods, speeding up development and reducing negative impact teams can have on each other's data layer. Legacy access continues to be migrated off endpoint by endpoint, database by database. I look forward to the day of never having to look at Twisted code ever again. The framework is aptly named.
If you are looking for a similar solution for MySQL, I currently recommend ProxySQL. It allows for a topography whereby teams can control their own proxy layer and still have most of the benefits outlined above.
Managing a DB + middle layer for a small and an already stressed DB team? Can't imagine that going well. The problem was that they spent too much time doing reviews for simple SQL patterns for CRUD + operational issues like debugging database performance issues or watching devs perform migrations. Of course they have other things to do in their job.
My opinion was the root of the problem is database choice and practices. If all I want to do is simple CRUD, then give me good scaled Redis cluster (AOF enabled), dynamo or something that constrains the query model. DB team can worry about about the cluster management leaving me to worry about structure. I could consult DB team if I needed opinions for multiple cases. Give me a way to watch db performance as well, so DB team does not need to watch over a migration at 12am or some off period.
Sometimes I may need higher performance or more dynamic queries, so just creating a table or elastic search with only indexable values to get the ids works. Use those IDs to fetch from original store.
Throughout my entire career I have come across very few simple crud applications that will actually work properly with denormalized data structures. It’s not even about scaling users, it’s about scaling your schema. The very first instance you encounter where a document has a nested array will start to cause problems, and the only way to solve it is to push more complexity and state management onto the client. Which is bad interface design, and quickly gets out of control.
If you really do have a simple crud app, and don’t want to put too much effort into running your RDBMS, just use MySQL imo.
Why should my queries be a stored prod or prepared stmt if all I need is get, insert, update, delete an object. Yes, the auto generated code wrote to the database as an object as in all fields included in the query. Why? There's no transactions, foreign keys, etc. since some tables where stored in different databases. Using a relational db as object store included people inefficiently using it as an object store.
In my own projects, if I need the speed, columns are indexed while unindexed data is binary. I can explode out (unserialize) the binary data into cache.
Point of my post was centralizing database access tends to be a bottleneck for team efficiency. If not the devs then the db owners. Augment my abilities to execute don't try to completely change it where it brings a whole set of unwanted problems that were not there before.
That’s not to say all use cases for such technology are niche. A lot of CMS applications can fit into that paradigm very well for instance.
I’d also say that storing binary data in an RDBMS is a seperate anti-pattern all together.
Anti pattern depends on use case. Yes, I absolutely do not agree with your generalization. Storing binary data in an RDBMS is not an anti-pattern. Binary data can be of any size. The bigger the binary data the less rows you should expect to store. At some point (maybe the binary data is images/watermarks, etc), you have to choose a replicated file system to use that as the part of the datastore operations.
Furthermore, I've only worked on high qps applications, so maybe I'm a bit biased on how to use the database efficiently. :)
Storing all data as binary is an anti-pattern too, regardless of your qps.
I wouldn’t store anything but metadata in the database, the blob can be somewhere like S3.
You can assert anti pattern, but knowing how to structure your tables matter. SQL has BLOB, BINARY, VARBINARY choose the proper type depending on trade off. Models that include blobs can be structured in DB to avoid IO issues (indexed data includes id). Go to S3 with the id. How does what you are saying differ? My first post literally says this.
Not only can be structured to avoid IO issues, but are protected via a cache where the unindexed data is exploded.
Of course with high QPS I want to offload CPU cycles away from the DB. Scaling the app is easier than a DB.
Scratching my head here. Are you arguing for just use only SQL? Why would I do that.
This kind of thing is the stuff of horror stories. Now I have no doubt that you’ve convinced yourself that this is a great approach, but it’s not, and it will be replaced as soon as you leave (assuming you’re not a one man band).
You may be surprised to learn that several very large and well-known social networks use this technique -- serializing unindexed columns into a blob (typically compressed at some level) -- for their core product tables. It's not really that "weird", if you consider that literally tens of thousands of engineers work at companies doing this.
Conceptually it's exactly equivalent the same technique as putting all unindexed columns in a single JSON or HSTORE column. Newer companies use those; older companies tend to have something more homegrown, but ultimately equivalent, typically wrapped by a data access service which handles the deserialization automatically.
This technique is especially advantageous if you have a huge number of flexible fields per row, and many of those fields are often missing / default / zero value, and the storage scheme permits just omitting such fields/values entirely. Multiplied by trillions of rows, that results in serious cost savings.
Also https://www.skeema.io/ looks like a good product that I'll have to checkout. Looks like a better product so far compared to solutions like flyway/liquibase. Full featured suite for DB migration from dev -> prod is exactly what I've been raving about. Like it is "boring" tech as in no one really wants to touch it, but it is the easiest to screw up and products like this really take it to the next level.
The responses I've seen in this post appear to be from people who've never used dynamic fields in a db and advise against it by saying it is an anti pattern. If it is an anti-pattern, bring on all the anti patterns as I'd like to not wake up at night or be pinged for slowness.
Thank you for the kind words re: Skeema :)
If you put something in a JSON column you give your DB some expectation about the data.
If you store your data in a binary column you are just doing what the database already does in it’s backend with it’s normal columns, and any optimizations it might make are lost.
It might result in cost savings, but in my experience cost savings at the expense of transparency/usability are always a bad idea.
At large scale (especially social network scale that I'm describing and personally spent most of the past decade working in), the cost/benefit calculus is quite different. I'm talking 8-figure cost savings here. And there's no negative impact to "transparency/usability", since application queries are all going through a data access layer that understands the serialization scheme, and rando staff members are not allowed to arbitrarily make manual queries against users' data anyway.
As for optimizations: a custom serialization scheme can easily be more optimized than how the database stores JSON/BSON/hstore/etc. A custom scheme can be designed with full knowledge of the application's access patterns, conceptual schema, and field cardinality. Whereas whatever the db natively gives you is inherently designed to be generic, and therefore not necessarily optimized for your particular use-case.
If we’re talking about the scale you’re speaking at, your data storage costs might go down, but at the same time you’re now doing a deserialization action for every row, you have no options of retrieving only half the values, so you’re always going to do that on all fields now. The costs for your data access layer rise correspondingly (but maybe that’s a different team, so it doesn’t matter).
If you find you want to filter on one of those fields now, do you run a reindexing and update operation on a few trillion? rows (guess that might happen regardless though).
Whenever something sounds extremely counterintuitive, people show up defending it because it’s what they have been doing for years and it makes a sort of twisted sense.
I do appreciate all your messages on this topic though, they have been much more informative than the GP’s original comment, and I have no doubt there’s actually people doing it now.
If I ever run into a situation where storing all my fields as one big blob makes most sense, I’ll revise my opinion.
That said, in large companies and large complex systems, it's not unusual for any given team to have no clue about how various far-flung parts of the stack work. Dev environments tend to look quite a bit different there too; for example there's often no notion of a singular local dev database that you can freely query anyway.
In terms of transparency and debugging, this usually means some combination of in-house tooling, plus using a REPL / code to examine persisted values instead of using a db client / SQL. Also automated test suites. Lots of tests -- better to programmatically confirm your code is correct, rather than manual querying. Many SREs view having to SSH to debug a prod server being an anti-pattern, and you could apply the same notion to having to manually query a database.
Regarding deserialization overhead, sure there's some, but deserialization is heavily optimized and ultimately becomes pretty cheap. Consider that more traffic is from apps than web these days, and app traffic = APIs = deserialization for every request itself. And lots more deserialization for each internal service call. And significantly more deserialization for every cache call, since caches (redis, memcached, etc) don't speak in columns / SQL anyway. Adding deserialization to database calls is honestly a drop in the bucket :)
Point of my original comment got lost because of this "binary data in rdbms is anti pattern" detour. It was the least important thing that several people wanted to zero in and pile on.
> "Congratulations, you’ve offloaded to your app what your database is designed to deal with.
I wouldn’t store anything but metadata in the database, the blob can be somewhere like S3."
> If you find you want to filter on one of those fields now, do you run a reindexing and update operation on a few trillion? rows (guess that might happen regardless though).
And also my comment above answered this filtering question from the get-go which underscores this distraction detour: "Sometimes I may need higher performance or more dynamic queries, so just creating a table or elastic search with only indexable values to get the ids works. Use those IDs to fetch from original store."
Let's be honest questions like "how to structure your tables", "how to make your datastore operations more efficient" (non exhaustive list, seems I need to point that out) veers on consulting, which is typically paid for so of course comments on the topic will have less information than you would like. Skeptics, who make blanket assertions, will usually be disregarded. Some may be willing to divulge and others may not be.
I feel the need to thank the people that even so make an effort to elaborate their point.
Don’t get me wrong, I’m still convinced it’s an anti-pattern, it was just very politely explained to me by someone so I feel obliged to respond with the same.
Like I said in the other post, I’ll let you know if I ever reach a point where this is the only reasonable solution and I can’t think of something else.
Not in an RDBMS though. The issues with blob storage I’ve mentioned in this thread relate specifically to RDBMS, and aren’t relevant to technology like Hadoop.
I've personally worked on the database tiers of multiple social networks and am stating this concretely via many years of first-hand experience. At no point am I talking about Hadoop in any way, not sure why you've assumed that.
Because every large social network uses Hadoop to solve the type of problems you were describing.
> I am describing the core sharded relational product tables of well-known social networks.
Well depending on how you set your clusters up, you’re probably not actually using an RDBMS for blob storage. If you have a multi master set up with any form of weak consistency guarantees, then you’ve introduced a way to violate key and check constraints, and no longer have a system that complies with Codd’s 12 rules for defining an RDBMS. If you’re writing to a single node, and offloading blob I/O to a seperate node, then you’re essentially just using a MySQL server as an object store external to your RDBMS.
Hadoop is not used for serving real-time requests (user-facing page loads) in any social network that I am aware of.
You may be misunderstanding my comments. I'm describing use of the blob column type to store serialized data for unindexed fields. For example, metadata describing a particular user or piece of content. Let's say there are 100 possible fields/properties/columns for each user, but the data is sparse, meaning many of these fields are null/zero/default for each row. Rather than having 100 "real" columns mostly storing NULLs, social networks tend to serialize all these together (omitting the ones with empty values) into a single blob column, ideally using a strategy that maps the field names to a more compact numeric format, and storing this in a binary format (conceptually similar to BSON).
This has nothing to do with "blob storage" meaning images/audio/video.
This has nothing to do with Hadoop. Social networks / UGC sites don't use Hadoop for "blob storage" either, btw.
This has nothing to do with multi-master setups, which are generally avoided by social networks due to risks of split brain and other inconsistency problems completely tangential to this topic.
This has nothing to do with foreign key constraints or check constraints, which are implemented entirely at the application level in a sharded social network, again for reasons completely tangential to this topic.
My original comment in this subthread was in response to a claim that using blobs for unindexed fields is "very very weird". I feel that claim is incorrect because it is a common approach, used e.g. by Facebook, Pinterest, and Tumblr (among others). I think you may have misread or misunderstood this subthread entirely.
I understand perfectly. This is a very inefficient use of RDBMS. It violates every single normal form, and will lead to all of the performance bottlenecks I mentioned.
> This has nothing to do with foreign key constraints or check constraints, which are implemented entirely at the application level
As I stated before. This is not an RDBMS, it’s simply using RDBMS components as a storage service. An RDBMS must enforce key and check constraints, if this is enforced entirely at a higher level then you’ve violated the nonsubversion rule, and you’re dealing with a system that is not an RDBMS.
An RDBMS can only be called so if it conforms to Edgar Codd’s relational model, that is the exclusive definition of an RDBMS. Simply managing relational data on some level does not make an RDBMS, excel does that and it’s certainly not an RDBMS. So does MongoDB, and it’s not an RDBMS either.
There are well over a hundred companies using a sharded database layout. This typically entails not using the RDBMS's built-in support for foreign key constraints, since they can't cross shard boundaries.
In your mind does this mean a sharded MySQL or Postgres environment is no longer a RDBMS? That's an extreme view if so, and IMO not a terribly useful or relevant distinction in the real world.
> will lead to all of the performance bottlenecks I mentioned.
In the situation I've described (large number of sparse fields), serializing non-empty non-indexed fields into a single blob results in smaller row sizes than making all the fields into real columns. And generally speaking, smaller row sizes tends to improve performance.
Look, I've literally spent a significant chunk of my life working on some of the largest databases that exist. I post here using my real name and have my information in my profile. If you want to have pseudonymous pedantic semantic arguments about why you claim the largest database fleets are no longer "relational", I suppose that's your prerogative, but kindly refrain from trying to lecture me about vague unsubstantiated "performance bottlenecks" in widely-used real-world approaches.
* An object store
* A bespoke mechanism for defining schemas
* A bespoke mechanism for querying data
* A bespoke mechanism for mutating data
* A bespoke mechanism for enforcing relationships
* A bespoke mechanism for enforcing check constraints
The performance bottlenecks I described are not vague at all. They are well known constraints of RDBMS. To come into a discussions about such constraints and say “well if you only use the RDBMS for object storage, and then implement your own bespoke systems to replace all other RDBMS functionality with something completely different, then you don’t need to worry about that” is completely asinine, and to claim that such a system is still an RDBMS is just factually wrong.
You might be an excellent distributed system engineer, but it really seems like you don’t actually know what an RDBMS is.
The original post here is entitled "Scaling Etsy". The ultimate solution that Etsy used, to solve the problem described in those tweets, was to move to sharded MySQL. With no database-side foreign key constraints. This architecture has served them well for over a decade now.
By your rigid definitions, Etsy isn't using an RDBMS, which would seemingly make all your RDBMS-related comments off-topic.
I have seen zero examples so far of the fears and dissent raised by several people. Threats of horror stories, bottlenecks or "just because you can do it doesn't mean you should" are all kinds of dismissive, anti-social and social anti-patterns unconducive to learning.
I think we can leave it at that. No point in this endless replyfest. Some people will stay with their beliefs and never be open to what is done in practice and works in hopes to just being "right".
There is also nothing to disbelieve about the system described in the parent comment. It a perfectly sound design pattern. It’s just not an RDBMS. It is a system that uses an RDBMS as an object store, and then uses external systems for enforcing ref and check constraints, schema definition, querying, mutation and consistency. If you’re just using collection of RDBMS to store objects, and performing all other operations in external systems, then RDBMS design constraints are mostly irrelevant.
(Storing unindexed columns (as JSON rather than binary though) is where I'm headed at my job, due to dynamic schema, the cost of coordinating DDL, and the excruciating cost of joins.)
As I commented below. Unsubstantiated generalizations are what I distrust. From your comment here, you say "horror stories" where is the substantiation? Notice how my comment said specifically" If I need the speed", "database interpretation" of values. The use of caching for exploded unindexed values. Not sure what it is exactly you don't like other than binary data (512-64kb) in database that is protected by cache. There is literally no change over existing patterns of how to store data in a db other than telling the db not to interpret the values. Are you smearing the idea of lookaside caching in front of a DB (SQL none the less)? Plenty of companies use it.
I prefer to use SQL data types unless I need the speed. What's so unreasonable? I never said that I use it everywhere like you purport it.
I think we can agree on that. What I don’t think we agree on is that storing all non-indexed columns as binary is a good idea.
There are a lot of non-obvious issues with storing blobs in your relational database.
I’m sure you can mitigate them, but ultimately I think it’s better to just not give yourself that headache.
Let's do google search 'storing blobs' to find what do people have to say with their experiences.
First link: https://dba.stackexchange.com/questions/2445/should-binary-f...
>Most RDBMS have optimizations for storing BLOBs (eg SQL Server filestream) anyway
Second to last link: https://www.quora.com/Is-it-a-bad-design-to-store-images-as-...
> Bill Karwin is correct that you can't make unequivocal statement that storing images in a database is always bad. I like a lot of his points. Most people who are against images in the DB usually are promoting the only thing they know how to do.
and I'm not talking about images. Object data that I would serialize is can very between 512b to 64kb (Always interesting to see the scale of memory nowadays, kb is so small).
Then you have the fact that those answers are ignoring a lot of facts. The answer you referenced from Bill Karwin talks about mutating blobs. This isn’t a reasonable design pattern at all, you create a new blob and update the reference. The downsides to storing blobs in an rdbms are so numerous that you’d really have to have a very strong justification for doing so, and I’m really struggling to think of any that are actually technical.
Sort of like piles of sqlite db's on S3
It is theoretically possible that there’s a good use case for storing all fields in a blob. It’s just that none of the ones you mentioned are ones I agree with.
The only thing that makes sense to store in a blob is something that is otherwise incomprehensible (e.g. a jpeg file, doesn’t fit in any other column type), and even that is a bad idea.
A lot of NoSQL design patterns lead you into this trap tbh. If you’re trying to make NoSQL work with relational data, you end up in a lot of situations where you have to choose between complex, over-engineered interfaces, or pushing more work into the client. For example, requiring the client to manage intermediate states because you can’t make efficient use of transactions.
One is making it much easier to copy graphs of data, e.g copying an account with a bunch of attached users, widgets and their images.
When everything's in the DB this is just "insert into ... Select from".
When images are held in e.g s3 this gets an order of magnitude harder.
Nearly every tweet describes scenarios that can only happen when engineering management is M.I.A. or too inexperienced to recognize when something is a bad idea.
- Attempting to solve problems with a rewrite in a different programming language. This can be the correct long term decision in specific scenarios, but it’s rarely the correct answer for bailing out a sinking ship. You need to focus on fixing the current codebase as a separate initiative rather than going all-in on a rewrite to fix all of your problems. Rewrites take far too long to be band-aid solutions.
- Rewriting open-source software before you can use it. If Django doesn’t fit your startup’s needs, the solution is never to rewrite Django. The solution is to use something else that does fit your needs. Your startup’s web services needs are almost never unique enough to merit rewriting popular open-source frameworks as step 1. Pick a different tool, hire different people if necessary, and get back to focusing on the core business problem. Don’t let the team turn into open-source contributors padding their resume while collecting paychecks from a sinking startup. Save the open-source work for later when you have the extra money to do it right without being a distraction.
- Hiring consultants to confirm your decisions. Consultants can be valuable for adding outside perspective and experience, but the team must be prepared to cede some control to the consultant. If you get to the point where you’re hiring a “Twisted consultant” instead of a web store scaling consultant, you’re just further entrenching the team’s decisions.
- “Nobody was in charge”. Common among startups who pick a “technical cofounder” based on their technical skills, rather than their engineering leadership skills. When you assemble a team of highly motivated, very smart engineers, it’s tempting to assume they can self manage. In my experience, the opposite is true. The more motivated and driven the engineers, the more you need explicit leadership to keep them moving in the same direction. Otherwise you get the next point:
- Multiple competing initiatives to solve the problem in different ways. Letting engineers compete against each other can feel like a good idea to inexperienced managers because it gets people fired up and working long hours to prove their code is the best. That energy quickly turns into a liability as engineers go full political, hoarding information to help their solution succeed while casually sabotaging the competing solutions. In a startup, you need everyone moving in the same direction. It’s okay to have disagreement, but only if everyone can still commit to moving in the one chosen direction. If some people can’t commit, they need to be removed from the company.
- The “drop-in replacement” team. This is just a variation of having engineers compete against each other. Doesn’t work.
- Allowing anyone to discuss “business logic” as if it’s somehow different work. This leads to engineers architecting over-generalized “frameworks” for other people to build upon instead of directly solving the company’s problems. At a startup, never let people discuss “business logic” as something that someone else deals with. Everyone is working on the business logic, period.
I have to admit that when I was younger and less experienced, I plowed right into many of these same mistakes. These days, I’d do everything in my power to shut down the internal competition and aimless wandering of engineering teams.
Ironically, these situations tend to benefit from strategically shrinking headcount. It’s not a fun topic, but it’s crucial for regaining alignment. The key is to remove the dissenters, the saboteurs, the politicians, and the architects creating solutions in a vacuum. You need to keep a cohesive core team that can move fast, commit to one direction even when they disagree, and not let their egos get in the way of doing the right thing.
The real challenge is that those employees tend to fly under the radar. The people quietly doing the important work and shipping things that Just Work can be overshadowed by bombastic, highly opinionated “rockstar” engineers. Founders need to be willing to let those rockstars go when they no longer benefit the company, no matter how good their coding skills might be in isolation. A coordinated team of mediocre but diligent engineers will run circles around a chaotic team of rockstars competing against each other.
So much this. The bombastic Rockstars not only create unnecessary ego driven political fights that are distractions to a business delivering value, but on the occasions they do deliver, it’s often not what was promised. But their social skills (or privilege) allow them to cruise through failures.
Whereas a team of devs that work well together that are hungry to learn, willing to have an open minded conversation about systems, that deliver consistently: they are what really keep the business from falling apart.
(I made the tweets)
I really want hear this story.
I know it’s unprofessional to comment on other developers work negatively, but I do see it often in the work place and it does serve to distance an individual from the group that is failing.
It’s sad but true, and this is an example, employers say they want professionalism but they incentivize against it.
I wonder if this team would have done better with tools that offer better observability (metrics, logs, tracing etc.). An example of a rewrite that went well https://avc.com/2019/12/grinding/
> Sorry this content is not available on Thread Reader
> It has been removed by the author.
But I'm not a seller. My username is simple and a common first name. I've approached them a few times, and there's no interest in fixing it. Yesterday, I found where I could open/sever the connection between those messages and my email. It's been going on for years.
Many confused users, although I have little idea how long they stay that way.