The biggest thing we're going to regret looking back on the early Cloud era is the foolish notion that you need 100x as many servers to do 100x as much work.
Servers don't scale linearly. It's more likely you'll need, at a minimum, 107-110x as many servers. Between 100 + log(100) and 100 + sqrt(100). So making your code 100x faster saves you more than 100 servers.
I think the arguments never end because it's all fuzzy math. Get too attached to running the whole thing on one server, and your architecture starts making irreversible decisions about where the source of truth is, and you get locked into a single server.
Give up too early, and you spend a lot of energy herding cattle instead of building features. And speed is something you outsource to the guys who write the checks. That's fine if you lock in your vertical, but I've worked on projects that lost out to a more responsive or financially efficient competitor. It really, really sucks.
adding 100 servers damages the environment 100x more and costs jobs. There should be no question that optimization is the responsible thing to do, just buying servers is wasteful, greedy and irresponsible.
We need a commitment from all the (supposedly) environmentally conscious SV companies to pick optimization if given a choice between making the software faster and just throwing more computers at it. That would be relevant and meaningful, not just the usual "our office AC isn't set as cold as possible, so we are green"
I mean, as a general expectation, that’s ridiculous. Businesses will always optimise for profitability, expecting them to eat their margin voluntarily for environmental concerns is unlikely to be a solution.
As a policy, though, it makes sense - tax the environmental externalities appropriately to reflect the actual shared cost. If it’s still economic to add the servers once those costs are factored in, then you wouldn’t have any grounds to complain.
Problem is, taxing electricity appropriately shuts down your whole economy. Prices would need to be sky-high before SV would feel the squeeze, till then mom and pop wont be able to run their electric hob and your neighbourhood steel smelter will move to china even faster. Taxes may seem like an easy solution, but if you look closely it is far more complex.
There's nothing saying you have to tax the first watt like the millionth watt.
Just design the tax regime so that it is high usage users who are paying the high costs, and exclude any usage levels that would impact homes, small businesses, and "normal" acceptable uses.
I agree that tax policy is more complex than just "tax the shit out of it", but everytime a tax is proposed, there comes a group of people who argue that taxation is so complex that we can't possibly grok it. That is clearly not the case.
Everytime a tax is proposed, people like me suggest to look at the German Energiewende. Small businesses and private homes are taxed to hell and back on electricity, to keep prices for large businesses internationally competitive, because those bussineses can just pack up and go elsewhere. Or cause a revolution when going bancrupt. Taxing large consumers more also doesn't work. Taxes are never the single answer, you always need to combine them with import duties, ending trade agreements, leaving the WTO, maybe even leaving the EU to be able to levy duties on other European countries with a lesser energy tax. Because that is what it takes to create a fair tax regime on energy.
I guess only the very radical greens would want to stomach that backlash...
I'm not sure why you are being downvoted, that sounds pretty reasonable to me. The idea that we'll have to make (at least) this level of compromise to survive as a species seems to still not be taken seriously in the tech world.
I can sympathize with downvoters, demanding change is always easier when the demand is aimed at others, not yourself or your peergroup. And I'm not sure if this level of compromise is really necessary, or if optimizing computer power consumption is just negligible on a global scale and the resources spent to optimize here (it will cost money after all, even if the initial investment pays later on) would be better spent elsewhere.
My comment was aimed more at the hypocrisy of claiming to be green and then skipping the obvious savings in optimizations.
No, it definitely doesn't have that direct impact on environmemt. If you're using cloud servers, what it costs to you is not always connected to how many physical servers are running. On top of that, AWS, for example, claims to use renewable sources for over 50% servers.
Moreover, there are other considerations. If you're spending too much time on optimization, the company may not survive... leading all the work done to potentially go to waste. What about the environmental impact of waste there?
> Yeah, but "making your code 100x faster" might cost significantly more than just adding 100x servers.
Cost? No. But it certainly requires more talent.
Unfortunately, programming talent doesn't scale with cost at all. In that it's like a creative endeavor: you can hire 10000 "professional writers" if you want, but that won't get you another Shakespeare.
Which is why I find the "software engineer" moniker silly. Software isn't at all like engineering.
The term “software engineer” was invented to describe the work being done by Margaret Hamilton and her team, when developing the guidance software for the Apollo moon landings. It definitely is a form of engineering.
From Wikipedia:
> When I first came up with the term, no one had heard of it before, at least in our world. It was an ongoing joke for a long time. They liked to kid me about my radical ideas. It was a memorable day when one of the most respected hardware gurus explained to everyone in a meeting that he agreed with me that the process of building software should also be considered an engineering discipline, just like with hardware. Not because of his acceptance of the new 'term' per se, but because we had earned his and the acceptance of the others in the room as being in an engineering field in its own right.
> The term “software engineer” was invented to describe the work being done by Margaret Hamilton and her team, when developing the guidance software for the Apollo moon landings.
There's a considerable debate about when and where this term was "invented", apparently with some oral histories placing it even into (late?) 1950s.
As for grandparent's comment, I think it was referring to the fact that the nature of the job is completely different. For example a large portion of a traditional engineer's job deals with fighting the natural world. A bolt may rust away and stop doing what it was doing until then. A source code line in the form of
a=a+7;
won't do any such thing. As long as it exists it will keep doing the same thing. Hence software engineers have to deal with a completely different set of issues, to the extent that calling it "engineering" is to a certain extent metaphorical.
Code rots/rusts faster than most physical objects. Requirements change, libraries update at a breakneck speed. I've had to deal with some (functional) languages that don't allow you to change the the variable where some would allow that sort of modification just fine for your simple example. Change is everywhere.
> Code rots/rusts faster than most physical objects. Requirements change, libraries update at a breakneck speed.
That's a forced metaphor as well, not actually an identical process. If "requirements change", that's not the same thing as a bolt rusting away. Surely the requirements for the rusting bolt didn't change, or the requirements for its "dependencies", i.e., the things it's attaching together.
> I've had to deal with some (functional) languages that don't allow you to change the the variable where some would allow that sort of modification just fine for your simple example.
I completely fail the relevance of that. Or maybe it is somewhat relevant, but in the completely opposite way, i.e., that this phenomenon does not even have an analogy in the physical world, thus underlining the vast differences between software engineering and classical engineering disciplines. (Likewise the dichotomy between a computational procedure and the process generated by said procedure, the latter being constrained in the way you describe, doesn't exist in traditional engineering either.)
That has nothing to do with talent IMO. Every software engineer worth their salt should be able to inspect a full stack, measure and evaluate the performance. Instead you have 80% people stuck in whatever the current fashion is (js, I look at you) and that cannot tell a png from a .so .
So in the end, it is more about attitude, not talent.
Agreed. When we focus on the part of the spectrum of optimizations that do require talent, we're thinking of the exceptional, interesting, makes-a-viral-post-on-HN stuff.
The bulk of the work, in my experience, is simpler than the programming part. It's just a culture/priority thing. Like how most of the industry used to not use source control. Now we do, we didn't need to add "talent" to get that done either.
The depressing part of all of this is performance optimization is really fun! I love sitting down with flamegraphs and other profiler output and making things faster.
Talented people usually cost a lot to optimize your code to be faster. And the more time they spend optimizing existing code, the less time they spend writing new code.
Fwiw engineering talent doesn't scale with cost either. What you've described is actually one of the areas software and engineering are similar, not divergent.
I’m not sure what you’re talking about. I’m not going to ask my developer - who I pay $2000 a day - to spend 10 days optimising my codebase to avoid an extra $5000 in server costs.
It doesn’t matter if they’re a 10xer or a 1xer, their time always has a price tag.
That doesn't fit in my accountant's brain at all. $20k one off cost vs $5k recurring cost could mean anything if you omit the time period! $5k a year depends on your cost of capital, $5k per month is a no brainer
Engineering is using science to build technology, using engineering principles that are common to all engineering disciplines. Software Engineering is using engineering engineering principles to build software.
Project management methodologies like scrum are implementation level details, together with platform (user interface, programming language, database e.t.c). Software Engineering starts at the feasibility study, followed by detailed design, then comes the implementation phase. Most Software Engineering job posts only list implementation phase requirements, which I reckon has led to SE being misunderstood.
Most likely, squeezing 100x performance out of your code will significantly complicate it. For example, you're probably going to be dealing with cache invalidation a lot, and you will pay for that complexity forever, as you try to build new features.
It depends on the baseline, I suppose. I could see code written (poorly) in a dynamic language to be 100x slower than necessary. Maybe it was fast enough when written, so nobody looked too hard. Inefficient DB queries, excessive IO.
But I agree with the sentiment. In a code base where performance matters, that level of optimization would often be impractical.
It depends. In many benchmarks Java is 50X faster than Ruby. So having code 100X as fast can just mean choose a fast language and framework for your project
That's under the assumption that code doesn't have to be maintained, updated with new features, and that writing it on any programming language has the same cost.
Otherwise we would all be writing assembler.
Humans are way, way more expensive than servers in almost every case. Exceptions might be small pieces of code ran by millons of devices, but that's far from being the common case.
Architecture (x86_64 or arm on mobile) hasnt changed in decades. The way to optimize for it changes in years but you get a slightly less optimized version. Im sure software written in 2007 in assemble for linux pc works now, and fast. I think assembly is not cheap because the language is tiny, offers no abstraction and youre on your own for everything, not because of arch turnover.
If the code has been optimized to the point of being illegible/difficult for others to maintain then, assuming you’d like to update that code at some point, you’ll need to pay for that too.
> The biggest thing we're going to regret looking back on the early Cloud era is the foolish notion that you need 100x as many servers to do 100x as much work.
> Between 100 + log(100) and 100 + sqrt(100).
I bet it doesn't even make the top ten. You have so many other problems by the time you're scaling to hundreds and thousands of servers and beyond. A few percent difference in server count? Whatever. That's still almost perfect scaling, and not going to be your bottleneck or your huge cost center.
Servers can and do scale effectively linearly depending on the workload.
The broader point stands though. The more servers you have, the more money a 1% cpu optimization saves you in raw dollar amounts. At a certain scale it always makes sense to optimize your code.
The moment the servers need any consensus at all, you've opened the door to Amdahl's law. That could be sharing caches, running Raft, or talking to a transactional store. Even if it's just to keep my UI in sync between the desktop and mobile site, you have costs.
But since so many sites have social aspects now, you need to be able to communicate between accounts, which means interprocess communication.
The Amdahl's law fraction is at least logarithmic to the number of users, but more likely to be proportional to the 'surface area' - sqrt(n). Every new server gets you a little bit less oomph than the one before.
There is basically no overlap with Amdahl's law here. Amdahl's law just says that there are diminishing returns on the whole when increasing parallelization in a program with significant serial parts. It is a fairly obvious observation, but people seem to distort it and apply it to general scalability.
First, everything needs to be synchronized but that doesn't mean synchronization is expensive or blocks computation from happening. Lots of synchronization isn't even dealt with explicitly by users, it happens in queues in the kernel, in networking hardware, or even in the memory controller when talking about reading from memory.
The closest thing is the database becoming a bottleneck, but that is not because of Amdahl's law or even synchronization, it just about reads, and those are extremely parallel, so it likely comes down to total CPU power when it is a problem. That's why query caching can be so effective - it isn't about synchronization and definitely not about serial parts of a program, it is just about usage.
For connected social media sites, not everything has to be done on a page request. Anything done ahead of time is also done in parallel.
And remember, even if we aren't sending more and more data between our users, we're collecting more and more telemetry data from our servers all the time. Coalescing logs and gathering stats requires funneling data from everywhere down to one machine (the one that built the graph)
I actually wasn't thinking of social sites when I said "certain workloads".
It's true that every server extra has a bit more overhead because of stuff like caching / load balancing / whatnot, but the amount extra can be tiny even at very large scale. For certain workloads that are parellelizable.
(It'd be fun to talk in details from personal experience instead of in general terms, but you know those pesky NDAs...)
Unfortunately the probability of failure also scales up.
Let's say you bought good hardware and the probability it will fail is x.
Let's say you buy a hundred of those. The probability that one of them fails is now 100x. You just increased the probability of failure a hundredfold. You will need a load balancer who can detect failure and not forward to that machine anymore. The load balancer can fail, too. Now it's 101 fold. You also need a separate database if you share sate. 102 fold. Make it redundant. 103 fold. Add router / switches. 105 fold. Make those redundant. 107 fold.
I have never used more than one crappy PC for any project of mine, and some of those have had/have pretty substantial load on them. When I say one server, I mean one PC but with two power supplies and RAID storage. I'm not trying to tempt fate here.
But I have no devops team, no admins juggling failing machines, I don't have racks. I have one slot in a rack somewhere. The machines have been running for years. Every few years, as I approach MTBF of the hardware, I move to a new server. That entails a small downtime.
Result: People use my servers in their "is the internet up" scripts. Because my failure rate is a fraction of what distributed systems have.
In fact, practically all of my downtimes have been software bugs in my code, or me fat-fingering input somewhere. I can't even remember the last time the hardware failed. Not even sure I ever had an actual hardware failure that caused downtime. I once had a raid hdd throwing errors and it had to be replaced. I think that's it.
From my experience its almost always the database. On any sufficiently complex web app with a ORM its easy to trigger literally 100 queries by loading a single page if you don't make a specific effort to avoid it.
If you already have a load balancer, you should be able to scale on the server part from 1-100 exactly linearly. Where are your extra 7-10 coming from?
And if you already have a database, you usually just upgrade its specs linearly too. And if god forbid you need to shard... that's perfectly linear too.
For pretty much all practical planning purposes, servers do scale linearly. The equations you're describing seem to imply a deeper hierarchy of server connections, but that's not usually how it works -- servers (whether load balancers, web servers, or databases) just come in bigger pools, not trees.
Where are you getting your log() or sqrt() terms from? Is it some kind of application-specific caching layer or something you've used in the past?
> If you already have a load balancer, you should be able to scale on the server part from 1-100 exactly linearly.
I think this only counts in the simplest of cases? Such as serving HTML straight from memory.
In other cases there are various micro or macro services to call to, state to communicate on the enterprise service bus or something equivalent, separate database servers to run queries on, etcetera. A single request might hold up several other requests due to badly-thought-out dependencies between services and physical servers. More servers wouldn't scale exactly, and it might be difficult to find out where the next bottle neck is.
Thanks... but all the things you're describing are still all linear though in terms of servers, no? All the services you're describing will scale linearly.
And if some external bottleneck exists... that's a separate issue and adding more servers isn't going to solve any problems at all.
I guess I still just don't see it... I know servers don't scale "exactly" in the sense that one service will require another server before another service does, but that doesn't make it nonlinear. It just gives it variance (fluctuating both upwards and downwards) around a line. It doesn't turn it into a curve.
> Yeah, but "making your code 100x faster" might cost significantly more than just adding 100x servers.
And moreover, it takes time to make code faster.
Very realistic scenario is that you have 2 months of runway in the bank and investors have committed to the next round but are holding back the money and still having 3-hour coffees and 2-hour lunches with you about how you can improve your go-to-market strategy instead of letting you get back to work on actually scaling your tech.
Frist spent some time and try to optimize. If that's too hard or time consuming for proper reasons, then scale up. If you see many long bars in htop on the biggest machine your provider has to offer, then scale out.
Of course, you might want to scale out earlier for other reasons e.g. fault tolerance, if you want to allow different kinds of failures.
We’re a startup and focussed on easy horizontal scalability early on. From a management point of view premature optimizations are cost intensive, especially while we’re still figuring out what our product is and how it works. At the end there are a lot of changes all the time to functionality and implementation and optimizations early get wasted very quickly. We rather purchase new servers. A powerful server cost 100 USD/month and wehe development team can focus on implementing features and functionality which moves our product forwar d, optimizations are opportunity losses at the beginning of a product. Of course, as soon as bad performance impacts user experience it’s a different story and also if you start to have hundreds or thousands of servers but then you’ll know what your product is doing and changes become smaller, focused and code optimizations start to make sense.
That's not an argument not to optimize, that's an argument to consider the costs & implications of optimization. Failing to do that now can cost much more later on when compute resources need balloon and product scope/codebase is much larger, making optimizations more labor intensive.
There's no yes/no on this. It's all on a slider, a spectrum.
optimizations that are not premature are a win though, and by making it standard to consider performance issues from the outset, you and tour team get better at it, such that it doesn’t cost time to default to pretty efficient solutions.
also some basic tech choices can often get you large constant factor wins without any downsides, like using a languages that are at least jitted rather than interpreted.
It is a tautology because the word "premature" means "before you have to" in this context. It's like saying "You don't need to do things before you need to need to do them." It's true by definition.
The question is: which optimizations are premature and which aren't?
I'd say it's not only a tautology, it's plain false.
Cheap optimizations that are known to work (compiling in release mode, enabling HTTP cache in your web server, using a fast sorting algorithm on large arrays...) are good even if you don't need them right now.
But the question still valid, which are and which aren't (premature)? For experts: avoiding n+1 query, using hash map rather than loop match, caching in memory, using smaller data type, all are standard optimization techinques.
Now if junior doesn't implement them, are they didn't do required optimization, or do they do premature optimization if they did?
"premature" is in the eye of the beholder... (or implementer). I've cleaned up a lot of slow code over the last few years, and in most cases the 'fixes' are pretty simple. The pushback was typically along the lines of "YAGNI" and "premature optimization is the root of all evil" quotes, yet... yeah, you know what, we ARE "gonna need it", and adding a table index on a column you're querying on in a where clause isn't "premature". It's industry standard practice.
I think what the others are trying to say is that 'premature' is generally a label applied in hindsight.
The preparations you made that turned out to make your life easier, the things you regret not doing, and the things that ended up being a total waste of time.
What about the cost of putting shitty, slow software into the world?
How about having respect for you user's time? Depending on how many users you have, shaving seconds or milliseconds off your response time will save humanity hundreds, thousands, or millions of hours waiting for your software to do something.
It depends on the product and the user/customer impact, if it doesn’t matter that it takes 200ms or 300ms why do you care? Also, if it matters and you can scale horizontally to improve performance do that first instead of spending valuable developers hours.
I think it's difference in the hacker/product mindset and the engineer mindset.
For hackers or product focused devs, making something work is the most important aspect whereas for engineering focused devs, it hurts to see such large inefficiencies that are solvable.
I empathize with the engineer mindset, but definitely align more with the hacker/product mindset.
After your first sentence I thought you'd continue like this:
For a hacker, making something fast and cool is the most important aspect, whereas for engineers making careful tradeoffs between effort and customer impact are the main focus.
I think you could swap 'hacker' and 'engineer' in your sentence and it would still make sense.
I'm not saying, "engineer bad, hacker good", just that we tend to value "good" code, architecture, performance, etc highly and sometimes that is to our detriment.
I feel the latter is just not being able to see the big picture.
People in the latter mindset don't get that providing real value to users is what matters.
Literally everything boils down to it. Even the example of making a faster application, it's literally only useful in that it increases how quickly you generate user value.
At some point once you generate enough value for your end user, the utility will outweigh a given latency problem.
Even making your server costs cheaper with optimization really matters because you can transform the savings into user value that exceeds the cost of optimizing.
So it really doesn't make sense to thumb your nose down at people who are "just sticking stuff they don't understand together" in a vaccum like they tend to do.
You don't know their runway.
You don't know how concrete their business case is.
You don't know their development budget.
The moment you're making a dichotomy between yourself and "those programmers who just put shiny legos they don't understand together", you're demonstrating a lack of understanding of the bigger picture that development fits in.
Because sometimes hiring someone who has little experience outside clicking those legos together is all that allows an idea to exist.
tl;dr: A service that loads with 100 requests instead of 1 because the developer doesn't know better still generates more value to the end user than one that doesn't exist.
The even bigger picture is that we (as people, who all use various services in our daily life) end up with all these services being slow and frustrating, although they do provide value, resulting in an overall frustrating daily experience.
The non-existence of a bad service allows a better service to come into existence. All too often a market is dominated by some company that got lucky and now has no incentive to improve.
I happen to have it on good authority that there is an endpoint lurking somewhere in our API that has a p99 latency of almost 3 full hours per request. We clearly don't respect our users time. Or our own. Or basic software engineering practices. Sigh.
Yeah client's will timeout for sure after a few minutes. The API doesn't know that though... it'll just sit there wasting an ungodly amount of resource and causing tonnes of noisy neighbor problems. You look in the APM traces and see one with 8 million spans that enumerates the whole database. Just silly. It's easy to be a 10x engineer in an environment where there is a lot of 0.1x engineering :(
Yeah, I think the article makes a good point on the impact that optimization can make but I use the same strategy that you do at my current job. Early on it just makes a lot more sense to focus on what can generate revenue.
Depending on your field, making things work fast may be the difference between it not working, working at all, and willing to be adopted by other practitioners.
Completely right. You are correct in what you should be focusing on. Only very large systems will see a return on investment in optimizing a system for hardware costs (bad code and crummy implementations should be worked out as tech debt). Focus on product, features and serving your customers. Get your market fit and burn your investor capital. Then when you reach that next stage, either sell to a company that is good at management or start hiring for performance.
On the other end of the spectrum, really poorly performing code can affect development, testing and delivery times which eats developer time as well. Some developers or testers might sit oh welling waiting for 1hr job to finish when 1 day of development time could turn that job into a 5min run. That kind of optimization pays for itself before it even makes it to production.
It's not necessarily either/or. In my experience, the really crappy developers who can't write optimized code are also the ones who are really slow and expensive in developing even the most basic features.
Indeed, that’s basically the point as well. I have seen developers hunting for the „right“ way and best performance even to the microsecond level for no good business reason while ignoring the state of a project/product and the required value they should focus on.
Golden rule to follow: always optimize first before you attempt scaling.
Working for large banks I see this error made every day. Nobody knows how to optimize, but everybody knows how to request for couple more servers.
Then there is a talk of budgets, hiring more people for maintaining the behemoth, then introducing some pretty exotic and expensive technology to "optimize operations".
I have seen yesterday a huge piece of infrastructure with literally terabytes of RAM and a number of GPUs costing at least 250k a year for a batch job that should fit a single server with no problems.
Agreed; but to clarify, the first thing to do is to scale vertically (get bigger servers). When you're starting hit limits with that, then optimise, within reason. When you're starting to hit limits there, then scale horizontally (add more servers).
The point with optimizing before scaling horizontally is to make sure you understand what is going on. Scaling horizontally without a good understanding is going to be a fail-whale shitshow.
I'd also note that high availability needs some horizontal scaling.
HA and scaling are not the same thing. High availability represents architectural decisions to maintain operation in the face of subsystem failures. With respect to high availability within a data center, this can be achieved with a passive secondary system and a mechanism to failover to this secondary in the face of a primary failure.
Such an HA strategy is valid, but provides no horizontal scalability.
Scalability is the ability to serve more traffic. Scaling horizontally implies adding hardware.
Though the HA example given above includes additional hardware compared to a single primary node, it does not support any additional traffic. Scaling would imply that the addition of the secondary server allows the system to serve more traffic.
But how many engineering-hours would be needed to optimise it to fit into a single server? And what would the risk be that there is a screw up? Sure there might be a few low hanging things to clean up ( i.e. a few places where a data structure could be better picked), but if it needs that much iron then it is probably not a trivial refactor of a few hotspots, but likely a more extensive rewrite.
Probably cheaper to just pay 250k than for a couple of engineers taking risks modifying a battle tested bit of code... If it is important enough for the business to justify paying that much for then you'll need to "do things properly" and not do a hatchet-job of a cowboy refactor, but plan things out and test them and have rollout/rollback plans, project management etc.
Optimisation is typically one time cost. You make a feature A and you spend a bit more time to ensure it works couple times faster or couple times less resources than initial/naive version.
Maintaining larger infrastructure is an ongoing cost as you need to provide x times more resources throughout the life of the application.
Not only that, but now future development will have to be done in context of large application. Maybe writing a process for a single node could be easy and now you have 100 nodes and it is complex to design your functionality to work correctly on multiple nodes, complex to deploy, complex to resolve issues. Maybe you decide to employ new technology to solve some of those problems but at the cost of creating other problems and also additional complexity.
Then maybe you need more people -- and this comes with its own inefficiency.
The point really is, most applications could be orders of magnitude more efficient if you just don't make stupid decisions.
It’s possibly much more. http://www.frankmcsherry.org/assets/COST.pdf Demonstrates, and it is totally in line with my experience, that using common scaling mechanisms has an immediate x10-x100 cost before you start to realize the (almost linear) scale in the number of machines.
Just adding Hadoop costs you x10; to go 100 times faster (compared to one machine properly written without Hadoop) you need 1000.
Common web app scaling is not quite as bad, but it is quite bad. Philip Greenspan had shown back in 1997 with the original arsDigita system that one machine hosting db+web could do better than the then-prevalent multi-machine deployment. That’s still correct (and still mostly ignored in scalability discussions cause it is boring and requires thinking ahead).
One rule that should be added here, but it might be a bit late:
* choose a language that supports async operations so that you aren't waiting for external webrequests or database calls to get back to local code.
A synchronous language can turn your big, beefy machine into a single concurrent transaction language and _that_, beyond any other pain, can really, really, really reduce your total throughput.
Pick languages that can do other things while they're waiting for IO.
Are there good articles on the benefits of async over something like PHP-FPM?
At $WORK, we run a lot of 1000s of php servers with 10-20 processes each using nginx. External calls we don’t need the result of immediately get kicked out to an external queue, but we haven’t seen any issues with DB calls blocking the whole server.
Long or slow DB connections are an issue for individual request time and DB load though (the biggest DBs we run handle 1000-2000 concurrent connections).
I think one of the forgotten blessings of PHP/CGI was that requests had to finish sometime so as to free the PHP process for the next request. This alone I feel ended up making decisions easy around when it felt right to offload stuff when slow requests started to affect index.php from rendering in a timely fashion but now that's often masked (rightly) by caches and other tricks.
Right now in a microservice model often you have to remind developers that it just isn't ok to have a ~1ms response in a basic case and a 1 second to infinitely blocking case during errors or slow remote services.
At the level of a programming language, the win from async isn't one of performance (at any given time you're operating at most one process per core, and you'd generally get better performance if it were always the same process for each core doing exactly the right thing at the right time, including choosing when to interact with your asynchronous network card), but rather one of developer ergonomics; it can be a lot easier to interact with async things (those network events) by making everything that calls into it async as well and relying on some framework to turn your async code into something that actually matches the machine's execution model.
I think it's relevant. Let's say you write an API that only needs to call another API that takes 1 second to respond. A golang or nodejs server would have no problem to serve thousands of parallel requests. You can't really justify having 5000 ruby threads, they are heavy and they will use too much memory.
This is not really a new thing but maybe it's not well known.
Ruby's had aysnc IO support for at at least 10 years I think.
Rails itself has been using Concurrent Ruby since at least Rails 5 but I think it had its own concurrency patterns even before that. e.g. ActiveRecord database drivers.
Jesse Storimer's Working with Ruby Threads book from 2013 is still a really good resource for concurrency in ruby.
It's very relevant if the external webservice calls take long enough.
Multithreading just increases your technical max concurrent request up to the size of your multithreading maximum.
Async lets the computer do _anything else_ while it's waiting for the request.
If you're handling thousands of concurrent requests, you need thousands of threads available ... or, 10s or hundreds of threads with async available (your mileage may vary).
Multithreaded python sorta works like async in web situations - the GIL still limits to one thread actually interpreting python code, but a thread can run while another one is blocking waiting for IO.
I've still run into issues where an external service took approx 1 second per call. We had 20 threads on each docker container running Python. We had a lot of web requests.
It regularly took down the multithreaded python instance.
The solution we went with was to put all the different web requests into 2 flask servers and then take the output of all those different web requests and make one big request tot the django instance that had all the business logic.
If memory serves, we only had 2 flask instances handling all the traffic that the old array of django servers.
Is async Django completed? I thought the initial async functionality wasn't 100% of Django being async, only specific parts (Like views?). Maybe I'm remembering wrongly though.
I've since moved on to a different company that does not use Python for its main services. Django async came out after I left. I was very excited to see it arrive, though, as I expected it would improve the areas of concern I had and reduce the need for the extra layer.
I still don't know why people are afraid of multiprocess python. Sure it's a bit more complicated, but not that much more. Use those processor cores. I use it all the day for doing things in parallel on testing hardware and deployments. Is it really that much harder for web?
Even then, it's a bit misleading; make it so concurrency, in general, is easy and efficient. Async is immaterial if you have green threads or similar (a la Go or Erlang). Point is, you shouldn't risk a model that ends up unnecessarily blocking/synchronizing execution.
Do you have any examples of this being used with Rails? I'm having a bit of trouble finding examples of both Socketry and Rails being used concurrently; and, I will admit, my Rails knowledge is a bit .... intermediate or less.
That's the earliest I could find off the top of my head - But the other comment responding to you found an even earlier one where they mention the "select" feature that was added sometime around 1998 (22 years ago). The twisted async networking library seems to have been released/started at least 13-14 years ago as well.
I think we all (myself included) kinda forget or underestimate how old Python really is.
I don't think I have even seen memory use be a benchmark for server frameworks. When is that ever actually a problem? Wouldn't that mean hundreds of thousands of in flight requests?
Python 3.7 and friends do have async support; and, Django has been recently gaining it, finally.
Prior to the new async stuff, though, Python was a horrible language if you had internal microservices bouncing requests off each other - the edge layer would wait a very long time for all those secondary requests to finish before it could respond.
OK, but please don't post unsubstantive/flamebait comments to HN, and certainly not personal attacks.
If you know more, the thing to do is to share some of what you know, so the rest of us can learn. If you don't have time or don't want to do that, that's fine of course, but then please don't post.
Edit: it looks like you've unfortunately been posting a lot of unsubstantive/flamebait comments. We ban that sort of account because we're trying for curious conversation here. If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and sticking to the rules when posting, we'd be grateful.
Optimizing an application is better than adding servers, because it's faster for any one user, not just scaling better.
> optimizing your app to fulfill a request in 1/10 the time is like adding 9 servers to a cluster.
Not if the request executes on one server. Because then it's like saying that optimizing baby gestation to take place in one month is like adding 8 uteri.
It would be like adding 8 uteri, if your requirements are “give me x babies per second”
In both cases you can get 9 babies in 9 months, and can support the same amount of baby throughput. But in the 1 month gestation case, you only need to feed one mother.
I think what kazinator is trying to say is that request latency is also important, in addition to throughout, because users will notice when requests are faster.
The mean number of in-flight requests is calculated off of the arrival rate and time per request. If you can retire requests faster, you're available for the next one.
I would argue that optimizing the server, in the uteri comparison, would translate to having twins or triplets (and so on). With one uteri you'd get more babies with no additional uteri.
In real life you would also be in for some real "fun" after nine months ;-)
> it may sound obvious, but - optimizing your app to fulfill a request in 1/10 the time is like adding 9 servers to a cluster. Optimizing to 1/100 the time (reducing requests from say 1.5 sec to 15ms) is like adding 99 server
Who the hell is casually optimizing away 90% of the latency in the time it takes to spin up 9 more pods? That's insane. Like the latter is an operation on the scale of minutes.
Also, this is a throughput/latency conflation.
90% of the cost a startup incurs is in the things it could be doing instead, not in the infra. Opportunity cost dominates.
I've optimized software at startups for a 10x to 100x performance improvement. It's not hard to find these areas when you have the fortune of hindsight. These past 2 weeks I took CI from ~40 min to ~2 min and not only improved everyone's lives at the company but also made it way cheaper to host beefier CI machines. If 1 build takes 20 minutes & 32 GB of ram vs 1 minute & 32gb of ram the costs are way different. At previous companies re-architecting core systems to allow better binpacking saved similarly large amounts of money (15k/month -> 1k/month).
At way smaller companies (2 people), though, this is way harder unless someone has done something really wrong.
Dropping our Docker+Gradle builds for Bazel and using Bazel's caching and running a Bazel daemon on CI machines so I don't pay startup/analysis costs. On builds with no change CI time is like under 1 minute.
I could have done that with Gradle but we also want to support multiple languages (Java, Python, NodeJS & React, Golang).
Great answer. Honestly, while the theory is that you can Dockerize your build and you can do remote caching with Bazel I've never seen anyone do it. Like it seems some confluence of steps that doesn't occur. I think I wouldn't use Docker for builds at all just because of this performance regression.
You can do very fast builds with both Docker, Gradle, and Bazel all of which support caching. Unfortunately for my use case Docker and Gradle don't have the understanding of the source tree needed for effective caching. Docker's caching is built off of the docker context + previous build layer hashes, gradle's caching is very very poorly thought out but - if you use no code generation (lombok, autovalue, Dagger) - it'll work.
Bazel's caching abilities are by far the best I've ever worked with because it understands the full source tree. It can also cache test executions. There's some tests in my code that make sure I'm calling out to crypto libraries correctly and these tests take >30 seconds to execute but almost never change. With bazel I can feel free to write as many of those integration tests as I want since they will only ever be rerun when something effects them (I.E. I change the version of my crypto library).
> Honestly, while the theory is that you can Dockerize your build and you can do remote caching with Bazel I've never seen anyone do it
Yea, you likely don't want to run bazel within a docker container, you want to build a Docker container within bazel [0]. The performance of this way of doing things is much better. My monorepo has >30 services and `docker-compose up --build` was becoming super slow. To address this I've written bazel_compose [1] to obtain the same workflow docker-compose offers you with bazel as your container build system. It also supports a gradual migration scheme and will build both the Dockerfile AND the bazel version of your container to make sure they both start.
Unfortunately the bazel community is mainly populated with companies who are 100x the size of the average and as such they already cant run all of their services on their dev machines and so they don't see the value of something like this. This version of bazel_compose is out of sync with HEAD @ caper but if you're adventurous I'd recommend checking it out. It has extra features to watch all of the source files using ibazel and will automatically build&restart containers (<<10 seconds in my experience) as you edit and save code.
Right behind the cost of your employees is the cost of your infrastructure. If your company is profitable, every dollar of infrastructure costs cut is a dollar more profit. For a company still working on a runway, every dollar cut from your infrastructure is more time you can spend working on a product.
It's also important to note that those efficiencies don't go away on their own. It's more profit/runway until you retire that code.
Opportunity costs exist, but so does spending your money wisely.
You can't look at it in dollars, you have to look at it in terms of what percentage of cost IT/cloud infra are vs the rest of the costs of the company. If it's not that much then from an economic point of view just scaling out may be the best choice.
I bet there is a non-insignificant amount of existing endpoints out there that could easily be improved a large percent if someone actually took the time to profile it. Just from the shifting codebase, build up of complexity, “legacy code”, etc. that no one is probably looking at.
If there was, for example, easy to use developer tooling that automatically identified bottlenecks, the opportunity cost scale could tip towards optimization vs having to pay for and manage more servers.
But the opportunity cost insight is great; optimization only wins if the cost to optimize is low and the amount that gets optimized is high.
The point is more that the time involved to set up the scalable architectures in the first place - load balancer, app cluster / containerization, deployment automation, database replication etc. - might be better spent just doing some simple optimizations on a single VM first. A lot of people jump straight to big distributed architectures with 1/1000th the traffic to justify them.
From a data analytics perspective, it is surprising to me if I don't find multiple opportunities for order of magnitude improvements in data processing infrastructure at my clients.
At one client where I was doing more IC dev work, it got to the point where they'd just chuck reports over the wall at me and ask me to find efficiencies in these. This was supporting >15K internal users in a massive shared reporting and BI infrastructure, so it was well worth it to have someone spending 10s of hours on optimization on a regular basis. I could routinely find order of magnitude improvements in reporting queries that were already running in production. If bringing things from POC/end user mockup to production, I could frequently find two orders of magnitude of improvement in query time, which reflected directly in latency to report rendering for end users.
That client was not special.
I wouldn't call this sort of work casual, and it's a very different domain than optimizing serving web apps. I would spend days to weeks on such optimization tasks at the client I mentioned above. It might take a few weeks to do serious optimization work on large ETL pipelines, but I routinely find big gains in performance on such projects - usually in the range of 1/2-1/10 run time compared to baseline.
They're kinda glossing over the difficulty in removing 99% of your runtime cost. Unless you've done an insanely bad job of that first draft, you won't be able to do it. Finding a 10% improvement in a system that has enjoyed even the slightest design work will be a challenge.
Experience has taught me no longer to be surprised when finding code that determines the number of items in a list by making a SELECT * call to the database and calling .size() on the resulting list and doing it inside a doubly nested loop. Sometimes finding improvements to the order of magnitude is less difficult than it ought to be.
But in my experience, that's usually only affecting a small part of a system. If you could save 90% by such optimizations, you're at a very early stage of your application. Otherwise, this would've been spotted earlier. But in a larger system, these things can be around for some time but it's unlikely that fixing them brings an overall performance boost of 90%. More like 3% here, 5% there and so on.
Edit: And I agree: I wouldn't be surprised to find a SELECT * and a call to size() in a moderately sized system.
Ten years working on a code base that serves near-StackOverflow levels of traffic and 5% would be a _huge_ win. I don't come across those, or even 3%, very often.
Our app has dozens of routes all seeing hundreds to thousands of requests a minute. To get a performance boost that big it has to be in some foundational code that's used nearly everywhere, and that code's already been poured over every way from Sunday.
Occasionally we'll run into a bit of code that sneakily becomes a significant drag on the system as traffic through that code grows slowly over time. A fix might result in a large % load drop, but only because there's some pathological problem, which I hesitate to call an "optimization" rather than a "bug fix".
We did also uncover many significant optimizations after migrating from dedicated hosted bare metal boxes to the cloud, when our network latency assumptions got thrown out the window - but the bulk of those optimizations were simply "cache it".
A lot of the improvements I've seen in systems have come from adding indexes. Sometimes, they're just missed over time. "Oh, we never thought that this database would be searched for this way".
Other times, it's batch calling the DB ahead of time, rather than calling it for each entry in an array.
You have 10,000 customers, and are getting 1000 a month, and each customer starts accumulating data. Your customers are starting to notice your app is slow, and your boss asks you to work out what it'll take to fix the problem, but the problem is you just don't scale.
You get a pretty clear idea that by the time you hit 50k customers, you're going to need 10x as many servers, and by 100k, you'll need 25 times as many (because your costs per customer keep going up, and lots of things scale logarithmicly). Either you're smart enough to know your boss will laugh in your face, or you're not and they do.
So I guess it's time to start optimizing. You cut some n^2 behavior down to nlogn with a smaller C, and you make your servers 4 times faster, but much more importantly you decrease your run rate. Now you only need 2x as many machines for 5x the customers, instead of 10x. That's 80%, but you're getting diminishing returns. Eventually you have to be 99% better because you have a million users and you couldn't possibly return a profit if you needed that much hardware.
“Stupid” is relative. Often, when designing a system, I did not have enough information about how it will actually be used (and in some cases - neither did anyone else; when you create a market all you have is guesses). Things I was sure weren’t important turned out to be important, and vice versa.
E.g. in one of the systems, prior business analysis showed that legally valid “chain of custody” would be an issue, which dictated very specific (and often costly, performance wise) decisions.
And then after the 3rd customer deployment, it turned out that they don’t care - they’d rather pay less for everything, and lose every 20th claim. And after the 20th customer it was proved beyond the shadow of a doubt.
It was stupid to believe what customers said ahead of time, yes. But I don’t think that’s the kind of stupid you are
Referring to.
Maybe. Building things to scale horizontally tends to have a lot of inherent overhead: a lot of fine locking, serialization, consensus, messaging, intermediate representations and storage, etc. Sure it increases the top-end of where you can go, but it can be a huge penalty.
It's not unusual to do an insanely bad job of that first draft. Arguably, it means that you did a good job for the primary task of a first draft, which in software is usually figuring out what you should do and whether you should do it without paying too much attention to how.
There's perhaps a mistake in shipping that first draft, but usually that lies with management.
There is no mistake in shipping the first draft - “no business plan survives first contact with the customer unchanged”. If you don’t ship it, you only learn some of the technical issues but not the customer facing ones.
The mistake is often keeping it live past the time it is clear what the mistakes you made are and that they will kill you in the long term.
> Consider a fast single-file database like SQLite
Sure, it's nice if your application is optimized enough that it could run on a single server. But it seems to me that actually tying it to a single server, with local storage on that same server, in production, is irresponsible. I sure wouldn't want to have to explain why the application went down, and will come back running a possibly out-of-date DB backup, if that single server suddenly disappears. SQLite may be faster and simpler, but to sleep well at night, there's no substitute for stateless application servers in front of a managed database.
Why though? SQLite is rock solid, has write ahead logs, transactions etc. You can presumably do backup at the file system layer via RAID/ZFS/whatever.
The main issue with SQLite is it's insanely loose type checking. Column types are completely ignored, as are foreign key constraints by default. Not a good way to build a robust system. But if your schema is pretty simple and you don't anticipate much data I don't see a problem with using SQLite.
I strongly disagree with this in practice. I understand the philosophical motivations to encourage deeper thought regarding the engineering, but we have been using SQLite as our only datastore for almost 5 years now without a single DB-related incident. We probably dont even spend 30 minutes per month screwing with SQL/database-related concerns. I will say that our strategy probably would not work at a larger technology organization. It is more advanced and requires disciplined, motivated engineers to keep it on rails. In retrospect, we were probably only able to get away with this because we were willing to accept a ridiculous amount of up-front risk. We got lucky that it played out so well. Now that we are on the other side of that journey, I can assure everyone that the new world is incredible (and much more stable).
To manage scalability and maintenance, we run multiple SQLite databases - one per logical type of persistent business entity (i.e. Users.db, Sessions.db, Customers.db, etc). This allows for us to manage schema versioning for each type independent of any others. We have ~25 types that each get their own separate DB. Our migrator is a simple for-loop, but somehow our approach seems even more elegant than Entity Framework because we don't need special unicorn tables to track migration metadata - see: pragma user_version. The part that requires discipline is that we have no hard referential integrity constraints. This is where developers have to make the right choices when designing related entities & data stores. We do not rely on the database/ORM to clean up our modeling for us.
Our backup strategy is to snapshot the entire VM. The biggest motivation for having your application fit on a single box is that you can synchronously snapshot the whole system with a single click. This is far simpler than maintaining a completely separate SQL server instance and worrying about all of the added complexity of backing up 2 (or more) machines. We have yet to encounter a customer who did not have the ability and willingness to use this strategy. If your business application can run on 1 server (and is forecast to do so forever) and you have a RTO/RPO that permits using VM snapshots as backup/restore, then I would strongly recommend considering this type of approach from an engineering perspective (assuming you have the team/skills for it).
Thinking more broadly, since we have committed to this idea of the datastore living on the application server, we could hypothetically build up clustering at the application-level by adding multiple application nodes. This would probably be better for us anyways, because we really only have 2-3 entities that we absolutely must have synchronously replicated across all nodes. A heavy-handed SQL Server cluster approach is way overkill when we can just swap to GUID keys on our sessions/state and pull consensus to update important settings, transactions or permissions.
And at the broadest scope, I still feel like most developers vastly underestimate just how fast a computer can do things in a well-optimized domain. SQLite is substantially faster than SQL Server in the single-node case. You will never get lower latency than by having your database running in the same process as your business application. Latency is the biggest devil when it comes to dealing with transaction throughput. If I can get a user request out the door in 100 microseconds vs 5 milliseconds, it makes a shitload of difference when I am pushing thousands of these per second.
Yes, that's true but beware of the fact it only works this way if your code is 0%-serial, see https://en.wikipedia.org/wiki/Amdahl%27s_law for the math. By the way, this law is why supercomputers like Fugaku https://www.youtube.com/watch?v=WVsFFojdq3c simply have no other way than optimizing their code. 0.1% serial code means that a code running on 160000 CPUs will be just 1000x faster than the code running on a single CPU.
Edit: bottom line is that the conclusions of the article are dangerous. Reducing the controller method exec time by 10x is not likely to allow your system to process 10x RPS by itself, the cost and product backlog delay caused by a perf optimisation project may not be acceptable etc. But beware that slow code in the critical path of your distributed/parallel system such as kernel code, event loop code, transaction commit code, distributed consensus protocol implementation can make your system unscalable beyond N factor (whether it is cores or servers). You can surely add 100xN nodes but it won't help.
This is already a gross misunderstanding of Amdahl's Law and what it means pragmatically, but you have it things completely reversed.
Optimizing something to run 100x faster avoids needing more servers, thus avoiding the synchronization from splitting things up to them.
That synchronization by the way is not actually done by the servers, it would be done by dedicated networking hardware. Even on a single computer, for web server programs the synchronization is not really happening in the user space programs, it is happening in the in kernel for the IO to and from disk and to/from the network.
We are actually talking about orthogonal aspects. You are right that being able to run the system on a smaller number of nodes reduces inefficiencies and reduces the "hit" of diminishing returns stemming from Amdahl's law. However, to do that, you need to speed up your code and I argue that unless you speed up the non-parallelizable part together with the parallelizable part, you won't be able to achieve that system speedup that allows you to remove a number of servers from your cluster in the first place.
However, I think you are right that on small number of servers speeding up parallel part can have a more immediate effect. I did a small calculation just to check the numbers. Assume you have a server that can do 100 RPS. You can either speed up the parallel code 2x (let's be realistic here and stop this 100x nonsense) and bring the RPS almost to 200 or you can take that serial code that is just a fraction and reduce it by half, so that f=0.995 and not 0.99 (and adding a marginal RPS increase to 102 RPS).
I think you are building on misunderstanding with assumptions that aren't backed up by anything.
> non-parallelizable part
Which part is that? Synchronization does not have to be expensive. The original paper had a theoretical 'serial' part of a program, but synchronization is different in that it can be very fast and doesn't tie up other resources. For a web server the synchronization is in IO, which is being handled by queues. It doesn't stall cores, processes or threads can put their data in a queue and another core can handle it.
> You can either speed up the parallel code 2x (let's be realistic here and stop this 100x nonsense)
Far from it. If python is translated to C++ directly you would already have a massive speedup (I would guess at least 20x, maybe more). If memory allocations are minimized you get another huge (7x) speedup on top of that. If larger chunks are dealt with at one time you get another huge speedup. I think 100x would be common.
Amdahl's Law is really about diminishing returns when there are significant places that need to be serial. Any emulator can be a good example. There aren't nearly as many scenarios that have to be serial as most people think.
That's not how it works, Amdahl's law is about the execution speedup you get by splitting a task across multiple cores, it is not about how many tasks you can run simultaneously on those cores, usually when we're talking about horizontal scaling what is meant is multiple servers executing tasks independently, not parallelizing the execution of individual tasks, which is a much harder problem with the issues you describe.
So what you are saying is that because Fugaku supercomputer has 158,976 nodes, it's "horizontally distributed" and Amdahl's law does not apply to it but only to the 48 cores on each node?
A task that we have (if we want to formalize it for analysis via Amdahl's law) is to process 1MM requests on 1000 web servers and a single SQL server behind them. What we are after is how much faster can this setup process 1MM requests compared to a setup with 1 web server and 1 SQL server (1 SQL server is used here to exemplify the sequential part of the request handling). Indirectly from this calculation you can derive the increase in overall system RPS after you run a benchmark actually pushing 1MM requests through 1000 servers vs 1 server.
If you have a line of code `sqlUpdateTransaction.commitBlockingWait()`, and it takes 1% of your request handler execution time, then even if you add 1000 web servers but have only 1 SQL server machine behind them (a little bit oversimplifying here as contention is likely to degrade performance further, assume we are able to scale the SQL vertically just so that the call above always takes 1% of the request handler time for the sake of simplicity), your system will not process 1MM requests more than 91x faster than a setup with a single web server and a single SQL server. See https://www.wolframalpha.com/input/?i=1%2F%28%281-0.99%29%2B.... And no amount of green threads will allow you to outsmart the Amdahl's law. The only thing you can do is to return 200 OK before the transaction is committed, which would bring sequential part of the handler code from 1% to (nearly) 0% (and arguably be an unacceptable method of performance optimisation).
> And no amount of green threads will allow you to outsmart the Amdahl's law. The only thing you can do is to return 200 OK before the transaction is committed,
Ignoring all the wild assumptions and made up numbers here, this again has nothing to do with Amdahl's law because the "serial" part is unnecessary synchronization and waiting, not unavoidable serial computation.
If you have a thousand web servers, having all of them wait to sync would be the exact thing anyone spending millions on servers would work to avoid. To start at the obvious bare basics, if only one computer can sync at one time, you might as well just use one computer.
Computers are queues and buffers all the way down and all the way up to the network level. A single computer or core merging data is not Amdahl's law. There is no reason computation resources have to wait on some other resource synchronizing, and this is not serial computation, which is what Amdahl's law is about.
The comments in this overall thread are pretty much echoing the types of discussions and lessons one learns in System Operations. I wonder if part of the challenge is that so many companies have decided to ditch "systems" people for developers, assuming that software developers have the same knowledge, interests, experiences, and perspectives as systems folks, and that systems folks are no longer needed.
The question isn't "optimize my code" or "add more servers", it's both, and determining when it's appropriate to do one or the other at a given point in time based on current load and expected future trends. But generally, "optimize my code" can be pretty low hanging fruit in complex systems, because if you lack systems people, you end up with everyone focused only on their specific component and nothing else.
Optimizing execution and designing for horizontal scalability are both cases of premature optimization. Scaling vertically via hardware as much as possible is IMO the most developer-time saving way. Once that's exhausted you'll have to decide which one is easier.
The biggest mistake I see is people spending a lot of time designing for horizontal scalability and then running a cluster of under-powered servers when a single powerful machine would do just fine.
There are very few things that are easier in a cluster (uptime perhaps?) and many that are easier on a single machine. Once you add the requirement of having to work in a cluster, every new feature will have to respect that. If your system is doing anything interesting you'll be hitting all the distributed computing dilemmas pretty fast.
I feel compelled to bring this article up, especially since last time it was brought up on HN people seemed sure in the comments that it was not a common sentiment.
I agree with the gist of this article, but I think if you are not a FAANG as the article suggests, then you probably also are not at a scale where your carbon impact is actually measurable...
> I agree with the gist of this article, but I think if you are not a FAANG as the article suggests, then you probably also are not at a scale where your carbon impact is actually measurable...
It's not hard to have a startup that does use 100kW of servers and maybe about 150kW total-- HVAC, networking, UPS inefficiencies, etc.
There's about 450 grams of CO2 emissions per kilowatt-hour from typical electrical generation. Therefore this is about 600 tonnes of CO2-- not counting other lifecycle costs.
On the one hand, this is a tiny smidgen of overall CO2 emissions. On the other hand, this is about 80 households
worth of CO2 emissions.
As a developer, making this kind of thing 1% more efficient is the same magnitude as completely eliminating your home carbon footprint, and is a whole hell of a lot more plausible to do.
I know offsets are "cheap" at $10 per tonne or whatever as the article says. On the other hand, please don't really assume that buying $6k of offsets really does as much good as eliminating 600 tonnes of emissions for realsies.
A startup that cares at all about energy efficiency of compute will use a public cloud service. They don't have inefficient (or any) UPS, some of them don't even have traditional HVAC, their power overheads are only about 10% compared to 100-300% for standard corporate datacenters, and whatever fixed overhead they suffer is amortized over all their customers.
While public clouds do a great job at optimized hardware designs, they do an abysmal job of keeping all the CPU's and disks utilised.
A typical CPU in a cloud datacenter is maybe only 10% utilised. That's mostly due to unsold capacity and poor bin packing on many levels (user has an oversized kube cluster, the cloud provider has spare machines, etc.). Many cloud machines just sit idle for years on end because someone has forgotten about them. In many cases, those idle resources are not used for lower priority jobs either, since they are being paid for by a customer.
Not sure I can entirely agree. If you use for example a cloud machine service for your application (e.g. GCE) but you refer to a hosted database (Cloud Bigtable) then you are exploiting a service that is packed into the rest of Google's junk with very high utilization.
You could also skip the machine reservation and get better utilization by using GAE or similar.
> packed into the rest of Google's junk with very high utilization.
Maybe packed in with other public cloud things? I thought the partitioning between public and private resources was done at a pretty coarse level, but might misremember.
IMO more relevant is that Google public cloud resources are carbon neutral, at least wrt electricity. (Through buying green power and emissions credits, to the degree you believe the prices and the models I guess.)
The proper comparison isn't to how much utilization they could get if they were theoretically perfect. You need to compare to the realistic alternatives. If you are running your own servers, you are probably utilizing an even lower percentage most of the time.
Doesn’t matter that clouds are efficient. By design they have to be massively over-provisioned so they promote waste more than a startup that carefully allocates compute and lives within its bounds.
Bursting to 100k servers for 24 hours rather than properly engineering the code is a trade-off that frequently gets suggested now that the cloud is an option.
Yeah. And even ignoring over-provisioning, I'm pretty sure people would think twice about spinning up as many servers as they currently do if they were forced look at all the servers on a daily basis in front of them. The distance the fact that you don't actually coming across the hardware makes people feel a lot less guilty about running more servers.
A quick internet research reveals that the social cost per emitted ton of CO2 is most certainly much higher than $10.
Estimates are around $140-$180 with many being much higher.
I'm not sure this adds up. If it is a trivial change, why not do it? 100Kg is 100Kg. That's like seeing a dollar bill sitting on the ground and not picking it up. No effort. But it is something everyone can do as well.
As a parallel let's look at diets[0] (independent gives higher numbers[1]). Lots of people reduce their meat intake to reduce their personal carbon footprint. But the average diet is about 2000 kgCO2eq/yr (3000 via Independent). A vegetarian diet is about 1200kg and a vegan is around 250kg (potentially dubious claim but it is for sure less than a mean and less than a vegetarian diet). 10 websites changing to local fonts is like a single person changing to a vegetarian diet.
The logic here is that you could $20-$30/yr and completely eliminate your dietary footprint, which is much easier than changing diets. So are what these people doing useless? No. It helps because the aggregate. It helps because it makes people conscious of the problem and how their choices affect the world. But yeah, big corporations should be doing more, but without them doing something that doesn't mean you should give up and do nothing.
> Plugging the numbers in the Internet then emits 180 billion kilograms of CO2. So the author, by switching to local fonts, reduced the carbon emissions of the Internet by 0.000000056%.
There ~7.8 billion people on the planet.
If you vote (let's say everyone votes for the sake of the argument), you're increasing the percentage of people voting by ~0.0000000128%.
This change is just as "meaningless" as that one, so... don't go vote?
There is no global election. But even if there was, the trade-off is do you do this stupid thing that gets a tiny vote or do you do something that’s 1000x more effective.
At some point you need to stop focusing on stupid shit when planting a tree does far more.
OP here, thanks for the link - interesting to see this stuff translated into real kilowatt terms. I haven't followed the discussions much around server carbon footprints but agreed, for a small app you're not gonna save the earth by shaving 2 watts a day down to 0.2 watts or whatever. There's definitely some scale at which optimization makes a meaningful dent, would be interesting to study that more.
Well, if it's easy to cut down a bit without incurring other costs that might have still other CO2 implications, then why not? If most people make the effort, then the aggregate will add up to more than a FAANG, so a social push for it isn't a waste.
Sure, don't expect to change the world on it alone. But if every industry takes on an awareness of this sort of thing, and each niche within an industry discovers some way to shave a fraction of % off it's emissions, it will all add up significantly.
Doing this kinds of thing is always going to be a game of inches, not miles. We can't say "It won't even get us a mile, why should we bother?" Be realistic about it by all means, but take the easy inches when they present themselve.
Our product isn't at that scale, but because of the way our hosting environment is set up we have typically 8 machines at 3% usage over a week... Some of this is pushing back against the desire by management or others in your group to seem larger than really necessary... I guess I'm saying part of pushing back against waste is taking care of both sides...
I know if I tried to get our people to shut down half of our current infrastructure people (who have no real knowledge of any actual load numbers) would never allow that...
So much political nonsense... It's been a long week :)
As the link argues, you have to look at the utility (value) of the website. When a phone call or checking the website stops me driving across town, that's 100 WH - half a gallon of gas + wear and tear.
If a review stops me from buying the wrong product, what's the footprint of that?
It is very easily measurable. In fact many rack UPS units (such as 2.2KW APC ones I use in my garage) even compute the carbon fooprint for you in their management web UI in pounds of CO2. The numbers can be surprisingly high, actually, even if you just have a few servers.
I think stacks that run on Java / .NET provide the best value in terms of productivity and costs when chosen to develop large web apps.
Sure, Django might be easier to start off with, but if you want to optimize your app, there is only so much you can do. However, the non-optimized performance of an ASP.NET core web app (developed as per Microsoft recommendations) is orders of magnitude faster than any Django App, without any explicit optimizations.
It also helps if you can split your app into an API and a client app, as APIs consume a lot less resources than traditional web apps.
It makes sense, going forward in a cloud dominant world, that the traditional web app development process, where HTML is passed around over the wire, a new set of HTML for every page, has to die.
Its imperative to replace it with modern stacks, with the client separated from the API.
It doesn't take a lot of consideration to arrive at the conclusion "modern = good", but I assure you, it's not imperative nor plainly better to make everything into a bundled js webapp.
The boilerplate code for basic web apps often dwarfs the actual content of a basic web page. You could also make the argument that many things designed as web apps have no business running in a browser and should be written as standalone applications, but what's "best" in some sense isn't always the most pragmatic thing.
> It also helps if you can split your app into an API and a client app, as APIs consume a lot less resources than traditional web apps.
We shouldn't focus so much on what's best for us (at least in operating costs) that we sacrifice what's best for the end-user. In many cases, a conventional server-rendered web application is what's best for the end-user. If we develop one of those by combining an API backend with a separate server-rendered application consuming it, we've likely added extraneous complexity and resource consumption to get back to what we used to have in an integrated server-rendered web app.
Also, as DHH has written [1] [2], an integrated system, a.k.a. monolith, is best for programmer productivity and especially for getting the most out of a small team of generalists. I want to use a framework whose authors and community are opinionated about this. It seems to me that the Phoenix framework and the broader Elixir community fall into this category, while still delivering pretty good performance. I've also taken a hard look at ASP.NET Core, but I don't think it's as focused on the integrated approach. So, after being undecided for too long, I'm using Phoenix for my new project.
Just making one server run well won't do much to your resume. Creating some distributed micro service buzz word bingo monstrosity on AWS on the other hand looks impressive.
I'm not sure about that. I've got a pretty good track record in interviews where I get right into the technical nitty gritty of reducing operating costs by 90%.
You're splitting hairs. Yes, technically knowledgeable and capable people are still in demand but on average the trend that the OP is mentioning is quite true. And due to human limitations one cannot go both in depth and breadth. And with the proliferation of technologies and the rapid pace of cycles and recycles, the ones who keep up with the trend are more likely to be better rewarded.
> Just making one server run well won't do much to your resume.
That's pretty much true, because doing that and that only will also mostly get you hired at a place which has a single server (like say "use SQLite" instead of RDS).
On the other hand, if you can justify what you did to make a single request better (as in, "what I did meant that a single request came back in 40ms instead of 120", like that mixed API example - like if you bypassed the standard 1+N problems), then I'd be very keen to pick out what that person did.
A lot of work involved for the latter is also the same for the first, but it is a difference in attitude towards user relevant functionality which I've noticed in the best people I've worked with.
That is not to say don't use SQLite, but more along the lines of this is a simplicity hack, I can make a complex working system out of this simple system, because I used the right abstractions for that bit (like say, JDBC or PDO - maybe not a full on ORM).
> It may sound obvious, but - optimizing your app to fulfill a request in 1/10 the time is like adding 9 servers to a cluster. Optimizing to 1/100 the time (reducing requests from say 1.5 sec to 15ms) is like adding 99 servers.
This cannot be correct. If you optimize 1/10, and then optimize to 1/10 again, that's the same as optimizing to 1/100. Does this "add" 18 servers or 99?
If you optimize to 1/10, then your cluster can handle 10x the load, which is the same effect you would get by simply having 10x the number of servers (assuming your app is perfectly parallelizable).
The article's quote is only correct if the "cluster" starts as just a single machine.
If you keep track of how many "virtual servers" your cluster has, and take into account the fact that the text obviously implies "per server" (let's not play dumb simply to be able to spit on the article), the math works correctly:
- I have x servers. I optimize to 1/10th of the time. I still have x servers but now have (1+9)x = 10x virtual servers.
- Now, I optimize again to 1/10th of the time. I have 10x virtual servers, so (1+9)(10x) = (1+99)x = 100x virtual servers.
A more meaningful comparison is if you can optimize your code so that it runs 10x faster that means it will also run just as fast as it is now on something 10x slower.
That's like going from needing a small server to something you could wear as a watch. (So to speak.)
> A lot of tech discussion these days is focused around scaling web app infrastructures to handle huge traffic.
It most cases this the wrong way of thinking. It isn't that its bad or wrong, but rather that it's dated and expensive. This is the 2000 era .bomb logic of get big fast and data is king. It's like thinking in terms of Facebook instead of BitTorrent. At this point if you are not already established or working on something extremely original you have probably missed your shot and crowded out from the incumbents.
Current programming paradigms indicate two paths forward of which one is substantially less expensive than the other: distribution and concurrency.
In the distribution model most of your costs are up front in the application. This is a service oriented approach but without a central service, rather a pool of nodes that intelligently communicate task, file, and event queues. The cost to scale is divested from the cost of application, which is the biggest difference between this approach and thinking in terms of huge traffic or server/data hoarding. Since the application is divested from cost to scale all that is required to compete with the large incumbents, at scale, is market penetration. Marketing is cheaper than a data center.
The concurrency model requires a central service but does not operate as a central server. Each connection/session is a parallel child processing unit, such as an event loop. This approach requires less costs up front since there is still a central connection point to manage, but there is still a cost to scale even though much of that cost is offloaded from data management to increased processing overhead.
If you optimize after the fact, you probably already made bad design decisions earlier. Those usually cannot be corrected by another optimization pass over the code.
It is much more effective and efficient to have a good plan before you start, and make sure not to write inefficient code in the beginning.
It is also much more effective to take the lessons you learned from the first round, throw the code from that round away, and write a new version. However that is only an option if you modularized your project well, so that small modules are replacable without endangering the rest of the project. Writing the new version will go much faster because you now understand the problem better, and it will produce better code because you get rid of questionable design decisions that hold you back in the long term.
Think about what your legacy in this world will be.
Leave the world with better software than you found it with!
You have probably never worked in game dev where there is an art to optimizing to get every last fps out. I feel a lot web apps could use a good optimization pass because there is no reason that so many sites and web apps are so laggy/janky while doing incredibly simple tasks compared to games runinng on the same system.
I didn't say optimizing code doesn't work or does not yield speed-ups.
In fact I am optimizing my own code, too.
But I only get ~10% speedup, tops. Because perf is decided at the design level, in the architecture, data structures and algorithms. In fact, if you get massive speedups in your optimization phase, I would view that as a sign that your process has issues. There should not be that much potential left after you wrote the code down.
Elite fit on one floppy disk. Do you think they got there by taking a slow implementation and optimizing it?
Have you ever used Turbo Pascal? You think you could get that kind of speed by taking an existing pascal compiler and optimizing it a bit?
Ahhh turbo pascal background, being exposed to the excellent design of that language and libraries helps to build an understanding of creating runtime efficient code! Anders is a guy who really got runtime performance combined with developer efficiency through good architecture.
I agree, I've found these are often a good strategies.
As a subtext I think your general perspective is as an experienced developer and not all developers have the experience to write runtime efficient code or components with clean modularized interfaces, so it won't work for everyone.
I feel like AWS and the massive amounts of credit they give out just for asking has only made this worse.
Don't get me wrong - I have been a beneficiary of their startup programs, and it's amazing for companies not to worry about compute for a year or two (you're never really using 100K anyway).
That said, I have seen multiple companies get into the hole of not worrying about optimization (at all - who needs to when you have a quarter of a million in credits), and locking themselves into a massively complex microarchitecture by the time that first real bill rolls around. AWS knows what they're doing, do your best to work them rather than the other way around.
Step one if you haven’t already would be adding caching. Something like varnish in front for, if nothing else guest traffic, and memcached or redis for commonly fetched data.
For a lot of cases varnish (or a different reverse-proxy cache) will shave off 50% of your traffic from ever hitting the servers. It depends on your percentage of guests/sessioned users and and read/writes though.
After that using a cache like redis or memcached for things like user sessions/permissions and commonly fetched data significantly reduce the load on your database.
I don't really understand why people _want_ to deal with all the nitty gritty stuff when it's so cheap and easy to have everything work perfectly and be maintained out of the box.
I was under the impression a Web App runs on my browser, so it's more like edge computing. :p
To be fair, a 100x optimization is not something easy, and if it was, I would probably blame the development team.
Your Web App is a whole lot of code and components in series. An 100x optimization would mean that every single part in there can get 100 times faster, or there is a component that takes maybe 95% of the time and this can be optimized 200x, and the rest of the stack takes 5% of the time and can be optimized 10x.
I do believe that often it's the other way around, instead of a very fast first iteration, people go and build the distributed version that can scale 100x but takes the performance hit that all distributed systems have for network, syncing, etc.
Most of us are using off-the-shelf components, so the space for optimization in our code isn't that big —unless you are a novice developer but compilers go a long way in our time. Thus it may come to choosing the correct off-the-shelf components, like maybe SQLite instead a NoSQL external datastore.
I’d argue that in many cases a 100x optimization is much easier than a 10% speedup. Very often the 100x speedup isn’t because you’re now doing something particularly clever, but because you were doing something stupid before. It’s very easy to do something horribly inefficient if important parts are hidden behind some abstractions. An ORM is maybe the best example of this, if you’re not careful you can easily generate queries with it that are several orders of magnitude slower than the straightforward and efficient version.
Optimizing parts that are already reasonably efficient is much harder, but unless you spent a lot of time optimizing already there are likely plenty of low hanging fruit.
Many apps/things can run with 100x overhead and nobody should care.
Look at your salary, add ~50% for corporate overhead and compare this to the savings you'd generate using market pricing of cloud resources (even if you don't use cloud). Can you beat this with, say, a 3 years discounting plan for the savings? In a lot of cases the answer actually is "clearly not".
Sometimes you can eek out very sizeable gains, but always do the capitalist homework first (and the engineering homework of profiling second) before trying any fixes.
Also, if it's performance that you need look into "stupid" solutions first. Can you move the DB to flash with just a few config options? Can you mlock() it in RAM? You'd be surprised how many things fit in RAM, especially if your DB is a managed distributed solution, and still be very cheap compared to engineering salaries.
I get that there are many out there that have fully bought into the idea that developer productivity is more valuable than efficiency. But I think it is many times a false and very misleading tradeoff, or at least they got their cost function drastically wrong.
1) There are plenty of fast languages out there that are suitable for high productivity environments. Go, Kotlin, Scala, F#, C#, even Java, are all great languages for developing web apps and services. And they'll all beat the pants off of your PHP, Ruby, or Python framework of choice.
2) Many of the frameworks that get you up and running extremely quickly really only save you a day or two at the very beginning before they level out and are more comparable with other frameworks. Sure, `rails new myapp` and your first few migrations will save you hours over something like Play Framework, but after that, you don't really have any more jawdropping productivity tools left to use. I once had a conversation with a die-hard Rails enthusiast tell me about how he decided he would never go back after using his first ActiveRecord migration, and realizing how much code was generated for him. I asked him how much time he saved, and he said, with pride, "LIKE 4 HOURS!". Meanwhile, he probably spent probably 2-3 hours every day writing tests and debugging errors that C# or Scala would never even let compile in the first place.
3) It's not too long before you have to start thinking about performance anyway because your high latencies are a terrible user experience. This goes for client-side software as well. My 8 year old doesn't know shit about software, but he knows that he absolutely hates Microsoft Teams and would much rather his school use Zoom...because the bloat and terrible performance are a massive disappointment in user experience, especially on cheap school-provided laptops.
4) People really undervalue the quality of life and productivity improvement that comes from managing one server versus ten. Even with the cloud, which supposedly takes care of those headaches, you still have to worry about load balancing, health monitoring, auto scaling, complex caching, etc. And even if you could autoscale your web server worry-free, you're soon gonna have so many connections to your database open that you're gonna have to worry about setting up connection pooling, read replicas and logical replication, and configure all your servers to partition their queries into reads vs writes, etc. And 95% of stack overflow answers that you google when you hit a snag are now irrelevant to you because they don't take into account the complexity of your environment.
Performance is absolutely an improvement in productivity and quality of life. It's one that pays dividends every single day, as opposed to a handful of days at the very beginning of a project.
For a fault tolerant system, even if all your traffic could be handled by a single machine, you'd still want some amount of redundancy. So you'd at least have a second node.
>Setting up FAANG-level infrastructure won’t make your company into a FAANG, any more than a cargo cult can summon supply-bearing planes by building fake runways and wooden ATC towers.
This sentence is strange. It uses the term cargo cult as part of an analogy to cargo culting. Wtf?
Anyone who knows what “cargo cult” means doesn’t need the rest of the comparison. If they don’t, then the comparison is senseless.
I think it's a little clever actually. But maybe only because I knew "cargo cult" as typically used here in HN (blindly copy). Reading this made me look into the origination.
For lurkers, this is an excerpt from the wiki page [1]
"A cargo cult is a millenarian belief system in which adherents practice rituals which they believe will cause a more technologically advanced society to deliver goods."
Ironically, if the islanders had built good runways instead of mockups, they would get more airplane arrivals. :)
In Indonesia today, local tribes have a habit of making the jungle and hill runways for aid too short, soft and tree-ringed, resulting in perpetual accidents affecting the long-term viability of cargo and passenger flights.
(Their huts are often made of wrecked airplane wings and fuselages.)
In WW2, the Chinese government charged the US a fortune in gold to build the Chengdu B-29 runway using 70,000 laborers with hand tools, which had to be about twice as wide and twice as long as any previous runways. The Chinese runways turned out to be too far from Japan, so they were largely abandoned after nearer islands were captured.
Period Photo of Chengdu Runway Construction Using Hand Tools
Servers don't scale linearly. It's more likely you'll need, at a minimum, 107-110x as many servers. Between 100 + log(100) and 100 + sqrt(100). So making your code 100x faster saves you more than 100 servers.
I think the arguments never end because it's all fuzzy math. Get too attached to running the whole thing on one server, and your architecture starts making irreversible decisions about where the source of truth is, and you get locked into a single server.
Give up too early, and you spend a lot of energy herding cattle instead of building features. And speed is something you outsource to the guys who write the checks. That's fine if you lock in your vertical, but I've worked on projects that lost out to a more responsive or financially efficient competitor. It really, really sucks.