Servers don't scale linearly. It's more likely you'll need, at a minimum, 107-110x as many servers. Between 100 + log(100) and 100 + sqrt(100). So making your code 100x faster saves you more than 100 servers.
I think the arguments never end because it's all fuzzy math. Get too attached to running the whole thing on one server, and your architecture starts making irreversible decisions about where the source of truth is, and you get locked into a single server.
Give up too early, and you spend a lot of energy herding cattle instead of building features. And speed is something you outsource to the guys who write the checks. That's fine if you lock in your vertical, but I've worked on projects that lost out to a more responsive or financially efficient competitor. It really, really sucks.
Yeah, but "making your code 100x faster" might cost significantly more than just adding 100x servers.
As with most things, the only absolute rule is "it depends".
We need a commitment from all the (supposedly) environmentally conscious SV companies to pick optimization if given a choice between making the software faster and just throwing more computers at it. That would be relevant and meaningful, not just the usual "our office AC isn't set as cold as possible, so we are green"
As a policy, though, it makes sense - tax the environmental externalities appropriately to reflect the actual shared cost. If it’s still economic to add the servers once those costs are factored in, then you wouldn’t have any grounds to complain.
Just design the tax regime so that it is high usage users who are paying the high costs, and exclude any usage levels that would impact homes, small businesses, and "normal" acceptable uses.
I agree that tax policy is more complex than just "tax the shit out of it", but everytime a tax is proposed, there comes a group of people who argue that taxation is so complex that we can't possibly grok it. That is clearly not the case.
I guess only the very radical greens would want to stomach that backlash...
My comment was aimed more at the hypocrisy of claiming to be green and then skipping the obvious savings in optimizations.
Moreover, there are other considerations. If you're spending too much time on optimization, the company may not survive... leading all the work done to potentially go to waste. What about the environmental impact of waste there?
Cost? No. But it certainly requires more talent.
Unfortunately, programming talent doesn't scale with cost at all. In that it's like a creative endeavor: you can hire 10000 "professional writers" if you want, but that won't get you another Shakespeare.
Which is why I find the "software engineer" moniker silly. Software isn't at all like engineering.
> When I first came up with the term, no one had heard of it before, at least in our world. It was an ongoing joke for a long time. They liked to kid me about my radical ideas. It was a memorable day when one of the most respected hardware gurus explained to everyone in a meeting that he agreed with me that the process of building software should also be considered an engineering discipline, just like with hardware. Not because of his acceptance of the new 'term' per se, but because we had earned his and the acceptance of the others in the room as being in an engineering field in its own right.
There's a considerable debate about when and where this term was "invented", apparently with some oral histories placing it even into (late?) 1950s.
As for grandparent's comment, I think it was referring to the fact that the nature of the job is completely different. For example a large portion of a traditional engineer's job deals with fighting the natural world. A bolt may rust away and stop doing what it was doing until then. A source code line in the form of
That's a forced metaphor as well, not actually an identical process. If "requirements change", that's not the same thing as a bolt rusting away. Surely the requirements for the rusting bolt didn't change, or the requirements for its "dependencies", i.e., the things it's attaching together.
> I've had to deal with some (functional) languages that don't allow you to change the the variable where some would allow that sort of modification just fine for your simple example.
I completely fail the relevance of that. Or maybe it is somewhat relevant, but in the completely opposite way, i.e., that this phenomenon does not even have an analogy in the physical world, thus underlining the vast differences between software engineering and classical engineering disciplines. (Likewise the dichotomy between a computational procedure and the process generated by said procedure, the latter being constrained in the way you describe, doesn't exist in traditional engineering either.)
So in the end, it is more about attitude, not talent.
The bulk of the work, in my experience, is simpler than the programming part. It's just a culture/priority thing. Like how most of the industry used to not use source control. Now we do, we didn't need to add "talent" to get that done either.
Those computers were an order of magnitude slower. Yet many equivalent apps today run at about the same speed.
Performance is not being prioritized. And I suppose back then, devs had no choice given the resource constraints they had.
I've seen many times devs paying for larger databases instance without first even taking a few hours to try and optimize their queries.
...is a scotsman that doesn't exist.
Talented people usually cost a lot to optimize your code to be faster. And the more time they spend optimizing existing code, the less time they spend writing new code.
And faster code in existing places can unlock new business opportunities which can unlock more money.
It doesn’t matter if they’re a 10xer or a 1xer, their time always has a price tag.
American salaries sound great.
Project management methodologies like scrum are implementation level details, together with platform (user interface, programming language, database e.t.c). Software Engineering starts at the feasibility study, followed by detailed design, then comes the implementation phase. Most Software Engineering job posts only list implementation phase requirements, which I reckon has led to SE being misunderstood.
Servers can also go down, but the 100x faster code will remain the same.
But I agree with the sentiment. In a code base where performance matters, that level of optimization would often be impractical.
Otherwise we would all be writing assembler.
Humans are way, way more expensive than servers in almost every case. Exceptions might be small pieces of code ran by millons of devices, but that's far from being the common case.
> Between 100 + log(100) and 100 + sqrt(100).
I bet it doesn't even make the top ten. You have so many other problems by the time you're scaling to hundreds and thousands of servers and beyond. A few percent difference in server count? Whatever. That's still almost perfect scaling, and not going to be your bottleneck or your huge cost center.
The broader point stands though. The more servers you have, the more money a 1% cpu optimization saves you in raw dollar amounts. At a certain scale it always makes sense to optimize your code.
But since so many sites have social aspects now, you need to be able to communicate between accounts, which means interprocess communication.
The Amdahl's law fraction is at least logarithmic to the number of users, but more likely to be proportional to the 'surface area' - sqrt(n). Every new server gets you a little bit less oomph than the one before.
First, everything needs to be synchronized but that doesn't mean synchronization is expensive or blocks computation from happening. Lots of synchronization isn't even dealt with explicitly by users, it happens in queues in the kernel, in networking hardware, or even in the memory controller when talking about reading from memory.
The closest thing is the database becoming a bottleneck, but that is not because of Amdahl's law or even synchronization, it just about reads, and those are extremely parallel, so it likely comes down to total CPU power when it is a problem. That's why query caching can be so effective - it isn't about synchronization and definitely not about serial parts of a program, it is just about usage.
For connected social media sites, not everything has to be done on a page request. Anything done ahead of time is also done in parallel.
It's true that every server extra has a bit more overhead because of stuff like caching / load balancing / whatnot, but the amount extra can be tiny even at very large scale. For certain workloads that are parellelizable.
(It'd be fun to talk in details from personal experience instead of in general terms, but you know those pesky NDAs...)
Let's say you bought good hardware and the probability it will fail is x.
Let's say you buy a hundred of those. The probability that one of them fails is now 100x. You just increased the probability of failure a hundredfold. You will need a load balancer who can detect failure and not forward to that machine anymore. The load balancer can fail, too. Now it's 101 fold. You also need a separate database if you share sate. 102 fold. Make it redundant. 103 fold. Add router / switches. 105 fold. Make those redundant. 107 fold.
I have never used more than one crappy PC for any project of mine, and some of those have had/have pretty substantial load on them. When I say one server, I mean one PC but with two power supplies and RAID storage. I'm not trying to tempt fate here.
But I have no devops team, no admins juggling failing machines, I don't have racks. I have one slot in a rack somewhere. The machines have been running for years. Every few years, as I approach MTBF of the hardware, I move to a new server. That entails a small downtime.
Result: People use my servers in their "is the internet up" scripts. Because my failure rate is a fraction of what distributed systems have.
In fact, practically all of my downtimes have been software bugs in my code, or me fat-fingering input somewhere. I can't even remember the last time the hardware failed. Not even sure I ever had an actual hardware failure that caused downtime. I once had a raid hdd throwing errors and it had to be replaced. I think that's it.
Add tests to check that you have not inadvertently added a massive innefficiency with a simple change.
If you already have a load balancer, you should be able to scale on the server part from 1-100 exactly linearly. Where are your extra 7-10 coming from?
And if you already have a database, you usually just upgrade its specs linearly too. And if god forbid you need to shard... that's perfectly linear too.
For pretty much all practical planning purposes, servers do scale linearly. The equations you're describing seem to imply a deeper hierarchy of server connections, but that's not usually how it works -- servers (whether load balancers, web servers, or databases) just come in bigger pools, not trees.
Where are you getting your log() or sqrt() terms from? Is it some kind of application-specific caching layer or something you've used in the past?
I think this only counts in the simplest of cases? Such as serving HTML straight from memory.
In other cases there are various micro or macro services to call to, state to communicate on the enterprise service bus or something equivalent, separate database servers to run queries on, etcetera. A single request might hold up several other requests due to badly-thought-out dependencies between services and physical servers. More servers wouldn't scale exactly, and it might be difficult to find out where the next bottle neck is.
And if some external bottleneck exists... that's a separate issue and adding more servers isn't going to solve any problems at all.
I guess I still just don't see it... I know servers don't scale "exactly" in the sense that one service will require another server before another service does, but that doesn't make it nonlinear. It just gives it variance (fluctuating both upwards and downwards) around a line. It doesn't turn it into a curve.
The first thing that came in my mind is the following article from back in 2013: https://blog.iron.io/how-we-went-from-30-servers-to-2-go/
And moreover, it takes time to make code faster.
Very realistic scenario is that you have 2 months of runway in the bank and investors have committed to the next round but are holding back the money and still having 3-hour coffees and 2-hour lunches with you about how you can improve your go-to-market strategy instead of letting you get back to work on actually scaling your tech.
Of course, you might want to scale out earlier for other reasons e.g. fault tolerance, if you want to allow different kinds of failures.
There's no yes/no on this. It's all on a slider, a spectrum.
optimizations that are not premature are a win though, and by making it standard to consider performance issues from the outset, you and tour team get better at it, such that it doesn’t cost time to default to pretty efficient solutions.
also some basic tech choices can often get you large constant factor wins without any downsides, like using a languages that are at least jitted rather than interpreted.
- "At the end there are a lot of changes all the time to functionality and implementation and optimizations early get wasted very quickly"
- "optimizations are opportunity losses at the beginning of a product"
The question is: which optimizations are premature and which aren't?
Cheap optimizations that are known to work (compiling in release mode, enabling HTTP cache in your web server, using a fast sorting algorithm on large arrays...) are good even if you don't need them right now.
And that, my friends, is a contradiction!
Now if junior doesn't implement them, are they didn't do required optimization, or do they do premature optimization if they did?
The preparations you made that turned out to make your life easier, the things you regret not doing, and the things that ended up being a total waste of time.
How about having respect for you user's time? Depending on how many users you have, shaving seconds or milliseconds off your response time will save humanity hundreds, thousands, or millions of hours waiting for your software to do something.
For hackers or product focused devs, making something work is the most important aspect whereas for engineering focused devs, it hurts to see such large inefficiencies that are solvable.
I empathize with the engineer mindset, but definitely align more with the hacker/product mindset.
For a hacker, making something fast and cool is the most important aspect, whereas for engineers making careful tradeoffs between effort and customer impact are the main focus.
I'm not saying, "engineer bad, hacker good", just that we tend to value "good" code, architecture, performance, etc highly and sometimes that is to our detriment.
People in the latter mindset don't get that providing real value to users is what matters.
Literally everything boils down to it. Even the example of making a faster application, it's literally only useful in that it increases how quickly you generate user value.
At some point once you generate enough value for your end user, the utility will outweigh a given latency problem.
Even making your server costs cheaper with optimization really matters because you can transform the savings into user value that exceeds the cost of optimizing.
So it really doesn't make sense to thumb your nose down at people who are "just sticking stuff they don't understand together" in a vaccum like they tend to do.
You don't know their runway.
You don't know how concrete their business case is.
You don't know their development budget.
The moment you're making a dichotomy between yourself and "those programmers who just put shiny legos they don't understand together", you're demonstrating a lack of understanding of the bigger picture that development fits in.
Because sometimes hiring someone who has little experience outside clicking those legos together is all that allows an idea to exist.
tl;dr: A service that loads with 100 requests instead of 1 because the developer doesn't know better still generates more value to the end user than one that doesn't exist.
But the simple fact is some of those services simply would not exist in another form.
Another way to look at it is if they could exist in a meaningfully faster form easily, competitors would pick up on that.
I think I read this here years ago as order of development, maybe quote from somebody else even, I forgot
A little bit is ok. If you have to. But too much of it and there's no way back other than draining the pool and starting again.
(And the converse is also usually true.)
Working for large banks I see this error made every day. Nobody knows how to optimize, but everybody knows how to request for couple more servers.
Then there is a talk of budgets, hiring more people for maintaining the behemoth, then introducing some pretty exotic and expensive technology to "optimize operations".
I have seen yesterday a huge piece of infrastructure with literally terabytes of RAM and a number of GPUs costing at least 250k a year for a batch job that should fit a single server with no problems.
The point with optimizing before scaling horizontally is to make sure you understand what is going on. Scaling horizontally without a good understanding is going to be a fail-whale shitshow.
I'd also note that high availability needs some horizontal scaling.
Such an HA strategy is valid, but provides no horizontal scalability.
Scalability is the ability to serve more traffic. Scaling horizontally implies adding hardware.
Though the HA example given above includes additional hardware compared to a single primary node, it does not support any additional traffic. Scaling would imply that the addition of the secondary server allows the system to serve more traffic.
Probably cheaper to just pay 250k than for a couple of engineers taking risks modifying a battle tested bit of code... If it is important enough for the business to justify paying that much for then you'll need to "do things properly" and not do a hatchet-job of a cowboy refactor, but plan things out and test them and have rollout/rollback plans, project management etc.
Maintaining larger infrastructure is an ongoing cost as you need to provide x times more resources throughout the life of the application.
Not only that, but now future development will have to be done in context of large application. Maybe writing a process for a single node could be easy and now you have 100 nodes and it is complex to design your functionality to work correctly on multiple nodes, complex to deploy, complex to resolve issues. Maybe you decide to employ new technology to solve some of those problems but at the cost of creating other problems and also additional complexity.
Then maybe you need more people -- and this comes with its own inefficiency.
The point really is, most applications could be orders of magnitude more efficient if you just don't make stupid decisions.
Just adding Hadoop costs you x10; to go 100 times faster (compared to one machine properly written without Hadoop) you need 1000.
Common web app scaling is not quite as bad, but it is quite bad. Philip Greenspan had shown back in 1997 with the original arsDigita system that one machine hosting db+web could do better than the then-prevalent multi-machine deployment. That’s still correct (and still mostly ignored in scalability discussions cause it is boring and requires thinking ahead).
* choose a language that supports async operations so that you aren't waiting for external webrequests or database calls to get back to local code.
A synchronous language can turn your big, beefy machine into a single concurrent transaction language and _that_, beyond any other pain, can really, really, really reduce your total throughput.
Pick languages that can do other things while they're waiting for IO.
At $WORK, we run a lot of 1000s of php servers with 10-20 processes each using nginx. External calls we don’t need the result of immediately get kicked out to an external queue, but we haven’t seen any issues with DB calls blocking the whole server.
Long or slow DB connections are an issue for individual request time and DB load though (the biggest DBs we run handle 1000-2000 concurrent connections).
Right now in a microservice model often you have to remind developers that it just isn't ok to have a ~1ms response in a basic case and a 1 second to infinitely blocking case during errors or slow remote services.
People are saying that python and ruby can't do things in parallel in an easy way.
While technically true, it is also pretty much irrelevant, since we have multithreading.
If you prefer some other paradigm then Concurrent Ruby has a swiss army knife of them including event loops and Go style SCP.
I think things will improve in ruby 3 but even without ractors it's still possible to do either async IO in MRI or use JRuby for fully parallel ruby.
Ruby's had aysnc IO support for at at least 10 years I think.
Rails itself has been using Concurrent Ruby since at least Rails 5 but I think it had its own concurrency patterns even before that. e.g. ActiveRecord database drivers.
Jesse Storimer's Working with Ruby Threads book from 2013 is still a really good resource for concurrency in ruby.
Multithreading just increases your technical max concurrent request up to the size of your multithreading maximum.
Async lets the computer do _anything else_ while it's waiting for the request.
If you're handling thousands of concurrent requests, you need thousands of threads available ... or, 10s or hundreds of threads with async available (your mileage may vary).
It regularly took down the multithreaded python instance.
The solution we went with was to put all the different web requests into 2 flask servers and then take the output of all those different web requests and make one big request tot the django instance that had all the business logic.
If memory serves, we only had 2 flask instances handling all the traffic that the old array of django servers.
Async pays dividends.
Python, until recently, didn't.
Even then, it's a bit misleading; make it so concurrency, in general, is easy and efficient. Async is immaterial if you have green threads or similar (a la Go or Erlang). Point is, you shouldn't risk a model that ends up unnecessarily blocking/synchronizing execution.
Do you have any examples of this being used with Rails? I'm having a bit of trouble finding examples of both Socketry and Rails being used concurrently; and, I will admit, my Rails knowledge is a bit .... intermediate or less.
That's the earliest I could find off the top of my head - But the other comment responding to you found an even earlier one where they mention the "select" feature that was added sometime around 1998 (22 years ago). The twisted async networking library seems to have been released/started at least 13-14 years ago as well.
I think we all (myself included) kinda forget or underestimate how old Python really is.
If something is CPU bound it is likely to be memory latency unless it is very optimized and running on multiple cores.
If something is not CPU bound neither memory latency or memory bandwidth will be the current bottleneck because both show up as CPU time.
Prior to the new async stuff, though, Python was a horrible language if you had internal microservices bouncing requests off each other - the edge layer would wait a very long time for all those secondary requests to finish before it could respond.
* Don't pick languages with no concurrency support.
If you know more, the thing to do is to share some of what you know, so the rest of us can learn. If you don't have time or don't want to do that, that's fine of course, but then please don't post.
In case you're interested, a more in-depth explanation of this principle is https://news.ycombinator.com/item?id=25130956, from yesterday.
Edit: it looks like you've unfortunately been posting a lot of unsubstantive/flamebait comments. We ban that sort of account because we're trying for curious conversation here. If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and sticking to the rules when posting, we'd be grateful.
> optimizing your app to fulfill a request in 1/10 the time is like adding 9 servers to a cluster.
Not if the request executes on one server. Because then it's like saying that optimizing baby gestation to take place in one month is like adding 8 uteri.
In real life you would also be in for some real "fun" after nine months ;-)
Who the hell is casually optimizing away 90% of the latency in the time it takes to spin up 9 more pods? That's insane. Like the latter is an operation on the scale of minutes.
Also, this is a throughput/latency conflation.
90% of the cost a startup incurs is in the things it could be doing instead, not in the infra. Opportunity cost dominates.
At way smaller companies (2 people), though, this is way harder unless someone has done something really wrong.
I could have done that with Gradle but we also want to support multiple languages (Java, Python, NodeJS & React, Golang).
Bazel's caching abilities are by far the best I've ever worked with because it understands the full source tree. It can also cache test executions. There's some tests in my code that make sure I'm calling out to crypto libraries correctly and these tests take >30 seconds to execute but almost never change. With bazel I can feel free to write as many of those integration tests as I want since they will only ever be rerun when something effects them (I.E. I change the version of my crypto library).
> Honestly, while the theory is that you can Dockerize your build and you can do remote caching with Bazel I've never seen anyone do it
Yea, you likely don't want to run bazel within a docker container, you want to build a Docker container within bazel . The performance of this way of doing things is much better. My monorepo has >30 services and `docker-compose up --build` was becoming super slow. To address this I've written bazel_compose  to obtain the same workflow docker-compose offers you with bazel as your container build system. It also supports a gradual migration scheme and will build both the Dockerfile AND the bazel version of your container to make sure they both start.
Unfortunately the bazel community is mainly populated with companies who are 100x the size of the average and as such they already cant run all of their services on their dev machines and so they don't see the value of something like this. This version of bazel_compose is out of sync with HEAD @ caper but if you're adventurous I'd recommend checking it out. It has extra features to watch all of the source files using ibazel and will automatically build&restart containers (<<10 seconds in my experience) as you edit and save code.
 - https://github.com/bazelbuild/rules_docker
 - https://github.com/CaperAi/bazel_compose
It's also important to note that those efficiencies don't go away on their own. It's more profit/runway until you retire that code.
Opportunity costs exist, but so does spending your money wisely.
I bet there is a non-insignificant amount of existing endpoints out there that could easily be improved a large percent if someone actually took the time to profile it. Just from the shifting codebase, build up of complexity, “legacy code”, etc. that no one is probably looking at.
If there was, for example, easy to use developer tooling that automatically identified bottlenecks, the opportunity cost scale could tip towards optimization vs having to pay for and manage more servers.
But the opportunity cost insight is great; optimization only wins if the cost to optimize is low and the amount that gets optimized is high.
At one client where I was doing more IC dev work, it got to the point where they'd just chuck reports over the wall at me and ask me to find efficiencies in these. This was supporting >15K internal users in a massive shared reporting and BI infrastructure, so it was well worth it to have someone spending 10s of hours on optimization on a regular basis. I could routinely find order of magnitude improvements in reporting queries that were already running in production. If bringing things from POC/end user mockup to production, I could frequently find two orders of magnitude of improvement in query time, which reflected directly in latency to report rendering for end users.
That client was not special.
I wouldn't call this sort of work casual, and it's a very different domain than optimizing serving web apps. I would spend days to weeks on such optimization tasks at the client I mentioned above. It might take a few weeks to do serious optimization work on large ETL pipelines, but I routinely find big gains in performance on such projects - usually in the range of 1/2-1/10 run time compared to baseline.
Edit: And I agree: I wouldn't be surprised to find a SELECT * and a call to size() in a moderately sized system.
You would be very, very surprised.
Speaking from experience.
Ten years working on a code base that serves near-StackOverflow levels of traffic and 5% would be a _huge_ win. I don't come across those, or even 3%, very often.
Our app has dozens of routes all seeing hundreds to thousands of requests a minute. To get a performance boost that big it has to be in some foundational code that's used nearly everywhere, and that code's already been poured over every way from Sunday.
Occasionally we'll run into a bit of code that sneakily becomes a significant drag on the system as traffic through that code grows slowly over time. A fix might result in a large % load drop, but only because there's some pathological problem, which I hesitate to call an "optimization" rather than a "bug fix".
We did also uncover many significant optimizations after migrating from dedicated hosted bare metal boxes to the cloud, when our network latency assumptions got thrown out the window - but the bulk of those optimizations were simply "cache it".
Other times, it's batch calling the DB ahead of time, rather than calling it for each entry in an array.
edit: these are easy mistakes to make over time.
You have 10,000 customers, and are getting 1000 a month, and each customer starts accumulating data. Your customers are starting to notice your app is slow, and your boss asks you to work out what it'll take to fix the problem, but the problem is you just don't scale.
You get a pretty clear idea that by the time you hit 50k customers, you're going to need 10x as many servers, and by 100k, you'll need 25 times as many (because your costs per customer keep going up, and lots of things scale logarithmicly). Either you're smart enough to know your boss will laugh in your face, or you're not and they do.
So I guess it's time to start optimizing. You cut some n^2 behavior down to nlogn with a smaller C, and you make your servers 4 times faster, but much more importantly you decrease your run rate. Now you only need 2x as many machines for 5x the customers, instead of 10x. That's 80%, but you're getting diminishing returns. Eventually you have to be 99% better because you have a million users and you couldn't possibly return a profit if you needed that much hardware.
E.g. in one of the systems, prior business analysis showed that legally valid “chain of custody” would be an issue, which dictated very specific (and often costly, performance wise) decisions.
And then after the 3rd customer deployment, it turned out that they don’t care - they’d rather pay less for everything, and lose every 20th claim. And after the 20th customer it was proved beyond the shadow of a doubt.
It was stupid to believe what customers said ahead of time, yes. But I don’t think that’s the kind of stupid you are
There's perhaps a mistake in shipping that first draft, but usually that lies with management.
The mistake is often keeping it live past the time it is clear what the mistakes you made are and that they will kill you in the long term.
> Consider a fast single-file database like SQLite
Sure, it's nice if your application is optimized enough that it could run on a single server. But it seems to me that actually tying it to a single server, with local storage on that same server, in production, is irresponsible. I sure wouldn't want to have to explain why the application went down, and will come back running a possibly out-of-date DB backup, if that single server suddenly disappears. SQLite may be faster and simpler, but to sleep well at night, there's no substitute for stateless application servers in front of a managed database.
The main issue with SQLite is it's insanely loose type checking. Column types are completely ignored, as are foreign key constraints by default. Not a good way to build a robust system. But if your schema is pretty simple and you don't anticipate much data I don't see a problem with using SQLite.
To manage scalability and maintenance, we run multiple SQLite databases - one per logical type of persistent business entity (i.e. Users.db, Sessions.db, Customers.db, etc). This allows for us to manage schema versioning for each type independent of any others. We have ~25 types that each get their own separate DB. Our migrator is a simple for-loop, but somehow our approach seems even more elegant than Entity Framework because we don't need special unicorn tables to track migration metadata - see: pragma user_version. The part that requires discipline is that we have no hard referential integrity constraints. This is where developers have to make the right choices when designing related entities & data stores. We do not rely on the database/ORM to clean up our modeling for us.
Our backup strategy is to snapshot the entire VM. The biggest motivation for having your application fit on a single box is that you can synchronously snapshot the whole system with a single click. This is far simpler than maintaining a completely separate SQL server instance and worrying about all of the added complexity of backing up 2 (or more) machines. We have yet to encounter a customer who did not have the ability and willingness to use this strategy. If your business application can run on 1 server (and is forecast to do so forever) and you have a RTO/RPO that permits using VM snapshots as backup/restore, then I would strongly recommend considering this type of approach from an engineering perspective (assuming you have the team/skills for it).
Thinking more broadly, since we have committed to this idea of the datastore living on the application server, we could hypothetically build up clustering at the application-level by adding multiple application nodes. This would probably be better for us anyways, because we really only have 2-3 entities that we absolutely must have synchronously replicated across all nodes. A heavy-handed SQL Server cluster approach is way overkill when we can just swap to GUID keys on our sessions/state and pull consensus to update important settings, transactions or permissions.
And at the broadest scope, I still feel like most developers vastly underestimate just how fast a computer can do things in a well-optimized domain. SQLite is substantially faster than SQL Server in the single-node case. You will never get lower latency than by having your database running in the same process as your business application. Latency is the biggest devil when it comes to dealing with transaction throughput. If I can get a user request out the door in 100 microseconds vs 5 milliseconds, it makes a shitload of difference when I am pushing thousands of these per second.
Further, https://en.wikipedia.org/wiki/Gustafson%27s_law may be more appropriate here as it is common for the app requirements grow with the performance increase of the compute hardware.
Edit: bottom line is that the conclusions of the article are dangerous. Reducing the controller method exec time by 10x is not likely to allow your system to process 10x RPS by itself, the cost and product backlog delay caused by a perf optimisation project may not be acceptable etc. But beware that slow code in the critical path of your distributed/parallel system such as kernel code, event loop code, transaction commit code, distributed consensus protocol implementation can make your system unscalable beyond N factor (whether it is cores or servers). You can surely add 100xN nodes but it won't help.
Optimizing something to run 100x faster avoids needing more servers, thus avoiding the synchronization from splitting things up to them.
That synchronization by the way is not actually done by the servers, it would be done by dedicated networking hardware. Even on a single computer, for web server programs the synchronization is not really happening in the user space programs, it is happening in the in kernel for the IO to and from disk and to/from the network.
However, I think you are right that on small number of servers speeding up parallel part can have a more immediate effect. I did a small calculation just to check the numbers. Assume you have a server that can do 100 RPS. You can either speed up the parallel code 2x (let's be realistic here and stop this 100x nonsense) and bring the RPS almost to 200 or you can take that serial code that is just a fraction and reduce it by half, so that f=0.995 and not 0.99 (and adding a marginal RPS increase to 102 RPS).
Here is the plot: https://imgur.com/a/7w1hms9
Edit: read http://www.frankmcsherry.org/assets/COST.pdf as recommended in the upvoted comment above!
> non-parallelizable part
Which part is that? Synchronization does not have to be expensive. The original paper had a theoretical 'serial' part of a program, but synchronization is different in that it can be very fast and doesn't tie up other resources. For a web server the synchronization is in IO, which is being handled by queues. It doesn't stall cores, processes or threads can put their data in a queue and another core can handle it.
> You can either speed up the parallel code 2x (let's be realistic here and stop this 100x nonsense)
Far from it. If python is translated to C++ directly you would already have a massive speedup (I would guess at least 20x, maybe more). If memory allocations are minimized you get another huge (7x) speedup on top of that. If larger chunks are dealt with at one time you get another huge speedup. I think 100x would be common.
Amdahl's Law is really about diminishing returns when there are significant places that need to be serial. Any emulator can be a good example. There aren't nearly as many scenarios that have to be serial as most people think.
A task that we have (if we want to formalize it for analysis via Amdahl's law) is to process 1MM requests on 1000 web servers and a single SQL server behind them. What we are after is how much faster can this setup process 1MM requests compared to a setup with 1 web server and 1 SQL server (1 SQL server is used here to exemplify the sequential part of the request handling). Indirectly from this calculation you can derive the increase in overall system RPS after you run a benchmark actually pushing 1MM requests through 1000 servers vs 1 server.
If you have a line of code `sqlUpdateTransaction.commitBlockingWait()`, and it takes 1% of your request handler execution time, then even if you add 1000 web servers but have only 1 SQL server machine behind them (a little bit oversimplifying here as contention is likely to degrade performance further, assume we are able to scale the SQL vertically just so that the call above always takes 1% of the request handler time for the sake of simplicity), your system will not process 1MM requests more than 91x faster than a setup with a single web server and a single SQL server. See https://www.wolframalpha.com/input/?i=1%2F%28%281-0.99%29%2B.... And no amount of green threads will allow you to outsmart the Amdahl's law. The only thing you can do is to return 200 OK before the transaction is committed, which would bring sequential part of the handler code from 1% to (nearly) 0% (and arguably be an unacceptable method of performance optimisation).
Ignoring all the wild assumptions and made up numbers here, this again has nothing to do with Amdahl's law because the "serial" part is unnecessary synchronization and waiting, not unavoidable serial computation.
If you have a thousand web servers, having all of them wait to sync would be the exact thing anyone spending millions on servers would work to avoid. To start at the obvious bare basics, if only one computer can sync at one time, you might as well just use one computer.
Computers are queues and buffers all the way down and all the way up to the network level. A single computer or core merging data is not Amdahl's law. There is no reason computation resources have to wait on some other resource synchronizing, and this is not serial computation, which is what Amdahl's law is about.
The question isn't "optimize my code" or "add more servers", it's both, and determining when it's appropriate to do one or the other at a given point in time based on current load and expected future trends. But generally, "optimize my code" can be pretty low hanging fruit in complex systems, because if you lack systems people, you end up with everyone focused only on their specific component and nothing else.
The biggest mistake I see is people spending a lot of time designing for horizontal scalability and then running a cluster of under-powered servers when a single powerful machine would do just fine.
There are very few things that are easier in a cluster (uptime perhaps?) and many that are easier on a single machine. Once you add the requirement of having to work in a cluster, every new feature will have to respect that. If your system is doing anything interesting you'll be hitting all the distributed computing dilemmas pretty fast.
I feel compelled to bring this article up, especially since last time it was brought up on HN people seemed sure in the comments that it was not a common sentiment.
I agree with the gist of this article, but I think if you are not a FAANG as the article suggests, then you probably also are not at a scale where your carbon impact is actually measurable...
It's not hard to have a startup that does use 100kW of servers and maybe about 150kW total-- HVAC, networking, UPS inefficiencies, etc.
There's about 450 grams of CO2 emissions per kilowatt-hour from typical electrical generation. Therefore this is about 600 tonnes of CO2-- not counting other lifecycle costs.
On the one hand, this is a tiny smidgen of overall CO2 emissions. On the other hand, this is about 80 households
worth of CO2 emissions.
As a developer, making this kind of thing 1% more efficient is the same magnitude as completely eliminating your home carbon footprint, and is a whole hell of a lot more plausible to do.
I know offsets are "cheap" at $10 per tonne or whatever as the article says. On the other hand, please don't really assume that buying $6k of offsets really does as much good as eliminating 600 tonnes of emissions for realsies.
A typical CPU in a cloud datacenter is maybe only 10% utilised. That's mostly due to unsold capacity and poor bin packing on many levels (user has an oversized kube cluster, the cloud provider has spare machines, etc.). Many cloud machines just sit idle for years on end because someone has forgotten about them. In many cases, those idle resources are not used for lower priority jobs either, since they are being paid for by a customer.
You could also skip the machine reservation and get better utilization by using GAE or similar.
Maybe packed in with other public cloud things? I thought the partitioning between public and private resources was done at a pretty coarse level, but might misremember.
IMO more relevant is that Google public cloud resources are carbon neutral, at least wrt electricity. (Through buying green power and emissions credits, to the degree you believe the prices and the models I guess.)
Bursting to 100k servers for 24 hours rather than properly engineering the code is a trade-off that frequently gets suggested now that the cloud is an option.
Slice 25% off the numbers I said, which assumed an overhead of 50% instead of your suggested 10%. It doesn't change the point at all.
As a parallel let's look at diets (independent gives higher numbers). Lots of people reduce their meat intake to reduce their personal carbon footprint. But the average diet is about 2000 kgCO2eq/yr (3000 via Independent). A vegetarian diet is about 1200kg and a vegan is around 250kg (potentially dubious claim but it is for sure less than a mean and less than a vegetarian diet). 10 websites changing to local fonts is like a single person changing to a vegetarian diet.
The logic here is that you could $20-$30/yr and completely eliminate your dietary footprint, which is much easier than changing diets. So are what these people doing useless? No. It helps because the aggregate. It helps because it makes people conscious of the problem and how their choices affect the world. But yeah, big corporations should be doing more, but without them doing something that doesn't mean you should give up and do nothing.
But I also probably wouldn’t pick it up on principle so the person who dropped it could come back and get it.
I don’t know how to extend your analogy to account for my process though.
There ~7.8 billion people on the planet.
If you vote (let's say everyone votes for the sake of the argument), you're increasing the percentage of people voting by ~0.0000000128%.
This change is just as "meaningless" as that one, so... don't go vote?
At some point you need to stop focusing on stupid shit when planting a tree does far more.
Sure, don't expect to change the world on it alone. But if every industry takes on an awareness of this sort of thing, and each niche within an industry discovers some way to shave a fraction of % off it's emissions, it will all add up significantly.
Doing this kinds of thing is always going to be a game of inches, not miles. We can't say "It won't even get us a mile, why should we bother?" Be realistic about it by all means, but take the easy inches when they present themselve.
I know if I tried to get our people to shut down half of our current infrastructure people (who have no real knowledge of any actual load numbers) would never allow that...
So much political nonsense... It's been a long week :)
If a review stops me from buying the wrong product, what's the footprint of that?
Sure, Django might be easier to start off with, but if you want to optimize your app, there is only so much you can do. However, the non-optimized performance of an ASP.NET core web app (developed as per Microsoft recommendations) is orders of magnitude faster than any Django App, without any explicit optimizations.
It also helps if you can split your app into an API and a client app, as APIs consume a lot less resources than traditional web apps.
It makes sense, going forward in a cloud dominant world, that the traditional web app development process, where HTML is passed around over the wire, a new set of HTML for every page, has to die.
Its imperative to replace it with modern stacks, with the client separated from the API.
The boilerplate code for basic web apps often dwarfs the actual content of a basic web page. You could also make the argument that many things designed as web apps have no business running in a browser and should be written as standalone applications, but what's "best" in some sense isn't always the most pragmatic thing.
We shouldn't focus so much on what's best for us (at least in operating costs) that we sacrifice what's best for the end-user. In many cases, a conventional server-rendered web application is what's best for the end-user. If we develop one of those by combining an API backend with a separate server-rendered application consuming it, we've likely added extraneous complexity and resource consumption to get back to what we used to have in an integrated server-rendered web app.
Also, as DHH has written  , an integrated system, a.k.a. monolith, is best for programmer productivity and especially for getting the most out of a small team of generalists. I want to use a framework whose authors and community are opinionated about this. It seems to me that the Phoenix framework and the broader Elixir community fall into this category, while still delivering pretty good performance. I've also taken a hard look at ASP.NET Core, but I don't think it's as focused on the integrated approach. So, after being undecided for too long, I'm using Phoenix for my new project.
Just making one server run well won't do much to your resume. Creating some distributed micro service buzz word bingo monstrosity on AWS on the other hand looks impressive.
It's all a question of incentives.
That's pretty much true, because doing that and that only will also mostly get you hired at a place which has a single server (like say "use SQLite" instead of RDS).
On the other hand, if you can justify what you did to make a single request better (as in, "what I did meant that a single request came back in 40ms instead of 120", like that mixed API example - like if you bypassed the standard 1+N problems), then I'd be very keen to pick out what that person did.
A lot of work involved for the latter is also the same for the first, but it is a difference in attitude towards user relevant functionality which I've noticed in the best people I've worked with.
That is not to say don't use SQLite, but more along the lines of this is a simplicity hack, I can make a complex working system out of this simple system, because I used the right abstractions for that bit (like say, JDBC or PDO - maybe not a full on ORM).
Sounds impressive until you hear that it’s serving maybe 300k requests a day.
This cannot be correct. If you optimize 1/10, and then optimize to 1/10 again, that's the same as optimizing to 1/100. Does this "add" 18 servers or 99?
If you optimize to 1/10, then your cluster can handle 10x the load, which is the same effect you would get by simply having 10x the number of servers (assuming your app is perfectly parallelizable).
The article's quote is only correct if the "cluster" starts as just a single machine.
- I have x servers. I optimize to 1/10th of the time. I still have x servers but now have (1+9)x = 10x virtual servers.
- Now, I optimize again to 1/10th of the time. I have 10x virtual servers, so (1+9)(10x) = (1+99)x = 100x virtual servers.
That's like going from needing a small server to something you could wear as a watch. (So to speak.)
It most cases this the wrong way of thinking. It isn't that its bad or wrong, but rather that it's dated and expensive. This is the 2000 era .bomb logic of get big fast and data is king. It's like thinking in terms of Facebook instead of BitTorrent. At this point if you are not already established or working on something extremely original you have probably missed your shot and crowded out from the incumbents.
Current programming paradigms indicate two paths forward of which one is substantially less expensive than the other: distribution and concurrency.
In the distribution model most of your costs are up front in the application. This is a service oriented approach but without a central service, rather a pool of nodes that intelligently communicate task, file, and event queues. The cost to scale is divested from the cost of application, which is the biggest difference between this approach and thinking in terms of huge traffic or server/data hoarding. Since the application is divested from cost to scale all that is required to compete with the large incumbents, at scale, is market penetration. Marketing is cheaper than a data center.
The concurrency model requires a central service but does not operate as a central server. Each connection/session is a parallel child processing unit, such as an event loop. This approach requires less costs up front since there is still a central connection point to manage, but there is still a cost to scale even though much of that cost is offloaded from data management to increased processing overhead.
If you optimize after the fact, you probably already made bad design decisions earlier. Those usually cannot be corrected by another optimization pass over the code.
It is much more effective and efficient to have a good plan before you start, and make sure not to write inefficient code in the beginning.
It is also much more effective to take the lessons you learned from the first round, throw the code from that round away, and write a new version. However that is only an option if you modularized your project well, so that small modules are replacable without endangering the rest of the project. Writing the new version will go much faster because you now understand the problem better, and it will produce better code because you get rid of questionable design decisions that hold you back in the long term.
Think about what your legacy in this world will be.
Leave the world with better software than you found it with!
In fact I am optimizing my own code, too.
But I only get ~10% speedup, tops. Because perf is decided at the design level, in the architecture, data structures and algorithms. In fact, if you get massive speedups in your optimization phase, I would view that as a sign that your process has issues. There should not be that much potential left after you wrote the code down.
Elite fit on one floppy disk. Do you think they got there by taking a slow implementation and optimizing it?
Have you ever used Turbo Pascal? You think you could get that kind of speed by taking an existing pascal compiler and optimizing it a bit?
As a subtext I think your general perspective is as an experienced developer and not all developers have the experience to write runtime efficient code or components with clean modularized interfaces, so it won't work for everyone.
Don't get me wrong - I have been a beneficiary of their startup programs, and it's amazing for companies not to worry about compute for a year or two (you're never really using 100K anyway).
That said, I have seen multiple companies get into the hole of not worrying about optimization (at all - who needs to when you have a quarter of a million in credits), and locking themselves into a massively complex microarchitecture by the time that first real bill rolls around. AWS knows what they're doing, do your best to work them rather than the other way around.
Anyone have a link to good write-up on tuning Apache/nginx and Postgres/MySQL for a web app running on a $10-20 typical VM?
For a lot of cases varnish (or a different reverse-proxy cache) will shave off 50% of your traffic from ever hitting the servers. It depends on your percentage of guests/sessioned users and and read/writes though.
After that using a cache like redis or memcached for things like user sessions/permissions and commonly fetched data significantly reduce the load on your database.
Focus on your app instead :)
To be fair, a 100x optimization is not something easy, and if it was, I would probably blame the development team.
Your Web App is a whole lot of code and components in series. An 100x optimization would mean that every single part in there can get 100 times faster, or there is a component that takes maybe 95% of the time and this can be optimized 200x, and the rest of the stack takes 5% of the time and can be optimized 10x.
I do believe that often it's the other way around, instead of a very fast first iteration, people go and build the distributed version that can scale 100x but takes the performance hit that all distributed systems have for network, syncing, etc.
Most of us are using off-the-shelf components, so the space for optimization in our code isn't that big —unless you are a novice developer but compilers go a long way in our time. Thus it may come to choosing the correct off-the-shelf components, like maybe SQLite instead a NoSQL external datastore.
Optimizing parts that are already reasonably efficient is much harder, but unless you spent a lot of time optimizing already there are likely plenty of low hanging fruit.
Look at your salary, add ~50% for corporate overhead and compare this to the savings you'd generate using market pricing of cloud resources (even if you don't use cloud). Can you beat this with, say, a 3 years discounting plan for the savings? In a lot of cases the answer actually is "clearly not".
Sometimes you can eek out very sizeable gains, but always do the capitalist homework first (and the engineering homework of profiling second) before trying any fixes.
Also, if it's performance that you need look into "stupid" solutions first. Can you move the DB to flash with just a few config options? Can you mlock() it in RAM? You'd be surprised how many things fit in RAM, especially if your DB is a managed distributed solution, and still be very cheap compared to engineering salaries.
1) There are plenty of fast languages out there that are suitable for high productivity environments. Go, Kotlin, Scala, F#, C#, even Java, are all great languages for developing web apps and services. And they'll all beat the pants off of your PHP, Ruby, or Python framework of choice.
2) Many of the frameworks that get you up and running extremely quickly really only save you a day or two at the very beginning before they level out and are more comparable with other frameworks. Sure, `rails new myapp` and your first few migrations will save you hours over something like Play Framework, but after that, you don't really have any more jawdropping productivity tools left to use. I once had a conversation with a die-hard Rails enthusiast tell me about how he decided he would never go back after using his first ActiveRecord migration, and realizing how much code was generated for him. I asked him how much time he saved, and he said, with pride, "LIKE 4 HOURS!". Meanwhile, he probably spent probably 2-3 hours every day writing tests and debugging errors that C# or Scala would never even let compile in the first place.
3) It's not too long before you have to start thinking about performance anyway because your high latencies are a terrible user experience. This goes for client-side software as well. My 8 year old doesn't know shit about software, but he knows that he absolutely hates Microsoft Teams and would much rather his school use Zoom...because the bloat and terrible performance are a massive disappointment in user experience, especially on cheap school-provided laptops.
4) People really undervalue the quality of life and productivity improvement that comes from managing one server versus ten. Even with the cloud, which supposedly takes care of those headaches, you still have to worry about load balancing, health monitoring, auto scaling, complex caching, etc. And even if you could autoscale your web server worry-free, you're soon gonna have so many connections to your database open that you're gonna have to worry about setting up connection pooling, read replicas and logical replication, and configure all your servers to partition their queries into reads vs writes, etc. And 95% of stack overflow answers that you google when you hit a snag are now irrelevant to you because they don't take into account the complexity of your environment.
Performance is absolutely an improvement in productivity and quality of life. It's one that pays dividends every single day, as opposed to a handful of days at the very beginning of a project.
This sentence is strange. It uses the term cargo cult as part of an analogy to cargo culting. Wtf?
Anyone who knows what “cargo cult” means doesn’t need the rest of the comparison. If they don’t, then the comparison is senseless.
For lurkers, this is an excerpt from the wiki page 
"A cargo cult is a millenarian belief system in which adherents practice rituals which they believe will cause a more technologically advanced society to deliver goods."
In Indonesia today, local tribes have a habit of making the jungle and hill runways for aid too short, soft and tree-ringed, resulting in perpetual accidents affecting the long-term viability of cargo and passenger flights.
(Their huts are often made of wrecked airplane wings and fuselages.)
In WW2, the Chinese government charged the US a fortune in gold to build the Chengdu B-29 runway using 70,000 laborers with hand tools, which had to be about twice as wide and twice as long as any previous runways. The Chinese runways turned out to be too far from Japan, so they were largely abandoned after nearer islands were captured.
Period Photo of Chengdu Runway Construction Using Hand Tools
(The infamous Gusdalcanal runway was built years earlier with modern equipment captured by surprise from the Japanese.)
Source: commercially-rated airplane pilot, WW2 student.