Hacker News new | comments | show | ask | jobs | submit login
Does it scale? Who cares (2011) (jacquesmattheij.com)
431 points by ne01 197 days ago | hide | past | web | 274 comments | favorite

Couldn't agree with this article more.

I built the biggest social network to come out of India from 2006-2009. It was like Twitter but over text messaging. At it's peak it had 50M+ users and sent 1B+ text messages in a day.

When I started, the app was on a single machine. I didn't know a lot about databases and scaling. Didn't even know what database indexes are and what are their benefits.

Just built the basic product over a weekend and launched. Timeline after that whenever the web server exhausted all the JVM threads trying to serve requests:

1. 1 month - 20k users - learnt about indexes and created indexes.

2. 3 months - 500k users - Realized MyISAM is a bad fit for mutable tables. Converted the tables to InnoDB. Increased number of JVM threads to tomcat

3. 9 months - 5M users - Realized that the default MySQL config is for a desktop and allocates just 64MB RAM to the database. Setup the mysql configs. 2 application servers now.

4. 18 months - 15M users - Tuned MySQL even more. Optimized JDBC connector to cache MySQL prepared statements.

5. 36 months - 45M users - Split database by having different tables on different machines.

I had no idea or previous experience about any of these issues. However I always had enough notice to fix issues. Worked really hard, learnt along the way and was always able to find a way to scale the service.

I know of absolutely no service which failed because it couldn't scale. First focus on building what people love. If people love your product, they will put up with the growing pains (e.g. Twitter used to be down a lot!).

Because of my previous experience, I can now build and launch a highly scalable service at launch. However the reason I do this is that it is faster for me to do it - not because I am building it for scale.

Launch as soon as you can. Iterate as fast as you can. Time is the only currency you have which can't be earned and only spent. Spend it wisely.

Edited: formatting

> I know of absolutely no service which failed because it couldn't scale. First focus on building what people love. If people love your product, they will put up with the growing pains (e.g. Twitter used to be down a lot!).

They haven't failed (yet), but I think gitlab.com would be a lot bigger if they scaled faster. Lots of people have been rejected using it because it was too slow.

Would it be as big if they had stopped to consider all scaling issues at the beginning?

In 2015 we only had the people to focus on one thing. We opted to focus on the downloadable product. This grew our revenue much faster than GitLab.com. We started to focus on .Com in 2016 but in hindsight not aggressive enough. Now it is one of our top three priorities.

You could argue it would be bigger. Instead of working around Travis to connect to repositories, the repositories are there. That's the biggest sell for me, anyway.

(Note: I agree w/ the OP + commenter whole-heartedly)

The other counter-example I can think of is Myspace who frequently had tons of server errors and page load errors alongside failing to iterate their product rapidly.

gitlab.com is not their core business. It's mostly a demo.

In 2015 we saw it as our demo. In 2017 we see it as one of our two main products and are improving performance accordingly. 95%+ of our revenue is still coming from licenses instead of GitLab.com. But over time this week change.

That's not how people perceive it, though.

How people perceive it is not relevant if it actually is a side business

How people perceive it may be impacting your main business.

Only they would have the data to be able to determine that, if anyone would have that data at all.

I built a service (~10mm users at peak) which was designed from the ground up to scale, around 2002. When the numbers did grow we just sat back and watched it pretty much.

Even given this, I completely agree with you. That's why I now develop in ruby. I'll take developer productivity over performance any day.

It's the old saying - 'nice problem to have'.

> That's why I now develop in ruby. I'll take developer productivity over performance any day.

In my experience, dynamically typed languages don't do well for developer productivity.

> In my experience, dynamically typed languages don't do well for developer productivity.

At least one empirical study suggests static typing does not improve programmer productivity. Source: http://courses.cs.washington.edu/courses/cse590n/10au/hanenb...

This is just one of the many common industry "wisdoms" that collapse under scientific scrutiny.

This study looks bad across the board. For one, they're writing a parser for a context-free grammar. That's not a good test for the benefits of static typing which often happen with very different types of data mixing, modifying large programs with less breakage, and so on. Contrary to your statement, there's a lot of reports and articles out there where switching to formal specs or a stronger type system reduced the number of defects. The first were in high-assurance where formal specs caught inconsistencies left and right. Later on, military and safety-critical firms looked at defect rates of C, C++, and Ada to find Ada's type system knocked them to half. This counter-example is about something so straight-forward there's many tools that can automate it with one formally-verified for correctness (w/ strongly typed language).

Next, I look at the languages themselves. Java's type system is so weak to barely have any benefit. Terrible choice to test static typing. The control language is then a language based on Smalltalk... the original, great language for dynamic, OOP, and complex programs... that's even simpler than Java & could possibly boost productivity just due to easier review/revision. A proper comparison would be with something like Ocaml, Haskell, Ada/SPARK, or Rust that really uses a type-system with the high-level & composition benefits of something like Smalltalk. Makes me lean toward Ocaml or Haskell than others which slow productivity fighting with compiler.

This paper doesn't prove anything for or against the topic. It's quite weak. The authors should do their next one on the kind of program that others in the field reported benefited from static, strong typing. Then, attempt an experiment on same type of software between a productive language with powerful, type system and a productive, dynamic language of similar complexity. Alternatively, try to modify similarly large applications in parts that touch a lot of the code base without breakage for both static and dynamic languages. Then, we might learn something.

That's a cool experiment, but from a quick read they are creating a compiler which is a bit different than web development. Looking at the related work all seem to contradict this finding and suggest preference to statically typed languages, such as one that looked at API development.

From my own experience with dynamic types it's not the dev time that increases but maintenance time. When you have a team working on an evolving codebase with data types distributed in caches, databases, and code it seems inevitable that you will have an error if you don't do any type checks. Even good linters can miss something that would have been easy to catch with static type checks.

"That's a cool experiment, but from a quick read they are creating a compiler which is a bit different than web development."

They're doing a lot of things wrong. See my reply.

Having worked on corporate Python platform with tens of thousands of developers and hundreds of applications running hubdress of thousands of instances on a many million line common core code base, I beg to differ.

We were sometimes able to go from concept to deployed application in a single day.

Having worked on corporate Python platform with tens of thousands of developers and thousands of applications running on thousands of instances on millions of lines common core code base, I beg to differ.

It is easy to ship any proof of concept. Is it hard to maintain, it is rigged with performance issues and it is almost impossible to refactor because python has zero refactoring support and no compiler to help you catch any error.

What you save on the initial rollout, you loose ten folds on the long term.

But that is basically the point of the whole article and this thread, that it's great for proof of concept that morphs into production when needed. If you can deploy in one day that means you can test 5 ideas/week until one catches on (hopefully not that long). Once one catches on, you can actually put more effort into it and optimize it as needed and not the other way around.

You can't try 5 products or features a week on real users. They don't keep up.

If you're looking to Facebook for your user experience guidelines, you're going to end up angering a lot of actual users.

While making millions to billions dominating a market I own. So, sure. Other circumstances it would just teach you can experiment more than a few things a week.

Facebook have thousands of products and billions of users. You do not ;)

Now that's a better point. Obviously, the amount of changes are tied to company's circumstances.

> We were sometimes able to go from concept to deployed application in a single day.

This just makes me shudder a little bit. I've seen such things before, even recently at my current job. These kinds of applications tend to be a nightmarish mix of unmaintainably overengineered and underdesigned terrible code.

It's not ideal certainly, most such cases would be either a variation of an existing application or something that isn't business facing such as a compliance or operational metrics report.

The key thing is that the testing, release and deployment tooling were not a significant bottleneck. You could develop the code, perform testing in a production like environment and sign it off with a minimum of down time in between waiting for compilation, package building, etc. So a lot of that was down to Python being interpreted.

They do well for single-person code bases, but velocity scales badly with the number of developers in my experience.

I have the opposite experience and I did a lot of back and forth between statically typed and dynamically typed languages over the past decade so I'm not biased in either direction.

And in my experience the exact opposite is true.

That's why it's called an anecdote; not data.

Exactly, I never understand the static vs dynamic flamewars. Most of the issues usually presented stem from poor developers or poor development processes/environments/tools.

Poor code written in JavaScript won't become pretty just by translating to Java, and vice versa. That I've seen anyway.

Many of the "poor development processes/environments/tools" issues are related to choosing dynamic languages in a "ship first, design later (maybe)" paradigm, though.

"Ship first, design later" is completely orthogonal to what language one's using. Using a language that forces one to write int or str before the variables isn't preventing anyone to write software without having a design.

It's not even part of the "static vs dynamic" argument; one can rush things in Java or in C++ just as easily. The only difference is that Java might allow doing it slightly faster.

As someone smarter than me put it: "Bad code is language agnostic"

Yes, catching type errors eliminates one category of error but, generally, the type of programmers that make that kind of error frequently will not constrain themselves to just one category of error.

In many situations these days, I'd argue you don't have to trade one for the other.

>I built a service (~10mm users at peak) which was designed from the ground up to scale, around 2002. When the numbers did grow we just sat back and watched it pretty much.

>Even given this, I completely agree with you. That's why I now develop in ruby.

Ouch that is a pretty big burn.

> I know of absolutely no service which failed because it couldn't scale.

I would say this is because of the simple fact of visibility and adoption. You've probably never heard of these services probably because they ground to a halt with a mere 1000 users, so they never got mainstream enough to be recognised as a viable service. It is a bit like how no one remembers the dozens of people who failed to achieve sustained powered flight before the Wright brothers. Doesn't mean there weren't any, and scalability, technical or planning issues killed those efforts before anyone knew of it.

Some 'scalability' issues are inherent to your initial design, and not just the choice or configuration of your hardware/software platforms.

For instance, what if you were building a contact database of some sort. At first, you may have things like 'Phone Number' and 'Email Address' as part of the 'Person' database. Then, as your service gets popular, you notice people asking for extra contact like Twitter handles, LinkedIn pages etc. So you start adding those to your Person table as extra columns.

Eventually, you realise that you should have though more about this at the outset and have contact details stored in another data table altogether, linked to a 'Contact Type' table and related back to the Person table. This would have been mitigated at the start via better database design and catering for eventualities that you might never have foreseen. Migrating the original database to normalise it is a massive effort in its own right, and probably will take more time, cause more outage time, and cause more bugs in existing code than designing for that eventuality in the first place.

Even if 99% of your users only ever enter Phone and Email contact details, the second option, designed for scalability, will still handle that without a sweat, and 'scaling' to meet additional demands later is merely a matter of adding new contact types in the 'Contact Type' data table so that they become an extra option for all your users.

I am willing to bet that 9 out of 10 'weekend projects' have had to be thrown out completely and redeveloped from scratch when the number of users became significant. Of those rebuilds, I would be interested to see some research into how many users abandoned the said platform when (a) the original one started to grind to a halt or constantly fell over with errors and (b) the new platform came out with new features or a different UX that broke the 'look and feel' of the original.

I mentioned something similar in a separate comment.

My initial DB schema was pretty bad. We did at 2 schema rewrites and migrations from the launch to 5M users. Each time it took 2 weeks of sleep less nights.

The machines today are really powerful. You can do a lot with 244 GB RAM machines backed by SSD.

Someone who doesn't have the skill set to be able to scale once they get traction - it's likely they will not have the skills to design for scale at start.

My recommendation to everyone would be pick a language and db you are most comfortable with and get started as soon as you can. You will fail on the product side a lot more times before you will fail on the technical side.

And if you are failing on technical side, reach out to me. I will definitely be able to help you find a way out. I am not sure if there is any product guy in the world who an make a similar claim on the product front. However there are at least dozens of technical guys in the world who can make a claim like I did.

So focus on launching the product as soon as possible. Work hard, reach out for help if needed. You will eventually get success.

>> You can do a lot with 244 GB RAM machines

Is this a typo of "244 GB" instead of "24 GB"? Nearly any company that has a single machine provisioned with 244 GB of RAM is doing something severely wrong, likely putting the company's ability to grow at risk. Such a machine screams of trying to vertically scale a poorly performing legacy product instead of figuring out to horizontally scale out with 16-64 GB servers.

That much memory on a single server is a huge red flag for 95-99%+ of companies. It takes a very specialized system (ie: you probably don't fit the mold, no matter what your excuses are) to require such a server.

Pretty sure it's not a typo. Amazon i3.16xlarge is 488 GB of Ram, 64 vCPUs, and a 20 Gigabit connection. Postgres/Mysql can absolutely scream perf-wise on that.

AWS also has 2 TB instances: https://aws.amazon.com/ec2/instance-types/x1/

Outside AWS, you can put 3-6 TB in a normal-ish server from Dell or HP, or 64 TB in a big iron box from Fujitsu or IBM.

+1 - it wasn't a typo. DB machines benefit a lot from huge RAM.

I know Basecamp uses a server with 2TB of RAM for their single MySQL server. FYI CTO of Basecamp is the creator of Ruby on Rails -- smart people.

Checkout​ stack overflow stack too.

Their system is faster too.

That is actually insane. The day their MySQL cluster experiences a severe downtime incident, it is likely going to take an enormous amount of time to recover.

That is my intuition. If you know of a video or blog post (conference, talk, article, etc.) where someone explains the benefits of a 2 TB MySQL server and how it is not a crazy bad idea, I would love to see it. Because my 15 years of professional experience screams "NO WAY IN HELL" at that one.

Keeping all data cached in RAM is something that eats a bunch of memory, but is not usually wrong.

I am willing to bet that 9 out of 10 'weekend projects' have had to be thrown out completely and redeveloped from scratch when the number of users became significant.

9 out of 10 weekend projects never get to a significant number of users.

Probably more like 99/100 or 999/1000 but yeah. Also, you can get pretty damn far with PHP and MySQL. Just look at Facebook. Just don't rely on Drupal or Wordpress to get you there and it'll be fine. KISS and solve the scaling issues as they come up. Far too many projects focus way too much on infrastructure and architecture and micro services instead of building something that solves a real problem for users. Focus on that first. It doesn't matter if the tech behind it is a bunch of bash scripts if it does something that is really useful to a lot of people.

An alternative view on history: Facebook succeeded because the competitors failed to scale.

Do you remember MySpace? The big social network with hundreds of million of users that came before facebook.

They had massive scaling and performance issues. At the peak when everyone was moving to social media (almost a decade ago) the site could take an entire minute to load (if loading at all). They lost a lot of users, who went straight to facebook and never recovered.

I was a big MySpace user and late FB user, and I can tell you that it had nothing to do with [tech] scale. Back then I didn't know about programming (my first lines of code were some CSS in MySpace!) and the real reason is that users were becoming more hostile in a site that was a niche on its own. You had to start filtering out and removing SPAM all the time from your public wall. The fact that people could personalize their site and the community (heavy on music and alternative people) put most of the people away.

You can say that it failed to scale, but I wouldn't say that it failed to scale technically; it failed to scale the community. Then at some point when it was already dying they started to change the layout big time I guess to attract more people, alienating the initial community.

MySpace and Friendset have been largely commented in this discussion, in other comments.

I agree, they were different from facebook enough, that it's not all about scaling. Yet, don't underestimate the impact of having your site unreachable, when the competitor is coming strong. Not a good position to be in.

Drupal is for scaling, learn the architecture and you see it.

indeed, I would go so far as to say 9/10 weekend projects never get finished, let alone released, let alone getting enough users

You're refusing to learn from someone who lived it. I also know of zero services which failed because they couldn't scale their technology. But I know of 100s that failed because they couldn't get enough users or usage.

Your example about the DB tables is exactly the trap to avoid while you are iterating rapidly to try and find product/market fit.

You are assuming that I haven't "lived it". Over the past 2 or 3 years, I have built around 8 web apps. Some which got virtually no traction at all, and some which have reached a happy medium of users and income. None which have reached mega scale or millions of users (yet).

Some of the (real world) feedback I got from the web apps that failed to get off the ground were due mainly to our customers complaining that:

* pages were taking too long to reload.

* not enough fields on a particular data table to store information in

* missing API

* pages that would refresh in entirety instead of just updating the changed portions

Most customers didn't stick around to wait for us to address or fix the issues - there were plenty of other competitive products that fitted them better that they could start using that same day.

A couple of those sites also got negative feedback on Reddit or HN because I hosted the front end website on a $5 VPS server and when the site suffered the inevitable 'hug of death' from posting to these sites, they immediately went down due to inability to scale under the sudden deluge of visitors, and I received uncomplimentary feedback on that (where people actually elected to post feedback rather than just close the browser tab and completely forget and move on from my app).

Yes, over thinking scaling and design is bad, but putting it aside as a 'totally not important now' factor is also just as bad, if not worse, in my 'lived it' experience.

It is great that you got a few which got traction out of 8 tries. This is an incredible success rate.

Most people see much less success. The original suggestion is geared towards first few tries that people make. Once they have made a few attempts, they learn from it and naturally build more scalable products without spending extra effort. You have already read my story about my first try which had a very poor start. My subsequent efforts have scaled in that order without needing a single rewrite.

However, if they spend too long procrastinating, worrying about and investing in scale - they are wasting precious time which would be better spent finding product market fit.

Thanks, and also thank you for not reading my posts the wrong way. I am in no way having a go at you or refuting what you say. You've built something and got it to a level that many of us can only dream of, and for that you have my admiration and respect.

Like you, each subsequent app that I built contained the lessons learned from the previous efforts - better front end frameworks, better data table normalising, better hosting infrastructure etc.

Part of me always thinks back to the ones that didn't work, and I always ask myself - would they have worked if they had faster page refreshes, or more flexible data entry? Perhaps if I had spent more time in the planning and design before I launched them, they may not have sunk as quickly? Those 1000 users that time who came from my HN post and saw an 'Error 500' page - could a fraction of them been the ones who would have signed up and made us profitable if they didn't see the error page as their first introduction to my app??

For that reason alone, I am always skeptical of any post that promotes 'iterate often and fast' or 'fail quickly' rather than 'spend time on design and a unique experience'...

It's all in good spirit of sharing and learning.

When you are thinking back, also consider the alternative where could you have missed opportunities and product launches because of extra effort spent on scaling upfront.

It's all good learning in the end. It's great to be in your situation. Keep at it and you will see great success in future!

> It is great that you got a few which got traction out of 8 tries. This is an incredible success rate.

That seems fairly reasonable to me.

It all depends on what you call "success". Running a startup from zero to a billion dollar business is an incredible and rare achievement. Running a small site with a few hundreds/thousands users and a decent income is fairly reasonable.

If it was slow for a small number of users, that is a fundamental design or operations flaw, not a scaling problem.

Nobody is saying that performance doesn't matter. But if it performs well with 100 users, you can worry about the performance with 10k users later. And if it doesn't perform well with 100 users, it doesn't matter if the scaling is O(logn) even O(1).

Performance under a sudden flood of users matters, but a lot less than day-to-day performance. And most of the criticism I see for sites dying is when they're serving static content with ideal O(1) scaling, but in a very unoptimized way.

None of those problems sound like scaling issues.

Absence of evidence is not evidence of absence.

I've been part of many projects that couldn't scale because lack of forethought meant there was a nasty implicit O(n^2) thing going on when it could have easily been designed to be O(n log n).

Almost all of these projects were killed with CYA hand waving about "just not fast enough".

Can you please share some examples. I would love to learn from your experience.

Thanks for putting across your point in such a straight forward way.

I didn't call out directly not the DB tables point. This kind of premature optimization is a huge red flag and I would anyone I am advising to avoid it.

Friendster and MySpace failed because of scaling issues - but it was not scaling per se but bad execution: They choose the wrong technology, brought in the wrong CTOs who bought very expensive hardware (wrong, few expensive high end servers with expensive storage, instead several more low cost normal servers) and software with very high license costs (wrong, ColdFusion on dotNet (experimental) on Windows (high lic costs)). That's why both lost the race, they stalled development because of rewrites for a year (too long).

Edit: you can read about both project histories, and learn about them. Of course on gets down-voted for mentioning it.

As a counterpoint stack overflow is hosted on windows and did just fine. If your service is printing money you can easily fix tech problems (e.g. Facebook effectively rewriting php), if it's not you may flail around trying rewrites etc but the real problem lies elsewhere.

Tech people are quick to find technical reasons for failure but the reasons are usually elsewhere.

1) Friendster had scaling issues. They insisted on keeping the x-degree-of-friendship-calculation which is quite resource expensive and doesn't scale. Instead making the software scalable, they brought in expensive exec who decided to through more and more very high end servers and enterprise storage on it. It eat their startup money, and they were constantly firefighting with very high page load time like 12+ sec, instead of advancing the site for at least one year. MySpace took over Friendster because of this.

2) MySpace culture and management was in trouble due Murdoch's News Corporation bought MySpace's parent company. Instead of investing News Corp choose the wrong path. Instead of improving the site, they decided to do a (in the end) very costly deal with MS to switch their ColdFusion stack from Java stack to dotNet stack incl very expensive MSSQL licenses and various other license costs. It turned out ColdFusion on dotNet was still in very experimental phase and MySpace suffered from the caused troubles a lot and bleed money like never before (which made News Corp very unhappy). MySpace website got little updates for at least one year while MySpace devs were busy with firefighting and switching backend. Facebook took over.

There are books worth reading.

3) Stackoverflow (like 2 years ago) run on less than a dozen of servers. It's several magnitudes smaller than social network services like (former) MySpace, Facebook, Twitter, etc. License costs often don't scale, you bleed through your startup money for little competitive advanced features or return. That's why successful startups often choose open source stack, look at Amazon, Google, (former) Yahoo, Facebook, Twitter, etc. - Perl, Python, PHP, Java, Ruby, MySQL, Postgres, Hadoop, etc. Stackoverflow shows it can be done with off the shelf COTS as well, if you keep the server license count low by wise decisions and performance tuning. But it's not like I could name dozends of successful startups that have a software stack like Stackoverflow. And even Hotmail, Linkedin, Bing run or used to run for the majority of their service-live on mainly open source software stack.

"MySpace took over Friendster because of this."

Friendster was mostly focused on Asia plus lost a ton of users around the same time Facebook gained a ton of users. I don't think it was scaling architecture that did them in. It was the market choosing the competition for the user experience plus what their friends were on. That's for most of the world. I have no idea what contributed to their failure in Asia since I don't study that market when it comes to social media.

EDIT to add: I recall the founder did say they had serious technology problems for a few years that affected them. I'm just thinking Facebook spreading through all the colleges & moving faster on features was their main advantage.

Asia? You came a few years to late... Friendster was US focused, at least until it lost. Then the probably pivoted, but I wrote about the first phase of Friendster (the one everyone in US remembers who was active online in early 2000s).

Hmm. It peaked in Asia with most users in Asia operating from Asia. You don't think the Asian focus could affect its marketing strategy for Americans? Facebook's college angle worked really well after MySpace's express yourself in cluttered pages worked before that. Im not sure Friendster understood our userbase enough to keep it.

Stack Overflow seems very relevant to this conversation given how unremarkable their architecture is – their level of traffic is comfortably running on 4 MS SQL Server boxes:


I've seen too many breathless posts which would have you believe they'd need a clustered NoSQL database or it wouldn't scale.

>>> how unremarkable their architecture is – their level of traffic is comfortably running on 4 MS SQL Server boxes:

Wrong, and wrong.

First, their architecture is extremely remarkable. They are doing and mastering vertical scaling, down to every little details. Terabytes of RAM, C# instead of Ruby/Python, FusionIO drives, MS SQL instead of MySQL/PostGre, etc...

Second, that's at least 6 database servers, each one being more expensive than 10 usual commodity servers:

First cluster: Dell R720xd servers, 384GB of RAM, 4TB of PCIe SSD space, and 2x 12 cores. It hosts the Stack Overflow, Sites (bad name, I’ll explain later), PRIZM, and Mobile databases.

Second cluster: Dell R730xd servers, 768GB of RAM, 6TB of PCIe SSD space, 2x 8 cores. This cluster runs everything else. That list includes Careers, Open ID, Chat, our Exception log, and every other Q&A site (e.g. Super User, Server Fault, etc.).

I think you misunderstand my use of unremarkable. It's not wrong or bad but key parts are the same stack you'd have picked 1-2 decades ago – well implemented, values scaled up, yes, but something you could have shown people in 2000 and not had to explain more than that the hardware costs so much less.

I think that's good: SQL databases are very mature and you don't want to be exciting for your core business data if you don't get some major benefit to defray the cost. Boring is a delightful characteristic for data storage.

My bad then. I think that using a proven technology perfectly tuned and the way it's meant to be done, is remarkable. This should be promoted more, instead of the new and shiny.

I'm not sure you could get 1TB of memory and multi TB SSD drives anywhere in the 2000's, even for a million dollar. That makes a major difference in the ability to scale up. Data didn't grow, storing 1M user account always took the same space.

Agreed that data sizes would have been far more of a challenge – far more people used clustered services just because a single box only supported so much RAM, RAID arrays for IOPs and size as well as redundancy, etc. – so it's arguably become much easier for a growing chunk of the industry.

>believe they'd need a clustered NoSQL database or it wouldn't scale.

While for many developers, NoSQL might be overkill - Stackoverflow is a bad example. If you were any fast growing startup in the cloud, and you wanted to go the SO route it would have meant going CoLo. SO has set of machines with a nearly a terabyte of RAM - GCE doesn't even offer cloud machines with the same specs.

And even then their setup is far more fiscally expensive than something you could get done with 20 cloud nodes on some NoSQL solution.

One: run the numbers on the dozens of cloud nodes you mentioned. Is that really cheaper than renting some servers in a colo?

Second: how much time would they have spent rolling all of the data integrity, reporting, etc. features they'd have needed to add. I'm inclined to take them at their word when they say this was safer and cheaper given their resources.

>I'm inclined to take them at their word when they say this was safer and cheaper given their resources.

I never meant to claim that cloud was safer and cheaper. What I meant was, for the majority of operations, staying in the cloud with some distributed setup is likely more feasible that moving to CoLo (see GitLab).

What you describe is more of a design flaw than a scaling issue.

Thinking ahead that custom fields could be needed in the future is a design choice and can apply to 5k users or 500 000k users. It's not the scaling itself that causing you pain, it just exacerbate the difficulty of a bad design choice.

So advice of OP still apply, at start you should take most of your time develloping the best design for your application and less on scaling. Because scaling a good design is way less painful.

Do you know of resources to learn more about good database design? I've been looking for ways to optimize my current db usage and we're absolutely adding a new field to the Person model every time we feel we need it...

myspace.com failed because it could not scale

I personally switched to facebook for two reasons:

got sick of repeated errors every time I browse myspace

facebook has better album permissions, (myspace has none)

as another comment said these look like bad design not bad scaling.

Many problems including not filtering user input and comments for CSS, JS, etc.

But main problem, they crumbled under heavy traffic

Since I am an Indian I know of a service which is hated by most people - IRCTC because of it's scaling issues. It's a lot better now but I remember those horrible days you have to do a tatkal ticket.

I also know of some government services (my state has paperless administration so everything is digital) which are used in day to day basis (but only hundreds of users) by government employees goes down for almost 2 hours per day everyday which stops work for everyone (both the employees and the people came for that service). These services are developed by companies like TCS and Infosys. Only if they thought of scaling before.

In case of government, it's often another issue. While the social network of the parent was probably a lean stack, the government one was probably over-engineered. Some webapps are front-ends that communicate with a back-end in XML, with massive flaws like serialization and lack of transactions, and the backend syncs with a business solution backed by dBase, all of that for 11 web pages (I know, I've been on govt apps before – France). Scaling was a major requirement from the beginning, but those engineers just don't know what they're doing and follow the XML frontend/ESB bus patterns, assuming that's what gives performance. And committee design (12 to 24 ppl) does the rest of the bloat. 11 screens, $2m, sluggish result.

Had they started with only $300 like the parent comment, their webapp would have often performed much better. Big projects are a difficult curse, our most difficult question in IT engineering.

> These services are developed by companies like TCS and Infosys

This is the real problem: lack of in-house expertise and ongoing incentives to maintain performance, while contractors are usually paid for features and, if they also do hosting, even have a financial disincentive for efficiency.

I would bet it's at least as likely that had someone thought of scaling it would have added a significant cost and delay to the actual project and the first major scalability bottleneck would still have been something unanticipated.

> I know of absolutely no service which failed because it couldn't scale.

If by scaling you mean increasing the number of page views that a given version of a web app can serve, then this is mostly true.

But things like conceptual simplicity, unit economics, codebase readability, strategy, etc. are also dimensions of scaleability. E.g. the only reason that startups can even exist at all is because large companies have a lower marginal output per employee.

"scale" here, as with all conversations on HN about "scale", refers to the technology required to service a large number of users or usage.

I think you'll find there are plenty of conversations here where "scale" is used in reference to stuff like the business models of "gig economy" apps, or at what point your startup has enough scale to need an HR department, and so on and so forth.

> "I know of absolutely no service which failed because it couldn't scale."

Zooomr did. They came out of nowhere in 2006 and were a real threat to Flickr. They had AJAX-powered editss, geotagging, various other unique features and they were starting to pull in some highly followed followed photographers.

But, the site kept crashing as traffic grew and some scaling problems even lead to data loss. I wanted to see them win but they just couldn't keep up with traffic and eventually Yahoo cloned their features and they became irrelevant.


Main question here is that would they have done better if they had started with scale in mind. When you start you don't know how you product will look like when it has millions of users. You will likely do major product changes quite a few times.

If someone cannot scale your product which has adoption, they likely don't have insights, problem definition, capabilities, skills and resources to do that when they launch.

If I could edit my original post, I would put a rider related to capable team for service not dying.

> However I always had enough notice to fix issues.

Thank you. This is the one major point I try to get across to people when 'scaling' comes up. "Oh, that won't be scalable" or "yeah, but when we have 2 million users, XYZ won't work". I've been on projects where weeks and months (calendar time, part time efforts) were spent on things that were 'scalable' instead of just shipping something that worked earlier.

I keep trying to tell folks on these projects (have been involved in a couple now) - man, we're not going to go from 20 users to 2 million users overnight. We'll notice the problems and can adapt.

Trying to whet their appetite, I've even argued that a new 'flavor of the month' will be out before our real scaling needs hit, and we can then waste time chasing that fad, which will be even cooler than the current fad we're chasing (although... I try to be slightly more diplomatic than that).

> Time is the only currency you have which can't be earned and only spent.

I would argue that you can earn lifetime with a healthy life style. I remember to have read a study that measured the average difference between healthy and unhealthy life styles with 14 years.

If you spend 1 day a week more for a healthy lifestyle (exercise, self cooking, enough sleep, little stress, enough recreation time) for 40 years you have earned ~9 years life time.

And I would argue that those 40 years are more enjoyable.

You make a great point. Health should be the topmost priority and I need to get better at it.

However to my original point - you can lose fitness (within reasonable limits) and then gain it back and not notice a difference later on. However time lost is lost forever.

Yes you can have a life style debt. But do not overload it or forget to pay it back.

You're omitting a lot of details. The number of users isn't as important, for example, as the number of users simultaneously using services and which services they were using concurrently. Also, no offense here, but the way you describe your experience makes it seem like you were very inexperienced in general. Your "not noticing" services that failed could mean, for example, that you weren't measuring properly and probably never learned to. In which case your anecdote isn't necessarily a useful one.

It might be a good idea to read my post again. When I talk about 'service which failed' it was other products (e.g. Facebook, Twitter etc) - not components of my product.

My post says 50M+ users and 1B+ message on a day. Almost 50% daily active users. And this was early 2000 hardware. Last year WhatsApp was doing 60 billion messages per day (60x of what we did).

However our messages had a lot more logic, because we delivered on unreliable Telco texting pipes. Pipes had a set throughput. Some pipes delivered to just some geographies. They would randomly fail downstream so you had to do health monitoring and management. Some pipes cost per message and some cost based on throughput. So pick in real time for the lowest cost. Different messages had different priority. In summary - there was a lot more business logic per message sent.

I was of course very inexperienced when I started. I was just two years out of college. However in the end the product was 50+ services running on 200+ servers. This experience helped me scale user communication infra at Dropbox to 100x scale within 3 months.

Your message was flame-bait at best and didn't seem to contribute to the discussion. You were passing judgements based on incomplete information. You could have asked nicely and still gotten a response.

Edit: fixed typo

> It might be a good idea to read my post again. When I talk about 'service which failed' it was other products (e.g. Facebook, Twitter etc) - not components of my product.

This wasn't clear at all from your comment.

> Your message was flame-bait at best and didn't seem to contribute to the discussion. You were passing judgements based on incomplete information. You could have asked nicely and still gotten a response.

My message was not flame bait; that's your interpretation. I'm sorry you took offense, but please do consider that your original comment does omit lots of important details, as you apparently (should) know from your experience at Dropbox.

Also, per your quoted numbers WhatsApp was going 60x what your app was, not 1/60. Typo?

This is a delightful retelling; thanks for sharing.

> I know of absolutely no service which failed because it couldn't scale.

Genuine question: what about Friendster?

Friendster definitely failed because they couldn't pull off scale. I was CTO of the Tagged social network at the time and the issue loomed large during our 2005 fundraising. Investors had seen $50m go down the drain when Friendster got slow. Despite network effects, users moved over to products that were up and running (mostly MySpace but also hi5, etc.).

I seem to recall that the Friendster team had more than one aborted rewrite before they got it right, and by then it was too late. Scaling social networks required a different bag of tricks than was popular at the time (e.g., database replication was well accepted as a best practice, but sharding worked much better). Talented people could still get it wrong.

Still, social networking illustrates how failing to scale is rare, and should practically never be the concern early on. Almost every social network that gained traction struggled to scale at times, but the vast majority of them overcame their challenges.

We poured sweat and tears and many sleepless nights into scaling the Tagged back-end from 10 machines to over 1,000, over time serving hundreds of millions of users. It was a tremendous amount of work, much more than building the initial product, and I don't think there's much we could have done early on to make later scaling easier.

More on the story: http://highscalability.com/blog/2011/8/8/tagged-architecture...

Thanks for sharing your experience.

There is always scope for a botched execution or just pure bad luck. I wish someone shared more candid details of what happened at Friendster.

The Startup podcast[1] has an upcoming series about Friendster

1 - https://gimletmedia.com/startup/

I remember reading that Friendster founder was complaining that there were facing a lot of technical issues and investors were not letting him invest effort in fixing them.

IMHO, if your investors need to be involved in seeking approval for handling scaling pains and on top of that if they reject it - there are deeper issues within the company (management issues, politics, technical incompetence, leadership issues).

So to clarify my point - any team could still screw up technical execution. However there is no reason that given a good team and support from within the company, a product can't be scaled.

Friendster's problem was that instead of fixing its problems, it decided to re-write.

You started with Java... a language designed to handle credit card transactions world wide. Your story would be very different if you started with say Rails.

If you build stateless horizontally scalable service (and you should) - you are almost always going to be blocked on DB. On the app server front you just keep adding more machines behind load balancer.

As I look back, if we had used RoR or Python we might have moved faster with no negative impact on scale. We would have spent more money on extra servers though!

Awesome story and takeaways. What happened to your product and what did you learn you could not keep up with or grow into?

I built this product as a side project within another startup I was working in. We had lot of money, had been around for 3 years and did not have a product.

Because of this I had multiple levels of bosses above me in the company. They all got really interested in starting to manage me and the product once we hit 5M users. They had a different vision, attachment and passion about the product.

Around 2008, we were spending a lot of money on text messaging. Most people where using the product to send messages for cheap to their friends and family (essentially like WhatsApp vs Twitter).

I wanted to slowly phase out text messaging in favor of Data (essentially become WhatsApp!). CEO didn't believe that Mobile Data will get adoption in India for a long time. He instead wanted to focus on monetizing - by sending ads on text messages, making people pay for premium content etc. We tried 8-10 things - nothing worked (this can be a really big post in itself).

Soon I quit and moved to US. It was like a bad breakup!

My biggest learnings were:

1. If you do not have ownership like a founder, do not that take responsibility like a cofounder.

2. Personally for me: do not work where I don't have veto rights. But bills need to be paid and family has to be supported. Took me 5 years of really hard work to get there!

Thanks for sharing your experience. People are getting caught up in Ruby v Java v whatever and not seeing the forest for the trees - you have a valuable perspective and an effective way of communicating it. Appreciate it!

May I ask you about your first 'biggest learning': how do you eschew cofounder responsibilities in an early company? Someone who genuinely cares and/or is really interested in the core business (pretty common among first hires of startups) may have a hard time turning work away or willfully not participating in major discussions/planning when they believe their contributions could be valuable. In other words, situations when you know you can contribute but the company will get the 'better end of the deal' as you take on more responsibility without taking on more compensation.

It sounds difficult to ride that out - were you able to do that successfully or are you saying "This happened to me, don't let it happen to you?"

Just saw this comment. Not sure if you will see the response. You raise a really good point.

If someone is joining a startup at an early stage, work like it is your own baby. You are spending your most valuable currency by being here - which is your time!

With time, hopefully, you will get rewarded for your effort and results. You will likely not get a founder level say and stake but your role should gradually expand. In the end founders take almost the same risk as the first employee, but get a lot more equity because founder was there at the start. Join a startup as an early employee only if you are able to accept this fact - otherwise you are setting yourself up for a lot of resentment and pain.

This above advise applied to me too for the first two years of my employment, before I started working on the ultimate successful product. We were building an offline search engine and as a part of that I build distributed file systems, crawled a billion pages and I was appropriately rewarded with career growth.

However with the new social product that I built, the situation was dramatically different. For that product I was there from day 0. I was owning the whole engineering (sole engineer at start and head engineering till I left) and significant portion of product. Till we got 5M users, I had a co-founder level say (but not the stake of course) and I worked with a co-founder level of involvement and dedication till that time. However once we got 5M users, multiple levels above me started doing meetings and taking critical decisions without involving me. I still had a co-founder level passion and it hurt to see the product flounder and being unable to do anything about it.

So I was in a uniquely bad situation and it is not common to be in this situation. However if someone really is and there is no way to get a founder level control - move out as soon as you can. And this is what I did - albeit 2 years too late.

I can name one: Darcs. It was my favorite certain control system, apart from the lack of scalability.

Darcs is a distributed version control system. It is inherently not a hosted solution that "needs to scale" the way the article is talking about.

I'd definitely say that Darcs failed to catch on because it didn't scale with the size of the repo, the larger your history was the more likely you were to run into performance issues, and there was little you could do about it because the underlying theory is flawed.

The scaling problem with Darcs wasn't about hosting, but in the implementation of their theory of patches and how it handled conflicts. See http://darcs.net/FAQ/ConflictsDarcs1#problems-with-conflicts for an overview of the issues in Darcs 1 and 2.

This seems to be also one of the driving forces behind the development of Pijul, which is also patch-based instead of snapshot-based, which makes it easier to understand and use, but all implementations so far had major performance issues once repositories grow. For more on that, see https://pijul.org/faq.html

I was one of the early users of Darcs 1, back before Git existed. I wanted to use version control, but the alternatives were pretty hard to understand and use. While Darcs was really nice to use (, and fast on small repos, after a few years I had to convert everything to git because exponential times on most operations was just not sustainable, and fixing that required constant vigilance and altering history, not very friendly for new contributors.

Even GHC moved from Darcs to git in 2011 because of this: https://mail.haskell.org/pipermail/glasgow-haskell-users/201...

I love this example. Thanks for sharing! It's very encouraging to remind me to just make something and get it out there.

One prominent counterexample that comes to mind is Friendster which was a big social network before MySpace and Facebook. They had terrible performance issues but to be fair to your point, their decline was more mismanagement than a true inability to scale.



> I know of absolutely no service which failed because it couldn't scale.

Well, Friendster failed for exactly that reason. But, those were the early days.

p.s Completely agree with the article.

> I know of absolutely no service which failed because it couldn't scale.

You must not be looking around at all, right?

>I know of absolutely no service which failed because it couldn't scale

So what are some examples of services which failed because they tried too hard to scale too early?

For example, when your network could handle 20k users, was there another network that could have already handled 500k, but they failed because they started a month after yours?

>I know of absolutely no service which failed because it couldn't scale.

These days there would be hardly any way to fail to scale given a large budget. The technology to do so is all so good. However, back in the day it wasn't always so. I'd say Friendster was an example of a company that failed due to scaling issues.

Some things simply don't scale. You can't do some things in real-time. You have to do it in batch-jobs or avoid them. If you know about the O-notation, you will probably know certain algos don't scale. Friendster insisted to calculate the friends-of-friends-of-friends relationship in real-time and was not able to scale it. MySpace and Facebook did the same calc in the early days too, but both scrapped the feature as they were not able to scale it. Other examples are big data (e.g. log analysis), if you choose the wrong software stack (database) or wrong indexes you won't be able to scale it (especially if it's real time. You will bleed money in no time for cloud servers with TBs of RAM, when you could do it with a couple of servers with a normal sharded SQL database and one hand tuned index.

> The technology to do so is all so good.

Mistake #1: Took the wrong technologies.

For every technology that can scale, there is a bunch of other that will give a lot of troubles.

Depends on what you're doing, easy to fail when you do analytics or something like datadog.

How much money did Twitter waste in its migration to Java? I think it's an important factor to consider.

They have thousands of people sitting around doing nothing. They don't really optimize for expenses.

Can you share a link to the site?

I think in your particular case, you had a good intuition about performance/scalability and this is why it was not hard for you to scale. However this is not the case for everyone and I have seen many counterexamples in my career.

There have been a bunch of discussions about language choices and impact on productivity. Wanted to touch upon this here based on my experience.

TLDR: Depends a lot on individual situation. Pick the language you are fastest and most comfortable with.


SMS Gupshup - 6 years - Java and some C++ - Built distributed filesystems. Crawled 1B pages with early 2000 hardware. Built map-reduce framework before Hadoop came. Built a search engine on top of it. Built an infra which would send 1B+ text messages with prioritization and monitoring. Social network with 50M+ users.

LinkedIn main stack - ~ 1 year - Java

LinkedIn mobile stack - ~ 1 year - Javascript - Was one of the early engineers.

Founder Startup - 1.5 years - Clojure

Dropbox - ~ 1 year - Python

Head Technology Tech incubator - 1.5 years - Node.js

As you can see my experience has been all over the spectrum. While primarily using one language, I would keep experimenting with other languages like Scala, Go.

Following are my learnings and the reasonings:

I love Clojure. Love it so much that I feel like quitting everything go to a mountain cabin and just code Clojure. However I would most likely not use Clojure for my startup. The learning curve is really-really steep. It would make it very difficult to me to build a team. It's really fun if you understand it. But first few weeks for a new person might be a nightmare and very emasculating. So even though I really love it, my current strategy is to bring Clojure patterns to other languages.

User facing logic

I would not use a statically typed Object Oriented language for any user facing logic. User facing logic and behavior tends to be very fluid and I feel that it gets strangled by the constraints of statically typed languages. One you start building things with Class, Inheritance and Polymorphism etc - soon these patterns start driving business logic rather than the other way around. Because of this reason you will see most arcane and slow moving popular sites today are Java based - (LinkedIn, Yahoo, Amazon, Ebay) vs (Facebook, Twitter, Instagram, Pinterest). For this purpose I like using Dynamically typed languages building code in as functional way as possible. Pick Node.js, Python, Ruby - whatever you are comfortable with.

In some cases people recommend Statically types languages for catching errors because of type safety etc. However I would solve that problem completely with test suite rather than solving it part of the way using language features while being forced to pick a statically types language.

Another big benefit of dynamically types languages is ability to just create a dictionary and start using it. Most CRUD apps work with JSON request and response and I would rather just dynamically pass objects, rather than build a class for the request and response objects for each and every route. This was a nightmare to create and maintain in Scala and Java.

Fast developing Infrastructure

E.g. Hadoop, Hive, Distributed FS or Queue. 5 years back I would have built the in Java. However I would use Go for this today. However you couldn't go wrong either way. Pick what you are most comfortable with.

There is decent amount of talent available. These languages are easy to learn. And you get close to C level performance if you built it right. Development is fast paced.

Mission critical performance - File System or Database

Most likely C, because you want to squeeze out last of the performance. However be ready for really slow development cycles. And most smart kids out of college wouldn't know C. So you will need to set aside some time till they are really productive.

This is probably because implicitly made available decisions from the get go. For example, if you had chosen MongoDB instead of MySQL, the story would be different. Ditto if say, Meteor instead of Java.

I am guessing you point is that if I had chosen Meteor and MongoDB - we wouldn't have been able to scale.

My initial DB schema was pretty bad. Had to do quite a few DB migrations - which took weeks of work to execute.

I have used MongoDB extensively since, and I am confident that it would have helped us scale to 5M users comfortably. I might have migrated to something else at that point.

There are very few products who reach even 5M users. Which is why developers should focus on launching fast.

Another thing is that best and most successful product have a really simple core product (remember Facebook, Twitter, Instagram etc when they had 5M users). It is not that much work to migrate and rewrite. When products are not getting traction - is when they start getting overcomplicated.

Isn't Meteor a framework? I'm assuming you meant Javascript instead of Meteor

Yes, I should have said JavaEE vs JavaScript

Available --> scalable (typo)

> I know of absolutely no service which failed because it couldn't scale.

I know HN skews young when no one remembers Friendster.

need much longer story if you could share. I's also Indian and Mine died at about 500k due to my focus on scalability.

Would love to know more about your story.

My biggest learning is Functional->Fast->Pretty from the product POV and Launch as soon as you can from an Engineer's POV.

Some more details on my blog: https://anandprakash.net/2016/10/28/how-to-build-a-business-...

Most of this was written in sleepless nights when we had a baby. So do not expect a super artistic flair ;)

At step 3, how many messages per day were you pumping with 64MB allocated to MySQL?

We were using MySQL like a persistent queue. As soon as a message was spawned, we would send it to downstream SMS gateways and delete them. So essentially that volume didn't use a lot of memory.

At stage 3 we had 50% daily active users who would receive on an average of 8 messages per day. So at step 3, we would send 20M messages per day.

Arguably Friendster failed due to its interminable scaling issues

> I know of absolutely no service which failed because it couldn't scale.

Bullshit. What about the site (voat something?) that tried to be an alternative to reddit when there was some scandal there but couldn't keep users because they kept crashing?

Bullshit. You can go on voat right now just fine and there are people using it just fine. It hasn't failed, it's just not a big deal because it's an almost exact replica of a site that already exists and has traction. The difference being one allows you to hate on fat people all the time and the other only lets you hate on fat people most of the time. It's just too niche of a USP for most people.

I don't think that's a fair analysis of what happened. Though it seems obvious that they have lost some potencial users by being down so many times, ultimately what has obstructed Voat’s success is its core community: the fringe of the fringe of reddit.

It's still going and they post about the scaling issues being due to lack of funding available that would allow them to continue their business model.


But is that a true scaling problem like "2x servers can't handle 2x users" or is it a business model problem like "every user means losing money and we can't afford more even if the cost was O(n^0.9)"?

I can't find the posts you're talking about.

What application did you built ?

Obligatory "Mongo DB Is Web Scale" https://www.youtube.com/watch?v=b2F-DItXtZs

There is scaling and there is scaling.

A messaging service is the most trivial service one can make, it's a well known easy problem that's been solved for decades with current technologies. There is no challenge in that. For comparison, WhatsApp had people with experience and they could handle 1 billion users with 50 people.

The fact that you can handle 10M users with a single untuned MySQL database is not a demonstration that scaling is overrated. It's an expression than you are running a trivial service that doesn't do much.

Almost any problems will be more challenging than that. There are endless companies that have 1/100th the customer base and yet require 100 times the data volume and engineering.

I care.

It's easy to brush off scaling concerns as not important, but I've had personal experience where it's mattered, and if you want a high profile example, look at twitter.

Yes, premature optimization is a bad thing, and so is over engineering; but that's easy to say if you have the experience to make the right initial choices that mean you have a meaningful path forward to scale when you do need it.

For example, lets say you build a typical business app and push something out quickly that doesn't say, log when it fails, or provide an auto-update mechanism, or have any remote access. Now you have it deployed at 50 locations and its 'not working' for some reason. Not only do you physically have to go out to see whats wrong, you have to organize a reinstall at 50 locations. Bad right? yes. It's very bad. (<---- Personal experience)

Or, you do a similar ruby or python app when your domain is something that involves bulk processing massive loads of data. It works fine and you have a great 'platform' until you have 3 users, and then it starts to slow down for everyone; and it turns out, you need a dedicated server for each customer because doing your business logic in a slow language works when you only need to do 10 items a second, not 10000. Bad right? yes. Very. Bad. (<---- Personal experience)

It's not premature optimization to not pick stupid technology choices for your domain, or ship prototypes.

...but sometimes you don't have someone on the team with the experience to realize that, and the push from management is to just get it out, and not worry about the details; but trust me, if you have someone who is sticking their neck out and go, hey wait, this isn't going to scale...

Maybe you should listen to what they have to say, not quote platitudes.

Ecommerce is probably one of those things where the domain is well enough known you can get away with it; heck, just throw away all your rubbish and use an off-the-shelf solution if you hit a problem; but I'm going to suggest that the majority of people aren't building that sort of platform, because its largely a solved problem.

I don't like opinionated pieces like the one presented here because while they are right about some things they miss other things and present half-truths as full-truths.

In my experience (and I have little of that) it's important to know the upgrade path and adjust your planning accordingly.

How many users can you serve with your solution?

How big do you expect the market to be in that stage?

What technology would be the next step?

How do you get there? How much more work would it be?

Always be one step ahead with technology, but not two. Most markets are surprisingly small. Most use cases scale surprisingly well. You can probably push your solution by an order of one magnitude if you need it quickly.

In order to answer the questions you need people who know the product, the (potential) technologies and the market. When you start you probably won't know any of that. See the first prototype you deploy to the customers as a means of collecting data for the first production version. Your first product is not your first product. Your funding should respect that. Get it done quickly with the aim of answering the critical questions. Then go back and design the next version "good-enough" for the second scaling step with the upgrade path in mind.

> You can probably push your solution by an order of one magnitude if you need it quickly.

You can always push a python/ruby app an order of magnitude by putting an order of magnitude more AWS instances.

It will almost always bankrupt you in the medium term.

The only place I've seen it sustainable is a place that was generating a $100 per user, and there weren't many active users either (thousands, not millions).

> You can always push a python/ruby app an order of magnitude by putting an order of magnitude more AWS instances.

Or a Java app or a Go app. Really, if one's working in a domain where the language would become the bottleneck, one deliberately screwed up by going against the grain because that language is little used in that domain.

For almost everything else or with very specific exceptions, something else is the bottleneck.

Everyone here is giving anecdotal "evidence" of their claims so I'll follow suit:

In the first company I worked in, the backend was entirely in Java and the application was internal (in-house CMS), meaning only ten users tops; everything about it was horribly slow. It was a sea of poor code, there was no such thing as a deployment pipeline and the servers it was hosted in were inadequate. There was also no relational schema to speak of in the database (basically MySQL used as a dumb document store).

The next place I worked at, my team's job was to build an actual customer-facing application. We did it in python and, while it only has a few hundred users so far, there haven't been complaints about poor performance that I know of.

Really, for every Twitter replacing Ruby, there's a Facebook written in PHP. Don't understand why so many people use one side to support their claim but forget the other.

Langauge is rarely the first bottle neck . This also doesn't apply in the same away for data stores and when your app is stateful. Scaling rdbms is not the simple. Query fine tunning and performance optimisation for single slow queries cannot be solved by just more resources. Just from a database pov Handling replication , multi master caching , backup and fail over especially can fuck you over . In b2b a single data loss event can kill your business, same with security .

Java or Go are easily 10 times faster than Python/Rail on the same hardware. (Potentially much more, especially when you have non trivial logic to process in the code).

That's a recurrent issue with API code (that usually has to be somewhat fast). Not so much for frontend generation.

P.S. Sorry but hundreds of customer is nothing. That's served by a pair of boxes (DB + webserver) irrelevant of the language.

> Java or Go are easily 10 times faster than Python/Rail on the same hardware. (Potentially much more, especially when you have non trivial logic to process in the code).

... I know that, everyone here knows that; saying it is little more than stating the obvious. Evidently, you missed my point so I fold; but once more, JIC: if an application is slow, one of the last things one should look into to improve performance is the language; it rarely is the bottleneck.

P.S.: I know it's nothing, all anecdotal evidence is nothing. For every anecdote one could give, anyone could give counterexamples. That was another point I tried to make. Cheers.

> it rarely is the bottleneck.

If you're using python, it can be the bottle neck quite easily.

I don't endorse change language as an optimisation; that's just ridiculous.

...but you might look at splitting your application up into parts, and doing some service in a more suitable language; or, up front realizing that you have a heavy data processing workload you need to do in parallel, and python isn't a good choice for it.

Maybe you're right; you can look at optimisations that patch over the problem with queries and so forth as a first pass; but in some cases your choice of language (specifically node and python in my experience) are actually fundamentally the performance problem (but to be fair, not always).

...but basically, if you don't address the root cause of your perf issues (whatever they are), you're going to be patching and firefighting forever.

> If you're using python, it can be the bottle neck

> data processing workload you need to do in parallel

It's funny how often those two go together, you're of course correct (for now). I never wrote that the language couldn't be the bottleneck, though.

If you are doing something that requires heavy parallelization then, by all means, don't use python or replace it if you're already using it... But that kind of workloads isn't actually a common (i.e. 50% or more of all applications) scenario.

> ...but basically, if you don't address the root cause of your perf issues (whatever they are), you're going to be patching and firefighting forever.

Yes, that's what I'm saying: Find the bottleneck and solve that. I was only addressing the, in my opinion, undue focusing on languages of the comment I replied to.

> quite easily

Only if one is working with poor developers. And this is true for all languages.

> Only if one is working with poor developers

It's not about the developers, it's about the workload.

Objectively, you can't write high performance multi-threaded python. No one can; it's not possible; it's just slow.

If you're rewriting your code in C++ so its not slow and pretending its python, you should just rewrite your code in C++. That's not writing fast python, it's writing C++.


So python can easily be a bottle neck, regardless of how good your developers are, if you've picked it for a poor purpose:

That's my point: Don't pick the wrong language for your task in the first place. ...and specifically python is the wrong choice for certain types of heavy lifting.

> it's about the workload.

... Why did you repeat yourself? I already told you I agreed with you on that.

Is it something specific you want me to tell you? It's going to be easier if you tell me what you want to read, otherwise we'll keep going in circles.

> Quite easily [...] poor purpose

That's an immediate contradiction. Yes, if one picks the wrong tool, that tool often becomes the bottleneck; but for every other scenario in which the tool is alright, it takes poor developers for the tool to become a bottleneck.

In my first comment I wrote that the language isn't the bottleneck, except for specific circumstances... And you took one of those specific circumstances and keep running with it. It's not a counterargument to my original point, if that's what you're trying to do.

The article is about choosing a business idea where technical scale isn't important. Twitter would fall under a "media play" which is what the author is telling the reader not to do. They aren't saying "try to build twitter but skimp on the tech" they're saying "don't try to build twitter, build something where people pay you money to use it and you'll be raking in millions in profit by the time technical scaling is an issue". Terrible example imo.

Twitter is not exempt. They focused on the product before scale. They were regularly overloaded in the early days.

    The article is about choosing a business idea where technical scale isn't important.
Those sorts of business ideas don't exist, unless you don't have any customers. :)

Yeah, it's easy to apply platitudes to the wrong problem.

I've seen plenty of people sweat milliseconds in front-end javascript that, even as the user's data amassed, would never amount to anything more than the blink of an eye.

That's a very different animal from "a dozen concurrent users will crash the server" and "this is fragile and full of edge cases, it basically won't work in production."

Wouldn't twitter be a counter-example? They had some issues but resolved them later when the had the resources to and they are now a multi-billion dollar company. That may not have happened had the devoted resources into scaling from the outset.

Taking a completely blasé approach to efficiency is potentially as dangerous as becoming hyper-focused on it.

Not all businesses become roaring successes, and those who achieve moderate success often don't get the resources to fix deep-seated performance or architectural issues (either via engineering and/or throwing hardware at it.) Eventually these technical woes can completely halt momentum and I've seen it even drown some businesses who just aren't able to dig theirselves out of the hole the find themselves in.

People always seem to be arguing for extremes, but the most sensible approach for most tends to be somewhere in the middle.

One approach that I like is to think about scaling in terms of epochs. Each epoch of your system should handle maybe 3 orders of magnitude (let's say users, epoch 1 is 1k - 100k, epoch 2 is 100k - 10m, etc — epoch 0 was your MVP/PoC)

When you implement each epoch, do a paper design for the next epoch, this helps you think about how you will get there, and can prevent you from writing yourself into a corner, without over indexing on scaling issues that you don't have yet

Agreed that completely blasé is pretty bad.

However in this case being blasé would mean, not working hard to scale when you users are getting a bad experience. A basic stack these days node.js+mongo, go+*sql can easily handle more than 100k users even with one of the worst implementations. Most products don't reach that point!

> A basic stack these days node.js+mongo, go+*sql can easily handle more than 100k users...

That is making some very large assumptions about application workload. For an application which is purely a CRUD interface to a database, yes.

Agreed on this point.

Many a times there is heavy lifting - recommendations, machines learning etc. However that is don't using specialized technologies in a non user facing process and the results are then dumped into a DB available for a CRUD app.

I am hoping that products with such requirements will have some obvious tools for such tasks (Hadoop, Hive etc) and they would find a way to scale such processes with time.

So their stack might be go+*sql+Hadoop.

Can you please suggest some use-cases which don't fit the above pattern and maybe we can brainstorm. Seems like a fun exercise!

Most apps are mainly CRUD.

I agree with your comment.

BTW, the phrase you're looking for is "deep seated", not "deep seeded".

Updated my post, cheers.

No. The most sensible approach is optimizing for developer efficiency while you figure out how to get users and usage. If you are lucky enough to get product/market fit, scaling is easy.

> If you are lucky enough to get product/market fit, scaling is easy.

And, as in the case of Twitter, the technology stack is the least of your problems.

I agree with the general premise of avoiding premature optimizations, but designing systems that scale is important for several reasons:

- Startups grow exponentially, if you're playing catchup as you're growing you are focusing on keeping the lights on and hanging on for the ride. Important for a growing company to focus on vision.

- Software that scales in traffic is easier to scale in engineering effort. For example, harder for a 100 engineers to work on a single monolith vs 10 services.

- Service infrastructure cost is high on the list of cash burn. Scalable systems are efficient, allow startups to live longer.

- If the product you are selling directly correlates to computing power, important to make sure you are selling something that can be done profitably. For example, if you are selling video processing as a service, you absolutely need to validate that you can do this at scale in a profitable manner.

I also don't agree with the premise that speed of development and scalable systems are always in contention. After a certain point, scalable systems go hand and hand with your ability to execute quickly.

In the projects that I was involved in in the past years, the ones that decided to build a scalable infrastructure before gaining traction were exactly those that were burning cash on infrastructure. There is just a reasonable minimum of servers you need to run such an infrastructure and then you need something similar as a staging environment to make sure it all works together correctly.

That being said, there are of course choices that can be made early on that will help you in case you need to scale at some time. Spending a bit of time thinking through what would happen if you do need to split up and scale out your system will help you avoid pitfalls such as relying too much on the local filesystem being shared across pageviews.

In general I believe there is more benefit on spending time on performance early on. (not caching, that is just postponing the problem) as it benefits not just your ability to run on a single system for much longer, it will also make life better for your users.

Scalable system does not imply needing many machines. There is no reason you can't just deploy your entire system on a single host, but maintain the ability to reconfigure exactly when needed.

But yes - its about finding the right balance.

I totally agree with you on all the points.

I think it takes experienced managers and developers with business-oriented mindset to achieve the best compromise. Saying "there is no need to overengineer now" can be as bad as months of overengineering. It all depends on the business.

I care because it usually goes like this: Product manager > "Niche sass app {x} will never need to support more than 10-20 users"

Two weeks after launch > "We have 10k users and counting, why didn't you architect this for scale?"

Always assume you underestimated the scope of the project.

Interesting. It might be a by-product of having worked in larger companies, but my experience has pretty much always been: Product manager > "We need this to work for 10k r/s at launch". Actual launch throughput (and all time peak): 100 r/s

Now I always mostly just assume (capacity asked / 10) is more than enough!

As an employee of a large company, I can relate to this experience. The manager has grandiose views about the product, but has no data to back his claims.

I've literally been waffling back and forth between UUID and BigInt for the last week. The reason for this is I need to handle eventually distributed systems. Do I need UUIDs? Maybe not, but after much debt, I've decided that the storage requirements of such an index in memory is worth the ability to move from Postgres to Postgresql-xl.

> Do I need UUIDs? Maybe not, but after much debt, I've decided that the storage requirements of such an index in memory is worth the ability to move from Postgres to Postgresql-xl.

There is a major benefit to UUIDs besides server-side scaling concerns: you can generate them offline in mobile apps and other client-side applications.

With serial primary keys, you are often stuck between waiting for a server round trip before saving data locally, or inventing complicated solutions for referring to not-yet-uploaded data locally.

The problem is not with memory problem is that after checkpoint postgres will do full page writes for serial you will be touching way fewer pages compared to UUID so write amplification for UUID vs bigint can be 60X+

Can you go into a bit more detail? I've seen amplification referenced before. My estimation is that the database will be pretty much read heavy. I will need to split a given database across multiple instances.

https://blog.2ndquadrant.com/on-the-impact-of-full-page-writ... this is fairly detailed analysis of this issue

Thank you. I'm reading over it now and for the next few days (to really grok it). I do wonder how bad it will be for low write situations. At the same time I'm open to switching to BigInt and using Postgresql-xl.

How about simply using non-random UUIDs?

That would def. help

luid, check it out

Ok what can you write it in that you can't scale to 10K users by just adding web servers? The hard to scale part is storage layer but even crappy options like RDS will happily support 10K users.

Microsoft Access.

Works fine for something one team uses, works... not fine for something the entire corporation uses.

Well there are tools to help easily migrate Access to SQL Server, but this is moving fairly far from the original premise of this topic.

Does Access actually work well for 20 users?

Last time I built something in Access was on Access 1997. I believe it had about 100 concurrent users and over a gig of data. Those both exceed the hard limits of access at the time so I had a single read/write database on a share and multiple read only databases on shares. I had to split tables among multiple database files as well.

My direct boss was the CFO and he was cheap as hell. It's amazing the things we can come up with when we have to.

Anecdotally it doesn't: had to use a time tracking program that used an Access database on a shared network folder and would frequently freeze up or crash due to locking issues (with around that many users). I imagine it could be done better though.


> but even crappy options like RDS

Curious as to why you see RDS as a 'crappy' option??

Given the choice of building and maintaining a MySQL box in the corner of your bedroom and hosting it on RDS - I know which way I would go (and I have done both over the years).

If you are talking 'scaling' in terms of the OP's article, then RDS is almost a no brainer, and you can scale your instances (and add replicated instances etc.), hide it behind a VPN, set up firewalls to prevent DDoS attacks all in a matter of minutes, with a few mouse clicks.

The highest IOPS you can get on RDS is about a single consumer grade SSD.

Fair enough. But if the alternative is setting up a consumer grade PC on a consumer grade internet connection in a (non environmentally or power controlled) corner of a room at home, then surely we are looking at a similar situation?

Given that spooling up a RDS instance is quicker and cheaper than buying all the hardware and spending time installing and configuring an SQL server, then if time to market is a critical factor, you would choose RDS over a home grown solution, wouldn't you?

[Edit: I just realised that this is actually the point you were making, and I got sidetracked by my above question]

Have you heard of hosting? No one is running servers in their house. Nobody.

No one hosts MySQL out of their bedroom. C'mon. The alternative is hosting the DB yourself at a hosting provider which is what most do and it works fine.

Everywhere I've worked that tried to architect for massive scales at the beginning has ended up with an architecture that inhibits scaling. Using correct indexes and transactions can go a long way.

Usually the opposite: business thinks it's going to get lots of users, engineers code for such, business way undershoots estimates, stuck with over-engineered app.

I've never heard of a PM saying such a thing.

It's a good things to overestimate in a Big Corp, because the managers always penny pinch and will put the blame on you if things go to shit.

If you're a startup, go the other way.

I don't know how shitty must a system be to not able to handle 10k users with vertical scaling.

I'm in the unfortunate position where this question actually matters from day 1. I learnt the hard way a few days ago when I hit the bottleneck (about 50-100 concurrent users) and I'm not sure how to proceed.

It's a multiplayer drawing site built with Node.js/socket.io. I'm already on the biggest Heroku dyno my budget can allow and it's too big of a task to rewrite the back end to support load balancing (and I wouldn't know where to start). Bear in mind that this is a side-project I'm not making any money of.

I had a lot of new features planned but now I've put development on hold. It's not fun to work on something you can't allow to get popular since it would kill it.

Socket.io can't handle many connections you're right. I would use (universal) Web Sockets (This is a good abstraction library, very easy to implement).


Maybe put it on a Digitalocean Server and just reference your Dyno's IP address if possible.

Thanks! Looks good so that's definitely something I should look more into.

What is it your processor spends its time on? Perhaps those parts could be rethought and redesigned? Perhaps heroku is the wrong solution for those tasks even if it is good for the rest of the app. Would it be possible to find something cheaper for just those parts?

To be honest I haven't really benchmarked it. Maybe that's something I should invest time in to see if there's some obvious bottleneck I can fix. Thanks for the suggestion.

Heroku is fairly expensive. Could you not port this to a more dedicated server? Yes, it's likely complicates deployment but for a few hours of work in initial deployment, you can save hundreds of hours rewriting your codebase.

I'm thinking about it but I'm no server admin so it feels a bit scary. I could probably do it quite easily but it would be a security nightmare.

I think what the author of the original article would suggest that you need to convert some of the concurrent users to paying customers and use the money to purchase additional Heroku dynos.

I don't think that's possible for me. I have some future monetization plans but that requires features I don't have yet. At the current state I have a hard time seeing anyone would pay for it. Also, due to how the back end is structured it's not possible to scale horisontally at the moment so all I could do with money is beefing up the current dyno, not adding more.

> At the current state I have a hard time seeing anyone would pay for it.

You'll never know for certain until you try. You need to ask put a dollar sign and see how current users respond. Of course only a handful of users will pay, after which you can slowly ramp up prices as you build out the features you have planned.

Hey, you might want to check out https://realm.io/ , the demo is exactly what you said. Good luck!

Looks cool but seems like it's aimed at mobile and not websites? Also, how do they differ from Firebase?

My previous experience with Firebase makes me hesitant to use a BaaS. It works great for simple stuff but as soon as you want more control it becomes a mess compared to just having your own server. It might just be me who haven't wrapped my head around the mindset correctly though.

Maybe you could rely on Firebase for the real-time db stuff and do the drawing there.

I actually used Firebase for the first version but ran into a lot of different issues, mostly due to the lack of control. That's why I migrated to my own Node server where I could have my own back end logic to control things.

This was many years ago though so maybe I should have a look at it again.

Flexibility vs efficiency. Agile vs high momentum.

As a rule of thumb, start-ups need to be more agile as they are mostly exploring new territory, trying to create new value or re-scope valueless things into valuable things.

Larger companies operate at a scale where minor efficiency improvements can mean millions of dollars and thus require more people to do the same thing, but better. Individualistic thinking on new directions to go is not needed nor appreciated.

Of course there are excepttions. The question boils down to whether or not the ladder is on the right walk before charging up it.

In rare circumstances you can do both. Either the problem is trivial, or the problem becomes trivial because you have a super expert. 10x programmers who habitually write efficient code without needing to think too much have more bandwidth for things like strategy and direction. The car they drive is both more agile, accelerates faster, has a higher max speed, etc...but even this can't move mountains. The problem an individual can solve, no matter the level of genius, is still small in scope conpared to the power of movements and collective action and intention.

The most poweful skill is to seed these movements and direct them.

Abstractly, this is what VCs look for in founders and also a reason why very smart and technical people feel short-changed that they are not appreciated for their 10x skills. (Making 500k instead of millions/billions) They may have 10x skills, but there are whole orders of magnitude they can be blind to.

The reason scale isn't so important today is because most DBs can actually scale vertically to really high numbers. The tipping point is high enough that if you have this problem, you probably can also afford to fix it.

What matters though is performance and availability. No matter what scale you work at, you can't be slow, that will drive people away. You also can't be unavailable. This means that you might have to handle traffic spikes.

Depending on your offering, you probably also want to be secure and reliable. Losing customer data or leaking it will drive customers away too.

So, I'd mostly agree, in 2016, scale isn't a big problem. Better to focus on functionality, performance, security, reliability and availability. These things will impact all your customers, even when you only have one. They'll also be much harder to fix.

Where scale matters is at big companies. When you already have a lot of customers, you’re first version of any new feature or product must already be at scale. Amazon couldn't have launched a non scalable prime now, or echo. Google can't launch a non scalable chat service, etc.

> Where scale matters is at big companies. When you already have a lot of customers, you’re first version of any new feature or product must already be at scale. Amazon couldn't have launched a non scalable prime now, or echo. Google can't launch a non scalable chat service, etc.

This is the point I was going to make. When you are designing the very platforms on which all these "does it scale? who cares" startups are going to be built, you do not have the luxury of having that attitude. Definitely don't take that mindset into an interview at google or amazon. :)

I do agree with the OP's sentiment for startup projects in general. I have had the experience of worrying about scale too early and over-engineering a system which never had more than a few dozen users, and the opposite experience of tossing something together that wouldn't scale and then going through the hairy scrambling at every order of magnitude of scale through tens of millions of users. The latter was definitely a better strategy.

Healthcare.gov was a site that failed and suffered due to scaling issues.

However, it was an anomaly because unlike a product someone this article is intended towards would be building, that site had an immediate audience of millions of users from the get go.

Also, the fact that it took a few weeks to be rewritten to handle the load at which point it became extremely successful, strengthens the original article's point.

By the time scalability becomes a problem, you will have enough resources to tackle the scalability problem.

This is not an anomaly.

It's the norm for all government projects, and most projects started in established companies that already have millions of users.

To be working in these domains, the "doesn't need to scale" mentality" is very inappropriate and I find it to be doing a lot of damages.

The guys at healthcare handled most the scaling problems, accounting for the means at their disposal and the time frame they had, judging by what was published in the news. That it took a few weeks to smooth showed that they handled scaling beforehand. A launch from 10 users to 10 million is always a bit bumpy for the initial weeks.

No one else bothered by "End-to-end tracking of customers" as the primary concern? Ok then.

On the subject of scaling, I think it's good to have an idea in your head about a path to scalability. One server, using PHP and MySQL? Ok. Just be aware you might have to load balance either or both the server and DB in the future, and that's assuming you've gotten the low hanging fruit of making them faster on their own. But as this thread's top comment illustrates, learning that stuff on the fly isn't too hard. So maybe it's better to make sure you're going with technology you sort of know has had big successes elsewhere (like Java, PHP, or MySQL) and even if you're not quite sure how you might scale it beyond the defaults you know others have solved that problem and you can learn later if/when needed.

He makes the valid point that performance for each individual user, like page load time, does matter. Just that building for an audience size you don't yet have is mostly wasted time.

Seems reasonable. I wonder, though, if PHP feels like an anchor to the average Facebook developer. I realize they architected around it, but it must have some effect on recruiting, retention, etc. I use PHP myself, and don't hate it, but the stigma is there.

1) FB uses a lot of languages other than Hack 2) Hack is fairly reasonable language even has pipe operator :) 3) While it is prevailing sentiment that PHP sux I think PHP 7 is fairly reasonable language

Even ignoring the innumerable flaws PHP (yes including 7) has as a language, its standard library is among the worst standard libraries of any language.

Everything has flaws but PHP is a reasonable choice for many problems. Python pip is crap compared to composer for example. While my favorite language is Elixir I mostly do Node and Python at work can't say experience is significantly better with either of them compared to PHP 7.

>Everything has flaws

False equivalence. PHP has many, many more flaws to a much deeper and more serious level than any other mainstream programming language. It's insecure, buggy, full of broken behaviour for legacy systems, slow and easy to misuse. Its standard library is inconsistent, hard to learn, easy to misuse, full of legacy behaviour and slow.

>but PHP is a reasonable choice for many problems.

PHP is an unreasonable choice for every problem unless you have already solved your problem with PHP. I'm not saying that Facebook should rewrite in something else, obviously, but nobody should be starting new work in PHP. Nobody.

>Python pip is crap compared to composer for example.

There's nothing at all wrong with it.

>While my favorite language is Elixir I mostly do Node and Python at work can't say experience is significantly better with either of them compared to PHP 7.

Well you're quite objectively wrong. Yes Node.js is a pile of shit comparable to PHP, but it's Javascript, what do you expect? Javascript is basically the second worst mainstream programming language right behind PHP. Yes they've both had superficial changes that make them more enjoyable to write recently, but none of those changes fix the underlying inherent problems with those languages.

Python, on the other hand, is a well-built, well-designed, much more sane language.

OK here's a list of things I miss from PHP when doing large projects in Python. 1) Much better typing support 2) interfaces 3) visibility control 4) composer

As far as none would choose it for new project don't know if you ever heard of this company called Slack they chose PHP fairly recently and seem to be doing OK.

Not trying to argue about languages, but Slack is not a good example of a great technical product - backend wise. It's bloated, slow, can't support cases with too many users, and makes way too many requests just to initialize.

When did you last write a project in PHP? Your thoughtsmight have been valid ten years ago, but they aren't now.

fwiw, I find the PHP stdlib more complete than many other languages.

When you have that sort of attitude about 2 of the most widely used languages/platforms in existence, I think it's more a reflection on you.

Well he does like python which is above them both in popularity.

besides haystack, needle vs needle, haystack, what do you view as so bad about PHP's standard library? The main critique I've heard is that it's so big, but it's hard to see that as a downside unless you're a maintainer of the language itself. To me it's great to have all this stuff built in, you very rarely need to add in dependencies, and besides fewer dependencies being better in general, it also means you can get started faster, you don't have to spend so much time figuring out what's the popular library for solving x, and is it still maintained, and was it built with a bunch of other weird dependencies? You just search the docs, find what you need, make note of what you need, issues, examples, etc., and you're off to the races.

Any stigma should be prioritizing tech over the business. Your business exists to turn a profit, stack be damned.

That works to motivate some employees, assuming their comp is relative to your business success. Facebook has been able to do that for some time, but there's probably a plateau or two in their future.

The business isn't your concern if you don't work there yet.

i.e. it's not bad to avoid working somewhere because they use PHP. That's not 'prioritising tech over business', it's just choosing to avoid toxic crap technology.

Facebook doesn't want those developers anyway.

The overall premise of the blog is exactly correct; though I would say some areas you probably need to consider more than others.

Spending a lot of time figuring out what exact microservice/sharding/etc strategy you need to serve a zillion visits a day and building it before you've even got customer/visitor one is overkill out of the gate. But that shouldn't mean you shouldn't think about how you'll scale over the short term or medium term at all.

When I approach scaling, I'll tend to spend much more time on the data retention strategy than anything else: databases (or other stores), being stateful, means that it's a harder problem to deal with later than earlier as compared to the stateless parts of the system. Even so, I'm typically not developing the data services for the Unicorn I wish the client will become, I'm just putting a lot more thought into optimizing the data services I am building so it won't hit the breaking point as early as it might if I were designing for functionality alone. I do expect there to be a breaking point and a need to change direction at some point in these early stage designs. But in that short to medium term period, the simpler designs are regularly easier to maintain than the fully "scalable" approaches that might be tried otherwise, and rarely do those companies ever need anything more.

I like the overall idea here. Focus on building something quality first then worry about scaling later. Most servers can handle a decent amount of traffic. Seems like common sense to me. I guess some people can get too hung up on engineering to make their site scale before actually deploying or innovating on the product. Wonder if people have encountered this in the workplace before?

I think the point is to make something that makes money first, then use the money to make something quality.

I agree with this article; however, there are considerations that can be made early on, to ensure an easy path for future scalability, that don't waste any time or money during the project's nascency. For example, if I know I want my app to scale at some point in the future, I may opt to build it in Elixir, or I may choose to use a Riak cluster rather than of MySQL.

Ok but please don't do things like using nested array searches with bad runtime when you can use hashmaps instead. I hate seeing code or using programs that are extremely unoptimized.

That seems to be the real story behind most "We switched from stack X to stack Y and gained an Z-fold increase in performance" stories. They tend to leave out the "and we also fixed some generically bad things we found when porting".

Hashes are more convenient too. It's just code smell by poor developers.

Once worked at a startup where we expected 'some activity' in our webshop at product launch, and didn't think about scaling for that. Well, some activity turned out to be 450mbit/s for five hours, which our unscalable application/webshop didn't handle very well. It became overloaded in the first minute, and took us more than an hour to get remote access again. It's one of those things we did better for our next big event (major sharding, basically replicated the application 32 times of the largest VM instance we could get. It was needed and it survived).

I agree with this article only for launching new products. But if you already have a product which is serving millions of customers, you better worry about scale while you change anything.

While a point the article tries to make (fix your leaky funnel before acquiring users) is true, I disagree with the article. If your application is converting well, scalability problems are not acceptable.

I have seen applications that convert very well, but were limited by scalability problems. That meant that the business had to hold on on marketing and user acquisition, missed their financial targets, and that cascaded into breaching contracts. The phrase that nobody wants to hear in that situation is "who cares about scalability".

Now, if you did not have a lot of problems scaling in your particular case, that just means it was not an obstacle for you. e.g: you had good intuition around performance/scalability, or the problem was coincidentally a good fit for your technological choices.

Unfortunately not everyone has a good intuition about scalability, not everyone is risk averse and not everyone is good at picking a good technology for their use case. So I disagree with this article in the sense that it is not in the best interest of a random reader to not care about scalability.


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact