I built the biggest social network to come out of India from 2006-2009. It was like Twitter but over text messaging. At it's peak it had 50M+ users and sent 1B+ text messages in a day.
When I started, the app was on a single machine. I didn't know a lot about databases and scaling. Didn't even know what database indexes are and what are their benefits.
Just built the basic product over a weekend and launched. Timeline after that whenever the web server exhausted all the JVM threads trying to serve requests:
1. 1 month - 20k users - learnt about indexes and created indexes.
2. 3 months - 500k users - Realized MyISAM is a bad fit for mutable tables. Converted the tables to InnoDB. Increased number of JVM threads to tomcat
3. 9 months - 5M users - Realized that the default MySQL config is for a desktop and allocates just 64MB RAM to the database. Setup the mysql configs. 2 application servers now.
4. 18 months - 15M users - Tuned MySQL even more. Optimized JDBC connector to cache MySQL prepared statements.
5. 36 months - 45M users - Split database by having different tables on different machines.
I had no idea or previous experience about any of these issues. However I always had enough notice to fix issues. Worked really hard, learnt along the way and was always able to find a way to scale the service.
I know of absolutely no service which failed because it couldn't scale. First focus on building what people love. If people love your product, they will put up with the growing pains (e.g. Twitter used to be down a lot!).
Because of my previous experience, I can now build and launch a highly scalable service at launch. However the reason I do this is that it is faster for me to do it - not because I am building it for scale.
Launch as soon as you can. Iterate as fast as you can. Time is the only currency you have which can't be earned and only spent. Spend it wisely.
They haven't failed (yet), but I think gitlab.com would be a lot bigger if they scaled faster. Lots of people have been rejected using it because it was too slow.
The other counter-example I can think of is Myspace who frequently had tons of server errors and page load errors alongside failing to iterate their product rapidly.
Even given this, I completely agree with you. That's why I now develop in ruby. I'll take developer productivity over performance any day.
It's the old saying - 'nice problem to have'.
In my experience, dynamically typed languages don't do well for developer productivity.
At least one empirical study suggests static typing does not improve programmer productivity. Source: http://courses.cs.washington.edu/courses/cse590n/10au/hanenb...
This is just one of the many common industry "wisdoms" that collapse under scientific scrutiny.
Next, I look at the languages themselves. Java's type system is so weak to barely have any benefit. Terrible choice to test static typing. The control language is then a language based on Smalltalk... the original, great language for dynamic, OOP, and complex programs... that's even simpler than Java & could possibly boost productivity just due to easier review/revision. A proper comparison would be with something like Ocaml, Haskell, Ada/SPARK, or Rust that really uses a type-system with the high-level & composition benefits of something like Smalltalk. Makes me lean toward Ocaml or Haskell than others which slow productivity fighting with compiler.
This paper doesn't prove anything for or against the topic. It's quite weak. The authors should do their next one on the kind of program that others in the field reported benefited from static, strong typing. Then, attempt an experiment on same type of software between a productive language with powerful, type system and a productive, dynamic language of similar complexity. Alternatively, try to modify similarly large applications in parts that touch a lot of the code base without breakage for both static and dynamic languages. Then, we might learn something.
From my own experience with dynamic types it's not the dev time that increases but maintenance time. When you have a team working on an evolving codebase with data types distributed in caches, databases, and code it seems inevitable that you will have an error if you don't do any type checks. Even good linters can miss something that would have been easy to catch with static type checks.
They're doing a lot of things wrong. See my reply.
We were sometimes able to go from concept to deployed application in a single day.
It is easy to ship any proof of concept. Is it hard to maintain, it is rigged with performance issues and it is almost impossible to refactor because python has zero refactoring support and no compiler to help you catch any error.
What you save on the initial rollout, you loose ten folds on the long term.
This just makes me shudder a little bit. I've seen such things before, even recently at my current job. These kinds of applications tend to be a nightmarish mix of unmaintainably overengineered and underdesigned terrible code.
The key thing is that the testing, release and deployment tooling were not a significant bottleneck. You could develop the code, perform testing in a production like environment and sign it off with a minimum of down time in between waiting for compilation, package building, etc. So a lot of that was down to Python being interpreted.
That's why it's called an anecdote; not data.
It's not even part of the "static vs dynamic" argument; one can rush things in Java or in C++ just as easily. The only difference is that Java might allow doing it slightly faster.
Yes, catching type errors eliminates one category of error but, generally, the type of programmers that make that kind of error frequently will not constrain themselves to just one category of error.
>Even given this, I completely agree with you. That's why I now develop in ruby.
Ouch that is a pretty big burn.
I would say this is because of the simple fact of visibility and adoption. You've probably never heard of these services probably because they ground to a halt with a mere 1000 users, so they never got mainstream enough to be recognised as a viable service. It is a bit like how no one remembers the dozens of people who failed to achieve sustained powered flight before the Wright brothers. Doesn't mean there weren't any, and scalability, technical or planning issues killed those efforts before anyone knew of it.
Some 'scalability' issues are inherent to your initial design, and not just the choice or configuration of your hardware/software platforms.
For instance, what if you were building a contact database of some sort. At first, you may have things like 'Phone Number' and 'Email Address' as part of the 'Person' database. Then, as your service gets popular, you notice people asking for extra contact like Twitter handles, LinkedIn pages etc. So you start adding those to your Person table as extra columns.
Eventually, you realise that you should have though more about this at the outset and have contact details stored in another data table altogether, linked to a 'Contact Type' table and related back to the Person table. This would have been mitigated at the start via better database design and catering for eventualities that you might never have foreseen. Migrating the original database to normalise it is a massive effort in its own right, and probably will take more time, cause more outage time, and cause more bugs in existing code than designing for that eventuality in the first place.
Even if 99% of your users only ever enter Phone and Email contact details, the second option, designed for scalability, will still handle that without a sweat, and 'scaling' to meet additional demands later is merely a matter of adding new contact types in the 'Contact Type' data table so that they become an extra option for all your users.
I am willing to bet that 9 out of 10 'weekend projects' have had to be thrown out completely and redeveloped from scratch when the number of users became significant. Of those rebuilds, I would be interested to see some research into how many users abandoned the said platform when (a) the original one started to grind to a halt or constantly fell over with errors and (b) the new platform came out with new features or a different UX that broke the 'look and feel' of the original.
My initial DB schema was pretty bad. We did at 2 schema rewrites and migrations from the launch to 5M users. Each time it took 2 weeks of sleep less nights.
The machines today are really powerful. You can do a lot with 244 GB RAM machines backed by SSD.
Someone who doesn't have the skill set to be able to scale once they get traction - it's likely they will not have the skills to design for scale at start.
My recommendation to everyone would be pick a language and db you are most comfortable with and get started as soon as you can. You will fail on the product side a lot more times before you will fail on the technical side.
And if you are failing on technical side, reach out to me. I will definitely be able to help you find a way out. I am not sure if there is any product guy in the world who an make a similar claim on the product front. However there are at least dozens of technical guys in the world who can make a claim like I did.
So focus on launching the product as soon as possible. Work hard, reach out for help if needed. You will eventually get success.
Is this a typo of "244 GB" instead of "24 GB"? Nearly any company that has a single machine provisioned with 244 GB of RAM is doing something severely wrong, likely putting the company's ability to grow at risk. Such a machine screams of trying to vertically scale a poorly performing legacy product instead of figuring out to horizontally scale out with 16-64 GB servers.
That much memory on a single server is a huge red flag for 95-99%+ of companies. It takes a very specialized system (ie: you probably don't fit the mold, no matter what your excuses are) to require such a server.
Outside AWS, you can put 3-6 TB in a normal-ish server from Dell or HP, or 64 TB in a big iron box from Fujitsu or IBM.
Checkout stack overflow stack too.
Their system is faster too.
That is my intuition. If you know of a video or blog post (conference, talk, article, etc.) where someone explains the benefits of a 2 TB MySQL server and how it is not a crazy bad idea, I would love to see it. Because my 15 years of professional experience screams "NO WAY IN HELL" at that one.
9 out of 10 weekend projects never get to a significant number of users.
Do you remember MySpace? The big social network with hundreds of million of users that came before facebook.
They had massive scaling and performance issues. At the peak when everyone was moving to social media (almost a decade ago) the site could take an entire minute to load (if loading at all). They lost a lot of users, who went straight to facebook and never recovered.
You can say that it failed to scale, but I wouldn't say that it failed to scale technically; it failed to scale the community. Then at some point when it was already dying they started to change the layout big time I guess to attract more people, alienating the initial community.
I agree, they were different from facebook enough, that it's not all about scaling. Yet, don't underestimate the impact of having your site unreachable, when the competitor is coming strong. Not a good position to be in.
Your example about the DB tables is exactly the trap to avoid while you are iterating rapidly to try and find product/market fit.
Some of the (real world) feedback I got from the web apps that failed to get off the ground were due mainly to our customers complaining that:
* pages were taking too long to reload.
* not enough fields on a particular data table to store information in
* missing API
* pages that would refresh in entirety instead of just updating the changed portions
Most customers didn't stick around to wait for us to address or fix the issues - there were plenty of other competitive products that fitted them better that they could start using that same day.
A couple of those sites also got negative feedback on Reddit or HN because I hosted the front end website on a $5 VPS server and when the site suffered the inevitable 'hug of death' from posting to these sites, they immediately went down due to inability to scale under the sudden deluge of visitors, and I received uncomplimentary feedback on that (where people actually elected to post feedback rather than just close the browser tab and completely forget and move on from my app).
Yes, over thinking scaling and design is bad, but putting it aside as a 'totally not important now' factor is also just as bad, if not worse, in my 'lived it' experience.
Most people see much less success. The original suggestion is geared towards first few tries that people make. Once they have made a few attempts, they learn from it and naturally build more scalable products without spending extra effort. You have already read my story about my first try which had a very poor start. My subsequent efforts have scaled in that order without needing a single rewrite.
However, if they spend too long procrastinating, worrying about and investing in scale - they are wasting precious time which would be better spent finding product market fit.
Like you, each subsequent app that I built contained the lessons learned from the previous efforts - better front end frameworks, better data table normalising, better hosting infrastructure etc.
Part of me always thinks back to the ones that didn't work, and I always ask myself - would they have worked if they had faster page refreshes, or more flexible data entry? Perhaps if I had spent more time in the planning and design before I launched them, they may not have sunk as quickly? Those 1000 users that time who came from my HN post and saw an 'Error 500' page - could a fraction of them been the ones who would have signed up and made us profitable if they didn't see the error page as their first introduction to my app??
For that reason alone, I am always skeptical of any post that promotes 'iterate often and fast' or 'fail quickly' rather than 'spend time on design and a unique experience'...
When you are thinking back, also consider the alternative where could you have missed opportunities and product launches because of extra effort spent on scaling upfront.
It's all good learning in the end. It's great to be in your situation. Keep at it and you will see great success in future!
That seems fairly reasonable to me.
It all depends on what you call "success". Running a startup from zero to a billion dollar business is an incredible and rare achievement. Running a small site with a few hundreds/thousands users and a decent income is fairly reasonable.
Nobody is saying that performance doesn't matter. But if it performs well with 100 users, you can worry about the performance with 10k users later. And if it doesn't perform well with 100 users, it doesn't matter if the scaling is O(logn) even O(1).
Performance under a sudden flood of users matters, but a lot less than day-to-day performance. And most of the criticism I see for sites dying is when they're serving static content with ideal O(1) scaling, but in a very unoptimized way.
I've been part of many projects that couldn't scale because lack of forethought meant there was a nasty implicit O(n^2) thing going on when it could have easily been designed to be O(n log n).
Almost all of these projects were killed with CYA hand waving about "just not fast enough".
I didn't call out directly not the DB tables point. This kind of premature optimization is a huge red flag and I would anyone I am advising to avoid it.
Edit: you can read about both project histories, and learn about them. Of course on gets down-voted for mentioning it.
Tech people are quick to find technical reasons for failure but the reasons are usually elsewhere.
2) MySpace culture and management was in trouble due Murdoch's News Corporation bought MySpace's parent company. Instead of investing News Corp choose the wrong path. Instead of improving the site, they decided to do a (in the end) very costly deal with MS to switch their ColdFusion stack from Java stack to dotNet stack incl very expensive MSSQL licenses and various other license costs. It turned out ColdFusion on dotNet was still in very experimental phase and MySpace suffered from the caused troubles a lot and bleed money like never before (which made News Corp very unhappy). MySpace website got little updates for at least one year while MySpace devs were busy with firefighting and switching backend. Facebook took over.
There are books worth reading.
3) Stackoverflow (like 2 years ago) run on less than a dozen of servers. It's several magnitudes smaller than social network services like (former) MySpace, Facebook, Twitter, etc. License costs often don't scale, you bleed through your startup money for little competitive advanced features or return. That's why successful startups often choose open source stack, look at Amazon, Google, (former) Yahoo, Facebook, Twitter, etc. - Perl, Python, PHP, Java, Ruby, MySQL, Postgres, Hadoop, etc. Stackoverflow shows it can be done with off the shelf COTS as well, if you keep the server license count low by wise decisions and performance tuning. But it's not like I could name dozends of successful startups that have a software stack like Stackoverflow. And even Hotmail, Linkedin, Bing run or used to run for the majority of their service-live on mainly open source software stack.
Friendster was mostly focused on Asia plus lost a ton of users around the same time Facebook gained a ton of users. I don't think it was scaling architecture that did them in. It was the market choosing the competition for the user experience plus what their friends were on. That's for most of the world. I have no idea what contributed to their failure in Asia since I don't study that market when it comes to social media.
EDIT to add: I recall the founder did say they had serious technology problems for a few years that affected them. I'm just thinking Facebook spreading through all the colleges & moving faster on features was their main advantage.
I've seen too many breathless posts which would have you believe they'd need a clustered NoSQL database or it wouldn't scale.
Wrong, and wrong.
First, their architecture is extremely remarkable. They are doing and mastering vertical scaling, down to every little details. Terabytes of RAM, C# instead of Ruby/Python, FusionIO drives, MS SQL instead of MySQL/PostGre, etc...
Second, that's at least 6 database servers, each one being more expensive than 10 usual commodity servers:
First cluster: Dell R720xd servers, 384GB of RAM, 4TB of PCIe SSD space, and 2x 12 cores. It hosts the Stack Overflow, Sites (bad name, I’ll explain later), PRIZM, and Mobile databases.
Second cluster: Dell R730xd servers, 768GB of RAM, 6TB of PCIe SSD space, 2x 8 cores. This cluster runs everything else. That list includes Careers, Open ID, Chat, our Exception log, and every other Q&A site (e.g. Super User, Server Fault, etc.).
I think that's good: SQL databases are very mature and you don't want to be exciting for your core business data if you don't get some major benefit to defray the cost. Boring is a delightful characteristic for data storage.
I'm not sure you could get 1TB of memory and multi TB SSD drives anywhere in the 2000's, even for a million dollar. That makes a major difference in the ability to scale up. Data didn't grow, storing 1M user account always took the same space.
While for many developers, NoSQL might be overkill - Stackoverflow is a bad example. If you were any fast growing startup in the cloud, and you wanted to go the SO route it would have meant going CoLo. SO has set of machines with a nearly a terabyte of RAM - GCE doesn't even offer cloud machines with the same specs.
And even then their setup is far more fiscally expensive than something you could get done with 20 cloud nodes on some NoSQL solution.
Second: how much time would they have spent rolling all of the data integrity, reporting, etc. features they'd have needed to add. I'm inclined to take them at their word when they say this was safer and cheaper given their resources.
I never meant to claim that cloud was safer and cheaper. What I meant was, for the majority of operations, staying in the cloud with some distributed setup is likely more feasible that moving to CoLo (see GitLab).
Thinking ahead that custom fields could be needed in the future is a design choice and can apply to 5k users or 500 000k users. It's not the scaling itself that causing you pain, it just exacerbate the difficulty of a bad design choice.
So advice of OP still apply, at start you should take most of your time develloping the best design for your application and less on scaling. Because scaling a good design is way less painful.
I personally switched to facebook for two reasons:
got sick of repeated errors every time I browse myspace
facebook has better album permissions, (myspace has none)
But main problem, they crumbled under heavy traffic
I also know of some government services (my state has paperless administration so everything is digital) which are used in day to day basis (but only hundreds of users) by government employees goes down for almost 2 hours per day everyday which stops work for everyone (both the employees and the people came for that service). These services are developed by companies like TCS and Infosys. Only if they thought of scaling before.
Had they started with only $300 like the parent comment, their webapp would have often performed much better. Big projects are a difficult curse, our most difficult question in IT engineering.
This is the real problem: lack of in-house expertise and ongoing incentives to maintain performance, while contractors are usually paid for features and, if they also do hosting, even have a financial disincentive for efficiency.
I would bet it's at least as likely that had someone thought of scaling it would have added a significant cost and delay to the actual project and the first major scalability bottleneck would still have been something unanticipated.
If by scaling you mean increasing the number of page views that a given version of a web app can serve, then this is mostly true.
But things like conceptual simplicity, unit economics, codebase readability, strategy, etc. are also dimensions of scaleability. E.g. the only reason that startups can even exist at all is because large companies have a lower marginal output per employee.
Zooomr did. They came out of nowhere in 2006 and were a real threat to Flickr. They had AJAX-powered editss, geotagging, various other unique features and they were starting to pull in some highly followed followed photographers.
But, the site kept crashing as traffic grew and some scaling problems even lead to data loss. I wanted to see them win but they just couldn't keep up with traffic and eventually Yahoo cloned their features and they became irrelevant.
If someone cannot scale your product which has adoption, they likely don't have insights, problem definition, capabilities, skills and resources to do that when they launch.
If I could edit my original post, I would put a rider related to capable team for service not dying.
Thank you. This is the one major point I try to get across to people when 'scaling' comes up. "Oh, that won't be scalable" or "yeah, but when we have 2 million users, XYZ won't work". I've been on projects where weeks and months (calendar time, part time efforts) were spent on things that were 'scalable' instead of just shipping something that worked earlier.
I keep trying to tell folks on these projects (have been involved in a couple now) - man, we're not going to go from 20 users to 2 million users overnight. We'll notice the problems and can adapt.
Trying to whet their appetite, I've even argued that a new 'flavor of the month' will be out before our real scaling needs hit, and we can then waste time chasing that fad, which will be even cooler than the current fad we're chasing (although... I try to be slightly more diplomatic than that).
I would argue that you can earn lifetime with a healthy life style.
I remember to have read a study that measured the average difference between healthy and unhealthy life styles with 14 years.
If you spend 1 day a week more for a healthy lifestyle (exercise, self cooking, enough sleep, little stress, enough recreation time) for 40 years you have earned ~9 years life time.
And I would argue that those 40 years are more enjoyable.
However to my original point - you can lose fitness (within reasonable limits) and then gain it back and not notice a difference later on. However time lost is lost forever.
My post says 50M+ users and 1B+ message on a day. Almost 50% daily active users. And this was early 2000 hardware. Last year WhatsApp was doing 60 billion messages per day (60x of what we did).
However our messages had a lot more logic, because we delivered on unreliable Telco texting pipes. Pipes had a set throughput. Some pipes delivered to just some geographies. They would randomly fail downstream so you had to do health monitoring and management. Some pipes cost per message and some cost based on throughput. So pick in real time for the lowest cost. Different messages had different priority. In summary - there was a lot more business logic per message sent.
I was of course very inexperienced when I started. I was just two years out of college. However in the end the product was 50+ services running on 200+ servers. This experience helped me scale user communication infra at Dropbox to 100x scale within 3 months.
Your message was flame-bait at best and didn't seem to contribute to the discussion. You were passing judgements based on incomplete information. You could have asked nicely and still gotten a response.
Edit: fixed typo
This wasn't clear at all from your comment.
> Your message was flame-bait at best and didn't seem to contribute to the discussion. You were passing judgements based on incomplete information. You could have asked nicely and still gotten a response.
My message was not flame bait; that's your interpretation. I'm sorry you took offense, but please do consider that your original comment does omit lots of important details, as you apparently (should) know from your experience at Dropbox.
Also, per your quoted numbers WhatsApp was going 60x what your app was, not 1/60. Typo?
> I know of absolutely no service which failed because it couldn't scale.
Genuine question: what about Friendster?
I seem to recall that the Friendster team had more than one aborted rewrite before they got it right, and by then it was too late. Scaling social networks required a different bag of tricks than was popular at the time (e.g., database replication was well accepted as a best practice, but sharding worked much better). Talented people could still get it wrong.
Still, social networking illustrates how failing to scale is rare, and should practically never be the concern early on. Almost every social network that gained traction struggled to scale at times, but the vast majority of them overcame their challenges.
We poured sweat and tears and many sleepless nights into scaling the Tagged back-end from 10 machines to over 1,000, over time serving hundreds of millions of users. It was a tremendous amount of work, much more than building the initial product, and I don't think there's much we could have done early on to make later scaling easier.
More on the story:
There is always scope for a botched execution or just pure bad luck. I wish someone shared more candid details of what happened at Friendster.
1 - https://gimletmedia.com/startup/
IMHO, if your investors need to be involved in seeking approval for handling scaling pains and on top of that if they reject it - there are deeper issues within the company (management issues, politics, technical incompetence, leadership issues).
So to clarify my point - any team could still screw up technical execution. However there is no reason that given a good team and support from within the company, a product can't be scaled.
As I look back, if we had used RoR or Python we might have moved faster with no negative impact on scale. We would have spent more money on extra servers though!
Because of this I had multiple levels of bosses above me in the company. They all got really interested in starting to manage me and the product once we hit 5M users. They had a different vision, attachment and passion about the product.
Around 2008, we were spending a lot of money on text messaging. Most people where using the product to send messages for cheap to their friends and family (essentially like WhatsApp vs Twitter).
I wanted to slowly phase out text messaging in favor of Data (essentially become WhatsApp!). CEO didn't believe that Mobile Data will get adoption in India for a long time. He instead wanted to focus on monetizing - by sending ads on text messages, making people pay for premium content etc. We tried 8-10 things - nothing worked (this can be a really big post in itself).
Soon I quit and moved to US. It was like a bad breakup!
My biggest learnings were:
1. If you do not have ownership like a founder, do not that take responsibility like a cofounder.
2. Personally for me: do not work where I don't have veto rights. But bills need to be paid and family has to be supported. Took me 5 years of really hard work to get there!
May I ask you about your first 'biggest learning': how do you eschew cofounder responsibilities in an early company? Someone who genuinely cares and/or is really interested in the core business (pretty common among first hires of startups) may have a hard time turning work away or willfully not participating in major discussions/planning when they believe their contributions could be valuable. In other words, situations when you know you can contribute but the company will get the 'better end of the deal' as you take on more responsibility without taking on more compensation.
It sounds difficult to ride that out - were you able to do that successfully or are you saying "This happened to me, don't let it happen to you?"
If someone is joining a startup at an early stage, work like it is your own baby. You are spending your most valuable currency by being here - which is your time!
With time, hopefully, you will get rewarded for your effort and results. You will likely not get a founder level say and stake but your role should gradually expand. In the end founders take almost the same risk as the first employee, but get a lot more equity because founder was there at the start. Join a startup as an early employee only if you are able to accept this fact - otherwise you are setting yourself up for a lot of resentment and pain.
This above advise applied to me too for the first two years of my employment, before I started working on the ultimate successful product. We were building an offline search engine and as a part of that I build distributed file systems, crawled a billion pages and I was appropriately rewarded with career growth.
However with the new social product that I built, the situation was dramatically different. For that product I was there from day 0. I was owning the whole engineering (sole engineer at start and head engineering till I left) and significant portion of product. Till we got 5M users, I had a co-founder level say (but not the stake of course) and I worked with a co-founder level of involvement and dedication till that time. However once we got 5M users, multiple levels above me started doing meetings and taking critical decisions without involving me. I still had a co-founder level passion and it hurt to see the product flounder and being unable to do anything about it.
So I was in a uniquely bad situation and it is not common to be in this situation. However if someone really is and there is no way to get a founder level control - move out as soon as you can. And this is what I did - albeit 2 years too late.
The scaling problem with Darcs wasn't about hosting, but in the implementation of their theory of patches and how it handled conflicts. See http://darcs.net/FAQ/ConflictsDarcs1#problems-with-conflicts for an overview of the issues in Darcs 1 and 2.
This seems to be also one of the driving forces behind the development of Pijul, which is also patch-based instead of snapshot-based, which makes it easier to understand and use, but all implementations so far had major performance issues once repositories grow.
For more on that, see https://pijul.org/faq.html
I was one of the early users of Darcs 1, back before Git existed. I wanted to use version control, but the alternatives were pretty hard to understand and use. While Darcs was really nice to use (, and fast on small repos, after a few years I had to convert everything to git because exponential times on most operations was just not sustainable, and fixing that required constant vigilance and altering history, not very friendly for new contributors.
Even GHC moved from Darcs to git in 2011 because of this: https://mail.haskell.org/pipermail/glasgow-haskell-users/201...
One prominent counterexample that comes to mind is Friendster which was a big social network before MySpace and Facebook. They had terrible performance issues but to be fair to your point, their decline was more mismanagement than a true inability to scale.
Well, Friendster failed for exactly that reason. But, those were the early days.
p.s Completely agree with the article.
You must not be looking around at all, right?
So what are some examples of services which failed because they tried too hard to scale too early?
For example, when your network could handle 20k users, was there another network that could have already handled 500k, but they failed because they started a month after yours?
These days there would be hardly any way to fail to scale given a large budget. The technology to do so is all so good. However, back in the day it wasn't always so. I'd say Friendster was an example of a company that failed due to scaling issues.
Mistake #1: Took the wrong technologies.
For every technology that can scale, there is a bunch of other that will give a lot of troubles.
TLDR: Depends a lot on individual situation. Pick the language you are fastest and most comfortable with.
SMS Gupshup - 6 years - Java and some C++ - Built distributed filesystems. Crawled 1B pages with early 2000 hardware. Built map-reduce framework before Hadoop came. Built a search engine on top of it. Built an infra which would send 1B+ text messages with prioritization and monitoring. Social network with 50M+ users.
LinkedIn main stack - ~ 1 year - Java
Founder Startup - 1.5 years - Clojure
Dropbox - ~ 1 year - Python
Head Technology Tech incubator - 1.5 years - Node.js
As you can see my experience has been all over the spectrum. While primarily using one language, I would keep experimenting with other languages like Scala, Go.
Following are my learnings and the reasonings:
I love Clojure. Love it so much that I feel like quitting everything go to a mountain cabin and just code Clojure. However I would most likely not use Clojure for my startup. The learning curve is really-really steep. It would make it very difficult to me to build a team. It's really fun if you understand it. But first few weeks for a new person might be a nightmare and very emasculating. So even though I really love it, my current strategy is to bring Clojure patterns to other languages.
User facing logic
I would not use a statically typed Object Oriented language for any user facing logic. User facing logic and behavior tends to be very fluid and I feel that it gets strangled by the constraints of statically typed languages. One you start building things with Class, Inheritance and Polymorphism etc - soon these patterns start driving business logic rather than the other way around. Because of this reason you will see most arcane and slow moving popular sites today are Java based - (LinkedIn, Yahoo, Amazon, Ebay) vs (Facebook, Twitter, Instagram, Pinterest). For this purpose I like using Dynamically typed languages building code in as functional way as possible. Pick Node.js, Python, Ruby - whatever you are comfortable with.
In some cases people recommend Statically types languages for catching errors because of type safety etc. However I would solve that problem completely with test suite rather than solving it part of the way using language features while being forced to pick a statically types language.
Another big benefit of dynamically types languages is ability to just create a dictionary and start using it. Most CRUD apps work with JSON request and response and I would rather just dynamically pass objects, rather than build a class for the request and response objects for each and every route. This was a nightmare to create and maintain in Scala and Java.
Fast developing Infrastructure
E.g. Hadoop, Hive, Distributed FS or Queue. 5 years back I would have built the in Java. However I would use Go for this today. However you couldn't go wrong either way. Pick what you are most comfortable with.
There is decent amount of talent available. These languages are easy to learn. And you get close to C level performance if you built it right. Development is fast paced.
Mission critical performance - File System or Database
Most likely C, because you want to squeeze out last of the performance. However be ready for really slow development cycles. And most smart kids out of college wouldn't know C. So you will need to set aside some time till they are really productive.
My initial DB schema was pretty bad. Had to do quite a few DB migrations - which took weeks of work to execute.
I have used MongoDB extensively since, and I am confident that it would have helped us scale to 5M users comfortably. I might have migrated to something else at that point.
There are very few products who reach even 5M users. Which is why developers should focus on launching fast.
Another thing is that best and most successful product have a really simple core product (remember Facebook, Twitter, Instagram etc when they had 5M users). It is not that much work to migrate and rewrite. When products are not getting traction - is when they start getting overcomplicated.
I know HN skews young when no one remembers Friendster.
My biggest learning is Functional->Fast->Pretty from the product POV and Launch as soon as you can from an Engineer's POV.
Some more details on my blog: https://anandprakash.net/2016/10/28/how-to-build-a-business-...
Most of this was written in sleepless nights when we had a baby. So do not expect a super artistic flair ;)
At stage 3 we had 50% daily active users who would receive on an average of 8 messages per day. So at step 3, we would send 20M messages per day.
Bullshit. What about the site (voat something?) that tried to be an alternative to reddit when there was some scandal there but couldn't keep users because they kept crashing?
I can't find the posts you're talking about.
A messaging service is the most trivial service one can make, it's a well known easy problem that's been solved for decades with current technologies. There is no challenge in that. For comparison, WhatsApp had people with experience and they could handle 1 billion users with 50 people.
The fact that you can handle 10M users with a single untuned MySQL database is not a demonstration that scaling is overrated. It's an expression than you are running a trivial service that doesn't do much.
Almost any problems will be more challenging than that. There are endless companies that have 1/100th the customer base and yet require 100 times the data volume and engineering.