I built the biggest social network to come out of India from 2006-2009. It was like Twitter but over text messaging. At it's peak it had 50M+ users and sent 1B+ text messages in a day.
When I started, the app was on a single machine. I didn't know a lot about databases and scaling. Didn't even know what database indexes are and what are their benefits.
Just built the basic product over a weekend and launched. Timeline after that whenever the web server exhausted all the JVM threads trying to serve requests:
1. 1 month - 20k users - learnt about indexes and created indexes.
2. 3 months - 500k users - Realized MyISAM is a bad fit for mutable tables. Converted the tables to InnoDB. Increased number of JVM threads to tomcat
3. 9 months - 5M users - Realized that the default MySQL config is for a desktop and allocates just 64MB RAM to the database. Setup the mysql configs. 2 application servers now.
4. 18 months - 15M users - Tuned MySQL even more. Optimized JDBC connector to cache MySQL prepared statements.
5. 36 months - 45M users - Split database by having different tables on different machines.
I had no idea or previous experience about any of these issues. However I always had enough notice to fix issues. Worked really hard, learnt along the way and was always able to find a way to scale the service.
I know of absolutely no service which failed because it couldn't scale. First focus on building what people love. If people love your product, they will put up with the growing pains (e.g. Twitter used to be down a lot!).
Because of my previous experience, I can now build and launch a highly scalable service at launch. However the reason I do this is that it is faster for me to do it - not because I am building it for scale.
Launch as soon as you can. Iterate as fast as you can. Time is the only currency you have which can't be earned and only spent. Spend it wisely.
They haven't failed (yet), but I think gitlab.com would be a lot bigger if they scaled faster. Lots of people have been rejected using it because it was too slow.
The other counter-example I can think of is Myspace who frequently had tons of server errors and page load errors alongside failing to iterate their product rapidly.
Even given this, I completely agree with you. That's why I now develop in ruby. I'll take developer productivity over performance any day.
It's the old saying - 'nice problem to have'.
In my experience, dynamically typed languages don't do well for developer productivity.
At least one empirical study suggests static typing does not improve programmer productivity. Source: http://courses.cs.washington.edu/courses/cse590n/10au/hanenb...
This is just one of the many common industry "wisdoms" that collapse under scientific scrutiny.
Next, I look at the languages themselves. Java's type system is so weak to barely have any benefit. Terrible choice to test static typing. The control language is then a language based on Smalltalk... the original, great language for dynamic, OOP, and complex programs... that's even simpler than Java & could possibly boost productivity just due to easier review/revision. A proper comparison would be with something like Ocaml, Haskell, Ada/SPARK, or Rust that really uses a type-system with the high-level & composition benefits of something like Smalltalk. Makes me lean toward Ocaml or Haskell than others which slow productivity fighting with compiler.
This paper doesn't prove anything for or against the topic. It's quite weak. The authors should do their next one on the kind of program that others in the field reported benefited from static, strong typing. Then, attempt an experiment on same type of software between a productive language with powerful, type system and a productive, dynamic language of similar complexity. Alternatively, try to modify similarly large applications in parts that touch a lot of the code base without breakage for both static and dynamic languages. Then, we might learn something.
From my own experience with dynamic types it's not the dev time that increases but maintenance time. When you have a team working on an evolving codebase with data types distributed in caches, databases, and code it seems inevitable that you will have an error if you don't do any type checks. Even good linters can miss something that would have been easy to catch with static type checks.
They're doing a lot of things wrong. See my reply.
We were sometimes able to go from concept to deployed application in a single day.
It is easy to ship any proof of concept. Is it hard to maintain, it is rigged with performance issues and it is almost impossible to refactor because python has zero refactoring support and no compiler to help you catch any error.
What you save on the initial rollout, you loose ten folds on the long term.
This just makes me shudder a little bit. I've seen such things before, even recently at my current job. These kinds of applications tend to be a nightmarish mix of unmaintainably overengineered and underdesigned terrible code.
The key thing is that the testing, release and deployment tooling were not a significant bottleneck. You could develop the code, perform testing in a production like environment and sign it off with a minimum of down time in between waiting for compilation, package building, etc. So a lot of that was down to Python being interpreted.
That's why it's called an anecdote; not data.
It's not even part of the "static vs dynamic" argument; one can rush things in Java or in C++ just as easily. The only difference is that Java might allow doing it slightly faster.
Yes, catching type errors eliminates one category of error but, generally, the type of programmers that make that kind of error frequently will not constrain themselves to just one category of error.
>Even given this, I completely agree with you. That's why I now develop in ruby.
Ouch that is a pretty big burn.
I would say this is because of the simple fact of visibility and adoption. You've probably never heard of these services probably because they ground to a halt with a mere 1000 users, so they never got mainstream enough to be recognised as a viable service. It is a bit like how no one remembers the dozens of people who failed to achieve sustained powered flight before the Wright brothers. Doesn't mean there weren't any, and scalability, technical or planning issues killed those efforts before anyone knew of it.
Some 'scalability' issues are inherent to your initial design, and not just the choice or configuration of your hardware/software platforms.
For instance, what if you were building a contact database of some sort. At first, you may have things like 'Phone Number' and 'Email Address' as part of the 'Person' database. Then, as your service gets popular, you notice people asking for extra contact like Twitter handles, LinkedIn pages etc. So you start adding those to your Person table as extra columns.
Eventually, you realise that you should have though more about this at the outset and have contact details stored in another data table altogether, linked to a 'Contact Type' table and related back to the Person table. This would have been mitigated at the start via better database design and catering for eventualities that you might never have foreseen. Migrating the original database to normalise it is a massive effort in its own right, and probably will take more time, cause more outage time, and cause more bugs in existing code than designing for that eventuality in the first place.
Even if 99% of your users only ever enter Phone and Email contact details, the second option, designed for scalability, will still handle that without a sweat, and 'scaling' to meet additional demands later is merely a matter of adding new contact types in the 'Contact Type' data table so that they become an extra option for all your users.
I am willing to bet that 9 out of 10 'weekend projects' have had to be thrown out completely and redeveloped from scratch when the number of users became significant. Of those rebuilds, I would be interested to see some research into how many users abandoned the said platform when (a) the original one started to grind to a halt or constantly fell over with errors and (b) the new platform came out with new features or a different UX that broke the 'look and feel' of the original.
My initial DB schema was pretty bad. We did at 2 schema rewrites and migrations from the launch to 5M users. Each time it took 2 weeks of sleep less nights.
The machines today are really powerful. You can do a lot with 244 GB RAM machines backed by SSD.
Someone who doesn't have the skill set to be able to scale once they get traction - it's likely they will not have the skills to design for scale at start.
My recommendation to everyone would be pick a language and db you are most comfortable with and get started as soon as you can. You will fail on the product side a lot more times before you will fail on the technical side.
And if you are failing on technical side, reach out to me. I will definitely be able to help you find a way out. I am not sure if there is any product guy in the world who an make a similar claim on the product front. However there are at least dozens of technical guys in the world who can make a claim like I did.
So focus on launching the product as soon as possible. Work hard, reach out for help if needed. You will eventually get success.
Is this a typo of "244 GB" instead of "24 GB"? Nearly any company that has a single machine provisioned with 244 GB of RAM is doing something severely wrong, likely putting the company's ability to grow at risk. Such a machine screams of trying to vertically scale a poorly performing legacy product instead of figuring out to horizontally scale out with 16-64 GB servers.
That much memory on a single server is a huge red flag for 95-99%+ of companies. It takes a very specialized system (ie: you probably don't fit the mold, no matter what your excuses are) to require such a server.
Outside AWS, you can put 3-6 TB in a normal-ish server from Dell or HP, or 64 TB in a big iron box from Fujitsu or IBM.
Checkout stack overflow stack too.
Their system is faster too.
That is my intuition. If you know of a video or blog post (conference, talk, article, etc.) where someone explains the benefits of a 2 TB MySQL server and how it is not a crazy bad idea, I would love to see it. Because my 15 years of professional experience screams "NO WAY IN HELL" at that one.
9 out of 10 weekend projects never get to a significant number of users.
Do you remember MySpace? The big social network with hundreds of million of users that came before facebook.
They had massive scaling and performance issues. At the peak when everyone was moving to social media (almost a decade ago) the site could take an entire minute to load (if loading at all). They lost a lot of users, who went straight to facebook and never recovered.
You can say that it failed to scale, but I wouldn't say that it failed to scale technically; it failed to scale the community. Then at some point when it was already dying they started to change the layout big time I guess to attract more people, alienating the initial community.
I agree, they were different from facebook enough, that it's not all about scaling. Yet, don't underestimate the impact of having your site unreachable, when the competitor is coming strong. Not a good position to be in.
Your example about the DB tables is exactly the trap to avoid while you are iterating rapidly to try and find product/market fit.
Some of the (real world) feedback I got from the web apps that failed to get off the ground were due mainly to our customers complaining that:
* pages were taking too long to reload.
* not enough fields on a particular data table to store information in
* missing API
* pages that would refresh in entirety instead of just updating the changed portions
Most customers didn't stick around to wait for us to address or fix the issues - there were plenty of other competitive products that fitted them better that they could start using that same day.
A couple of those sites also got negative feedback on Reddit or HN because I hosted the front end website on a $5 VPS server and when the site suffered the inevitable 'hug of death' from posting to these sites, they immediately went down due to inability to scale under the sudden deluge of visitors, and I received uncomplimentary feedback on that (where people actually elected to post feedback rather than just close the browser tab and completely forget and move on from my app).
Yes, over thinking scaling and design is bad, but putting it aside as a 'totally not important now' factor is also just as bad, if not worse, in my 'lived it' experience.
Most people see much less success. The original suggestion is geared towards first few tries that people make. Once they have made a few attempts, they learn from it and naturally build more scalable products without spending extra effort. You have already read my story about my first try which had a very poor start. My subsequent efforts have scaled in that order without needing a single rewrite.
However, if they spend too long procrastinating, worrying about and investing in scale - they are wasting precious time which would be better spent finding product market fit.
Like you, each subsequent app that I built contained the lessons learned from the previous efforts - better front end frameworks, better data table normalising, better hosting infrastructure etc.
Part of me always thinks back to the ones that didn't work, and I always ask myself - would they have worked if they had faster page refreshes, or more flexible data entry? Perhaps if I had spent more time in the planning and design before I launched them, they may not have sunk as quickly? Those 1000 users that time who came from my HN post and saw an 'Error 500' page - could a fraction of them been the ones who would have signed up and made us profitable if they didn't see the error page as their first introduction to my app??
For that reason alone, I am always skeptical of any post that promotes 'iterate often and fast' or 'fail quickly' rather than 'spend time on design and a unique experience'...
When you are thinking back, also consider the alternative where could you have missed opportunities and product launches because of extra effort spent on scaling upfront.
It's all good learning in the end. It's great to be in your situation. Keep at it and you will see great success in future!
That seems fairly reasonable to me.
It all depends on what you call "success". Running a startup from zero to a billion dollar business is an incredible and rare achievement. Running a small site with a few hundreds/thousands users and a decent income is fairly reasonable.
Nobody is saying that performance doesn't matter. But if it performs well with 100 users, you can worry about the performance with 10k users later. And if it doesn't perform well with 100 users, it doesn't matter if the scaling is O(logn) even O(1).
Performance under a sudden flood of users matters, but a lot less than day-to-day performance. And most of the criticism I see for sites dying is when they're serving static content with ideal O(1) scaling, but in a very unoptimized way.
I've been part of many projects that couldn't scale because lack of forethought meant there was a nasty implicit O(n^2) thing going on when it could have easily been designed to be O(n log n).
Almost all of these projects were killed with CYA hand waving about "just not fast enough".
I didn't call out directly not the DB tables point. This kind of premature optimization is a huge red flag and I would anyone I am advising to avoid it.
Edit: you can read about both project histories, and learn about them. Of course on gets down-voted for mentioning it.
Tech people are quick to find technical reasons for failure but the reasons are usually elsewhere.
2) MySpace culture and management was in trouble due Murdoch's News Corporation bought MySpace's parent company. Instead of investing News Corp choose the wrong path. Instead of improving the site, they decided to do a (in the end) very costly deal with MS to switch their ColdFusion stack from Java stack to dotNet stack incl very expensive MSSQL licenses and various other license costs. It turned out ColdFusion on dotNet was still in very experimental phase and MySpace suffered from the caused troubles a lot and bleed money like never before (which made News Corp very unhappy). MySpace website got little updates for at least one year while MySpace devs were busy with firefighting and switching backend. Facebook took over.
There are books worth reading.
3) Stackoverflow (like 2 years ago) run on less than a dozen of servers. It's several magnitudes smaller than social network services like (former) MySpace, Facebook, Twitter, etc. License costs often don't scale, you bleed through your startup money for little competitive advanced features or return. That's why successful startups often choose open source stack, look at Amazon, Google, (former) Yahoo, Facebook, Twitter, etc. - Perl, Python, PHP, Java, Ruby, MySQL, Postgres, Hadoop, etc. Stackoverflow shows it can be done with off the shelf COTS as well, if you keep the server license count low by wise decisions and performance tuning. But it's not like I could name dozends of successful startups that have a software stack like Stackoverflow. And even Hotmail, Linkedin, Bing run or used to run for the majority of their service-live on mainly open source software stack.
Friendster was mostly focused on Asia plus lost a ton of users around the same time Facebook gained a ton of users. I don't think it was scaling architecture that did them in. It was the market choosing the competition for the user experience plus what their friends were on. That's for most of the world. I have no idea what contributed to their failure in Asia since I don't study that market when it comes to social media.
EDIT to add: I recall the founder did say they had serious technology problems for a few years that affected them. I'm just thinking Facebook spreading through all the colleges & moving faster on features was their main advantage.
I've seen too many breathless posts which would have you believe they'd need a clustered NoSQL database or it wouldn't scale.
Wrong, and wrong.
First, their architecture is extremely remarkable. They are doing and mastering vertical scaling, down to every little details. Terabytes of RAM, C# instead of Ruby/Python, FusionIO drives, MS SQL instead of MySQL/PostGre, etc...
Second, that's at least 6 database servers, each one being more expensive than 10 usual commodity servers:
First cluster: Dell R720xd servers, 384GB of RAM, 4TB of PCIe SSD space, and 2x 12 cores. It hosts the Stack Overflow, Sites (bad name, I’ll explain later), PRIZM, and Mobile databases.
Second cluster: Dell R730xd servers, 768GB of RAM, 6TB of PCIe SSD space, 2x 8 cores. This cluster runs everything else. That list includes Careers, Open ID, Chat, our Exception log, and every other Q&A site (e.g. Super User, Server Fault, etc.).
I think that's good: SQL databases are very mature and you don't want to be exciting for your core business data if you don't get some major benefit to defray the cost. Boring is a delightful characteristic for data storage.
I'm not sure you could get 1TB of memory and multi TB SSD drives anywhere in the 2000's, even for a million dollar. That makes a major difference in the ability to scale up. Data didn't grow, storing 1M user account always took the same space.
While for many developers, NoSQL might be overkill - Stackoverflow is a bad example. If you were any fast growing startup in the cloud, and you wanted to go the SO route it would have meant going CoLo. SO has set of machines with a nearly a terabyte of RAM - GCE doesn't even offer cloud machines with the same specs.
And even then their setup is far more fiscally expensive than something you could get done with 20 cloud nodes on some NoSQL solution.
Second: how much time would they have spent rolling all of the data integrity, reporting, etc. features they'd have needed to add. I'm inclined to take them at their word when they say this was safer and cheaper given their resources.
I never meant to claim that cloud was safer and cheaper. What I meant was, for the majority of operations, staying in the cloud with some distributed setup is likely more feasible that moving to CoLo (see GitLab).
Thinking ahead that custom fields could be needed in the future is a design choice and can apply to 5k users or 500 000k users. It's not the scaling itself that causing you pain, it just exacerbate the difficulty of a bad design choice.
So advice of OP still apply, at start you should take most of your time develloping the best design for your application and less on scaling. Because scaling a good design is way less painful.
I personally switched to facebook for two reasons:
got sick of repeated errors every time I browse myspace
facebook has better album permissions, (myspace has none)
But main problem, they crumbled under heavy traffic
I also know of some government services (my state has paperless administration so everything is digital) which are used in day to day basis (but only hundreds of users) by government employees goes down for almost 2 hours per day everyday which stops work for everyone (both the employees and the people came for that service). These services are developed by companies like TCS and Infosys. Only if they thought of scaling before.
Had they started with only $300 like the parent comment, their webapp would have often performed much better. Big projects are a difficult curse, our most difficult question in IT engineering.
This is the real problem: lack of in-house expertise and ongoing incentives to maintain performance, while contractors are usually paid for features and, if they also do hosting, even have a financial disincentive for efficiency.
I would bet it's at least as likely that had someone thought of scaling it would have added a significant cost and delay to the actual project and the first major scalability bottleneck would still have been something unanticipated.
If by scaling you mean increasing the number of page views that a given version of a web app can serve, then this is mostly true.
But things like conceptual simplicity, unit economics, codebase readability, strategy, etc. are also dimensions of scaleability. E.g. the only reason that startups can even exist at all is because large companies have a lower marginal output per employee.
Zooomr did. They came out of nowhere in 2006 and were a real threat to Flickr. They had AJAX-powered editss, geotagging, various other unique features and they were starting to pull in some highly followed followed photographers.
But, the site kept crashing as traffic grew and some scaling problems even lead to data loss. I wanted to see them win but they just couldn't keep up with traffic and eventually Yahoo cloned their features and they became irrelevant.
If someone cannot scale your product which has adoption, they likely don't have insights, problem definition, capabilities, skills and resources to do that when they launch.
If I could edit my original post, I would put a rider related to capable team for service not dying.
Thank you. This is the one major point I try to get across to people when 'scaling' comes up. "Oh, that won't be scalable" or "yeah, but when we have 2 million users, XYZ won't work". I've been on projects where weeks and months (calendar time, part time efforts) were spent on things that were 'scalable' instead of just shipping something that worked earlier.
I keep trying to tell folks on these projects (have been involved in a couple now) - man, we're not going to go from 20 users to 2 million users overnight. We'll notice the problems and can adapt.
Trying to whet their appetite, I've even argued that a new 'flavor of the month' will be out before our real scaling needs hit, and we can then waste time chasing that fad, which will be even cooler than the current fad we're chasing (although... I try to be slightly more diplomatic than that).
I would argue that you can earn lifetime with a healthy life style.
I remember to have read a study that measured the average difference between healthy and unhealthy life styles with 14 years.
If you spend 1 day a week more for a healthy lifestyle (exercise, self cooking, enough sleep, little stress, enough recreation time) for 40 years you have earned ~9 years life time.
And I would argue that those 40 years are more enjoyable.
However to my original point - you can lose fitness (within reasonable limits) and then gain it back and not notice a difference later on. However time lost is lost forever.
My post says 50M+ users and 1B+ message on a day. Almost 50% daily active users. And this was early 2000 hardware. Last year WhatsApp was doing 60 billion messages per day (60x of what we did).
However our messages had a lot more logic, because we delivered on unreliable Telco texting pipes. Pipes had a set throughput. Some pipes delivered to just some geographies. They would randomly fail downstream so you had to do health monitoring and management. Some pipes cost per message and some cost based on throughput. So pick in real time for the lowest cost. Different messages had different priority. In summary - there was a lot more business logic per message sent.
I was of course very inexperienced when I started. I was just two years out of college. However in the end the product was 50+ services running on 200+ servers. This experience helped me scale user communication infra at Dropbox to 100x scale within 3 months.
Your message was flame-bait at best and didn't seem to contribute to the discussion. You were passing judgements based on incomplete information. You could have asked nicely and still gotten a response.
Edit: fixed typo
This wasn't clear at all from your comment.
> Your message was flame-bait at best and didn't seem to contribute to the discussion. You were passing judgements based on incomplete information. You could have asked nicely and still gotten a response.
My message was not flame bait; that's your interpretation. I'm sorry you took offense, but please do consider that your original comment does omit lots of important details, as you apparently (should) know from your experience at Dropbox.
Also, per your quoted numbers WhatsApp was going 60x what your app was, not 1/60. Typo?
> I know of absolutely no service which failed because it couldn't scale.
Genuine question: what about Friendster?
I seem to recall that the Friendster team had more than one aborted rewrite before they got it right, and by then it was too late. Scaling social networks required a different bag of tricks than was popular at the time (e.g., database replication was well accepted as a best practice, but sharding worked much better). Talented people could still get it wrong.
Still, social networking illustrates how failing to scale is rare, and should practically never be the concern early on. Almost every social network that gained traction struggled to scale at times, but the vast majority of them overcame their challenges.
We poured sweat and tears and many sleepless nights into scaling the Tagged back-end from 10 machines to over 1,000, over time serving hundreds of millions of users. It was a tremendous amount of work, much more than building the initial product, and I don't think there's much we could have done early on to make later scaling easier.
More on the story:
There is always scope for a botched execution or just pure bad luck. I wish someone shared more candid details of what happened at Friendster.
1 - https://gimletmedia.com/startup/
IMHO, if your investors need to be involved in seeking approval for handling scaling pains and on top of that if they reject it - there are deeper issues within the company (management issues, politics, technical incompetence, leadership issues).
So to clarify my point - any team could still screw up technical execution. However there is no reason that given a good team and support from within the company, a product can't be scaled.
As I look back, if we had used RoR or Python we might have moved faster with no negative impact on scale. We would have spent more money on extra servers though!
Because of this I had multiple levels of bosses above me in the company. They all got really interested in starting to manage me and the product once we hit 5M users. They had a different vision, attachment and passion about the product.
Around 2008, we were spending a lot of money on text messaging. Most people where using the product to send messages for cheap to their friends and family (essentially like WhatsApp vs Twitter).
I wanted to slowly phase out text messaging in favor of Data (essentially become WhatsApp!). CEO didn't believe that Mobile Data will get adoption in India for a long time. He instead wanted to focus on monetizing - by sending ads on text messages, making people pay for premium content etc. We tried 8-10 things - nothing worked (this can be a really big post in itself).
Soon I quit and moved to US. It was like a bad breakup!
My biggest learnings were:
1. If you do not have ownership like a founder, do not that take responsibility like a cofounder.
2. Personally for me: do not work where I don't have veto rights. But bills need to be paid and family has to be supported. Took me 5 years of really hard work to get there!
May I ask you about your first 'biggest learning': how do you eschew cofounder responsibilities in an early company? Someone who genuinely cares and/or is really interested in the core business (pretty common among first hires of startups) may have a hard time turning work away or willfully not participating in major discussions/planning when they believe their contributions could be valuable. In other words, situations when you know you can contribute but the company will get the 'better end of the deal' as you take on more responsibility without taking on more compensation.
It sounds difficult to ride that out - were you able to do that successfully or are you saying "This happened to me, don't let it happen to you?"
If someone is joining a startup at an early stage, work like it is your own baby. You are spending your most valuable currency by being here - which is your time!
With time, hopefully, you will get rewarded for your effort and results. You will likely not get a founder level say and stake but your role should gradually expand. In the end founders take almost the same risk as the first employee, but get a lot more equity because founder was there at the start. Join a startup as an early employee only if you are able to accept this fact - otherwise you are setting yourself up for a lot of resentment and pain.
This above advise applied to me too for the first two years of my employment, before I started working on the ultimate successful product. We were building an offline search engine and as a part of that I build distributed file systems, crawled a billion pages and I was appropriately rewarded with career growth.
However with the new social product that I built, the situation was dramatically different. For that product I was there from day 0. I was owning the whole engineering (sole engineer at start and head engineering till I left) and significant portion of product. Till we got 5M users, I had a co-founder level say (but not the stake of course) and I worked with a co-founder level of involvement and dedication till that time. However once we got 5M users, multiple levels above me started doing meetings and taking critical decisions without involving me. I still had a co-founder level passion and it hurt to see the product flounder and being unable to do anything about it.
So I was in a uniquely bad situation and it is not common to be in this situation. However if someone really is and there is no way to get a founder level control - move out as soon as you can. And this is what I did - albeit 2 years too late.
The scaling problem with Darcs wasn't about hosting, but in the implementation of their theory of patches and how it handled conflicts. See http://darcs.net/FAQ/ConflictsDarcs1#problems-with-conflicts for an overview of the issues in Darcs 1 and 2.
This seems to be also one of the driving forces behind the development of Pijul, which is also patch-based instead of snapshot-based, which makes it easier to understand and use, but all implementations so far had major performance issues once repositories grow.
For more on that, see https://pijul.org/faq.html
I was one of the early users of Darcs 1, back before Git existed. I wanted to use version control, but the alternatives were pretty hard to understand and use. While Darcs was really nice to use (, and fast on small repos, after a few years I had to convert everything to git because exponential times on most operations was just not sustainable, and fixing that required constant vigilance and altering history, not very friendly for new contributors.
Even GHC moved from Darcs to git in 2011 because of this: https://mail.haskell.org/pipermail/glasgow-haskell-users/201...
One prominent counterexample that comes to mind is Friendster which was a big social network before MySpace and Facebook. They had terrible performance issues but to be fair to your point, their decline was more mismanagement than a true inability to scale.
Well, Friendster failed for exactly that reason. But, those were the early days.
p.s Completely agree with the article.
You must not be looking around at all, right?
So what are some examples of services which failed because they tried too hard to scale too early?
For example, when your network could handle 20k users, was there another network that could have already handled 500k, but they failed because they started a month after yours?
These days there would be hardly any way to fail to scale given a large budget. The technology to do so is all so good. However, back in the day it wasn't always so. I'd say Friendster was an example of a company that failed due to scaling issues.
Mistake #1: Took the wrong technologies.
For every technology that can scale, there is a bunch of other that will give a lot of troubles.
TLDR: Depends a lot on individual situation. Pick the language you are fastest and most comfortable with.
SMS Gupshup - 6 years - Java and some C++ - Built distributed filesystems. Crawled 1B pages with early 2000 hardware. Built map-reduce framework before Hadoop came. Built a search engine on top of it. Built an infra which would send 1B+ text messages with prioritization and monitoring. Social network with 50M+ users.
LinkedIn main stack - ~ 1 year - Java
Founder Startup - 1.5 years - Clojure
Dropbox - ~ 1 year - Python
Head Technology Tech incubator - 1.5 years - Node.js
As you can see my experience has been all over the spectrum. While primarily using one language, I would keep experimenting with other languages like Scala, Go.
Following are my learnings and the reasonings:
I love Clojure. Love it so much that I feel like quitting everything go to a mountain cabin and just code Clojure. However I would most likely not use Clojure for my startup. The learning curve is really-really steep. It would make it very difficult to me to build a team. It's really fun if you understand it. But first few weeks for a new person might be a nightmare and very emasculating. So even though I really love it, my current strategy is to bring Clojure patterns to other languages.
User facing logic
I would not use a statically typed Object Oriented language for any user facing logic. User facing logic and behavior tends to be very fluid and I feel that it gets strangled by the constraints of statically typed languages. One you start building things with Class, Inheritance and Polymorphism etc - soon these patterns start driving business logic rather than the other way around. Because of this reason you will see most arcane and slow moving popular sites today are Java based - (LinkedIn, Yahoo, Amazon, Ebay) vs (Facebook, Twitter, Instagram, Pinterest). For this purpose I like using Dynamically typed languages building code in as functional way as possible. Pick Node.js, Python, Ruby - whatever you are comfortable with.
In some cases people recommend Statically types languages for catching errors because of type safety etc. However I would solve that problem completely with test suite rather than solving it part of the way using language features while being forced to pick a statically types language.
Another big benefit of dynamically types languages is ability to just create a dictionary and start using it. Most CRUD apps work with JSON request and response and I would rather just dynamically pass objects, rather than build a class for the request and response objects for each and every route. This was a nightmare to create and maintain in Scala and Java.
Fast developing Infrastructure
E.g. Hadoop, Hive, Distributed FS or Queue. 5 years back I would have built the in Java. However I would use Go for this today. However you couldn't go wrong either way. Pick what you are most comfortable with.
There is decent amount of talent available. These languages are easy to learn. And you get close to C level performance if you built it right. Development is fast paced.
Mission critical performance - File System or Database
Most likely C, because you want to squeeze out last of the performance. However be ready for really slow development cycles. And most smart kids out of college wouldn't know C. So you will need to set aside some time till they are really productive.
My initial DB schema was pretty bad. Had to do quite a few DB migrations - which took weeks of work to execute.
I have used MongoDB extensively since, and I am confident that it would have helped us scale to 5M users comfortably. I might have migrated to something else at that point.
There are very few products who reach even 5M users. Which is why developers should focus on launching fast.
Another thing is that best and most successful product have a really simple core product (remember Facebook, Twitter, Instagram etc when they had 5M users). It is not that much work to migrate and rewrite. When products are not getting traction - is when they start getting overcomplicated.
I know HN skews young when no one remembers Friendster.
My biggest learning is Functional->Fast->Pretty from the product POV and Launch as soon as you can from an Engineer's POV.
Some more details on my blog: https://anandprakash.net/2016/10/28/how-to-build-a-business-...
Most of this was written in sleepless nights when we had a baby. So do not expect a super artistic flair ;)
At stage 3 we had 50% daily active users who would receive on an average of 8 messages per day. So at step 3, we would send 20M messages per day.
Bullshit. What about the site (voat something?) that tried to be an alternative to reddit when there was some scandal there but couldn't keep users because they kept crashing?
I can't find the posts you're talking about.
A messaging service is the most trivial service one can make, it's a well known easy problem that's been solved for decades with current technologies. There is no challenge in that. For comparison, WhatsApp had people with experience and they could handle 1 billion users with 50 people.
The fact that you can handle 10M users with a single untuned MySQL database is not a demonstration that scaling is overrated. It's an expression than you are running a trivial service that doesn't do much.
Almost any problems will be more challenging than that. There are endless companies that have 1/100th the customer base and yet require 100 times the data volume and engineering.
It's easy to brush off scaling concerns as not important, but I've had personal experience where it's mattered, and if you want a high profile example, look at twitter.
Yes, premature optimization is a bad thing, and so is over engineering; but that's easy to say if you have the experience to make the right initial choices that mean you have a meaningful path forward to scale when you do need it.
For example, lets say you build a typical business app and push something out quickly that doesn't say, log when it fails, or provide an auto-update mechanism, or have any remote access. Now you have it deployed at 50 locations and its 'not working' for some reason. Not only do you physically have to go out to see whats wrong, you have to organize a reinstall at 50 locations. Bad right? yes. It's very bad. (<---- Personal experience)
Or, you do a similar ruby or python app when your domain is something that involves bulk processing massive loads of data. It works fine and you have a great 'platform' until you have 3 users, and then it starts to slow down for everyone; and it turns out, you need a dedicated server for each customer because doing your business logic in a slow language works when you only need to do 10 items a second, not 10000. Bad right? yes. Very. Bad. (<---- Personal experience)
It's not premature optimization to not pick stupid technology choices for your domain, or ship prototypes.
...but sometimes you don't have someone on the team with the experience to realize that, and the push from management is to just get it out, and not worry about the details; but trust me, if you have someone who is sticking their neck out and go, hey wait, this isn't going to scale...
Maybe you should listen to what they have to say, not quote platitudes.
Ecommerce is probably one of those things where the domain is well enough known you can get away with it; heck, just throw away all your rubbish and use an off-the-shelf solution if you hit a problem; but I'm going to suggest that the majority of people aren't building that sort of platform, because its largely a solved problem.
In my experience (and I have little of that) it's important to know the upgrade path and adjust your planning accordingly.
How many users can you serve with your solution?
How big do you expect the market to be in that stage?
What technology would be the next step?
How do you get there? How much more work would it be?
Always be one step ahead with technology, but not two. Most markets are surprisingly small. Most use cases scale surprisingly well. You can probably push your solution by an order of one magnitude if you need it quickly.
In order to answer the questions you need people who know the product, the (potential) technologies and the market. When you start you probably won't know any of that. See the first prototype you deploy to the customers as a means of collecting data for the first production version. Your first product is not your first product. Your funding should respect that. Get it done quickly with the aim of answering the critical questions. Then go back and design the next version "good-enough" for the second scaling step with the upgrade path in mind.
You can always push a python/ruby app an order of magnitude by putting an order of magnitude more AWS instances.
It will almost always bankrupt you in the medium term.
The only place I've seen it sustainable is a place that was generating a $100 per user, and there weren't many active users either (thousands, not millions).
Or a Java app or a Go app. Really, if one's working in a domain where the language would become the bottleneck, one deliberately screwed up by going against the grain because that language is little used in that domain.
For almost everything else or with very specific exceptions, something else is the bottleneck.
Everyone here is giving anecdotal "evidence" of their claims so I'll follow suit:
In the first company I worked in, the backend was entirely in Java and the application was internal (in-house CMS), meaning only ten users tops; everything about it was horribly slow. It was a sea of poor code, there was no such thing as a deployment pipeline and the servers it was hosted in were inadequate. There was also no relational schema to speak of in the database (basically MySQL used as a dumb document store).
The next place I worked at, my team's job was to build an actual customer-facing application. We did it in python and, while it only has a few hundred users so far, there haven't been complaints about poor performance that I know of.
Really, for every Twitter replacing Ruby, there's a Facebook written in PHP. Don't understand why so many people use one side to support their claim but forget the other.
That's a recurrent issue with API code (that usually has to be somewhat fast). Not so much for frontend generation.
P.S. Sorry but hundreds of customer is nothing. That's served by a pair of boxes (DB + webserver) irrelevant of the language.
... I know that, everyone here knows that; saying it is little more than stating the obvious. Evidently, you missed my point so I fold; but once more, JIC: if an application is slow, one of the last things one should look into to improve performance is the language; it rarely is the bottleneck.
P.S.: I know it's nothing, all anecdotal evidence is nothing. For every anecdote one could give, anyone could give counterexamples. That was another point I tried to make. Cheers.
If you're using python, it can be the bottle neck quite easily.
I don't endorse change language as an optimisation; that's just ridiculous.
...but you might look at splitting your application up into parts, and doing some service in a more suitable language; or, up front realizing that you have a heavy data processing workload you need to do in parallel, and python isn't a good choice for it.
Maybe you're right; you can look at optimisations that patch over the problem with queries and so forth as a first pass; but in some cases your choice of language (specifically node and python in my experience) are actually fundamentally the performance problem (but to be fair, not always).
...but basically, if you don't address the root cause of your perf issues (whatever they are), you're going to be patching and firefighting forever.
> data processing workload you need to do in parallel
It's funny how often those two go together, you're of course correct (for now). I never wrote that the language couldn't be the bottleneck, though.
If you are doing something that requires heavy parallelization then, by all means, don't use python or replace it if you're already using it... But that kind of workloads isn't actually a common (i.e. 50% or more of all applications) scenario.
> ...but basically, if you don't address the root cause of your perf issues (whatever they are), you're going to be patching and firefighting forever.
Yes, that's what I'm saying: Find the bottleneck and solve that. I was only addressing the, in my opinion, undue focusing on languages of the comment I replied to.
> quite easily
Only if one is working with poor developers. And this is true for all languages.
It's not about the developers, it's about the workload.
Objectively, you can't write high performance multi-threaded python. No one can; it's not possible; it's just slow.
If you're rewriting your code in C++ so its not slow and pretending its python, you should just rewrite your code in C++. That's not writing fast python, it's writing C++.
So python can easily be a bottle neck, regardless of how good your developers are, if you've picked it for a poor purpose:
That's my point: Don't pick the wrong language for your task in the first place. ...and specifically python is the wrong choice for certain types of heavy lifting.
... Why did you repeat yourself? I already told you I agreed with you on that.
Is it something specific you want me to tell you? It's going to be easier if you tell me what you want to read, otherwise we'll keep going in circles.
> Quite easily [...] poor purpose
That's an immediate contradiction. Yes, if one picks the wrong tool, that tool often becomes the bottleneck; but for every other scenario in which the tool is alright, it takes poor developers for the tool to become a bottleneck.
In my first comment I wrote that the language isn't the bottleneck, except for specific circumstances... And you took one of those specific circumstances and keep running with it. It's not a counterargument to my original point, if that's what you're trying to do.
The article is about choosing a business idea where technical scale isn't important.
That's a very different animal from "a dozen concurrent users will crash the server" and "this is fragile and full of edge cases, it basically won't work in production."
Not all businesses become roaring successes, and those who achieve moderate success often don't get the resources to fix deep-seated performance or architectural issues (either via engineering and/or throwing hardware at it.) Eventually these technical woes can completely halt momentum and I've seen it even drown some businesses who just aren't able to dig theirselves out of the hole the find themselves in.
People always seem to be arguing for extremes, but the most sensible approach for most tends to be somewhere in the middle.
When you implement each epoch, do a paper design for the next epoch, this helps you think about how you will get there, and can prevent you from writing yourself into a corner, without over indexing on scaling issues that you don't have yet
However in this case being blasé would mean, not working hard to scale when you users are getting a bad experience. A basic stack these days node.js+mongo, go+*sql can easily handle more than 100k users even with one of the worst implementations. Most products don't reach that point!
That is making some very large assumptions about application workload. For an application which is purely a CRUD interface to a database, yes.
Many a times there is heavy lifting - recommendations, machines learning etc. However that is don't using specialized technologies in a non user facing process and the results are then dumped into a DB available for a CRUD app.
I am hoping that products with such requirements will have some obvious tools for such tasks (Hadoop, Hive etc) and they would find a way to scale such processes with time.
So their stack might be go+*sql+Hadoop.
Can you please suggest some use-cases which don't fit the above pattern and maybe we can brainstorm. Seems like a fun exercise!
BTW, the phrase you're looking for is "deep seated", not "deep seeded".
And, as in the case of Twitter, the technology stack is the least of your problems.
- Startups grow exponentially, if you're playing catchup as you're growing you are focusing on keeping the lights on and hanging on for the ride. Important for a growing company to focus on vision.
- Software that scales in traffic is easier to scale in engineering effort. For example, harder for a 100 engineers to work on a single monolith vs 10 services.
- Service infrastructure cost is high on the list of cash burn. Scalable systems are efficient, allow startups to live longer.
- If the product you are selling directly correlates to computing power, important to make sure you are selling something that can be done profitably. For example, if you are selling video processing as a service, you absolutely need to validate that you can do this at scale in a profitable manner.
I also don't agree with the premise that speed of development and scalable systems are always in contention. After a certain point, scalable systems go hand and hand with your ability to execute quickly.
That being said, there are of course choices that can be made early on that will help you in case you need to scale at some time. Spending a bit of time thinking through what would happen if you do need to split up and scale out your system will help you avoid pitfalls such as relying too much on the local filesystem being shared across pageviews.
In general I believe there is more benefit on spending time on performance early on. (not caching, that is just postponing the problem) as it benefits not just your ability to run on a single system for much longer, it will also make life better for your users.
But yes - its about finding the right balance.
I think it takes experienced managers and developers with business-oriented mindset to achieve the best compromise. Saying "there is no need to overengineer now" can be as bad as months of overengineering. It all depends on the business.
Two weeks after launch > "We have 10k users and counting, why didn't you architect this for scale?"
Always assume you underestimated the scope of the project.
Now I always mostly just assume (capacity asked / 10) is more than enough!
There is a major benefit to UUIDs besides server-side scaling concerns: you can generate them offline in mobile apps and other client-side applications.
With serial primary keys, you are often stuck between waiting for a server round trip before saving data locally, or inventing complicated solutions for referring to not-yet-uploaded data locally.
Works fine for something one team uses, works... not fine for something the entire corporation uses.
My direct boss was the CFO and he was cheap as hell. It's amazing the things we can come up with when we have to.
Curious as to why you see RDS as a 'crappy' option??
Given the choice of building and maintaining a MySQL box in the corner of your bedroom and hosting it on RDS - I know which way I would go (and I have done both over the years).
If you are talking 'scaling' in terms of the OP's article, then RDS is almost a no brainer, and you can scale your instances (and add replicated instances etc.), hide it behind a VPN, set up firewalls to prevent DDoS attacks all in a matter of minutes, with a few mouse clicks.
Given that spooling up a RDS instance is quicker and cheaper than buying all the hardware and spending time installing and configuring an SQL server, then if time to market is a critical factor, you would choose RDS over a home grown solution, wouldn't you?
[Edit: I just realised that this is actually the point you were making, and I got sidetracked by my above question]
I've never heard of a PM saying such a thing.
If you're a startup, go the other way.
It's a multiplayer drawing site built with Node.js/socket.io. I'm already on the biggest Heroku dyno my budget can allow and it's too big of a task to rewrite the back end to support load balancing (and I wouldn't know where to start). Bear in mind that this is a side-project I'm not making any money of.
I had a lot of new features planned but now I've put development on hold. It's not fun to work on something you can't allow to get popular since it would kill it.
Maybe put it on a Digitalocean Server and just reference your Dyno's IP address if possible.
You'll never know for certain until you try. You need to ask put a dollar sign and see how current users respond. Of course only a handful of users will pay, after which you can slowly ramp up prices as you build out the features you have planned.
My previous experience with Firebase makes me hesitant to use a BaaS. It works great for simple stuff but as soon as you want more control it becomes a mess compared to just having your own server. It might just be me who haven't wrapped my head around the mindset correctly though.
This was many years ago though so maybe I should have a look at it again.
As a rule of thumb, start-ups need to be more agile as they are mostly exploring new territory, trying to create new value or re-scope valueless things into valuable things.
Larger companies operate at a scale where minor efficiency improvements can mean millions of dollars and thus require more people to do the same thing, but better. Individualistic thinking on new directions to go is not needed nor appreciated.
Of course there are excepttions. The question boils down to whether or not the ladder is on the right walk before charging up it.
In rare circumstances you can do both. Either the problem is trivial, or the problem becomes trivial because you have a super expert. 10x programmers who habitually write efficient code without needing to think too much have more bandwidth for things like strategy and direction. The car they drive is both more agile, accelerates faster, has a higher max speed, etc...but even this can't move mountains. The problem an individual can solve, no matter the level of genius, is still small in scope conpared to the power of movements and collective action and intention.
The most poweful skill is to seed these movements and direct them.
Abstractly, this is what VCs look for in founders and also a reason why very smart and technical people feel short-changed that they are not appreciated for their 10x skills. (Making 500k instead of millions/billions) They may have 10x skills, but there are whole orders of magnitude they can be blind to.
What matters though is performance and availability. No matter what scale you work at, you can't be slow, that will drive people away. You also can't be unavailable. This means that you might have to handle traffic spikes.
Depending on your offering, you probably also want to be secure and reliable. Losing customer data or leaking it will drive customers away too.
So, I'd mostly agree, in 2016, scale isn't a big problem. Better to focus on functionality, performance, security, reliability and availability. These things will impact all your customers, even when you only have one. They'll also be much harder to fix.
Where scale matters is at big companies. When you already have a lot of customers, you’re first version of any new feature or product must already be at scale. Amazon couldn't have launched a non scalable prime now, or echo. Google can't launch a non scalable chat service, etc.
This is the point I was going to make. When you are designing the very platforms on which all these "does it scale? who cares" startups are going to be built, you do not have the luxury of having that attitude. Definitely don't take that mindset into an interview at google or amazon. :)
I do agree with the OP's sentiment for startup projects in general. I have had the experience of worrying about scale too early and over-engineering a system which never had more than a few dozen users, and the opposite experience of tossing something together that wouldn't scale and then going through the hairy scrambling at every order of magnitude of scale through tens of millions of users. The latter was definitely a better strategy.
However, it was an anomaly because unlike a product someone this article is intended towards would be building, that site had an immediate audience of millions of users from the get go.
Also, the fact that it took a few weeks to be rewritten to handle the load at which point it became extremely successful, strengthens the original article's point.
By the time scalability becomes a problem, you will have enough resources to tackle the scalability problem.
It's the norm for all government projects, and most projects started in established companies that already have millions of users.
To be working in these domains, the "doesn't need to scale" mentality" is very inappropriate and I find it to be doing a lot of damages.
The guys at healthcare handled most the scaling problems, accounting for the means at their disposal and the time frame they had, judging by what was published in the news. That it took a few weeks to smooth showed that they handled scaling beforehand. A launch from 10 users to 10 million is always a bit bumpy for the initial weeks.
On the subject of scaling, I think it's good to have an idea in your head about a path to scalability. One server, using PHP and MySQL? Ok. Just be aware you might have to load balance either or both the server and DB in the future, and that's assuming you've gotten the low hanging fruit of making them faster on their own. But as this thread's top comment illustrates, learning that stuff on the fly isn't too hard. So maybe it's better to make sure you're going with technology you sort of know has had big successes elsewhere (like Java, PHP, or MySQL) and even if you're not quite sure how you might scale it beyond the defaults you know others have solved that problem and you can learn later if/when needed.
Seems reasonable. I wonder, though, if PHP feels like an anchor to the average Facebook developer. I realize they architected around it, but it must have some effect on recruiting, retention, etc. I use PHP myself, and don't hate it, but the stigma is there.
False equivalence. PHP has many, many more flaws to a much deeper and more serious level than any other mainstream programming language. It's insecure, buggy, full of broken behaviour for legacy systems, slow and easy to misuse. Its standard library is inconsistent, hard to learn, easy to misuse, full of legacy behaviour and slow.
>but PHP is a reasonable choice for many problems.
PHP is an unreasonable choice for every problem unless you have already solved your problem with PHP. I'm not saying that Facebook should rewrite in something else, obviously, but nobody should be starting new work in PHP. Nobody.
>Python pip is crap compared to composer for example.
There's nothing at all wrong with it.
>While my favorite language is Elixir I mostly do Node and Python at work can't say experience is significantly better with either of them compared to PHP 7.
Python, on the other hand, is a well-built, well-designed, much more sane language.
As far as none would choose it for new project don't know if you ever heard of this company called Slack they chose PHP fairly recently and seem to be doing OK.
fwiw, I find the PHP stdlib more complete than many other languages.
i.e. it's not bad to avoid working somewhere because they use PHP. That's not 'prioritising tech over business', it's just choosing to avoid toxic crap technology.
Spending a lot of time figuring out what exact microservice/sharding/etc strategy you need to serve a zillion visits a day and building it before you've even got customer/visitor one is overkill out of the gate. But that shouldn't mean you shouldn't think about how you'll scale over the short term or medium term at all.
When I approach scaling, I'll tend to spend much more time on the data retention strategy than anything else: databases (or other stores), being stateful, means that it's a harder problem to deal with later than earlier as compared to the stateless parts of the system. Even so, I'm typically not developing the data services for the Unicorn I wish the client will become, I'm just putting a lot more thought into optimizing the data services I am building so it won't hit the breaking point as early as it might if I were designing for functionality alone. I do expect there to be a breaking point and a need to change direction at some point in these early stage designs. But in that short to medium term period, the simpler designs are regularly easier to maintain than the fully "scalable" approaches that might be tried otherwise, and rarely do those companies ever need anything more.
I have seen applications that convert very well, but were limited by scalability problems. That meant that the business had to hold on on marketing and user acquisition, missed their financial targets, and that cascaded into breaching contracts. The phrase that nobody wants to hear in that situation is "who cares about scalability".
Now, if you did not have a lot of problems scaling in your particular case, that just means it was not an obstacle for you. e.g: you had good intuition around performance/scalability, or the problem was coincidentally a good fit for your technological choices.
Unfortunately not everyone has a good intuition about scalability, not everyone is risk averse and not everyone is good at picking a good technology for their use case. So I disagree with this article in the sense that it is not in the best interest of a random reader to not care about scalability.