
Scaling to 100k Users - sckwishy
https://alexpareto.com/scalability/systems/2020/02/03/scaling-100k.html
======
majkinetor
This is relevant only for multimedia apps.

I have fintech systems in production with 100k+ users with complex Gov app for
entire country that runs on commodity hardware (majority of work done by 1
backend server , 1 database server and all reporting by 1 reporting backend
server using the same db). Based on our grafana metrics it can survive x10
number of users without upgrade of any kind. It runs on linux and dot net core
and Sql Server.

Most of the software is not multimedia in nature and those numbers are off the
charts for such systems.

~~~
bob1029
Thank you for this post. I read "10 Users: Split out the Database Layer" and
about had an aneurysm.

I also work with fintech systems built upon .NET Core and have similar
experiences regarding scaling of these solutions. You can get an incredible
amount of throughput from a single box if you are careful with the
technologies you use.

A single .NET Core web API process using Kestrel and whatever RDBMS (even
SQLite) can absolutely devour requests in the <1 megabyte range. I would feel
confident putting the largest customer we can imagine x10 on a single 16~32
core server for the solution we provide today. Obviously, if you are pushing
any form of multimedia this starts to break down rapidly.

~~~
thinkmassive
It seems like they recommend splitting out the database at the start because
using a managed service is much easier than properly managing your own
production database.

~~~
throwaway5752
I can't speak for high performance/near or realtime system people but I would
not trust a managed database service for those needs. My experience is that
the managed offerings lag behind upstream versions substantially and usually
are economical because they are multitenant. So you have a bit less
predictability in io wait / cpu queue, lose host kernel level tunings (page
sizes or hugepages, share memory allocation, etc), and - not naming names -
some managed db services are so behind they lack critical query planner
profiling features. That's not even going into application workload specific
tuning for various nosql stores. This is a nice article but its audience is
people that haven't scaled up a system but are trying to cope with success.
It's not great generalized scaling advice.

~~~
bob1029
All I will say is that our latency getting a business entity in or out of a
SQLite database (running on top of NVMe flash) is on the order of tens to
hundreds of microseconds. There will never be a hosted/cloud offering that can
even remotely approach this without installing some "bring the cloud to you"
appliance in your datacenter.

~~~
throwaway5752
It almost kills me, just 10 years ago it was nearly six figures to get a
RamSan that had 2TB storage and did 100k/25k r/w iops
([https://www.networkworld.com/article/2268291/less-
expensive-...](https://www.networkworld.com/article/2268291/less-expensive--
faster-flash-array-served-up-by-texas-memory-systems.html))

Now a WD Blue NVMe does 95k/84k r/w iops at $215 and that's just off their
website ([https://shop.westerndigital.com/products/internal-
drives/wd-...](https://shop.westerndigital.com/products/internal-drives/wd-
blue-3d-nand-sata-ssd), may be more depending on shipping method...)

That said, it's not a fair comparison and I wouldn't want to run a big service
on a single sqlite/nvme setup for more reasons than are worth mentioning, but
not prematurely optimizing can take you really far - scale _and_ money - with
good design.

------
jrvarela56
An hour of work put in by any of you reading this is worth several months of
hosting for a starter project in an expensive provider like Heroku.

Do not invest time making sure your service runs for $6 a month if it can run
for $50 with 0 hours invested. Invest that time talking to customers and
measuring what they do with your service.

Most times a few customers pay for the servers.

This is just a friendly reminder. I see a lot of comments talking about
running backends for cheap.

~~~
herval
A friend of mine recently launched a side-project that does heavy processing
of audio. He decided to invest ~2-5 hours properly setting up auto-scaling, a
job queue, etc, before releasing v1.

Fast-forward two days later, his service and a competitor were both featured
on Product Hunt. He's now making a profit on the service, as he managed to
scale it up very fast, while the competitor buckled and completely lost
momentum.

If you're talking about spending _a long time_ preparing a perfect infra, then
your argument makes sense. Spending a few hours? It's both a great learning
exercise and can literally save your project, so why not?

~~~
fogetti
> completely lost momentum

I am sorry but you are really talking about 2 days of time interval and
projecting predictions based on that? Unbelievable...

So what's to stop the competitors to do the same thing that your friend did to
invest 2-5 hours and catch up on the third day???

I wish I was good at ascii art, then I would draw a nice facepalm here.

~~~
yardstick
My take: depending on the nature of the business, and how the publicity was
done, they may only have had one shot at gaining the customers. In 2-3 days
time you might have fixed things, but by then the prospects moved on to the
site that worked.

I’m not convinced that you need to superscale your infrastructure first. I
think it’s normally a waste of time and money. But for the example listed this
is a likely benefit.

------
munns
As the original creator of the presentation referenced by the blog author
(later re-delivered by Joel in the linked post) I am super excited to see this
still have an impact on people, but I'd say today in 2019 you'd probably do
things very differently(as others call out).

Tech has progressed really far and there are tools like Netlify for hosting
that would replace 90% of the non-DB parts of this. Cloud providers have also
grown drastically and so again a lot of this would/could look a lot lot
different.

Fwiw original deck from Spring of 2013, delivered at a VC event and then went
on to be the most viewed/shared deck on Slidehare for a bit:
[https://www.slideshare.net/AmazonWebServices/scaling-on-
aws-...](https://www.slideshare.net/AmazonWebServices/scaling-on-aws-for-the-
first-10-million-users/)

thanks, \- munns@AWS

~~~
debaserab2
Does it look that much different if you exclude solutions that increase vendor
lock-in?

~~~
Swizec
You always pay the vendor. Whether that’s in sweat and tears or in dollars is
up to you.

Fwiw, you are almost certainly shooting yourself in the foot by avoiding
vendor lockin at stages before 8 revenue figures per year. Your engineering
takes longer, is more brittle, and because you’re only using 1 vendor
actively, your solution is still vendor locked-in.

Love, ~ Guy who learned his lesson many times

~~~
debaserab2
I think it depends on the type of vendor lock-in -- sure, the trade off of
having a managed Postgres instance is obvious, but it becomes less obvious to
me when you're using things like a proprietary queueing or deployment service.

Writing service API integration code instead of code that interfaces directly
with the technology that service is doing makes code quite brittle. If/when
the vendor deprecates the service, introduces backwards incompatible changes,
or abandons development of the product, you're left on the hook to engineer
your way out of that problem. Often times that effort is equal to or greater
than the effort of an in-house solution in the first place.

I had the same mentality as you until this happened to the SaaS product I work
on for a few different services. Now at very least I try to make sure
solutions are cloud agnostic.

~~~
Swizec
My use case is that for the past 5 years I’ve built several API integrations
in a vendor agnostic way. We never changed vendors.

Actually, we did once and we found that our abstraction was so tightly coupled
to the underlying API that we had to remake it anyway. The core concepts
between those APIs were just too different.

And I’ve had at least 2 cases where our attempt at being vendor agnostic made
the integration completely fail and never work right. To the point the vendor
told us “You’re holding it wrong, please stop”

------
jedberg
Heh, this is one of the questions I liked to used for interviews.

"Let's work together and design a system that scales appropriate but isn't
overbuilt. Let's start with 10 users".

Then we talk about what we need and go from there. The end result looks a lot
like this blog post, for those who are qualified.

~~~
ignoramous
/offtopic

Heh, you're being modest. I'm sure you've dealt with far more complex
distributed systems than the hypothetical one in the blog post.

~~~
jedberg
Sure, but most of the people I was interviewing hadn't, so it was a good way
to test their knowledge. :)

If you can scale to 100K users, you can probably learn the rest to scale to
100M users.

------
gfodor
It's probably a bad idea to switch to read only replicas for reads pre-
emptively, vs vertically scaling up the database. Doing so adds a lot of
incidental complexity since you have to avoid read after writes, or ensure the
reads come from the master.

The reason punting on this is a good idea is because you can get pretty far
with vertical scaling, database optimization, and caching. And when push comes
to shove, you are going to need to shard the data anyway to scale writes,
reduce index depths, etc. So a re-architecture of your data layer will need to
happen eventually, so it may turn out that you can avoid the intermediate
"read from replica" overhaul by just punting the ball until sharding becomes
necessary.

~~~
jedberg
The problem with going to the "top of the vertical" scaling so to speak is
that one day, if you're lucky, you'll have enough traffic that you'll reach
the limit. And it will be like hitting a wall.

And then you have to rearchitect your data layer under extreme duress as your
databases are constantly on fire.

So you really need to find the balance point and start doing it _before_ your
databases are on fire all the time.

~~~
toast0
Assuming you have a relatively stable growth curve, you should have some
ability to predict how long your hardware upgrades will last.

With that, you can start planning your rearchitecture if you're running out of
upgrades, and start implementing when your servers aren't yet on fire, but are
likely to be.

Today's server hardware ecosystem isn't advancing as reliably as it was 8
years ago, but we're still seeing significant capacity upgrades every couple
years. If you're CPU bound, the new Zen2 Epyc processors are pretty exciting,
I think they also increased the amount of accessible ram, which is also a
potential scaling bottleneck.

~~~
jedberg
> Assuming you have a relatively stable growth curve, you should have some
> ability to predict how long your hardware upgrades will last.

But that's not how the real world works. The databases don't just slowly get
bad. They hit a wall, and when they do it is pretty unpredictable. Unless you
have your scaling story set ahead of time, you're gonna have a bad day (or
week).

~~~
toast0
If you're lucky, the wall is at 95-100% cpu. Oftentimes, we're not that lucky,
and when you approach 60%, everything gets clogged up, I've even worked on
systems where it was closer to 30%.

Usually, databases are pretty good at running up to 100%, though. And if you
started with small hardware, and have upgraded a few times already, you should
have a pretty good idea of where your wall is going to hit. Some systems won't
work much better on a two socket system than a one socket system, because the
work isn't open to concurrency, but again, we're talking about scaling
databases, and database authors spend a lot of time working on scaling, and do
a pretty good job. Going vertically up to a two socket system makes a lot of
sense on a database; four and eight socket systems could work too, but get a
lot more expensive pretty fast.

Sometimes, the wall on a databases is from bad queries or bad tuning; sharding
can help with that, because maybe you isolate the bad queries and they don't
affect everyone at once, but fixing those queries would help you stay on a
single database design.

~~~
bcrosby95
The minute your RDBMS' hot dataset doesn't fit into memory its going to shit
itself. I've seen it happen anywhere from 90% CPU down to around 10%. Queries
that were instant can start to take 50ms.

It can be an easy fix (buy more memory), but the first time it happens it can
be pretty mysterious.

------
charlesju
These posts are great and there is always great information in them. But to
nitpick, it would be a lot easier to digest on face value if you lead with
concurrency rather than raw total users as that's the true gauge of how your
server infrastructure looks like.

~~~
stingraycharles
Yeah I still don’t understand the need to split servers at 10 users. Even if
this is in parallel, it must still mean there is a well-beyond-average
resource consumption per user.

~~~
cwingrav
Probably so when your 10 users grow to 1000, your efforts at 10x are good for
1000x, and you’re working on 100000x.

------
bcrosby95
Conversely, rent or buy 1 bare metal server. That's how we went until we hit
around 300k users. Back in 2008.

~~~
brokencode
I think it’s kind of crazy that we have 64 core processors available, but
still need so many servers to handle only a hundred thousand users. That’s
what, a few thousand requests per second max?

Having many servers gives you redundancy and horizontal scalability, but also
comes at a high complexity and maintenance cost. Also, with many machines
communicating over the network, latency and reliability can become much harder
to manage.

Most smaller companies can probably get away with having a single powerful
server with one extra server for failover, and probably two more for the
database with failover as well. I think this would also result in better
performance and reliability as well. I’m curious to know whether the author
tried vertical scaling first or went straight to horizontal scaling.

~~~
neurostimulant
The bottleneck on a single big server setup is available network bandwidth to
serve all those 100k users. If you run a simple site you can probably slap
some cdn to serve all your static assets so it won't clog your network, but if
your app uses more bandwidth per user than a typical website that can't be
offloaded to cdn, then your single server might not have enough available
bandwidth to serve all those 100k users and you'll be forced to scale
horizontally even though your server still have plenty of cpu and i/o
capacity. You might be able to increase your bandwidth but your mileage may
vary as dedicated server vendors usually cap their offering to 1-3gbps per
server.

~~~
jjeaff
That would have to be 100k concurrent users all streaming data at to 100kbps
to saturate a 10gbps connection. Which are not hard to find these days. At
least I came across several offerings when browsing bare metal server options
recently. And they were not that expensive either.

And as a side note, anyone who is using up that kind of data is not going to
be able to afford cloud egress prices unless they are making a mint on those
users. Saturating a 10gbps connection would cost you around $450 an hour at
AWS rates.

~~~
neurostimulant
European providers seem to be more generous with bandwidth. On other locations
if you want 10gbps per server you probably need to talk to someone first, and
there are nonzero chance that they can't fulfill that if their datacenter is
not that big.

------
thaniri
This blog post is almost entirely a re-hash of
[http://highscalability.com/blog/2016/1/11/a-beginners-
guide-...](http://highscalability.com/blog/2016/1/11/a-beginners-guide-to-
scaling-to-11-million-users-on-amazons.html)

The primary difference is that this post tries to be more generic, whereas the
original is specific to AWS.

The original, for what it is worth, is far more detailed than this one.

~~~
lixtra
That’s on purpose:

>> This post was inspired by one of my favorite posts on High Scalability. I
wanted to flesh the article out a bit more for the early stages and make it a
bit more cloud agnostic. Definitely check it out if you’re interested in these
kind of things.

------
k2xl
A little bit of overkill recommendations here.

With 10 users you don't "need" to separate out the database layer. Heck you
don't need to do that with 100 users. Website I ran back in 2007-2010 had tens
of thousands of users on a single machine running app, database, and caching
fine.

Users are actually a really poor way use for scalability planning. What's more
relevant is queries/data transmission per interval, and also the distribution
of the type of data transfers.

I'd say replace the "Users" in this posts to "queries per second" and then I
think it's a better general guide.

~~~
AlchemistCamp
Even in that case it's overkill. My site load tested at ~1k requests per
second two years ago, when it was entirely on a $5/month DO droplet.

------
marcinzm
That seems pretty aggressive for just 100k users unless they mean concurrent
users (in which case they should say so).

Let's say that maybe 10% of your users are on at any given time and they each
may make 1 request a minute. That's under 200 QPS which a single server
running a half-decent stack should be able to handle fine.

------
erkken
We now use a DigitalOceans managed database with 0 standby nodes, coupled with
another instance running Django. It is working good.

We are however actually thinking about switching to a new dedicated server at
another provider (Hetzner) where we are looking at having the Web server and
the DB on the same server, however the new server will have hugely improved
performance (which is sometimes needed), still at a reduced cost compared to
the DigitalOcean setup.

The thing we are doubting is if having a managed db is worth it. The sell in
is that everything is of course managed. But what does this mean in reality?
Updating packages is easy, backups as well (to some extent), and we still do
not use any standby nodes and doubt we will need any replication. So far we
have never had the need to recover anything (about 5 years). Before we got the
managed db we had it in the same machine (as we are now looking at going back
to) and never had any issues.

Any input?

~~~
ryanar
I thought part of their managed service was that they optimized / tuned your
postgres db based on how you were using it. If that is true, then moving off
of the managed service means you are tuning postgres yourself now.

Also want to throw in there that it is important to not only compare specs,
but to also compare hardware. If DO has newer chips and faster RAM, then you
will take a performance hit moving to the new provider even if the machine is
beefier.

~~~
adventured
Pretty certain that DO tunes to broad usage performance optimization (all the
easy, obvious performance wins), not dynamically per client to each client's
usage.

Here's their pitch: easy setup & maintenance, scaling, daily backups, optional
standby nodes & automated failover, fast reliable performance including SSDs,
can run on the private network at DO and encrypts data at rest & in transit.

~~~
ryanar
huh, when I used them in the past we had someone specifically look at our RDS
and data usage and tune it.

------
marknadal
Eh, this article was disappointing.

It teaches good _business_ practice of tackling 1 thing at a time & not over-
engineering.

But simply using read replicas, caches, CDNs, etc. does not mean things will
scale.

Actually, often this will break your app unless you write concurrency-safe
code. Learning and writing concurrent code is how you scale, the "adding more
boxes" is just the after effect.

For instance, we have about 10M+ monthly users running on GUN
([https://github.com/amark/gun](https://github.com/amark/gun)), this is
because it makes it easy/fun to work/play with concurrent data structures, so
you get high scalability for free without having to re-architect.

But learning something new is never an excuse for shipping stuff today. Ship
stuff today, you can always learn when you need it.

------
AlchemistCamp
I find this deeply unconvincing as I've scaled multiple apps to the 10k range
while on low-tier hardware using the setup Alex suggests is only appropriate
for _one_ user.

The old Stack Overflow podcast was also very instructive. They went a _very_
long way on a single server and had the Reddit founders on the show to talk
about their scaling during their process of adding a second box. This was on
servers of the mid-aughts, running ASP.NET.

------
dillonmckay
So, I just finished reading the article, and the last paragraph quickly
mentions logging.

So, in the ‘spirit’ of this article, would that not be one of the first things
to implement with the system?

Wait until you have added a caching layer and sharding the DB, to begin
implementing logging?

I may not be reading this correctly.

I could see the case being made for distributed tracing, but having a logging
strategy that can also scale and be flexible seems really important, to me at
least.

------
simplecto
I am glad to see so many HN'ers here who run very successful projects on bare
metal or simple VMs. We should do more to talk about those business verticals,
use cases, and how we solve them practically from a tech point of view.

------
ReverseCold
I'm running ~1k DAU on a $6/mo Vultr VPS. Just a Phoenix Web App, no special
things done to optimize. If I cache a little more aggressively on the frontend
I should be able to handle even 10k. As always, advice in article depends on
what you're doing.

------
huzaif
We can now achieve pretty high scalability from day 1 with a tiny bit of
"engineering cost" up front. Serverless on AWS is pretty cheap and can scale
quickly.

App load: |User| <-> |Cloudfront| <-> |S3 hosted React/Vue app|

App operations: |App| <-> |Api Gateway| <-> |Lambda| <-> |Dynamo DB|

Add in Route53 for DNS, ACM to manage certs, Secrets Manager to store secrets,
SES for Email and Cognito for users.

All this will not cost a whole lot until you grow. At that point, you can make
additional engineering decisions to manage costs.

~~~
aratakareigen
Great, but this reads like a particularly blunt Amazon ad. Is there a way to
achieve "high scalability" without selling my soul to Amazon?

~~~
huzaif
Yes, it does read like that.

In the context of a start-up, cost is a big factor and then perhaps
(hopefully) handling growth. You could start small and refactor
apps/infrastructure as you grow but I am unsure how one could afford to do
that efficiently while also managing a growing startup.

On the selling soul to cloud provider, I don't see it like that. I have a
start-up to bootstrap and I want to see it grow before making altruistic
decisions that would sustain the business model.

Once you are past the initial growth stage, there are many options for
serverless, gateway, caches, proxies that can be orchestrated in K8 on
commodity VMs in the datacenter. Though this is where you would need some
decent financial backing.

(I am not associated with Amazon, Google or Azure. I do run my start-up on
Azure.)

~~~
ignoramous
I'm down a similar route, but I must point out that beyond a certain number of
users / scale, Serverless becomes cost-prohibitive. For instance, per back-of-
the-napkin calculation, the Serverless load I run right now, though very cost-
effective for the smaller userbase I've got, would quickly spiral out of
control once I cross a threshold (which is at 40k users). At 5M users, I'd be
paying an astonishing 100x the cost than if I hosted the services on a VPS.
That said, Serverless does reduce DevOps to an extent but introduces different
but fewer other complications.

As patio11 would like to remind us all, _we 've got a revenue problem, not a
cost problem._ [0]

[0]
[https://news.ycombinator.com/item?id=22202301](https://news.ycombinator.com/item?id=22202301)

------
flukus
Users has to be the worst unit you can possibly use, there's just to much
variance between projects. If it's 100k users for a timesheet app people use
once a week you've got very different scalability requirements than a 100 user
app that people constantly interact with day in day out. Even then there's big
differences depending on the domain, 100k users inserting data into a single
table looks very different to the same entering information across 30 tables.
You can't just pretend there's magic numbers that apply to everyone.

Many of those steps are reduce scalability if they're applied prematurely,
splitting out the API and database layer at 1000 "users" is going to use more
resources serializing things across the network than keeping it in process
would. Same for seperating out the database, it's great if you need it but
there's a cost if you don't. I worked on one system where we pulled out the
API layer after realizing that this was where ~50% of our CPU time was being
spent.

It also seems to focus on vertical layering more than horizontal splitting,
being a photo sharing website I would have thought there was a lot of CPU
intensive photo manipulation or something they can split off to services on
the side that doesn't need to be done in real time.

~~~
highhedgehog
Of course the article doesn't mean to give a solution to every possible
scenario. It has to be taken as a general guideline of the necessary steps you
should take in scaling an application but obviously every scenario is unique
and needs to be analyzed before taking actions.

------
segmondy
IMHO, I believe discussion about scale should begin with at least 1Million
users these days. 100k has been old news for more than a decade.

~~~
lbriner
As stated above, the number of users is not a measure of a system, it is the
concurrency multiplied by the typical system loads per action.

Clearly a million users on Facebook is much heavier than a million registered
with online banking and who only use it once a month.

------
highhedgehog
I have a few questions:

1) I don't really get the "100 Users: Split Out the Clients".

It definetly helps in terms of understanding your customer profiles, whether
they prefer the mobile app or the web interface for instance, and it might
help from a usability point of view, but how does this help scalability per se
if the API layer stays the same?

Splitting client happens, obviously, at the client level..

2) Also, I don't understand "This is why I like to think of the client as
separate from the API."

Who considers the client and the API as the same thing?

You can consider the API as the client of the DB, sure, but why would you mix
the user client and API together?

3) Caching: here I lack some knowledge. "We’ll cache the result from the
database in Redis under the key user:id with an expiration time of 30
seconds". I assume that every access to Redis _will not_ refresh the cache
(aka reset the counter), otherwise you could potentially never get an updated
data, right?

------
arwhatever
Couple of questions for the author as well as the HN crowd:

1\. Should I interpret each X number of users as stated in the article to be
"simultaneous" or "generally active over the past Y amount of time" or even
"total user count ever?" "Unique per day?"

2\. What do you think in general about skipping the step "100 Users: Split Out
the Clients" if one is reasonably certain to not want or need multiple
clients? It would seem as though this could keep the deployments and testing
simplified until later in the growth stage, as more code can be
deployed/tested as a single bundle. But also, I want to be sure I'm not
missing something by just trying to justify my own interests.

------
kashprime
Early stage social media/news feedy startup here, wondering if a strategy of
just starting off with Firebase and praying will work.

I figure if we did get to a thousand or ten thousand users, we could recruit
the DevOps talent to pull this off before the bills kill us!

~~~
tqkxzugoaupvwqr
Start with PostgreSQL as database and PHP/NodeJS/Java/<popular language> as
web server. It doesn’t sound sexy but leaves enough room to grow your app if
need arises, and it’s much easier to find people who know the stack. Firebase
is a mistake in my eyes because when high bills come in, you are locked in.
Either the bills or the migration to another stack will kill you.

~~~
kashprime
I think you're right... We blew through the free tier 50k database reads just
on testing, and that's just 3 users!

------
ablekh
Nice, but very simplistic (on purpose, it seems), write-up on the topic. For a
much more comprehensive and excellent coverage of designing & implementing
large-scale systems, see [https://github.com/donnemartin/system-design-
primer](https://github.com/donnemartin/system-design-primer). Also, I want to
mention that an important - and relevant to scaling - aspect of _multi-
tenancy_ is very often (as is in Alex's post) not addressed. Most of the
large-scale software-intensive systems are SaaS, hence the multi-tenancy
importance and relevance.

------
katzgrau
Good post, two thoughts:

* I'd imagine the website layer is frequently static (html/js) and could just be hosted on s3/cdn. One part of scaling avoided.

> This is when we are going to want to start looking into partitioning and
> sharding the database.

You have to be at pretty huge scale before you really need to consider this. A
giant RDS instance, some read replicas, and occasional fixing of bottlenecks
will go a lonnng way. And scaling RDS is a few clicks. By the time you need to
start sharding, you can probably afford a dedicated database engineer, or at
least I'd hope.

------
iamaelephant
Jesus. If you're having to split your application across multiple machines at
just 1,000 users you are poked by the time you have to properly scale.

------
echelon
Questions for y'all. (Rather, I'm soliciting broad technical and business
advice.)

I have built a very fast and efficient CPU-only neural TTS engine in
Rust/Torch JIT that is the synthesis of three different models. I've got a
bunch of celebrity and cartoon voices I've trained. The selling point is that
this runs on cheap, commodity hardware and doesn't require GPUs. I can easily
horizontally scale it as a service.

I've currently got it running in a Kubernetes autoscaling group on
DigitalOcean, but I'm worried about the bandwidth costs of serving up
potentially thousands of hours of generated audio. I haven't thrown any real
traffic at it beyond load testing, but I think it can survive heavy traffic.
The thing that worries me is the bandwidth bill.

Does anyone have experience with other hosts that are cheap for bandwidth
intensive apps? Are there hosts that provide egress bandwidth on the cheap for
dynamically generated (non-CDN) content?

Subsequent to this, I would really like to sell or monetize this app so I can
fund the R&D / CapEx intensive startup I really want to undertake.

Who might be the market to buy a TTS system like this?

I was thinking Cartoon Network might want "Rick and Morty" TTS, but despite my
engineering to scale this and make it sound really good, I doubt they'd pay me
much for the product. I suppose $2M would give me runway to hire a few
engineers and buy a lot of the equipment I need, but I have no idea who would
pay for this.

Glass for optics is surprisingly expensive, and beyond that I have other
extremely high R&D costs.

Alternatively, I also have a "real time" (~800ms delay) neural voice
conversion system. I thought about running a Kickstarter campaign and selling
it to gamers / the discord demographic. It's relatively high fidelity with no
spectral distortion, and I have a bunch of hypothetical mechanisms to make it
an even better fit.

I've also thought about slapping a cute animation system on top of my TTS
service let people animate characters interacting. (Value add?) An earlier
non-neural TTS system I built before the last presidential election cycle had
something like this, but more primitive:
[http://trumped.com](http://trumped.com) (The audio quality of this
concatenative system is absolute garbage. The new thing I've built is
unrelated.)

~~~
ronsor
If you're worried about bandwidth, you could just let people download an
offline app/sdk under a proprietary license.

------
agumonkey
odd, that was my google query of yesterday..

I'm curious what kind of hardware can sustain 100k concurrent connections
these days.

~~~
lbriner
We were running a speed test with node vs dotnet core and even on a small
Linux box (4GB, 1 core), we could reach nearly 10K concurrent requests for a
basic HTTP response but the exact nature of the system will affect that
massively.

Add large request/response sizes or CPU/RAM bound operations and your servers
can very quickly reach their limits with far fewer concurrent requests.

Architecture is a big picture task since you have to consider the whole system
before implementing part of it, otherwise you end up having to start again.

~~~
agumonkey
thanks that's already a lower bound point of reference

------
superphil0
Use firebase or any other serverless architecture and forget about scaling and
devops. Not only will you save development time but also money because you
need less developers. Yes I understand at some point it will get expensive,
but you can still optimize later and move expensive parts to your own
infrastructure if needed

------
abinaya_rl
I'm running 20$ Linode instance and I'll not upgrade that very soon :)

