I'm the CTO of a moderately sized gaming community, Hypixel Minecraft, who operates about 700 rented dedicated machines to service 70k-100k concurrent players. We push about 4PB/mo in egress bandwidth, something along the lines of 32gbps 95th-percentile. The big cloud providers have repeatedly quoted us an order of magnitude more than our entire fleet's cost....JUST in bandwidth costs. Even if we bring our own ISPs and cross-connect to just use cloud's compute capacity, they still charge stupid high costs to egress to our carriers.
Even if bandwidth were completely free, at any timescale above 1-2 years purchasing your own hardware, LTO-ing, or even just renting will be cheaper.
Cloud is great if your workload is variable and erratic and you're unable to reasonably commit to year+ terms, or if your team is so small that you don't have the resources to manage infrastructure yourself, but at a team size of >10 your sysadmins running on bare metal will pay their own salaries in cloud savings.
Later I worked at a FAANG and remember when Snap filed their S1 when they were going public they disclosed that they were paying Google $5B and we were totally shocked at the cost compared to our own spend on significantly larger infra.
I think people don’t realize this is doable and it’s great to hear stories like yours showing the possibilities.
> paying Google $5B
You were off by 10x :). The annual commitment was $400M/yr on average. Snap’s S1  said:
> We have committed to spend $2 billion with Google Cloud over the next five years
I agree with everything you say - I'm convinced that a huge part of the cloud's financial success is due to how it allows CTOs/CIOs to indulge their fantasies about having a mega-scalable app - even if their workloads are very regular and predictable. Along the lines of buying an expensive sports car but never driving it fast, you're just paying for the kudos it brings you in the eyes of other people.
Having said that, we are happily using the cloud for our small app because it makes no sense to build out our own infrastructure for a single VPS and database.
It's a sane choice given incentives, too. Cloud bullshit features prominently on an awful lot of technical job postings these days.
But yeah I've definitely seen some heavy, expensive cloud setups that could have run on a toaster. Smaller-scale B2B stuff seems especially prone to this—like, what's your max reasonable traffic? If you took off like crazy? What's the size of your market? Come on. Throw in really half-assed and inconsistent use of automation tools and lots of relying on shitty cloud web dashboards and hope, and often the toaster (or a couple low-end co-located servers, more seriously) would be easier and safer to manage, too.
These things don't matter for large, established companies because they already have DevOps, SysAdmin, and Development teams. But for smaller dev shops, it absolutely makes a difference when you can generate a good bit more efficiency from your development staff.
To give you an idea, if you want to run a data center or your own servers in some countries you need a standby generator (because electricity is not a given) and the Diesel used to run these generators are imported and the economy of these countries is shaky so the exchange rate fluctuates, so suddenly the cost to keep your site up becomes a variable and is now subject to government announcements (not even in an evil authoritarian way), policies and import taxes. In the face of all of this, having a steady AWS bill with reliable infrastructure becomes priceless to these companies
I ran data centers for a living in Northern VA and had all sorts of international clients. Egyptian schools who rented servers, Brazilian Protestant ministries who shipped servers to us, etc. There were some decent data centers in Mumbai we had to get VPNs built for, and we had a least one legit client in Lagos, Nigeria.
You'd want to co-locate with ISP who already has infrastructure for continuous services through blackouts et al, or you could have a datacenter if it's small-ish, because you already have some infrastructure to keep operating during blackouts.
Even then, you can colocate in a datacenter anywhere you want, have equipment delivered there and pay remote hands to install it for you for a very reasonable fee.
Of course this doesn't make sense if you just want a small webserver, but that's not who we are talking about here.
And you cannot have a website that doesn't cater to US and EU users, unless the website only solves local problems.
The detail to look at is that flexibility is a feature of the cloud offering, and an expensive one at that. If you don't need it, you need to find a way to not pay for it.
My guess is they did it for cost reasons.
Isn't that true in all cases?
There is no doubt that rolling and maintaining your own infrastructure can be and is better than dumping cash on the AWSs of this world. The only question is what size marks the breakeven point.
Looking at the comments here, I think it's clear that there are a relatively small number of use cases where roll your own is a better idea, primarily where you have a huge number of servers basically all doing the same thing with lots of data transfer (which is comparatively expensive in the cloud). This may be cheaper to manage when a small team of people if you're essentially cloning similar setups.
 - https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-f...
 - https://blog.cloudera.com/introducing-s3guard-s3-consistency...
If you want to use this to create new objects all the time, rather than update ones you already have, you now have to keep track of which objects in your bucket are "old" and should no longer be there. But yes, totally doable.
Sometimes old versions seems to take precedence over newer ones, for some reason.
Then they get to a certain inflection point on the growth curve and splurging for unpredictable capacity slowly is replaced with reasonable costs for the known capacity.
An average company that uses does not provide IT infrastructure (like storage in this case) to a massive amount of clients.
Personally I don't think there's a one size fits all solution, you will have to do the math (like I'm sure Snap, Netflix and others have done) to see if cloud is worth it. However, I agree, for most teams the default should be cloud.
200 million a year, huh.
The cost of commercial office space in the U.S. can range from $6 per square foot in low cost regions to over $12 per square foot in New York City. On average, a 50-cabinet data center will occupy about 1,700 square feet. At a median cost of $8 per square foot, the space alone would cost about $13,600 per month. 
Are Uber renting on the order of two million square feet of data centre? Do they have sixty thousand cabinets of hardware?
If they do, i would absolutely love to see a quote for how much it would cost them to run in the cloud.
I think it's far more likely that number is bullshit.
A pretty dense cabinet should only cost ~$1400/mo at wholesale (1MW+ rooms) rates, and $200M is 143,000 cabinets.
And it's public that Uber uses multiple clouds as well.
Disclaimer: I haven't reviewed public Uber filings, would be very interested if there's any data that indicates they're really spending $200M on opex for real estate (which would be equivalent to $400M/year on cloud, which is either opex or potentially mix if there is some reserved instance-type cloud spend).
But servers can't be included in a "Uber spends nearly 200M/yr alone on real estate for their datacenters" figure anyway.
Add $200 per server per month to have a gigabit uplink.
Add $4000 per server per lifetime for VmWare licenses.
These are reasonable estimates of course. Could be multiple of that depending on what hardware is used and what colo. Physical hardware easily gets as expensive as any cloud.
I know that hardware varies a lot for sure, but for context I put together 3 racks 50% density with an 800gbit backplane for around 18k eur/mo.
I spared no expense, official juniper QSFPs (which are egregiously overpriced) and top of the line Dell servers with full out of band licenses.
And once there: interconnected bandwidth and IOPs became “free” (or, no extra charge).
We put the same application in cloud And it costs us 40k eur/mo with a heavy amount of optimisation, with half-sized instances and an aggressive optimisation in bandwidth/iops.
Clouds “sticker price” to us is 4x that of physical. You can buy a lot of human time for that price.
If anything that's an argument against physical infra, not in favor of. Although it's fine if one wants a lot of the same big servers (an Hadoop cluster, or video computing cluster, or a CDN), which are the few use cases where physical can make sense (and hybrid cloud probably makes even more sense).
With AWS, you'd never spin half of that infra upfront. You'd spin a few VMs and start running stuff. If the project goes well, spin more or bigger VMs, otherwise spin down. Cost is very dynamic and the company doesn't have to spend half a million upfront, which is a big financial problem for many companies.
Anyway. We can both agree on AWS being overpriced. The reference on costs should be Google Cloud if one is cost conscious, not AWS. Google Cloud is often half the costs of AWS for the same thing.
AWS vs Google comparison, bit old and instance types have changed but relative pricing has not moved https://thehftguy.com/2018/11/13/2018-cloud-pricing-aws-now-...
And this one more recent since they've released high memory instances to compete with SoftLayer: https://thehftguy.com/2018/11/13/2018-cloud-pricing-aws-now-...
And, I would really agree with you if not for 2 things:
1) "wasted" cycles are not wasted, the CPU will clock down.
2) Kubernetes was designed specifically for this, the idea is you slice the CPU up so much that you dont waste much resources.
It's astonishingly capable of consuming all resources.
By the by; RAM is never "wasted", it's used for various caches.
The 200M is for a year - at $1400/mo, that's 11905 cabinets.
we ran the numbers continually and like others mentioned, it was a no-brainer to build on-prem. that said, there are a few use cases where aws/gcp are a great fit. people selling cloud without putting in real engineering work to enterprises generally make my life harder than it already is.
Also Uber seems to be much less computing/BW demanding than Snap
>> Your infra would be better tailored to your workloads
How do you scale elastically with an on-prem infra? What part do you tune more to your own workload when an average company cannot even tune the GC for Java for their own workload?
Your claims do not reflect on reality, it is rather your imagination. I have migrated countless companies to the cloud, and almost every single migration was driven by a single factor: cost. Everything else was an added bonus: inscrased security, availability and elasticity.
This is the antithesis of any cloud. You only pay for what you use.
> At $5B it would not cost anywhere near that much to replicate.
If you can recreate GCP for $5B in capex, there are likely some VCs lurking here who would like a word with you.
If your business is something that is fundamentally not “pushing bytes”, you really, really don’t want to know about routers, firewalls, the OSI layer, SHA256, RAID arrays, all the way to this week’s JS framework. All of that is a big annoyance, and paying AWS to “take care of it” makes sense, even if it comes at a higher price: more often than not, the difference wouldn’t be offset by the time, effort, and risk exposure that you would have to allocate when building your own.
This calculation is different if your business is primarily digital. The gentleman upthread making a game, for example, is perfectly right: his company naturally developed a culture that can evaluate and manage every aspect of its digital operations, because it’s part of its core business, so it makes sense to put that knowledge to good use and save money.
Firstly, if you are big enough, you can manage/architect your data centre better than a cloud provider. You can afford to hire staff to do it, and they can build something specific to your needs. Companies like this should be on-prem.
(It doesn't matter if the company is "digital" or not. It could be a huge retail chain, or a government agency, whatever. The only requirement is to be big enough that you have big needs and can afford a big spend on staff.)
Secondly, if you are small enough, you really, really don’t want to know about routers, firewalls, the OSI layer, SHA256, RAID arrays, etc. Dealing with that would mean another couple of full-time employees, and you can't afford that. Companies like this should be on a SaaS (ideally one less full of gotchas than Heroku).
Thirdly, if you are in between, you have the capacity to deal with system administration, there are some advantages to being able to shape your infrastructure to your needs, but you really don't want to get into real estate and millions of dollars of CAPEX. Companies like this should be on rented physical hardware.
What doesn't make sense, at any scale, is renting VMs.
It's interesting that competent engineers is only a question about cost to you. Where I am at (north of Europe) it's really hard to find good engineers. Even though it doesn't mean much about competence it's not mandatory for people in the IT industry to have a CS degree even if they're managing infrastructure in the cloud for millions every year.
I've been working in the industry at different companies for a bit over 10 years and I would say that a huge majority is just an average person who knows basic concepts about infrastructure but wouldn't be able to design and implement any of it by their own. This includes network infrastructure for the biggest ISPs in the country.
I work as a system engineer and the teams I've been working with the last 5 years, independent of company, have been looking for engineers constantly. The pay well and have great benefits but it's just not enough people out there to take these jobs and manage a full on-prem infrastructure.
So a part of the money you're talking about that should be used to hire all these really competent people are instead used to pay for cloud services where we know that professional people with far more hardware (and software) knowledge is managing the infrastructure. And this is really convenient for a lot of companies.
I’m from a small province in Canada and we have similar problems here sometimes. One of the questions I sometimes have to ask clients when they make a statement like that is: “do you pay well for the area? Or do you pay well enough to attract talent from out-of-province?”
This often leads to them arguing that “but the cost of living is low here!” And inevitably I have to mention my friends who left the province to go to the Bay Area for 10 years and came back with $500k USD in stocks they collected, on top of what they had left over after paying their high cost-of-living rent.
I feel your pain. Companies running in less desirable areas have to somehow pay a premium for top tier talent. One way, as you mention, is to outsource, whether to cloud providers or part time contractors.
This isn’t meant as a brag at all, but I do the independent contractor thing here and make “pretty darned good for the area” money. Inevitably clients will ask me to come work full-time for them, and we have a painful conversation where I tell them what my taxable income was the year before, as well as the investments I made into equipment and licenses I use to provide the services I provide. The result so far has always been to continue being happy with me as a part-time contractor!
Edit: sorry for the typos, written on mobile with the swipe text thing
p.s hope you're staying safe from COVID
But to give some kind of estimate just so you know what page I'm at when I say they pay well I would say somewhere between $60-75k.
I disagree. The big cloud providers will let you rent VMs across multiple availability zones in a single geographic region. That gives you better redundancy than you could get by renting one or more dedicated servers in a single data center. Yes, those same cloud providers offer bare-metal instances, but those are absurdly expensive for a company that only needs small-scale computing power but still cares about uptime.
I wish i had a good answer about PaaS. There are hosted Cloud Foundry services. I think CF is more sensible than Heroku, myself. The various serverless platforms are PaaS of a specific kind. I think SalesForce counts. Is OpenShift still a thing?
Even if all it does is suck data in, perform some computation, and send it off somewhere else - are you saying there are SAAS providers that are better equipped for this?
That and very preferential pricing.
I recently talked with someone who said their costs dropped in half from switching off a major cloud provider to OVH's dedicated servers. Performance went way up too.
For $200 / month you can get a machine with a high end Xeon 8 core (16 threads) CPU, 128 GB of memory, 4.5 TB of SSD space across multiple disks with unlimited incoming / outgoing traffic and anti-DDoS protection.
No reputable cloud provider is going to come close to that price with their listed prices.
Of course there's trade offs like not being able to click a button and get it within 2 minutes for most server types but if you have consistent and predictable traffic, going with an over provisioned dedicated server seems like a very viable option.
Because they can calculate TCO correctly.
So assume you're doing a 3-year or 5-year TCO -- you build in credits, discounts, and bundled options. Execs are looking for a 3-year apples-to-apples spend, and you make your TCO look fucking amazing. They see the low price and decent technical options and they bite.
3 years later those credits vanish and they're paying full OpEx costs. And after 3 years they're now invested -- stuck -- in their space/circuit/whatever. You can start raising the price or negotiating new contracts.
Same thing with the cloud, for that matter. Cut a glorious bulk deal with Microsoft for Azure space, and then after you've moved everything to MS they can start nickle and diming you -- cuz the cost of moving that load to AWS or GCP isn't cheap, and you're not going out and buying more hardware and going back to CoLo are you?
After all people who make decision on customer side are in same trap - in 3-5 years if not earlier they won't be at the company anymore and it won't be their problem.
Our IT infra costs are 1/10th the cost of cloud, simply because I happen to be comfortable having on-premise machine and working on them (sometime myself).
We have two dozen servers in two locations. It's more time to setup, but maintenance is actually quite low.
Unless you are really small, have variable workloads then the cloud is maybe not for you. Unless the cost is a small part of the total cost of the plattform like not really related to how many users/sales you have.
We could spend time making sure we use the smallest aurora instances possible in AWS or we could standardize on a version to support and use the smallest size of that version that's available. We spend a lot on this extra capacity in some places, but it greatly increases our speed in other places.
It's all about trade-offs and what your organization values.
More often than not, my fellow devs are happy throwing extra servers at a problem instead of tackling the problem :(
I'd rather not get put in that position. Pre-pandemic, business was pretty good, but that seems like an unwise assumption as we slide into a recession.
Maintenance burden that's avoided has two inflection points rather than one. In a large enough data center, there's enough work simply replacing hard drives that justifies having a team that manages it. From the other end, that's just not true if there's a small number of hard drives (like O(1)). There's a middle area though, where there's enough work for more people but the overhead of there being a team is too expensive.
The real cost is IT talent and time. I happen to have homelab experience with HP servers, so (for us) those costs are extremely low.
It's honestly not as complex or costly as cloud providers make it sound.
Replication offsite is easy via s2s vpn and veeam - as is spinning up new VMs for dev or testing. Building already has good a/c and backup power, and hardware/drive failures are rare.
Are you setup with one or two racks in each location and maybe one full time IT/Sys admin at each location?
So if you use a lot of bandwidth on top of it (can be another 25-50k at AWS) this could reach into to levels where it's worth it to hire two devops guys to run your own in combination with some SLA/management agreements with a colo.
I do hope you regularly patch and reboot your systems.
(Not picking on CentOS, just using it as a placeholder for $REQUIRED_OS_BECAUSE_OF_CUSTOM_SOFTWARE.)
Cloud? Automatically updating operating systems. On-prem? Must be stuck on an old version.
Microservices? You can release to production in minutes. Monolith? Must take you weeks to get a release out.
Go? Simple self-contained deployables. Java? Must need a JDK and Tomcat installed on the server.
It always seem to come up when the leading term is something dubious (cloud, microservices, Go), but the following term is something unequivocally good. Or maybe that's just when i notice it.
I've looked several times and found different technologies over time (kexec, (Oracle's) ksplice, kGraft, kpatch, livepatch). They do appear to have some use-cases, e.g. delaying the need for a reboot by being able to install a critical vulnerability fix/workaround so that the reboot can be done at a more convenient time. Because many of the patch mechanisms are function-based, they don't appear to solve the general problem in such a way that reboots can be avoided all together for arbitrary large kernel changes. From my reading of the solutions none are at the level of unattended upgrades using apt/yum-cron or similar in a way that "most" can benefit from them without worrying too much about it (ksplice might do it, but not sure how much you need to pay for it for server use and therefore how accessible it is). kexec helps with skipping the bootloader/BIOS, but I'm not sure if it ends up restaring all the systemd services or going up/down the runlevels, some places suggest it reduces downtime but doesn't eliminate it. I've not experimented with any of these myself yet... so I'd be happy to be proven wrong and in any case learn more!
EDIT: forgot to mention livepatch
Awesome seeing you here! I used to be super active on Hypixel when I was in high school (I was actually #1 on the leaderboards for one of the games for over a year). One of my first large scale programming projects ever was creating a Hypixel mod for my guild. Years later I am now a software engineer working at Google on the YouTube algorithm!
Have lots of fond memories of those early years, especially Minecon 2013.
The downside is that this competes with school (thanks god he is very good so that's fine-ish)
The upside is that it is quarantine in France so I can manage this.
Thanks for all of that!
Everything is deployed with Ansible, monitored with Monit. The servers have redundant PSUs and SATA hot-swappable HDDs, so you can fix minor hardware issues without having to reboot. As the result, we have more than a year of uptime on a typical server.
On the other side, we've been hit with several S3 outages. My feeling is that the cloud needs fully automatic fail-over because it is failing much more often than all our bare metal server combined.
As I said, we are considering but the analysis hasn't been done yet. If you have insight, please.
It's also for speed to market... Our Infra setup where I work is extremely slow and at times takes over 1 month to setup a database.... and a VM about the same.
There is no way you need that many for maintaining a fleet of 50 CPUs (chance is that in OPs case they got duel socket servers as well).
Or if you're just constantly iterating on a large product with many engineers. Those engineers' salaries almost always outweigh all of your cloud costs and so making them productive is cost effective. Things like SNS/SQS/S3/VPC/ELB/etc. save you countless hours and often make up for the increased cloud costs with increased developer productivity.
I think there is a divide here. Most people in this thread that mention cost as an issue are running some serious gear, and that does not come cheap. Your parent is running 70 servers plus some serious networking equipment, which is easily a couple million dollars. And he said that a cloud provider would 10x that cost.
If all you need is a little VM to run a website, cloud hosting is cheap. If you are running real infrastructure, a couple hours of developer time will never out-weight the astronomical costs of cloud hosting.
If we'd started in the beginning (11 or 12 years ago) with our own infra, I imagine our recurring infra costs would be lower right now, but I also suspect that our 3-person founding team would have failed to produce an MVP before their initial time and money ran out.
If I were starting a company now, I'd probably do things in a more "platform agnostic" way such that a cloud->on-prem migration might be easier. But I still never expect it'd be easy.
That's the whole problem why we haven't seen a competitive reduction in prices between cloud providers, yet.
Over time, buying things yourself will always win out on cloud architecture, or the cloud business would be bankrupt.
Not saying this is what happens in practice.
AWS's margin is 20-25% (eg ). Is AWS's cost of goods sold really 25% lower?
Admittedly this calculation is tricky, because it's not clear what to include. Does that figure for AWS's margin include R&D? Should it?
I'd wager a guess that a properly-configured server won't actually cost that much to maintain. Unless you're frequently updating the OS and other stuff - but that's a software issue. I don't expect the hardware will fail that frequently at all. If there's a HDD failure, a good RAID system (with tolerance for 3 or 4 disks failing simultaneously) will alert you to it, and you'd grab the spare HDD (you should have a few spares), and pop it in, and the RAID would recover from the failure. What other hardware components frequently fail? RAM? CPU? Not really.
The engineer who sets up the server should be someone who is a generalist, and a full-time employee who when not tending to the server, works on other stuff (like the product). Then you have someone on staff who is familiar with the system, and can fix something that goes wrong, without needing to ramp up on it first. I know a lot of programmers would love to set up a server. Any who enjoys buildings PCs (which many generalist programmers do) would probably love to have a one-month project where they pick parts and set up a powerful server.
In my day job and my side projects, I’m 100% paying for AWS for 99% of our workloads. I choose to let Amazon deal with all the staff issues and all the other, to use their words, undifferentiated heavy lifting.
There’s some scale point at which you have to ask the question whether you ought to be in the cloud, but IMO if you think a cloud provider is only 50% more than you could DIY, you should probably be in the cloud.
Most people running serious gear have serious engineering payroll overhead.
Sure, some folks are managing something fairly static like game servers, but many of us work for bigco with two dozen products and massive teams.
Even with thousands of ec2 instances we continue to move our on-prem infrastructure to the cloud because developer productivity saves us big money in the long run.
I’ve heard stories of companies having each PR spin up something like 20 ec2 instances to build up a whole deployment. A CICD design like that used to be a fireable offense. Now people see that and assume it’s saving them money because it would have bottlenecked before.
20 t2 mediums costs under a dollar an hour at list rates, and if you’re using a lot of AWS infrastructure can cost a lot less.
How much is the confidence that that PR will not break master worth to you? How many hours of engineer code review time would it take to achieve that same level of confidence in your build that spinning up 20 ec2 instances and running regression tests wins? How much would a failed deployment into production cost you, per minute?
Before you assume that it’s wasteful for any organization to throw cloud at a problem, realize there are many orgs for which a few hundred dollars of cloud compute spend per production release is an entirely reasonable trade off, not a firing offense.
That’s exactly the mental trap I’m talking about. Requiring 20 instances to try to mirror production instead of getting better testing in place is a sign of testing immaturity.
I have significantly less confidence in the products that use this testing strategy because it really means they don’t have much in place for testing infrastructure to stub things out, inject failures, etc.
And it’s not t2 mediums. It’s whatever is specced for production because we’re such good engineers that we want to test in a production like env, right?
> Before you assume that it’s wasteful for any organization to throw cloud at a problem, realize there are many orgs for which a few hundred dollars of cloud compute spend per production release
That’s not even close. It’s hundreds of dollars a day leading up to thousands per release with larger teams.
Though I id get a understanding of the basics of AWS out of it probably not the best use of the share holders money
 Especially if you're already in production with the thing you want to replace, and want to transition without downtime for your customers.
What a joke.
They think AWS will be cheaper. I hope they are right, but have doubts. Fortunately, they're doing this slowly and carefully, and if it turns out that AWS is too expensive, they should be able to move back or to elsewhere. Since what they're doing isn't even close to what AWS normally does, they're not that tied to AWS features.
Well that solves the age old question about trees falling in forests :-)
You could always fast forward the current state when the first external observer arrives. But I get there are time constraints - you can't just stop the real universe until it converges to a consensual state.
So if a tree falls in a forest and no one is around to hear it, it won't hit the sound code or the display code.
It's very weird that we haven't fully arbitraged $/instruction to a single (low) price yet (or storage/hosting, whichever).
If only there were an Uber for unused cycles or storage... let everyone turn their unused capacity into mini AWSs with a common interface and safety and reliability guarantees.
Maybe there are too many barriers, like security.
This is exactly what Sia.tech is building for storage. A competitive marketplace for storage providers where anyone can set up a host and start selling their spare storage space. It's not completely finished yet, but it's pretty close.
A bunch of companies are already building products on top of it (including me)
The team is currently working hard on performance upgrades. On regular consumer hardware you can currently upload / download data at a rate of about 500 Mbps. The next release is expected to improve this significantly.
Here's an introduction article which explains how Sia works with a bit more depth: https://support.sia.tech/article/dk91b0eibc-welcome-to-sia
My own product is a file sharing website: https://pixeldrain.com. It uses Sia to store large files because it's cheaper than conventional cloud storage. I plan to make it possible to download directly from Sia hosts as well so I can save on bandwidth costs too.
How does Sia prevent hosts from precomputing the checksums to fake they are behaving but erasing the data itself? Does it checksum over random ranges of data?
Which source does it use for entropy so that the network remains distributed but nodes can't predict the ranges? Does it use the last block nonce?
Which checksum algorithm does it use? Is care taken as to not be vulnerable to prepend or append attacks from hosts who intend to host data partially whilst pretending they are hosting full data?
We do probabilistic proofs, so we have the host provide us a small random sampling of actual data (so the host can't rely on precomputing), plus a proof that this actual data is what the contract says the host should be storing.
See chapter 5: https://sia.tech/sia.pdf
When uploading data the renter (that's what we call the node which pays for storage) computes a merkle tree of the data which the host should be storing. When a contract is nearing its end the host will enter a proof window of 144 blocks (1 full day) in which it will need to prove that it is storing the renter's data. The proof is probably based on the block hash of the block where the window started. The host stores the proof in the blockchain and the renter will be able to see the transaction. If the proof matches the merkle tree (which the renter has stored) the contract will end and the host will receive the payment and their collateral back. If the proof is invalid or was not submitted at all the renter can cancel the contract which destroys the funds in it. The host won't get paid and loses its collateral, but the renter also won't get their money back (to discourage the renter from playing foul)
There is some more info on this on the wiki: https://siawiki.tech/about/trustlessness and the website: https://sia.tech/technology. And here is some incomplete technical documentation: https://gitlab.com/NebulousLabs/Sia/-/blob/master/doc/Resour...
If you want to go more in-depth you can go on our Discord where lots of developers hang out, eager to help others to get started with the network :) https://discordapp.com/invite/sia
EDIT: The whitepaper is of course the best source of knowledge. It's quite old at this point but the core principles still apply https://sia.tech/sia.pdf
im sure you hear this a lot but... has anyone done a heads up comparison with filecoin?
Comparing with Filecoin is hard because there's not much information available about it. The rollout keeps getting delayed too. I know that the founder of Sia has criticized Filecoin's whitepaper a few times because it contains unsolved problems which could cause significant issues during the rollout of the network. Sia took a more conservative approach and worked out all the math before the development of the network started in 2015. Now, 5 years later, Sia has solved all the fundamental issues with the protocols and such and are working on upgrading the performance and building apps on top of Sia's core protocols. In terms of development Sia is about 3 years ahead of Filecoin.
will spread the word about Sia and Pixeldrain when the topic comes up :)
- how much data do you need to move to actually perform the computation
- whether the computation performance is expected to be reliable or if best-effort is acceptable
- whether there are confidentiality and accuracy requirements on the input and output of the computation.
Most software engineering teams are not working at a granular enough level to properly describe which parts of their computations and data are expected to be reliable or not (in availability, confidentiality and integrity). However this can impact the cost per instruction of a computation by multiple orders of magnitude.
Funny, this was our idea with first startup 18 years ago, federation of unused storage and redistribution. There were no takers as no one wanted to contribute their unused storage but wanted others. It was much easier to centrally locate our own storage and allocate to users who required it.
I would say also, the cloud is cheap if you can shut down or scale-down services when you need it.
IMHO if you ow your software stack, you can use the cloud to rationalize your spending (for instance, Netflix changed their default codec some weeks ago which made them save a lot of monet in egress bandwith)... but nobody can do this.
If your company runs prepackaged software, like an SAP, Manhattan, you don't have margin to shutdown services when they're not needed (at least by now).
Also, after a long time working in a datacenter, I am still ashamed how powerful bare-metal has become: For less than 30K you can get a 1TB server (without storage) with at least 40 cores.
Some vendors are even offering hardware as Opex in a pay-as-you-go seting to compete against the cloud.
(disclaimer I work for a Google Cloud partner)
Netflix's content serving is done from physical hardware they own and place in datacentres. I don't understand your point.
Isn't that how IBM ran their mainframe business 50 years ago? The more things change...
And because of this singular fact, startups cannot move to on-premise. You really dont want to manage snapshots, restores, backups.
Anyone can manage application servers.
Slightly off-topic, but it sounds like a single machine can't handle more than 150 concurrent players. How is this so?
Is Hypixel Minecraft that resource intensive?
What's the bottleneck — it is the CPU, or RAM, or the network latency(?) per machine, or is it something else?
> We push about 4PB/mo in egress bandwidth
With 70k players, 4PB / 70k is approximately ~57 GB per player per month. That divided by the number of seconds in a month (2.628e+6) is: 57 x 10^9 / 2.628e+6 = ~21.7 kB.
Over a month, on average, a single player consumes a bandwidth of ~21.7 kB/sec, or about 174 kbps. This is an extraordinarily low per-player bandwidth consumption. At 150 players, each machine would average 26 Mbps of network traffic. This is fairly low as well. Of course, the machines have to be capable of handling of possibly much higher peak usage, but even an order of magnitude more — 260 Mbps, is something a Raspberry Pi 4 (which support full-throughput Gigabit Ethernet) can do.
You're pretty accurate with the per player bandwidth. We measure it at an average of 200kbps/online player. We are incredibly compute-intensive, though, and these machines are all operating with E3-1271v3 CPUs, 32GB of RAM, and ~100GB SSD.
I saw you mentioned in another thread Minecraft performs better when the CPU has better single-threaded performance. I'm guessing, going forward into the future, you'd probably want to build machines with Zen 2 AMD Ryzen CPUs (or future Zen 3 Ryzens), like the Threadripper 3960X (which has a base clock of 3.8 GHz, 24 physical CPU cores, and costs ~$1200 — so $50 for each of those cores). Or, the AMD Ryzen 9 3950X, which is about $40 per core (when you get it on sale), and despite a lower nominal base clock actually performs better on single-threaded benchmarks (e.g. https://www.cpubenchmark.net/singleThread.html).
I'd like to chat with you about this; we charge a flat rate for dedicated connections and about 95-97% less than AWS on egress and are just starting to talk to people about it - it's how we picked up Zoom and 8x8 (bi-directional video has huge egress charges). Let me know if you are open to chatting, there wasn't a "exec team" link on your webpage to reach you directly.
There are many variations of this scheme but that's more or less the idea.
DigitalOcean gives 3TB bandwidth on a $15 2CPU/2GBMEM/60GBSSD instance. If you ran 1350 of them it'd cost you ~$20k/month and get you your 4PB egress within bandwidth allowance.
Your point does stand, though, that not all cloud providers have the bandwidth price gouging. I was mainly referring to GCP/AWS/Azure, who set the trend for the rest of the major providers.
DigitalOcean is a VPS provider, not a cloud provider, and that's pretty similar to pricing on the nearest comparable AWS service (Amazon LightSail), not something showing DO as notably better (1Core/2GBRAM/60GBSSD/3TB transfer @ $10/mo or 2Core/4GBRAM/80GBSSD/4TB transfer @ $20/mo.)
Though if you are optimizing price per TB of transfer quota, LightSail does best with the 1Core/1GB/40GBSSD/2TB transfer @ $5 instance size.
From the first paragraph in https://en.wikipedia.org/wiki/DigitalOcean
> DigitalOcean, Inc. is an American cloud infrastructure provider headquartered in New York City with data centers worldwide. DigitalOcean provides developers cloud services that help to deploy and scale applications that run simultaneously on multiple computers.
You might have your own cute definition of “cloud”, but it doesn’t match the industry so get over it and stop correcting people.
One of the challenges for us is, to be honest, we're rather spoiled having run exclusively off-the-shelf open-source tech on our own hardware. It's difficult to start paying per million DB queries, for example, when we've been paying a flat rate of $X/mo for the past 7 years. On top of that, our team is very comfortable operating and managing these services in-house, and while it would free them up to focus more on dev tasks, it's a tiny gain compared to the cost increase of moving to SaaS model.
Having said that, though, I'm definitely going to look into Outposts more since it seems much more usable than full-cloud, so thanks for bringing it to my attention!
( Interesting there is no official words on the website. But that is what Wiki shows. )
i find this hard to square with the evidence that the Netflixes of the world still find it worthwhile to pay AWS then. theres no way that they dont have the same problems you do. care to speculate on why they prefer to pay a vendor? (it seems the Dropbox story is the exception that proves the rule)