- Static frontend hosted on Netlify (free unlimited scale)
- Backend server on Google App Engine (connecting to Gcloud storage and managed DB via magic)
I realize I'm opening myself up to vendor lock-in and increased costs down the road (if I even get that far), but I've wrangled enough Docker/k8s/Ingress setups in the past to know it's just not worth the time and effort for a non-master.
In my experience, the issue isn't that Google will jack up the costs but that they'll deprecate their infrastructure and push the migration work onto you, often forcing you to reimplement major features.[0]
One notable example is how their NDB client library used to automatically handle memcache for you, but they got rid of that with Cloud NDB Library and forced clients to implement their own caching.
The sequence of datastore APIs I've seen during my experience with AppEngine is:
* Python DB Client Library for Datastore[1], deprecated in favor of...
* Python NDB Client Library[2], deprecated in favor of...
* Cloud NDB Library[3], still supported, but they ominously warn new apps to use...
If you're using the App Engine Flexible editions, it's really easy to not worry about vendor lock in or really even deprecation much at all. E.g. it's easy to run a basic Node, Python or Java backend in App Engine Flexible, making use of a MySQL or Postgres DB in Cloud SQL, so you don't have to worry about managing servers at all and you get all the benefit of automatic scaling without the semi-nightmare of running your own kubernetes cluster. Then even if App Engine totally went away you just have a normal Node, Python or Java app running against a MySQL or Postgres DB that is pretty trivial to migrate to another platform.
I still use GCP, but I avoid locking myself into their proprietary infrastructure when I'm writing new stuff. I feel like Google is far too cavalier about deprecating services and forcing their customers to do migration work.
It is hard to replace GCP's managed datastores because I really don't want to maintain my own database server (even if it's a managed service that someone else upgrades for me). So I've stuck to Google Cloud Datastore / Firestore, but I've been experimenting a lot with Litestream[0], and I think that might be my go-to choice in the future instead of proprietary managed datastores.
Litestream continuously streams data from a SQLite database to an S3 backend. It means that you can design your app to use SQLite and then sync the database to any S3 provider. I designed a simple pastebin clone on top of Litestream, and I use it in production for my open source KVM over IP. It's worked great so far, though I'm admittedly putting a pretty gentle workload on it (a handful of requests per day).
You don’t want to maintain your own database server, even managed by GCP, but with SQLite you have to maintain state on GCP Persistent Disks and backups to S3 using Litestream. Why do you think this is easier?
I don't have to maintain state on GCP persistent disks. I can blow away a server without warning, and I'll only lose a few seconds of data.
True, I have to maintain state on S3, but there's not much work involved in that.
If I was maintaining my own database server, I have to manage upgrades, backups, and the complexity of running an additional server. With Litestream, I don't have to manage upgrades because nothing bad happens if I don't upgrade, whereas there are security risks running an unpatched MySQL/Postgres server in production. Litestream has built-in snapshots and can replicate to multiple S3 backends, so I'm not too worried about backups. And there's no server to maintain.
What operational complexity do you see in Litestream?
SQLite is really great. By using it, you don't have to install and maintain another service, and you don't have to think about things like network security. From that point of view, that's clearly simpler.
But it also introduces a few challenges. It's not as easy to connect to your database remotely to inspect it, with something like SequelPro for MySQL. It's not possible to create an index or drop a column without blocking all writes, which can be annoying if your database is large. Database migrations in general are harder with SQLite because ALTER TABLE is limited. [1]
One last thing regarding losing the few seconds of data. If you use something like Google Cloud Regional Persistent Disk, then your data are replicated synchronously in two different data centers, which means you can lose your server, restart another one, and not lose any data. Can still be combined with Litestream for backup to S3 with point-in-time restores.
yeah, this is the more sane approach. Just use Google's replication/durability, and export to S3 when you want/need to change vendors. In this case, you wouldn't even need lightstream. Just SQLite.
If you can lose the last few seconds then yes that's fine. But for most applications I've been working on, we didn't have that flexibility (committed means durable).
I don't see any operational complexity with Litestream.io. I think that's an awesome tool. But it's not that different of managing PostgreSQL backups with something like WAL-E.
The complexity of managing your own database server only exists if you don't use a managed service. Then there is no server to maintain and they do all the things you mentioned for you.
I agree with you in terms of using what you already know best.
> If you're not already familiar with these tools consider using a managed platform first, for example Render or DigitalOcean's App Platform (not affiliated, just heard great things about both). They will help you focus on your product, and still gain many of the benefits I talk about here.
And:
> I use Kubernetes on AWS, but don’t fall into the trap of thinking you need this. I learned these tools over several years mentored by a very patient team. I'm productive because this is what I know best, and I can focus on shipping stuff instead. Your mileage may vary.
I actually spend very little time on infrastructure after the initial setup (a week of part time work, since then a couple of hours per month tops).
For comparison, this post describing what I did took nearly a month of on-and-off work. But I might just be slow at writing :)
Makes sense, didn't mean my comment as a criticism of your setup Anthony. The product and infra look very cool! Just highlighting that things can be a lot simpler for those of us with more mundane requirements.
cloud vendor lock-in fears are overblown. pricing and features will always be competitive between the big vendors. I suspect people waste a lot of time/money trying to be cloud agnostic.
Real vendor lock-in is when you have decades of code written against an Oracle DB and you're getting charged outrageous Oracle rates and it would also cost a fortune migrate.
Real cloud vendor lock-in is when you have decades of code written against a [cloud vendor] and you're getting charged outrageous [cloud] rates and it would also cost a fortune migrate.[sic]
A decade has to past first. Most start ups don't last 5 years. Statistically speaking he's right and if he's not, well, a project that lasted 10 years ought to be profitable so pay up. Not profitable? Then who cares that cloud lock-in broke the camels back. If it wasn't profitable enough to justify the investment needed to switch to another vendor then it wasn't profitable enough to begin with.
The thing Ive learned is that a lot of people have both a vested interest and a sort of stockholm syndrome with vendors (cloud or otherwise). If you spent tons of time learning AWSs special tooling, you are going to see everything as a nail if you catch my drift. Ive seen a few particular users here spend many threads defending their choices despite the often very logical criticisms levied against the "cloud everything" approach.
One thing I like to talk about to Cs is their strategy on capex vs opex, because honestly that determines quite a lot, but is often something engineers dont think about.
The ultimate “vendor independence” is racking your own servers in your own on-prem data centre with multiple internet connections. Very high capex, potentially low opex depending on scale. In the middle would be racking your own servers at multiple DCs. Less capex (you’re still buying servers, but not air handlers and power distribution), higher monthly opex. On the other end are things like GCP and AWS, where you have virtually no capex but relatively high opex.
And in the end, it really depends on how much you trust different vendors and how you want to manage cash flows. Racking your own servers reduces some risks (Google deciding to terminate your account on a whim, Azure pushing wild updates, Amazon jacking prices wildly) while increasing other risks (only your own staff are watching your hardware).
You are painting an incomplete picture. Between high (racking your own servers at multiple DCs) & very-high (your own DCs) CapEx options and low CapEx options (IaaS and PaaS), there is a middle ground that - unless you need specific managed services, the larger PaaS ecosystem and/or an extreme scalability - is to use bare-metal cloud providers. This approach combines multiple benefits, including bare metal's max. performance, full isolation / no "noisy neighbors", pretty much total control of the equipment that you rent, cloud-like elasticity, flexible, usually globally distributed, network architecture and reasonable pricing.
Yes. This becomes clear when the cloud costs rise to be the largest burn in your budget and the runway keeps getting shorter and you can't migrate away because your code has tendrils deep into every AWS crevice...
Any company after a decade is going to have growing pains.
Spend your early time working on your core business. If your core business isn't cloud agnosticism then you shouldn't be investing your resources there.
Vendor lock-in depends heavily on exactly what vendor you’re using and especially if it’s OSS API hosted on the vendor or a vendor API.
If you use something like AppEngine to run a Flask or Django app, you will not be locked in much because those are open source libraries with well known runtime options elsewhere.
Same to some extent with any sort of managed OSS database.
If you use something like Cloud Datastore or Firestore or DynamoDB , you are using a proprietary API and will have to rewrite all your client calls , or write an extensive shim, and probably significantly re architect to port.
Even in the “hosted OSS” option there are usually some vendor specific stuff but it can vary a lot. Something like AppEngine specifically used to be an absurd amount of API lock-in but has evolved over the years to be more of a general container runtime.
Cost involved really depends upon how you did it and the differences between what you're migrating to/from.
If all database access is compartmentalized and the two datastores are fairly similar then it can be pretty cheap. If you didn't compartmentalize it will be expensive. If their characteristics are different enough then your compartmentalization will probably fall down in some cases and it will probably be expensive, although not as expensive if it weren't compartmentalized.
I love this post. I'm a big believer that one and two man startups will continue to build more and more impressive products. My one man startup 42papers.com (A community for top trending papers in CS/DL/ML) has the following stack.
1. Firebase Hosting for the React frontend
2. GraphJin (Automatic GraphQL to SQL Engine) on App Engine for the backend
3. Cloud SQL Postgres for DB
Another way to do something similar would be to use Cloud Run https://cloud.google.com/run and that way you can avoid vendor lockin since you can move your manifests to another knative hosting provider or spin up your own K8s cluster and deploy knative
Interesting, thanks. I used to use Google AppEngine a lot and very much liked it, but haven’t touched it for years. Now, I like the idea of using Heroku better, and just pay a little more.
Heroku my feels cheaper when you think about how long you can punt on having ops proper person(s) & how much time you save rolling your own everything.
My experience of Heroku has mostly been the pain of migrating to a different platform once you grow to the point that their pricing (and abstraction) starts to act against your growth.
Heroku is great for general applications, but if you're trying to do something that isn't a standard CRUD app, it can really start to bite you in the arse.
Their DB pricing in particular is incredibly inflexible compared to AWS RDS. Among other issues we had with Heroku at my old job, was having a DB that was hitting its storage limits, but was miles away from hitting its memory or connection limits. There was no option but to upgrade to the next tier, with additional memory etc., even though all we needed was additional disk.
That's not to say that Heroku is bad, but like any tool, you need to be aware of the long term costs that are often associated with term convenience.
I used them both in the same time period. I liked GAE because it was basically free to use for low use web apps, but has scalability built in. I liked Heroku because it was just so easy to develop and deploy with.
If you haven't checked out App Engine in a while, you really should. Especially check out the App Engine "Flexible" editions, which make it really easy to run on App Engine withOUT getting locked in.
I run a NodeJS GraphQL server in App Engine Flexible, and it is basically just like running it in a Docker container. It's also pretty trivial to run in Google Cloud Run if I so desired, there is even a tool to assist: https://github.com/GoogleCloudPlatform/app-engine-cloud-run-...
If you're just now looking in to GAE, you should likely be using Cloud Run instead. My company is busily migrating everything there and reaping the benefits.
Converting (it's more of a conversion than a migration) from flexible GAE to cloud run is super easy, check out the conversion tool I posted in my previous comment.
Basically, your code shouldn't really need to change at all, it's really just your deployment scripts and configs that need to be updated. At their heart flexible GAE and cloud run are both just running Docker containers.
GAE Flex is super old at this point and I've never personally met someone who migrated between them (they're pretty different offerings imho). Moving between either GAE to Run has been pretty seamless though.
Agreed, would have gone with their managed app platform if I was using one of the supported techs. For search I use a $5/mo meilisearch DO droplet that took almost no time to set up and I never have to pay attention to.
Price and functionality. It’s incredibly easy to use, unlike AWS and Google Cloud. The downfall is that you have a bit less control, but that’s never been an issue for me. Their servers have been incredibly reliable, they offer managed databases now, load balancer, S3 compatible Spaces. Everything I’ve needed so far, predictable and affordable pricing, and none of the complexity.
App engine (and Google's cloud in general) is pretty fantastic. I find it much easier to navigate and use than AWS (as someone whose day job isn't running infra on clouds), and I would have gladly put my side projects in there and recommended it to my clients... if only it wasn't Google and its history of randomly locking people out of their Google account, thus the entire Google ecosystem, without appeal.
Some would argue that identity management is the real lock in anyway and while a business may mostly be abstracted from their cloud via Kubes, any Internal IT systems may be such a kludge that moving away is a nightmare hell
First of all nothing important, mostly stuff that's a distraction unless it becomes a need.
That said, using a static frontend cached on a CDN in general improves initial pageload and cuts down on traffic to your server by a lot. Netlify makes this easy if you want to use React on the client (with NextJS).
With AppEngine you get direct access in one console to all the bells and whistles of Google Cloud, basically the same as the other infra giants. AWS has even more bells and whistles but I find its console more annoying.
You can always add Cloudflare to the mix to cache static assets. This change is additive meaning you can start with a single Heroku deployment and if static asset traffic becomes an issue, you can create a Cloudflare account, configure DNS and be done.
Well if you're deploying a static site they are the same, but that's still not the whole picture. They have support for lambda style "serverless" functions and Fauna DB[1], and can bundle functions with apps automatically for some tools like Next.js to do server side rendering for dynamic routes[2]. So while they don't support quite the same level of custom stacks, backends and DBs, they do provide tools that enable full stack applications.
That's right, I'm exaggerating. At current rates I'll hit that limit at 7.5MM pageviews/month.
I've also paid for extra builds once or twice in the past (automatically charges a few dollars when you cross the build time limit), and I pay them $9/mo for analytics.
Are you happy with their analytics? I have no experience with website analytics but I find their offering a bit too minimalistic. I wish for the following features:
- Break down page views into unique visiters for all views (per site, per country etc.). (or some other comparison between those).
Agreed, they're extremely mediocre, but worth $9 to me. Seems like they have better analytics available at a "custom" price, which I assume would be quite expensive. For my use case, minimal analytics at a minimal price works fine.
It would be if it was interpreted as an actual Roman number. But in this case it's treated as M x M.
The actual Roman version of a million is an M with a bar over it, where the bar means x1000. But that's not an ordinary character, so wouldn't work for this purpose.
I oversimplified a bit. I have a low-traffic "admin" interface that's rendered server-side. The people using that are my direct customers and are the only authenticated users (they auth in a traditional in-app way).
I also have a high(er)-traffic frontend on a CDN which is used by their customers. User writes there are purchases/payments handled by third(fourth?)-party SaaS.
Many sites have low write:read ratios and don’t leverage that fact in their architectural choices. Availability for maintainers is often less critical than for consumers, and your life is better if you build that in.
My current employers still haven’t learned this lesson and think caching fixes everything.
Tons of reasons, but the main one is that cache is shared mutable state, pretending not to be. It has all of the ugly attributes of global variables, especially where knowledge transfer and reliability are concerned.
In a read-mostly environment you can often more easily afford to update the state all at once. It’s clear what the effects are because they happen sequentially. The cost of an update isn’t fanned out and obscured across the codebase, where or your team can delude yourself of the true system cost of a suspect feature.
I agree that caching is mostly a bandaid fix. But IMO if it's used judiciously -- namely in response of a demand for a quick fix of a performance problem -- they can be OK mid-term.
As for shared mutable state, yes, that's true, but what are the alternatives? Whether it's memcached or Redis or an in-process cache (like Erlang/Elixir have), the tradeoffs seem mostly the same.
> namely in response of a demand for a quick fix of a performance problem
Caches are addictive. The first one is 'free' (easy) and people start wanting to use that solution for all their problems, especially social problems (we can't convince team A to get their average response time to match our SLA, so we'll just cache them to 'fix' it)
They defer thinking about architectural problems until later, when they are so opaque that "nobody could blame you" for having trouble sorting them out. But I do. Blame them, that is.
I'm almost inclined to believe that the relationship is inverted from what many assume.
Amazon will bend over backwards to accomodate a company spending $500 mil a year on hosting (apparently what Snap spends). Sure it's only a fraction of its revenue ($386 bill for AWS), but half a billion is half a billion.
Google Cloud SQL. I say magic because locally I need a service key to connect to the proxy, but the production app doesn't seem to need anything but the internal google address.
The service credentials are supplied via an env variable that points to their location. Locally, you can provide the location directly or set the env variable yourself. When deployed, most GCP service environments just have that variable setup already and you don't have to think about it, so it feels a bit like magic. Same thing underneath the good though.
Anthony - if you're reading this, thank you!! To arrive at this architecture takes 100s if not 100s of hours, and to share it with the community is dang inspiring.
I was feeling a bit down on my projects, but this has me amped up seeing how the ultimate goal of working on features rather than deployment is possible, and very real!
I really enjoyed your post too! I would be interested in more details around the "100s of hours". I want to try a k8s setup like yours, but after investing those 100s of hours into my Flask setup it's hard to justify spending that time again for something else when this already works.
Also interested in the costs for your setup. My costs are in my other comment [1].
You should be able to run a flask app pretty easily in kube. Basically you would build a docker image containing the app then deploy it with k8 I believe
If they have the drive to create an entire SAAS app, how is following a a few tutorials on deploying it to a container in k8 too difficult? It only takes 20-30 minutes to setup and there are hundreds of videos and step by step walk through a that hold their hand through it start to finish. Maybe I am over estimating how difficult it is to build an app in Flask then.
Building, deploying and getting something that works fine isn't that complicated, but in my experience, without a strong background of the tech (the 100s of hours required), you will lose a significant amount of time, compounded by a high amount of stress and probably money / customer dissatisfaction, when a problem arises (even a trivial one), and that always happen.
Maybe helpful to you or others: I have a similar startup architecture, Django apps on K8, and found that AWS Fargate extracted a lot of the madness away. Still not a walk in the park but it does a lot of crap for you. Been $300/month servicing light traffic.
The author almost seems to apologize for having a django monolith.
But it's worth realising that one purpose of code organisation in larger companies is to mirror the team organisation. That's a constraint on code that can interfere with the best technical architecture.
You can do better with a monolith in a one-man team!
> one purpose of code organisation in larger companies is to mirror the team organisation.
That's one of the weirdest reasonings I've ever heard. What happens when you have to downsize that team? But yea, shoehorn each individual contributor to say a single microservice out of a hundred and you'll wonder why your software doesn't develop fast - everyone's too tired trying to understand what each abstraction does that is meaningful so they spend less time understanding how the pipeline works and how to integrate into it.
> one purpose of code organisation in larger companies is to mirror the team organisation.
Sometimes, the organisational structure drives the code structure (Conway's law [0]). I've seen real world consequences of this, where the disconnected system stovepipes in a large organisation reflected the team structure of the organisation's purchasing function. The purchasing teams didn't speak to each other, so neither did the systems they purchased. The systems had separate support contracts, incompatible upgrades, and each one was a wholly distinct integration target, if you were a third party.
How do you start learning this breadth of software engineering? I consider myself good in the python / django space, but where do I start with learning these infrastructure technologies? I find that I use them once or twice periodically, and then don't touch them for so long, so I forget much of what I have learned.
When you are working on a project, if you hit the edge of your current knowledge / skills, push just a little bit further when it’s something that interests you instead of just aiming to hit the basic requirements / lean on other people.
This minor effort compounds over time; do it for twenty years and you’ll be an expert in multiple disciplines and also an expert in how to tie them all together into one cohesive whole. Aim to be a “T-shaped” person, and just expand over time.
At least in my experience I can read about different architectures all day and sort of understand them, but I only really "get" it once I find a non-toy problem I need to solve and attempt to apply the knowledge. Then you see how it really works and form hard skills which stay with you.
Absolutely this. Wanna learn French? Go to France and live there for a year. Wanna be good at spinning up infrastructure with Terraform? Take an infrastructure job at a company that uses terraform - a start-up if possible, so you get to solve all of the problems. I wanted to learn Terraform and Kubernetes for years, and no amount of books or online courses really helped. Taking a job at a start-up fixed it. In fact, our stack is spookily similar to the one OP posted, Which is validation and also admiration because this person did it solo.
I find it is best to do the tutorials. The really basic ones. First one tutorial or article, then another and another. Don't get distracted by using it on your own project yet. Do more and more tutorials. Read the docs. Not just the getting started guide. Read the docs for like 2 days. Then get a book and read that.
It may also be helpful to share some details on effective ways to be curious. I’m a curious person too, but in the early days I just didn’t know where to start.
My advice:
- There is no defined learning path yet (to my knowledge).
- Start by reading the GitHub readme of technology in these articles (ex: nginx or kubernetes).
- If interested, try to spin up a tutorial app.
- Try to make something useful. Maybe this is a spin on a tutorial, or something novel. This is the hardest but best way to learn.
Finally, I’ll add that many folks learn these skills on the job either directly or having worked in proximity to new tech. It does seem this was how the author learned.
Hope that helps! I’m sure others will have great advice, too!
Document everything in excruciating detail - I go so far as to record all the commands I run; and when complete o destroy the machine and start again (or use a separate system) and verify that I accurately recorded every step.
You can add additional text about why you did certain things - and then store the data in a wiki or checked into git or similar so you can find it when you need it.
That's all great advice. What I find though is I don't do it enough for it not to change under me. Example:
1. Did a project on digital ocean, just ubuntu and node
2. Year later, Did a project using meteor, spent way too much time trying to get it all install with Vagrant (so all info from 1 was not useful)
3. Year later, Changed meteor setup to use docker ... so had to learn docker (so all info from 2 was not useful)
4. 2 Years later, Tried to do something with AWS lambda (so all info from 3 was not useful)
5. 1 Year later, Tried something with Apollo (so info from 4 was not useful)
And to be honest, none of the projects' various needs are all that different. I feel like one "good" solution could have, should have, should now exist ... but I haven't found it.
I guess I kind of feel like people who learned Rails back in the day found it met all their needs and they were able to do 50 projects on it. What is that thing today that if I learn today won't be out of date in 1-2yrs?
Not OP, but I've been keeping a "journal" repo for the last 4 months. There's a single dev.md file in there, where I separate each entry with "---". Whenever I encounter an issue or learn something new, I document it for later reference.
It honestly depends on what it is - sometimes in the repository, for a wiki it’s stored in the wiki itself, otherwise it might be as simple as a text file in a web directory.
What has helped me most over the years is working in smaller companies, where you necessarily need to take on more responsabilities.
My first job ~2005 was at a small shop with like 4-5 people and around 20 physical servers under our control and the same amount on-premise with clients (mix of Windows Servers, Linux distros and BSDs). We did have a sys admin person, but he was only responsible for the servers themselves and the base configuration. Everything application related running on it was our responsilibity as developers.
And after that, in the following jobs and as a freelancer, there were I wide variety of things I had to ramp up on quickly. Different build processes, application monitoring, backups, different cloud providers, hidden costs, etc.
Also I have been keeping a "Today I Learned" journal, where I just put small comments and snippets. It is hardly ever any deep insight, but for the most part "to do x in framework y solution z worked". It is also mostly a write-only journal. Just writing things down helps a lot with memory.
The most natural way is to join a startup that is scaling. You can of course learn by doing it yourself on the side, but in practice "learning by doing" on the job is by far the most effective in my experience.
I also hope we don't need to know all this stuff in the future. It's pretty really low-level and it's much better if we can focus more on creating differentiation and building your actual product. (Full disclosure: I've founded a startup that's trying to do exactly that, so I guess I'm biased!)
Addressing the implicit "am I doing career wrong?": there's folks that find working broadly across the whole stack really compelling (I am one of them) that will visibly have exposure to a _lot_ of stuff. Those folks (I am again one of them) are likely looking at your work and worriedly feeling as though they are Doing Career Wrong because they don't have your depth. Apologies if I'm merely projecting; my hope is that this is supportive.
For me the magic trick has always been side-projects. I have a lot of them, and each one is an opportunity to learn new tricks.
(Over time I've learned that it's best to avoid side-projects which have user accounts and store data on behalf of other people, because that's not a side-project: it's an unpaid job.)
Work somewhere where you're the big fish in a small pond. You're forced to wear a bunch of different hats and learn multiple tools across multiple business functions.
Definitely. There's a sweet spot, where you're the go-to guy, and you leverage that for work-life balance (because no one else can do what you do), but it also means you're leaned on a lot to move things along. It definitely requires some fortitude and ability to manage time and expectations (and stick up for yourself).
For me, it's mostly learning by doing. At my day job as well as with my hobby projects. I initially started learning about the containerization and Kubernetes stuff mostly out of interest before I realized that there is a massive benefit to them, even during the development phase. I guess that's the reason why they are parts of most DevOps toolkits.
While hobby projects can be a great start, the best way to learn is in a team of experience coworkers. The basic concepts of something like Kubernetes are very easy to grasp, leading people to believe Kubernetes is easy and completely missing the giant complexity the system introduces (that's way many people on here say its overpowered for 99% of projects, which I tend to agree on). Even with seemingly simple things like Docker, there is a massive amount of depth that's in my experience very hard to find in blog articles or YouTube tutorials.
That being said, if you have to chance to learn about such things from your coworkers by applying them on your day job, I think the best choice is stell do have hobby/testing projects and combine the learning by doing aspect with some good books. I also recently learned about two YouTube channels that do a pretty good job with explaining such tools and applying them to the real world in a beginner friendly way. [1][2]
> I find that I use them once or twice periodically, and then don't touch them for so long, so I forget much of what I have learned.
This is why I "write". I started a decade ago capturing short notes for myself about the technologies I use. Writing it down helps me remember it in two ways. First, the act of writing (primarily by pen) is proven to increase your memory of a thing. Second, I can open my notes for step-by-step reminders.
You don't have to blog publicly. Checkout the Zettelkasten method if you want to use Index cards. Keep a set of Markdown files in a private repo. Whatever floats your boat.
If you keep notes in a notebook I found that labeling mine as "Stray Thoughts" was one of the best things for me. That prevents me from moving away for that notebook trying to categorize my thoughts. If they are just stray thoughts, I can put any random thought in that same notebook. The same thing works in a set of text files or a Zettel.
Nothing like bringing down the entire production cluster and all services with it on a Friday afternoon due to a seemingly innocent "hotfix". Big learnings at the time, but now these make for good stories.
I learned most of these tools at my day job through some catastrophic failures. From my experience, failure has always been the best teacher.
Kubernetes just happens to be a great sandbox for failing hard :) Lots of stories here: https://k8s.af/
However, I wouldn't reach for tools that didn't solve a problem I truly have, be it cost-effective scaling (my day job), or reusing what I already know best even if unconventional (my SaaS).
I guess what I'm trying to say is: focus on solving your immediate problems first with the tools you already know. Your toolbelt will expand without you realizing it.
I don't have a complete answer, but so far documenting things as I go about doing them helps, especially if I write down what I tried, what went wrong, what worked and why. Its a lot while starting off, but over time as the concepts sink and become habits, my docs move to higher abstractions automatically and then it is mostly clear. The key words for me are 'train of thought'. The solution (the how) is important obviously, and always useful when quickly referring, but when making bigger changes it is more important to remember th why.
It is hella time consuming, needs dedication and practice and good tools (I couldn't start without org-mode myself)
Work at a company that does DevOps well; this is a pretty common deployment pattern because it’s easy to test and gives a lot of flexibility. Many of the things OP is describing are only things you learn in a role with a production support component, which is where you find the tiny details like your health checks were inadequate and driving error rate spikes in certain scenarios without a specific config option, etc. Many of them are things that only pop up when you have some scale to deal with over a period of time, which makes it hard for hobbyists to pick up if they’re not on a team with prod support responsibilities.
Give yourself an ambitious pet project that requires you to learn and practice new things. Add to it over time so you continue to revisit the project with new requirements. And read all the docs, not just the parts you need to know.
My strategy over the years has been to build a whole lot of otherwise useless side projects with incrementally different stacks, optimizing for doing it well with different technology instead of cobbling them together quickly.
My one-person SaaS architecture with over 250k users:
* Flask + Flask-Login + Flask-SQLAlchemy [1]
* uWSGI app servers [2]
* Nginx web servers [3]
* Dramatiq/Celery with RabbitMQ for background tasks
* Combination of Postgres, S3, and DigitalOcean Spaces for storing customer data [4]
* SSDB (disk-based Redis) for caching, global locks, rate limiting, queues and counters used in application logic, etc [5]
I like how OP shows the service providers he uses, and why he decides not to self-host those parts of his infra. Also, there's a large up front cost involved for any stack (Rails, Django, k8s). I'd be interested in a more detailed writeup with configs, to try out OP's auto-scaling setup. My configs are linked in the gist below [2] for my non-auto-scaling Flask setup.
I spend about $4,000/mo on infra costs. S3 is $400/mo, Mailgun $600/mo, and DigitalOcean is $3,000/mo. Our scale/server load might be different, but I'm still interested in what the costs would be with your setup.
I'd argue that just about every infrastructure that looks like this benefits from Kubernetes (that you're not setting up and managing), and that's a lot of them. The biggest problem is that not enough people have boiled down Kubernetes enough to look like heroku yet. Google Cloud Run is possibly the best example of what Kubernetes can look like/run like -- it runs on (probably a relatively heavily modified) KNative, a project that runs on top of kubernetes.
The "point" of Kubernetes is to drop the difficulty of building a service like Cloud Run to zero. It drops the cost of building a Heroku down to zero. I'd bet my bottom dollar that fly.io and render are running on Kubernetes (maybe they mentioned it somewhere already and I just missed it). With the right cluster set up, building one of those platforms (or others that I won't mention) is almost as simple as setting up stripe checkout and writing a web interface to turn form fields into JSON fields and send them to a kubernetes cluster (possibly with hard multi-tenancy! not to get too into it, but you can literally provision kubernetes clusters from kubernetes clusters, ephemeral or otherwise).
No other tool in the devops world except for maybe the initial orchestrator wave (ansible/puppet/salt/chef) has been this much of a force multiplier. Ok, maybe that's hyperbole, but if adhoc-bash-scripts->ansible is 1->2, Ansible->Kubernetes is similarly 1->2, especially if you consider baked in cloud provider support/innovation.
But here's the secret -- perversely, I'm happy deep down that everyone thinks k8s is too complicated/is a buzzword/isn't worth the effort. All it means to me is that I'm still ahead of the curve.
I think time will tell that ANY cluster setup is a high risk for smaller businesses. The amount of knowledge needed to run it in production is much higher than to set it up.
I have setup Kubernetes but never run it myself in production. But I work with a Hashicorp equivalent setup with Docker, Nomad and Consul. I also have several Service Fabric clusters. I think it all is just a complete waste of money. Buying services/metal in the cloud or going serverless or whatever is cheaper and with much lower risks for most minor businesses.
Ah sorry I think you may have misread the claim. K8s is my secret weapon, it’s useful for most teams when you don’t set it up (managed offering).
It really depends on what you do with that cluster, if all you do is run deployments with services and ingress (the equivalent of ECS + ELB), its easier than doing the terraform thing IMO. It’s certainly easier than cloud formation and building AMIs.
I completely agree that buying metal in the cloud is cheaper (that’s part of my secret, shhhh).
I disagree on server less because I think it’s only a matter of time before it becomes a frog boil scenario. Bit of a tin foil hat theory but I think there’s a reason companies want you to move to serverless — the complex flows you build make it sticky, the hidden costs are everywhere, they can simply raise the price at any time, and they scale cost with your business. I think we’ll see more and more of the “I got a thousand hits in a second and my bill was crazy because X” once this deluge of free credits runs out. Also definitely not sure about serverless for small business, it’s such a new paradigm, maybe if you get prebuilt flows but it’s definitely simpler to set up dokku/caprover on a droplet.
Can you point out any tutorials or guides on how to set up kubernetes simply? I'm wondering what the best way to deploy my app with minimal effort is - you make it sound like kubernetes is the answer.
If you're developing apps in containers, there are platforms that make it really simple to deploy. For example, Cloud Run (Google), Fargate (AWS), or Heroku. What the parent comment is suggesting is that building a platform like Cloud Run, Fargate, or Heroku is much easier on Kubernetes. Kelsey Hightower (principal engineer for Google Cloud) put it well when he stated, 'Kubernetes is a platform for building platforms.'¹
You can still deploy apps directly onto Kubernetes and it works very well for this purpose, but it will require a lot more learning than one of the platforms listed above. If you enjoy learning, Kubernetes is an incredibly powerful and satisfying tool to have in your kit, and the initial learning curve isn't as steep as some make it out to be. If your goal is to deploy apps as quickly and simply as possible however, go with one of the pre-existing platforms.
If you still want to learn Kubernetes then a really great book is Kubernetes Up and Running. It goes into just enough detail at the right point in time to make it simple while still being useful. If you do a bit of Googling, you might find a free copy of the book that used to be offered by Microsoft to promote their Azure Kubernetes Service. Otherwise there's Kubernetes the Hard Way² but that's more focused on administering the Kubernetes cluster itself, rather than how to use the cluster to deploy apps. You'd need a pretty convincing reason to administer your own cluster rather than spinning up a managed cluster on GKE or EKS.
My advice:
- Grab a copy of Kubernetes Up and Running
- Install minikube on your local PC
- Experiment and have fun learning
If you want to go from machine to cluster in no time, use the following:
- kubeadm (read the logs)
- k0s
- k3s
If you want to understand everything though, the way I started was:
- read the kubernetes documentation front to back
- go through setting up a cluster the hard way (look up the kubernetes the hard way guide)
- set up ingress on that cluster (nginx ingress), and make sure you understand the interplay between kube-proxy, ingress, services, and your deployment. The actual flow is more like kube-proxy->iptables/lvs->containerd but you want to be able to “think in k8s” (i.e know where to look and have an idea what to check when something goes wrong).
- install cert manager for free https certs
- tear that cluster down, and set a cluster up with kubeadm
(This will be much easier, and you’ll know what it’s doing because the logs are great and you’ve done it before)
- tear that down and make a cluster with k0s/k3s
I want to point out that it really depends on what your goals are. Kubernetes is useful to me because it’s a one stop shop for a wide range of possibilities.
If you just need to get an app up as fast as possible, install caprover/dokku on a DO droplet and git push to deploy.
> The problem is mitigated somewhat by our orchestration system. The control plane for Fly.io is Hashicorp Nomad, about which we will be writing more in the future.
It's clean and simple (to me). Billing is in one place, nicely separated by projects. Monitoring & Logging is already built in. No need to span multiple dev SaaS tools. So far managed to avoid Redis caching because Golang + Postgres is fast enough, so far. But if you need Redis you can DIY on Compute Engine or try Cloud Memorystore (configure the memory to a low amount for cost savings).
Google Cloud drawbacks: Additional charges necessary to connect Cloud Run to VPC (via proxy instances). Load balancing on GCP ain't cheap ($18/month, though to a larger enterprise that is a rounding error). But in my setup I didn't need these things.
As shown above, I have heavily optimized for cost and simplicity in my setup.
I find the UI to be too slow for the purpose it serves. I'm fine with a slow-ish app sometimes but not when I have to use it often and during incidents.
I also had a few instances over the course of several years where policies seemed to have transparently broke because a system metric name changed. It's possible the issues were of my doing but I don't think they were.
Lastly Monitoring, Tracing and Error Reporting are too disjointed. I wanted a solution that created a more holistic view of what's going on.
Pretty happy for my use cases. Stackdriver is no more and now fully integrated into Cloud Console. Error Reporting is useful for production errors, even on frontend. Monitoring uptime is quick and easy. Metrics alerting has been okay. Mobile app alerts via GCP app.
Cool. Like you said, having everything under one umbrella is very nice. It's a big reason why I stuck with it for so long. Ultimately, it just didn't fit well enough for my use cases.
Cloud Run supports websockets now fyi. This is the approach, more or less, that I use for my personal projects. With CR and Firestore are literally free for my use cases. I only pay a few cents to host some static assets in GCS.
Good article. My comment might be off topic in which case, please ignore.
If you have a one-person SaaS company, how do you get past customers’ resistance to a single point of failure, namely you?
Do you pretend you’re not just one person? Do you only have customers who could handle losing the service when you, say, run away to meditate on the mountaintop? (Or get run over by a beer truck, or whatever.) Is there some non-obvious solution?
And — back on topic — is the architecture part of that sales pitch? “I’m just one dude, but look how simple this is, it can run itself if I am devoured by mice!”
This is a great question. It never ever comes up during the sales process, I don't go out of the way to show them my (lack of) org chart.
If you do customization for larger customers (and you should), like boiling a frog, one day you become mission critical to their business. Once they recognize that, then they will start asking questions. Now they're kinda stuck with you. You did charge enough money, right?
At that point you must appease them with a plan. Have their code and database on a dedicated server instance. Have them pay for it and own the account (you just saved money). Make sure you're using their domain they control. Give them access to the source code. It's on the server, so that's easy. Write up a doc with how to access everything and all the frameworks and tools you use. After this, they will never bring it up again.
Worst case scenario, sell them the software outright. Price it much higher than you think they will pay. Then double that. Trust me, I've done this a few times.
I've made a living running one-person SaaS sites for over 20 years, many of them in the same space as OP (analytics stuff). I can't recall a customer ever asking how many employees there are. It just doesn't come up. I don't think small business customers care. Maybe it matters for more enterprise salesy type businesses.
I'm in a very similar situation - I've ran a one-person, B2B SaaS for 20 years, and not once has anyone asked how many employees we are. I sell a lot in the EU defence sector, and even there is has never come up. Sometimes customers ask about product end-of-life policies, but they are always satisfied with my responses.
I run a b2b saas that is used daily by some fortune500s. I have been going through rigorous vetting processes but have only once been asked about this. I lied and told them it is no problem. I usually pretend my company is larger than it is (me).
It gives me a headache now and then, and am now in the process to get someone else onboard.
I've had some cases of customers asking "what happens if your company dies" for cloud services even knowing the company I work for is +30 employees, so not just for "one person company", some people seems to care about what happens to their data / service. I'm just wondering if that comes from conferences or talks that points out that a lot more/less.
A suggestion, hopefully helpful: a better approach to securing your admin console than simply layering 2FA onto it would be to expose it to a private WireGuard network. One very easy way to do that is with Tailscale, which will hook up to your GSuite authentication --- Google's 2FA stack will be far better than anything you'd likely build on your own.
Tailscale is disgustingly simple to set up. If you're a product person, it's actually upsetting how easy they've made it to get set up.
I setup Tailscale for my home server the other day, can vouch for how disgusting it is. I thought it would take at least an hour, but no, took about 5 minutes. Was, literally, physically taken aback (in a good way). Highly recommend it.
An important distinction here is that PanelBear (OP's One Man SAAS) is something I would define as an "analytics" SAAS and as such has requirements that are way above what a typical CRUD SaaS might have.
That's not to take anything away from the excellent writeup, but more so that someone who is thinking about starting a SaaS maybe doesn't jump to the conclusion of "I should go learn Kubernetes".
Yep, and their recommendation on Render is a good one. I used it for a SaaS / fintech app I built and it couldn't be easier to work with. Great support too if you need it.
This is probably not the best place to ask this question, but as a solo founder or just to reduce costs/time are there some standard free software packages that are used when creating sites? For example most sites need a user sign up mechaism, a authN and authX mechanism to gate access to different pages. Are there open source projects that provide this? Or do site owners develop these from scratch every time?
Different framework implementations of a CRUD website with authentication.
A lot of popular web frameworks have basic authentication out of the box & easily allow you to tie free authentication with accounts like Google, Microsoft, and many others. There are also paid alternatives that may save you more money than the free ones if you need advanced authorization controls or other features.
Most devs probably have a collection of ways they've done it in the past that they pull from when needing to adjust from the default framework's methods.
Depending on your software stack, there are several open source projects that can provide functionality out of the box.
If you don't mind paying just a little bit of money, you can get even more out of the box SaaS like functionality with tools like Jumpstart (if you're using Rails).
Depends on the framework you use. Rails and Django have a lot of this built in or as widely used plug-ins. In node.js you have to do a lot from scratch every time, or you can use a service that provides this like Auth0
Very interesting. Do you mind sharing the hardware specifications of your servers? Are you confident that FreeBSD is a secure OS to face the internet, say, as compared to OpenBSD?
Ilja van Sprundel answers your question by comparing the number of kernel vulnerabilities since 1999 of the BSDs and Linux. [1]
I don't think FreeBSD, even well hardened [2], is as secure as OpenBSD. After all, OpenBSD's main focus is security. I use OpenBSD for orchestration and monitoring, and I have an experimental setup of OpenBSD with VMM but they crash sporadically, so I'll wait a bit.
At any rate, my goal is to have two heteregenous paths (maybe OpenBSD, FreeBSD) or (Solaris, Linux). This way I could simply shutoff the vulnerable path when there's an unfixed vulnerability.
BTW, I have the FreeBSD hardening and setup scripted, which you could add into the ISO in `/etc/installerconfig`, or downloaded from the orchestration and manually ran with `bsdintall script myinstallerconfig.sh` if you wish.
I'll keep the hardening script on mind. I have strong interests to spend more time on servers, but at the moment it is difficult to find time.
If vmm(4) is stable on OpenBSD, it can be used as an alternative to jail. Because OpenBSD has small footprint, a virtual machine of OpenBSD through vmm(4) probably will not require much more resources than a jail instance, I guess.
I have been bitten by OpenBSD once, though. I was traveling with a laptop, where OpenBSD was the only OS and the filesystem was encrypted. However, there was a hardware failure, that the data on the hard disk was corrupted. I lost some work and some files, and managed to recover the rest of the files before the hard disk died.
At the moment OpenBSD still does not support a filesystem that implements file checksum. I think it can be a limitation.
Good on them. I wish I could use K8 as effectively as the author, it is an incredibly overwhelming list and an impressive range of knowledge.
In my situation I am finding the lack of consistent environment a reoccuring issue, the developer environment does not match production. However I kept it simple with Google App Engine Standard and Flex environments, I found the deployment process simple and was enough for me (at the time) - however I am finding we are going to step into dockerland; however I feel like it is very over my head!
I've done a few of these for people at home (albeit not quite so complex) and for myself. I built the application/infrastructure monitoring systems where I work as well. As one poster said above, document everything, even the commands. It works, although it is tedious. But there is a certain joy using something you created, even if it is something of a "labor of love" to maintain it.
I want to get out of IT after 20 years, but there is no way I will stop tinkering with OSs, Raspberry Pi IoT devices, SoC, light coding, etc. It's different when it's a hobby than when you're faced with time constraints, budgets, and nagging bosses.
A project I'm about to start at home is taking an existing 1080P dash cam (front and rear) that features great night vision and hack it using a Raspberry Pi that handles motion detection, sends stills, and uploads to the cloud. Sure, I could go buy an extant system that just works, but what's the fun in that? It's like Legos. I could go buy my kid a fully-assembled car or spaceship, but I'd rather him learn how to follow instructions, see cause and effect, and experience the pride of a job well done. YMMV. There is something really uplifting in seeing "complex" technical stuff working that you yourself built. It doesn't even have to be as good as existing tech.
It's probably a function of to whom a one man band can effectively market. Consumers don't buy SaaS and a single person couldn't afford a consumer level brand ad campaign anyway. Big companies mean big company sales cycles and demands - procurement departments, compliance etc. Other SaaS companies are easier to find in places like HN, and they're maybe more predisposed towards buying from lone hackers like this one.
In the beginning my startup only had 2 people. A designer (my friend) and me (a developer).
For our frontend we used Webflow. My friend was able to create the entire marketting site, and all the app UI's without needing help from me. Webflow is an awesome tool for that sort of thing.
For the backend, I built a simple Node/Express API and hosted via Heroku.
To this day, everything is still running fine and the API is processing roughly 200 million requests a month. The total cost to host that on heroku is $50/mo.
You can definitely have a simple stack but have it be highly scaleable!
Super interesting! Definitely feels like a lot of fairly low-level tech to have to deal with for a one-person company, but I guess that doesn't surprise me any more :)
In the ideal case it should be constant and as close to zero as possible! Of course we don't live in that world for arbitrary scale, but surely "a one person SaaS" should be able to do without so much low-level tech and infrastructure work.
It seems to me that even when you outsource your infrastructure to a major cloud provider, you're still spending a lot of time yourself setting everything up.
I'm certainly not criticising Anthony here – what he's done, especially in terms of product development, is remarkable – but just thinking about the industry at large.
I also tested Firebase but quickly ran into issues.
Firebase's database is a NoSQL database, whereas almost all my data for the apps and (micro-)SaaS I was building had relational data.
Their frontend data fetching felt clunky and did not fit my requirement.
Also, the fact that Firebase is a closed-source backend felt scary in the hands of Google (https://killedbygoogle.com/).
Firebase's problems and my desire to have the perfect backend made me build an open-source alternative to fix all the shortcomings. PostgreSQL instead of NoSQL. GraphQL instead of REST. 100% open source. That is now https://nhost.io.
Thanks for sharing. On first impressions, I found Nhost's product to be really cool, so much so that I actually applied for the open position of Product Designer [Tahmid].
I was under the impression that Kubernetes was a complicated beast not meant for small teams / startups. What is the value of it in this monolith environment? Is the key to using it in a startup context to use it as a basic monolith auto-scaling orchestrator but no more than that? If you or anyone else here can comment about how to use Kubernetes strategically without falling into an unnecessary over-engineering rabbit hole, I'm willing to learn from you.
Regarding the rate limiting, you're load balancing into nginx services that you've configured to limit requests. Are they synchronizing rate limiting state? I can't seem to find nginx documentation supporting this. What value is there in this style of rate limiting, considering User X can send a sequence of requests into a load balancer that routes them to nginx boxes A, B, and C? The big picture that 3 requests were processed for user X gets lost. Your endpoint-level rate limiting, however, may potentially be achieving the synchronized rates if the redis servers in a cluster are synchronizing. I guess I'm asking about the strategy of using multiple lines of rate limiting defense. Is nginx-level rate limiting primarily for denial of service?
The horizontal autoscaler should be based on throughput rather than hardware consumption, shouldn't it? If the req/sec goes below a threshold, spawn a new service. Can anyone comment?
> From a technical point of view, this SaaS processes a large amount of requests per second from anywhere in the world, and stores the data in an efficient format for real time querying.
That is the closes thing to a number of requests I could find. So this architecture, no matter how solid, is somewhere between „way to large“ and „matches perfect“.
It seems like a solid breakdown on how to deploy your services to k8s and how to properly do CD deployments. But it does never mention whether it actually makes sense at the scale he actually has.
> I use Kubernetes on AWS, but don’t fall into the trap of thinking you need this. I learned these tools over several years mentored by a very patient team. I'm productive because this is what I know best, and I can focus on shipping stuff instead. Your mileage may vary.
This is a key point. I don't know Kubernetes, and for this kind of scale I'd probably use, say, Heroku. But if I did know Kubernetes, I'd probably use it as it would be one less thing I'd have to worry about if I had to scale up quickly: you never know if that little side project with a dozen users is going to become an overnight success.
> But it does never mention whether it actually makes sense at the scale he actually has.
What does "make sense" in this context mean? It sounds like you're assuming he chose K8s for the scalability, but scalability isn't the only consideration here. Familiarity of the tooling is the biggest one that he mentions in the post. He even goes so far as to say that k8s probably isn't right for everyone, it's just what he knows.
It's efficiently supporting a profitable application and requires minimal maintenance. That seems to accomplish the goals of "infrastructure", broadly speaking.
Depends on how much you make from a client per month. If your product is $19 - 99 or whatever you can stick, $2.99 or $5 really doesn't mean much just to keep things simple. We do the same for the projects we have. Droplets with custom ISO (created from the original project) takes about 5 minutes to deploy and serve with new credentials to the new users. These infrastructures are I see in these posts really don't align with most single fullstack devs out there.
A lot of people are going to jump on the "he used k8s and he doesn't even work at Google scale!" part of this writeup, but I think it's a perfect demonstration of the concept of innovation tokens [1]. He admits in TFA that clickhouse was the only new piece of tech in his stack, and he was already familiar with k8s et al - so he's able to focus on actually building the products he wants. I could see somebody unfamiliar with k8s (but very familiar with all other pieces of tech in the system they want to build) being able to learn it as part of a side project, if it's the only new thing. Where the wheels come off is when you've never touched k8s, postgres, aws, rust, graphQL or vue - and you try to mash them all together in one ambitious project.
> Where the wheels come off is when you've never touched k8s, postgres, aws, rust, graphQL or vue - and you try to mash them all together in one ambitious project.
In my experience (both myself and observing others) this is the cause of lots of side project (sometimes even startup) failure. Lots of people choose a tech stack that's far away from what they've worked with, so they never get past the "read the docs and try to get anything working" stage. For a real chance at completion it seems like the recipe for success is choosing a stack that's 1ish derivative away from a dev's competencies so they have a new and exciting thing to learn, but are able to continue progressing and adding value.
I am also a person that, prior to using Azure, was an absolute "Kubernetes is a big waste of my time and I'll just skip it" person. I wrote it off as predominantly "resume-driven". Now, having used Azure for about a year, I'm rewriting all my Azure infra to use AKS to better insulate me from the inevitable issues that come up when I GTFO of the Azure sphere as soon as our credits run dry. And, what I'm learning, is Kubernetes is a just-fine deployment/orchestration/management tool for containerized infrastructure that is _not_ a massively complex microservices infra. It's just a more streamlined approach to scaling and managing cloud-agnostic tooling/containers.
Innovation tokens is such a fantastic concept. When I brainstorm products I like to make a simple plan, and then map out "Whats new to me" to help me decide what I am really trying to accomplish. Prototyping a new idea is very hard and requires a lot of iteration, using a new tech complicates that even if its better. Alternatively, if the prototype is purely for fun, then new tech can be a great value add, as even if the prototype takes a turn for the worse (i.e. too hard to finish), you get satisfaction from learning some new tech. I've noticed since being very intentional about it, side project work has become considerably more enjoyable.
Picked up k8s as part of my a side project, that become my startup. I would say if you know docker, it's not hard. Especially when using managed google style. Setting up your own k8s cluster is a whole other thing..
That said I agree with the innovation token concept. None of this junk makes you money, solve a problem first.
As a one person company I find it not just helpful but a core principle to minimize the number of stacks/tools/services being used. Overhead of task switching and learning curves.
This goes against the HN trope that "you don't need Kubernetes unless you are Google-size".
It turns out Kubernetes is actually perfect for small teams as it solves many hard operational issues, allowing you to focus on the important part of the stack: the application.
The key is to stick to a simple setup (try not to mess with networking config) and use a managed offering such as GKE. We may need a Kubernetes, The Good Parts guide.
> It turns out Kubernetes is actually perfect for small teams
As long as at least one of them is an expert on kubernetes. In this case, the one person in the team is that person, and as he points out in the article, he's using it because it's what he knows.
That should be the takeaway, I think. The "trope" remains pretty sensible IMO; I've seen it first-hand, jumping on kubernetes without the know-how is a foot-gun factory, and that team ultimately gave up on trying to implement it.
I use DigitalOcean's managed kubernetes for one of my side projects that I did with a friend. Really happy with it. And it's actually cost-neutral: all you do is pay for the $10 droplet it runs on and you get the managed k8s at no additional cost.
I've done a complete 180 on this too, I realised I was reacting from my default position of hostility to new concepts rather than an honest appraisal. I am writing it up at the moment but I've been working on a 1 person SAAS MVP tutorial [0] and though I've definitely misconfigured something having the ability to go from git push to deployed to production with 0 downtime inside of 5 minutes with no manual steps is such a nice flow, versus my previous attempts of SCP and faffing around with services.
> Kubernetes aims to provide all the features needed to run Docker-based applications including cluster management, scheduling, service discovery, monitoring, secrets management and more. Nomad only aims to focus on cluster management and scheduling and is designed with the Unix philosophy of having a small scope while composing with tools like [Hashicorp] Consul for service discovery/service mesh and [Hashicorp] Vault for secret management.
> Nomad is architecturally much simpler. Nomad is a single binary, both for clients and servers, and requires no external services for coordination or storage. Nomad combines a lightweight resource manager and a sophisticated scheduler into a single system. By default, Nomad is distributed, highly available, and operationally simple.
Like all of Hashicorp's tools, they are more complicated and error-prone than they first appear, because they stuff too much functionality in one binary. But it does let you implement one piece at a time, so you can make incremental improvements as you need them.
What do you think is "too much functionality in one binary"? With Nomad I feel like the opposite is true: Nomad is just a workload scheduler. If I need service discovery I can add Consul, if I need secrets management I can add vault. Honestly curious by what you meant exactly and how Kubernetes does it better / easier.
Kelsey Hightower's 'Kubernetes The Hard Way' [0] "is optimized for learning, which means taking the long route to ensure you understand each task required to bootstrap a Kubernetes cluster."
My one man SaaS setup: t4g.micro (Free Trial) on AWS Ec2 - one mod_perl module + a bunch of python/perl scripts. ( https://poidata.xyz ). Startup costs so far=$1 (domain registration).
I believe Plenty of Fish (a dating site) was in that category for some time, though the founder did wind up hiring people before eventually selling it for $575m.
(Render founder) This is incredible work, and underscores the reason Render exists and is recommended by OP. Everything mentioned in the post is baked into Render already:
* Automatic DNS, SSL, and Load Balancing
* Automated rollouts and rollbacks
* Health checks and zero downtime deploys (let it crash)
* Horizontal autoscaling (in early access!)
* Application data caching (one-click ClickHouse and Redis)
* Built-in cron jobs
* Zero-config secrets and environment variable management
* Managed PostgreSQL
* DNS-based service discovery with private networking
* Infrastructure-as-Code
* Native logging and monitoring and 3rd-party integrations (LogDNA, Datadog, more coming this month!)
Cool write-up. I am a K8s hater, but I can see how this can work well for small projects with 1 developer. EKS definitely takes a lot of the maintenance headache, but there'll still be some down the line.
Interesting post. I would advice people against running a kubernetes / docker setup if you don't know it well. It's quite complicated and most small companies don't really need it.
As the author say, he already got a lot of experience of it so it worked out great for him but it is probably easier just to install the tech needed for a small company.
Unless you have something very special going on, the dependencies (like databases) are probably not going to be that many.
I was just reading the beginning of Arvid Kahl's Zero to Sold. He recommends using a tech stack that you already know and have lots of muscle memory with. I couldn't agree with him more. [1]
This tech stack looks over-engineered upon first glance, but I don't know much about the author or his product.
I use Kubernetes a fair bit whilst developing OpenFaaS and teaching people about K3s, but there is a whole world of development teams who aren't prepared to consider it as an option. One of the reasons we created "faasd" [2] (single-node OpenFaaS) was to help people who just wanted to run some code, but didn't want to take "Kubernetes mastery 101"
For a small app using a managed service like Cloud Run plus some cloud storage should get you very far. I saw that Heroku is still popular with the indie community, with the author of Bannerbear getting a lot of value from the managed platform.
I’d rather have the time to build new features for my user base than spend it learning how to use k8s or wrangling AWS through its abysmal console website.
Just that a simple stack that's boring can work well for a SaaS whose success is not contingent on huge traffic.
For many successful SaaS companies, if their model is not monetizing huge traffic through ads, traffic is really not a metric of success. And they can be financially successful handling only paltry traffic.
Lots of business-problem-solving apps fall into this space. Sure, maybe you only have 60 customers using your app. But if each one is paying you $1,000 a month...
Nope, just different. I'm a big proponent of the notion of 'innovation tokens,' which has come up elsewhere in this discussion[1].
If the project that I was referring to had been a 'just for me, just for learning' side project, I would have chosen one new-to-me technology. It might have been some JavaScript SPA framework. It might have been Kubernetes. Or maybe GraphQL. It might have been something completely different.
But, this project was absolutely critical for its users, and it had to be delivered on an incredibly tight timeline. I didn't have the luxury of playing around with anything new. And this is still the case. In fact, the project has grown in importance for its users and I have even less flexibility to play around with new-to-me technologies right now.
---
[1] Here's an example of me discussing this six years ago, although I was unfamiliar with the term at the time: https://news.ycombinator.com/item?id=9291437 ... Re-reading this comment, I'm actually kind of surprised how little I disagree with here. The only thing that has actually changed for me is that I use Swift instead of Objective-C for my iOS apps.
Starting with an easily scalable infrastructure, if the need for a scalable infrastructure is either unproven or distant, is not the best use current resources. It's more cost-effective and beneficial to use an out-of-the-box tool like Heroku and spend our time building the product that people will actually be paying for.
There might come a time where the hosting costs on Heroku are getting frighteningly high, but even then it's something that might not be high priority. If the profit impact of building and maintain a K8 infrastructure is less than the profit impact of building a new feature and securing more users, then by all means, let's continue paying Heroku and focus on what makes the business more sustainable.
The time to convert from a simple setup like Heroku to a scalable system like K8 is when that change has the biggest net impact on the company's operation.
One thing I noticed a lot with indie or "one-man" startups is they make ample use of other SaaS tooling, often lesser heard of or known ones as well.
I am not sure what the right answer is, but I at least appreciate that there founders out there willing to give the little-er shops a chance. A healthy ecosystem with competition is good for the most amount of people.
I run a one-person SaaS company supporting three products. One is an iOS and Android all-local storage app so that costs me nothing to run. I have two projects running on Django sharing the same RDS DB. I can support two apps with just a single EC2 each. One runs docker containers. The other I did not dockerize yet. For me, the total costs are about $40/month. I have looked at Netlify and other “easy options” but they double or more my costs due to their costly basic tiers.
Thanks for a great post! It was super detailed and I loved reading it. I had a quick question about your pg setup. You mentioned that you use EBS for your persistence storage, which is locked by zone . You can't have an EC2 instance in Zone 1 mount a storage in Zone 3. Does this cause issues with your db? Especially as you have HPA and ClusterAutoscaler, your k8s nodes could be spun up in Zone 1 for pg autoscaling but your data is in Zone 3.
It makes sense to leave managed services outside. Do you really want to be responsible for maintaining your Postgres database? Dealing with upgrades, backups, replication etc...
Much better to leave that to the cloud provider to manage.
From the article:
> However, as a project grows, like Panelbear, I move the database out of the cluster into RDS, and let AWS take care of encrypted backups, security updates and all the other stuff that’s no fun to mess up.
From the small stuff to the big picture, Panelbear gives you the insights you need while respecting the privacy of your visitors. It's simple, and fast."
Price is based on client websites' page views per month, with free tier to 5K page views.
I have done with before, ran and one man b2b saas platform with 30clients from around the world. Infrastructure was the easiest part. We where processing roughly 100million messages a day, about 5 nodes. Monitoring was good, application performance tracking was good. Business ran for close to 7 years, making about 1.3mill a year on an average year.
Yes, I no longer have the business. It got harder and harder to compete with larger players as the market begin to attract more players. I sold of the business to a larger company (just contracts and not tech).
Hard part was support. Due to my clients being from around the world, sometimes they need help with items and expect a certain level of service, such and responses to support questions with in 24 hours or sometimes sooner. This required me to 90% of the time forward the inquiry to the a upstream vendor and that took time to collateral the right data to vendor (from the client's inquiry).
I did end up hiring a support personal for help to give me some breathing room on the weekends.
I don't have the startup part yet but here's my one-person stack with Postgres, Node and React deployed on AWS with CDK using RDS, Lambda, S3 and Cloudfront. It's 100% in the free tier.
Yeah, it's basically this. I'm running this as an initContainer for my K8s-based deployments. Took me a bit to get everything going, but my stack is pretty much similar to OP's article, although not quite as advanced in the automated-deployment of containers and monitoring. I'm not at a position where usage needs heavy monitoring because I'm still in the pre-launch phase of things and I'm using this side project to learn stuff I've yet to get experience with at multiple companies.
You can use any normal DB migration tool. For k8s, I put the app's readiness probe to false, run the migrations and then toggle the probe back to true.
I wonder what happens during blue green or canary deployment?
if your migration changes database schema in a way that affect previous version negatively?
is it even possible to do blue/green deploy if your schema changes radically?
It's fine if you make the changes backwards compatible a version. And in general, any change can be done in a backwards compatible way (although it can be a PITA)
He's using Django, so most likely Django Migrations which is built into that framework. If you're using Flask, you're probably using Alembic with SQLAlchemy. Those are the two main ways to handle schema migrations in Python.
Nice read. I haven't seen any references to Ansible or similar tools. For the ones who know: given the architecture described in the article, does Ansible fit in the picture? I don't know a lot of k8s but I wonder how VMs are provisioned (e.g., how docker is installed?)
In the authors case, terraform will create the EKS (kubernetes) cluster, which then is responsible for creating the EC2 instances. The actual application containers are then created by EKS.
Hey great post! Thanks for sharing so many details. Just one question: how do you approach profiling in production? Specifically in those cases where copying whatever slice of data from the prod DB would be too much to handle.
Digital Ocean / Vultr + Ubuntu droplets solve everything for us. We slap a sucuri in front of it, or for cheaper projects Cloudflare. I can't understand the complexity people come up with like this.
I'm currently using zappa and lambda for supporting about 25 b2b users. It's a django / react application and I use cloudwatch for scheduled cron jobs. My overall cost is <$20 a month.
It's probably fine, but reading about a single Postgres container in a Kubernetes cluster with backups to S3 gives me sweaty palms. I hope the author has fully tested their disaster recovery plan.
If you're using Stripe for billing how do you handle tax compliance, specifically related to the VAT requirements in each EU country as well as sales tax requirements in each US state?
I suspect IPO would be impossible with one 'person'. You'd have other people on payroll: Lawyers and accountants at the very least. If you're paying millions of dollars to professionals anyway at a certain point it would more sense to hire the people directly, right?
It's a great writeup, I just find it weird that the author runs his "privacy-focused" analytics service on AWS and Cloudflare. From a GDPR perspective it's not even clear if this is lawful (Schrems-II), and there are some good alternative cloud services available in the EU (e.g. Hetzner or OVH). Also, Cloudflare still sets the __cf* cookie on every request, so it's not really cookieless tracking (I'm aware that Cloudflare is planning to get rid off this cookie though).
Edit: Maybe the downvoters can explain what they're disagreeing with?
Sure, I still find it's a weird choice for a privacy-focused service. Also it's not clear that the Cloudflare cookie is really required to ensure the functionality of a web service (it probably isnt' as they're planning to get rid off it), so just pretending that you don't need to inform customers about this cookie set by your data processor is mere wishful thinking. Again, no one cares much about this and it's not enforced, if you're building a privacy-focused service you should give it some thought though instead of just saying "Cloudflare is cool because it gives me free TLS and load balancing". Just my 2c.
No it's not, you can check e.g. the EDPB's recommendation [1] on this. At the very least you'd need to use data mapping and ensure EU citizens data stays within the EU, the author's service advertises "200 edge locations around the world" so I'm skeptical whether data won't leave the EU.
Not many companies care about this and there's little enforcement so far, I think it's fair to think about this though if you're running a privacy-focused web service from Germany.
It's pretty clear that using AWS is lawful. What you're questioning is if AWS is being used in a compliant manner, which is an entirely different thing. It is possible to do so, so there's nothing odd with choosing AWS.
Personally I find it odd to choose AWS (and Cloudflare) for running a privacy-focused service out of Germany. But again, that's just my personal opinion, I guess most people here are fine with this setup. And I'm also at least a bit doubtful that a one-person startup can get all compliance aspects of running services in a global AWS and Cloudflare-based setup right, so I'd recommend using infrastructure that by default will be hosted in the EU so you don't have to worry about this.
You don't have to obey the GDPR for users outside the EU, so as long as the central storage is located in the EU (and only replicated across EU countries, which is easily configurable), the author is most likely absolutely fine.
By the edge locations, I'd assume he's serving cached static files, such as his blog or tracking scripts from there using CloudFlare. Assuming CloudFlare is not falsely advertising their GDPR compliance, the author is also fine.
As far as I'm aware, there's no requirement imposed by GDPR requiring that data stay within the EU as long as you have DPA's with Cloudflare, AWS, and any other data processors.
DPAs are very easy to sign with AWS and Cloudflare.
I also don't understand your complaint about "200 edge locations". Are you expecting him not to use a CDN?
Not very proud in the age of Cloud, but I can’t deal with all the complexities. Command line scares me (which seems to be the requirement these days for any development). Now I have a simple ftp folder mapped directly in VS Code.
- Static frontend hosted on Netlify (free unlimited scale)
- Backend server on Google App Engine (connecting to Gcloud storage and managed DB via magic)
I realize I'm opening myself up to vendor lock-in and increased costs down the road (if I even get that far), but I've wrangled enough Docker/k8s/Ingress setups in the past to know it's just not worth the time and effort for a non-master.