
One-Process Programming Notes - jaffee
https://crawshaw.io/blog/one-process-programming-notes
======
kodablah
From experience, before you just spinkle big ass string literals throughout
your Go code that communicate with Sqlite, just toss all of your persistence
in a separate package if you can. I'm not saying some crazy abstraction, just
put it some place common if you don't have a lot of advanced SQL needs, and
put in the minor extra work serializing to and from structs. You'll be happy
when you want to add query logging, find certain usages, need code reuse, and
if you ever want to change your persistence approach.

Also, devs should be aware of some of the concurrency limitations of Sqlite.
Devs need to essentially share a single connection across the app, and
serialize the writes.

Also, side note, I've found one really neat use case for Sqlite over other
DBs: simple disk encryption. I wrote a Go wrapper for one of the popular ones
[0], but in general if you distribute your app in other ways you may want to
offer your users encryption that you can't easily with other DBs. Sure you
lose some performance, but sometimes it's worth it (again, less in the server
case where you control it and/or can do fs-level encryption, but definitely in
the distributed app case).

0 - [https://github.com/cretz/go-sqleet](https://github.com/cretz/go-sqleet)

~~~
marktangotango
The author talks about using multiple readers and serializing all writes to a
single connection fyi. This is how you get really stellar update/write
performance out of sqlite. In my test I saw 3x insert per second compared to
Postgres for example.

------
pavlov
_" Most problems do fit in one computer, up to a point. Spend some time
determining where that point is. If it is years away there is a good chance
one computer will do."_

I've been following this principle for a decade, and 98% of the services I've
built have never needed to grow beyond one instance. The ones that did were
solved by breaking out specific functions into their own managed services, not
by applying a wide horizontal scaling solution.

If you build for scaling before you have anything running that can be tested
against, it's highly likely that you're actually scaling the wrong thing. It's
like optimisation: to get any meaningful benefit, you'll want to surgically
solve the actual bottleneck rather than sprinkle hope-based gestures all over
the place. (I'm reminded of the article that was posted the other day, where
the writer experienced a Firebase slowdown and tried to solve it by upgrading
hundreds of npm packages to their latest versions. That's what I mean by hope-
based gestures. Premature "web scale" architectural tripping is essentially
the same kind of shotgun approach.)

~~~
grey-area
Sure, I'd say this applies to the vast majority of websites, and technologies
like docker, kubernetes or multi-region deploys simply are not required for
most projects. There is no need for massive scale and multiple nodes for most
services/websites and if you find there is, well you'll be rewriting a bunch
of things anyway unless you've done a lot of work on very large websites
before.

I do think there's something to be said for habitually separating the database
though - if you store data elsewhere, you can have 1 instance or 2 or 3 pretty
easily, and sometimes having more than one is nice for redundancy, handling
big spikes or seasonal traffic etc. It is pretty easy to scale horizontally
nowadays on cloud services where they take care of load balancing and
sometimes the data store too. You also don't have to worry about wiping a
server and starting again, as your data is elsewhere.

Embedding a db like sqlite introduces other problems (cgo requirement for
one). It is really handy for things like mobile development though as then you
don't depend on a separate api or db for data. It's also handy if you don't
want potential users to have to worry about setting up a db when trying out
your project, so an embedded db is really nice for that - I wish golang had a
clone of sqlite that was pure go code as it would make this even more
attractive for a broader range of options.

------
pjc50
"One process programming" is a really under-rated philosophy. Communication
overhead can be substantial and concurrency issues are harder to debug. A
single process that you can just stop and inspect in the debugger is easier to
work on. Especially if you're only one developer as well.

Two big pieces of software I worked on used this philosophy. One was
[https://en.wikipedia.org/wiki/Zeus_Web_Server](https://en.wikipedia.org/wiki/Zeus_Web_Server)
\- nowadays its niche is occupied by nginix, but back then its advantage was
being a one-process-per-core select()-based solution, compared to Apache's
process-per-concurrent-request solution. It was also absurdly, unnecessarily
portable, being written in vanilla POSIX. Ran on a dozen different flavours
and architectures of UNIX almost all of which are now dead.

Another was chip design software for a startup. Eventually it would have been
useful to parallelise computation, but in the meantime it was _so_ much faster
to develop it and just buy big machines and leave it running overnight.

See also [https://adamdrake.com/command-line-tools-can-
be-235x-faster-...](https://adamdrake.com/command-line-tools-can-
be-235x-faster-than-your-hadoop-cluster.html)

------
interfixus
Unsurprisingly, comments start pointing the way to added complexity like
Postgres and containers and whatnot, but this is _exactly_ the right approach
for probably the large majority of webapps out there. One executable, one db-
file, up and running anywhere within a very few minutes. With a modicum of
intelligence thrown into the design, you're never ever going to see SQLite
acting bottleneck. Few sites have request rates on the scale of tens of
thousands per second.

Developers of would-be federated services to decentralize the web, please take
note. This is how you go about ensuring wide deployment. Simplicity, if anyone
remembers the word.

------
oavdeev
If you're using AWS anyway, I think using their managed Postgres / RDS could
make this a lot easier operationally. No need for custom syncing/backup
scripts, easier to scale the storage up.

To the larger point, while I totally agree with the premise -- most apps will
never need to scale beyond one large instance, I'm not exactly sure what's the
actual tradeoff is. If you're writing a simple CRUD app, it is not really
controversial that it shouldn't need any complex distributed stuff anyway, it
is just a simple python/Go app and a data store.

Most "fancy" things outside that single process paradigm, such as avoiding
using local disk and using S3/RDS/document stores, using containers for
deployment, etc. usually have more to do with making the app operationally
simpler and easy to recover from instance failures than scaling per se.

~~~
chii
> operationally simpler and easy to recover from instance failures than
> scaling per se.

I'd say architecturally making the app easier to perform this function, rather
than rely on infrastructure to do it for you (and hoping doing so will make
the app "easier" to code).

For example, use an append only data structure (like event sourcing), with
snapshotting. Then your app recovery process is just "restart".

------
erdaniels
" _Typical concerns about the impact on software quality of adding C code to
Go do not apply to SQLite as it has an extraordinary degree of testing. The
quality of the code is exceptional._ "

It could be worth noting the thread concurrency limit imposed by cgo calls
locking the OS thread to the calling goroutine. For example, 128 concurrent
requests running long statements would take up 128 OS threads.

Of course any language not using a lightweight threading model has this issue
either way if each request is mapped to one thread but it seems worth it to
point out that you may want to limit your own concurrency at the request level
or with properly sizing a connection pool (like the one present in the sqlite
driver).

Edit: whatever the solution to concurrency is, benchmarking it is a good idea
for any sized product going into a production environment. There's a good
article about this by the people at Cockroach labs:
[https://www.cockroachlabs.com/blog/the-cost-and-
complexity-o...](https://www.cockroachlabs.com/blog/the-cost-and-complexity-
of-cgo/) (see Cgoroutines != Goroutines)

~~~
le-mark
_like the one present in the sqlite driver_

Last time I checked mattn's lib, there was no connection pool, and my tests
supported this. Do you have other information?

~~~
erdaniels
David created his own at
[https://godoc.org/crawshaw.io/sqlite](https://godoc.org/crawshaw.io/sqlite)

------
hathawsh
This is a good article, but I feel like using Postgres instead of SQLite on
the server side would achieve his goals even better:

\- No need to link with C code. There's a pure go driver for Postgres.

\- You get the 'psql' command to inspect and change the database, even while
the web app is running.

\- You get a lot more database features.

\- You can keep the deployment simple by running Postgres on the web app
server until you need to scale up.

\- Conversion to a full blown cloud database and horizontal scaling is just a
matter of configuration.

I understand the desire to keep the stack as simple as possible, but it looks
to me like Postgres would actually be simpler than SQLite for the server side.

~~~
qznc
Can you deploy Postgres by simply linking it into your executable?

~~~
lmm
Yes, though it's a less common configuration and so a bit fiddly and
underdocumented.

~~~
ngrilly
As far as I know, there is no way to link PostgreSQL as a library in another
executable. At least, nothing supported by the official PostgreSQL project.
Even MySQL, which used to be available as an embedded library, got rid of
this, feature a long time ago. Do you have any link to this "underdocumented"
feature?

------
chj
While reading this, please keep in mind this is for indie developers, who
needs to do all kinds of jobs. Chances are SQLite is the one thing that
everyone must learn to use, if your business involves building apps. So if you
can use it on server side, it would save a lot of needless learning. Same
reason as how nodejs got started.

Some may say, well, sql servers are not that hard to learn. Well, that is
because you live with it day and night. Try going back to it after 1 year of
break.

------
cabaalis
> I claim typical services do not hit this scaling limit.

This seems to be the main push of the article, and I for one agree with that
statement.

------
justinholmes
I've done something similar to keep it small with potential to go multi region
for one of my new projects using Bitbucket and Google Cloud.

Bitbucket pipelines that create an image of the app, create instance template
with a container "gcloud beta compute instance-templates create-with-
container", create instance group, create DNS, global load balancer, SSL certs
and rolling update of the instance groups with a new version (if applicable).

Using preemptible g1-small to keep costs down but deployed in at least two
zones per region.

~~~
ngrilly
Where do you store your data? Since you use at least two zones, how do you
keep your data synchronized between zones?

Do you use SQLite?

What do you mean by an "image of the app"? A Docker image or something else?

How does your rolling deploy work?

------
bvinc
Just today, I finished writing my own low level go sqlite driver. I haven't
published it yet. I really thought there was a need for it. I hate the
database/sql integration that the other drivers have. Now I'll have to see if
my driver has any need to exist.

This always seems to happen when I write something.

~~~
iovrthoughtthis
There is nothing wrong with implementing something someone else has already
done!

Usage of a library has more to do with it’s community than it’s codebase.

~~~
ckastner
> _There is nothing wrong with implementing something someone else has already
> done!_

Well, the phrase "re-inventing the wheel" exists for a reason.

> _Usage of a library has more to do with it’s community than it’s codebase._

The purpose of a library is code re-use.

For comparison, the wikipedia page [1] for a computer library does not even
mention communities, or the social aspect in general.

I'd say the community is a factor in the decision to use a particular library,
but it's not the driving factor.

[1]
[https://en.wikipedia.org/wiki/Library_(computing)](https://en.wikipedia.org/wiki/Library_\(computing\))

------
mikhailfranco
_Scalability! But at what COST?_

Single process performance and the COST principle, from the iconoclastic (some
might say bombastic) McSherry:

[1] blog
[http://www.frankmcsherry.org/graph/scalability/cost/2015/01/...](http://www.frankmcsherry.org/graph/scalability/cost/2015/01/15/COST.html)

[2] pdf
[https://www.usenix.org/system/files/conference/hotos15/hotos...](https://www.usenix.org/system/files/conference/hotos15/hotos15-paper-
mcsherry.pdf)

------
TooBrokeToBeg
Why is half the blog post about SQLite? Use SQLite, because everyone has or
will. Yea, got it.

------
gf263
I really enjoy the minimalism.

