
An Architecture for Millions of Things - jswny
http://blog.cityboundsim.com/an-architecture-for-millions-of-things/
======
wyager
Interesting project. A few pedantic points:

> let's have a look at the very few CPU center-of-attention prediction rules:

Modern high-end CPUs have incredibly powerful branch and cache prediction
machines that use _much_ more complicated rules than the 3 listed here.

Additionally, the actor model is _horrible_ for cache consistency. Actors only
run for short bursts, so both the instruction and data caches are constantly
getting blown out by context switches (even lightweight context switches that
don't involve the OS). By their very nature, actors are focused on different
things. The "loop over lots of objects" paradigm would almost certainly
deliver higher cache performance, at the very least because you're running the
same code over and over again.

Edit: that's not to say that the actor model gives worse performance in
general - just for certain workloads.

~~~
jdmichal
The author clarified on Reddit [0] that the lists are separated per type, so a
list will only contain data for a single actor. So the actors are likely
looping over lists and processing each entry, which should look to the CPU
like any other array-processing code and optimize pretty well. It's a bit
deceptive, since actors would typically be isolated. Instead, it seems like
the "big idea" here is simply separate different data types into different
lists and process them with different threads, using message queues as the
concurrency mechanism.

[0]
[https://www.reddit.com/r/programming/comments/4sp25q/xpost_r...](https://www.reddit.com/r/programming/comments/4sp25q/xpost_rcitybound_an_architecture_for_millions_of/d5b8gsk)

~~~
anselm_eickhoff
Exactly, and messages are divided into buckets which contain only one kind of
(type of message, type of receiver)-combination. If you then iterate over
that, you are going to execute very similar code repetitively.

I guess my epiphany was that exactly because actors are isolated, I could
manage them like this and do some very primitive fast array iteration, while
still using a nice high-level pattern to implement the actual game logic.

------
Animats
OK, so someone is writing a city simulator/game, with millions of actors.
Fine. They're not very far along yet; there don't seem to be any demo images.
Yet they're obsessing on CPU cache management. This seems to be premature
optimization.

Their big problem is making a procedural city generator that produces a
convincing city at street level. There are lots of floorplan generators and
things which produce vaguely reasonable buildings, but nothing good enough to
make a video game grade driveable or walkable city down at street level.
Unless they crack that problem, their game is going to suck.

~~~
yetihehe
If you need to do something which was never done because of efficiency
problems, you need to think about how to do it before you start. It's called
planning, not premature optimization. Just optimizing existing approaches
won't do anything good here as author said in beginning.

~~~
ggambetta
It is a bit premature in a way; paraphrasing here, but _" a million is not
cool. You know what's cool? A billion"_ No matter how much you optimise CPU
caches and memory allocators, at some point your application won't fit in a
single machine, and you'll have to go distributed. Datacenters, not mainframes
:)

~~~
anselm_eickhoff
But if you're already using message passing, this transition will be much more
natural!

------
josephg
The author describes performance problems with existing heap (malloc)
implementations, then describes their own memory allocation algorithm system
like this:

\- things of roughly similar total chunk sizes (constant + dynamic part) are
stored in one "bucket", laid out contiguously in fixed-size slots, with a
little leeway for growth inside

\- there are buckets for each existing size of a thing

\- if a thing outgrows the slot size of its bucket, it simply migrates to the
bucket with the next bigger slot size

Isn't this exactly how many (most?) modern malloc implementations work? ... A
series of buckets for different sizes of objects, combined with bitfields or
free lists to keep track of which slots in each bucket are free?

~~~
anselm_eickhoff
(author) I made the mistake of writing a blog post targeted towards non-
programmers, leaving away a lot of essential detail. For example that buckets
only contain one kind of actor or message and because of indirect references
can be deallocated by a simple swap-with-last, so no free-lists are needed

------
dharma1
this is a bit further [https://improbable.io/](https://improbable.io/)

~~~
ggambetta
And specifically about city sim: [https://improbable.io/2016/03/17/disrupting-
cities-through-t...](https://improbable.io/2016/03/17/disrupting-cities-
through-technology-a-new-event-with-wilton-park)

Disclaimer: I work at Improbable

~~~
Dzugaru
That's an impressive thing you do.

Can't find if you're using GPGPU for (maybe some simple, but not complex?)
workers or not?

~~~
ggambetta
We offer a C++ SDK that would allow anyone to integrate anything as a worker,
as long as it speaks a simple protocol - essentially, receiving updates from
the platform, and sending updates to the platform. It would be completely
viable to use a GPU to do some intensive computations and send the results
back as updates.

Here's an example of a custom worker:
[https://improbable.io/2016/04/21/create-custom-flocking-
work...](https://improbable.io/2016/04/21/create-custom-flocking-worker-
spatialos)

~~~
Dzugaru
Awesome, thanks!

