
Storing C++ Objects in Distributed Memory - xiii1408
https://people.eecs.berkeley.edu/~brock/blog/storing_cpp_objects.php
======
maxxxxx
I think stuff like this is why C/C++ are so popular. You can push the language
really far to do interesting things. Obviously you can also shoot yourself in
the foot but you can also pretty much do whatever you come up with.

~~~
lostmsu
Actually, most stuff described here is implementable in any language. Since
its public interface is just a container.

~~~
xiii1408
Dynamic languages for sure - you just dynamically dispatch your method call
based on the type of the object in the container.

But what C++ is doing (and static languages with metaprogramming in general)
is specializing and statically inlining everything at compile time so the
assembly just takes the fastest path every time; no runtime cost.

I'm unsure which static languages do and do not support partial
specialization. My understanding is that the current iteration of Rust doesn't
have partial specialization, but maybe it has other features that could do
what this post is doing?

I'd be interested in hearing people's perspectives on the metaprogramming
power of different statically typed languages - people have expressed interest
in me building libraries like this in Rust and a few other languages.

~~~
steveklabnik
I don't fully grok all of the details in the post, but yes, Rust does not have
specialization. However, given the way traits work, I don't think it would be
needed in Rust. Serde is the most popular serialization/deserialization
framework. You get a Serialize trait, that you implement for a given type. It
already calls that specific implementation for that type.

Buuuut I'm probably missing something.

~~~
stochastic_monk
He specifically asked about partial specialization, which is akin to currying
for templates.

~~~
steveklabnik
Yeah, this is what I mean by that I don't 100% understand all the nuances. I
mean, I understand what that feature does, but given that Rust's system works
quite differently, I'm not 100% sure what an exact translation would be.

------
htfy96
This was already broadly used at least 10 years ago. Interestingly, modern C++
techniques tend to be ignored by HN folks, while old stuffs[0][1][2][3]
generally gain more points.

[0] this thread

[1]
[https://news.ycombinator.com/item?id=18650902](https://news.ycombinator.com/item?id=18650902)

[2]
[https://news.ycombinator.com/item?id=16257216](https://news.ycombinator.com/item?id=16257216)

[3]
[https://news.ycombinator.com/item?id=18281574](https://news.ycombinator.com/item?id=18281574)

~~~
vortico
Perhaps this is because modern language designers gradually solve more niche
problems as time progresses, while the rest of the world is still catching up
to pre-2000 concepts. The 90's was a great time for new strides in language
design and memory management, but now we're only making marginal improvements.
People will vote for things that teach them something new.

------
fredsanford
We were doing this stuff in 1993 with OS/2, Rogue Wave libs and SmartHeap. It
was a LOT of work and exposed many a buggy motherboard.

We had to build everything ourselves except for the STL like features of RW
Tools.h++ and a few other of their libs.

1990s C++ compilers were ... interesting ... - especially for templates.

~~~
pjmlp
Yeah, writing C++ portable code across UNIX flavours, OS/2, Windows was indeed
interesting, even when using plain old C.

Have you used SOM as well?

~~~
fredsanford
>> Have you used SOM as well?

We tried a few things with it. It was a good idea that took too much manual
work. Much like CraptiveX/COM but without the toolworks.

------
slaymaker1907
Distributed data structures is an interesting idea, but I’m not sure this
particular implementation is the right abstraction level for many problems.

These sorts of things form a spectrum from low level message passing all the
way to a full fledged database. Some things in between are abstractions like
MapReduce which still gives a lot of control, but also gives you fault
tolerance and lower configuration of your cluster to run different
applications.

Another abstraction in this space that is just a bit higher than ordinary
message passing are systems like Kafka and RabbitMQ which give additional
guarantees over plain message passing using sockets as well as less
configuration to set up/remove machines.

One thing a lot of the more successful abstractions in this space seem to have
is a pretty clear line between which parts of the system are local and which
are distributed. It seems like this system doesn’t have as clear of a
demarcation which would make understanding performance much more difficult.
Databases can also have this problem since they do a lot of low level
performance tuning automatically, but at least they provide a very high level
of abstraction for applications to work with. This seems like it could end up
being confusing to tune without providing a nice high-level interface like
databases.

~~~
xiii1408
This post is really just about an abstraction for storing objects and doesn't
explain how you'd make data structures that distribute objects across nodes
using remote pointers.

Remote pointers end up being pretty nice for locality, since you can
explicitly see what process a remote pointer is pointing to.

None of the backends that my remote pointer implementations depend on really
offer fault tolerance, though. So you're right: if you're in a situation where
you have nodes entering or leaving your cluster before your program ends, this
model needs extending before it works for you. That's true for MPI-style
programs, generally, though.

------
slaymaker1907
Another option for variable size types not explored in the article is to set
aside some contiguous part of memory for these objects. The main idea is to
store pointers as relative addresses from the start of the memory region.

You can then avoid doing any copying or serialization by directly sending the
entire memory region as a unit.

The main difficulties with this approach is in deciding how large to make
these regions and also handling free over this region. The first part is often
not too bad depending on the application, but handling deallocation and
expanding/shrinking memory regions can be tricky. This allocation problem is
much more difficult to dismiss since it is necessary to solve in order to
avoid making the user decide how much memory to allocate initially which is
impossible for many applications. For instance, when dealing with strings, it
may be impossible to get a good bound on the size of these strings.

~~~
amelius
> The main idea is to store pointers as relative addresses from the start of
> the memory region.

One problem is that languages don't support the use of standard container
types inside a region with a variable base address (variable because it will
change the next time you call mmap).

I think modern languages should support this concept, especially the ones that
aim to be "systems" programming languages.

~~~
Koshkin
> _mmap_

Funny you should mention that - something has kept me thinking that the modern
operating systems should have a built-in support for the cross-node sharing of
the virtual memory via mmap.

~~~
xiii1408
Some cool people at VMWare wrote some software that lets you manage RDMA
segments as files in the file system [1]. And you can mmap those RDMA-resident
files if you wish.

[1] [https://256fd102-a-62cb3a1a-s-
sites.googlegroups.com/site/mk...](https://256fd102-a-62cb3a1a-s-
sites.googlegroups.com/site/mkaguilera/remote-regions-atc2018.pdf)

------
lostmsu
It's a weird article in that it does not mention at all, that for remote
containers it's more beneficial to expose and use range queries. E.g. you
don't ever want to access remote list element by element due to latency. You
will be much better off with chuncked enumeration.

~~~
loeg
Sure. Most of the time you'd be better off with an explicit endpoint and
protocol, using something like protobufs or capn-proto (which provide both
serialization of objects and a relatively easy to use RPC interface for remote
calls).

Making remote accesses explicit rather than implicit makes it a little more
obvious to code readers how grossly expensive they will be.

~~~
lostmsu
To be fair, the collection can internally do prefetching and write
accumulation.

~~~
loeg
I'm not sure adding black boxes really helps design reliable/performant
distributed systems :-).

~~~
lostmsu
Everyone uses semi-transparent caches of CPUs. If behavior is predictable - I
don't see a problem.

~~~
loeg
CPU caches are good about correct cache invalidation (and must be). Software
caches aren't always. Software caches don't always do a great job of bounding
cache size, either.

Today's multi-core systems are really distributed systems, and the cpu cache
does not hide that well enough that OS software doesn't have to work around
it.

For example, kernels have to send IPIs (interprocessor interrupts) to other
CPU cores to get them to perform some actions; cores sharing a specific cache
line, or memory address that happens to alias to the same line in cache can
create a 'ping-pong' effect that can severely degrade performance; L3 cache
latency can be non-uniform depending on how far away the cache is from that
particular core. The black box is breaking down as core counts increase.

------
Koshkin

      template <>
       struct serialize<int> {
         int serialize(const int& value) {
           return value;
         }
    
         int deserialize(const int& value) {
           return value;
         }
       };
    
      error: return type specification for constructor invalid

~~~
xiii1408
Yep, you caught my typo. Notice the GodBolt version has serialize_().

------
j4ah4n
well off topic, but anyone know what blog engine/style this is done in?

~~~
xiii1408
It's just CSS based on the LaTeX defaults, written by Andrew Belt as a style
for Wikipedia [1,2].

[1] [https://github.com/AndrewBelt/WiTeX](https://github.com/AndrewBelt/WiTeX)

[2] [https://andrewbelt.name](https://andrewbelt.name)

