
Tidalscale, a hypervisor that allows a single OS to run across multiple servers - corv
https://www.tidalscale.com/
======
withjive
Not a fan of "magic" scaling across multiple servers. I'm not sure I would get
the amount of control to squeeze the most performance out of this "OS" (edit:
in terms of data locality for "big computation").

I am using Terracotta for a high performance JVM memory heap across a cluster.
I need control and performance.

I wonder how this compares...

Personally, I'd rather stitch my cluster together than myself, rather than a
monolithic "OS" hypervisor.

------
2trill2spill
This video from the Bay Area FreeBSD User Group, June 2015, does a decent job
of explaining how TidalScale implements their hypervisor, spoiler, it's based
off FreeBSD and Bhyve.

[https://www.youtube.com/watch?v=f-ug6B6ycng](https://www.youtube.com/watch?v=f-ug6B6ycng)

------
nl
So.

It's easy to dismiss this: the fallacies of network programming, etc etc.

But something happened in the last 10 years many people haven't really thought
through: the network became faster than (most spinning) disks. Reading 1 MB
from a spinning disk is roughly slow as reading 1 MB from the network in the
same data center.

That makes you consider exactly what a "computer" is. A "supercomputer" is
little more than multiple traditional computers with very heavily optimized
network interconnects. But now everyone's internet connects are just as fast,
why should we consider a cluster of computers any different?

Yes, SSDs change this (especially M2!). But view the SSD -> Network
relationship as we used to view the RAM -> Disk relationship.

I think it's worth rethinking some of the traditional boundaries of what "a
computer" is.

~~~
danpalmer
True, however this is competing with Supercomputers, which are quite a special
case.

It's difficult to define a supercomputer, but essentially a defining factor is
being able to read data from the _memory_ (not spinning/SSD disk) across the
network at the same speed as if it were local. This is typically done with
specialised networking (like Infiniband) that can reach nearly 100Gb/s,
something that we are still quite far away from on 'commodity hardware'.

I think this company must be doing something very different, or have made some
sort of breakthrough, that allows them to do what they're claiming, I don't
think the network becoming faster and having SSDs is what has enabled this, at
least not for the most part.

------
dkarapetyan
About page is pretty impressive. I understand how memory and storage can be
clustered this way, I guess it's just like virtual memory and you have another
layer of indirection via their hypervisor. What I don't understand is how you
can partition computation the same way and not have performance tank.

~~~
lotyrin
Answer: you can't.

Something like MOSIX (os-level ability to ship processes, their IO, etc.
across the network), being less indirect, should theoretically outperform this
sort of thing, but people don't even deploy those very often.

It's almost always harder, but still not harder enough to write apps that
decompose explicitly to being distributed.

Implicitly, you take on risks where either your whole VM dies if any physical
member dies, or you eat the performance penalty of some kind of application-
naive consensus, or yet some other hideous compromise. There's no good answers
and you get what you pay for.

~~~
dh997
Computing's RAID0 with worse performance and worse reliability.

With traditional enterprise hypervisors, boxes are sized for least TCO and to
be large enough to comfortably hold the largest VM... and then the farm is
usually sized to have at least N+k (k >= 1) capacity to avoid killing off VMs
in order to perform maintenance or be trapped without some spare capacity for
new VMs. Critical VMs use lock-stepped mirroring on multiple physical boxes to
avoid downtime from hardware failure. Furthermore, most enterprise hypervisors
allow migrating both storage (disk images) and computing (the running VM's
state) resources to different hw (hw failure notwithstanding).

If folks are trying to vertically scale one app or one system to become a
giant box without using explicit parallelism, it's going to be slow and
painful.

(As a teenager, I once ran an actual Fortran nuclear reactor simulator on
Windows beige box and later played with making code using OFED performant over
Infiniband... don't ever inherit two brands of gear and expect it to work)

------
winestock
How does this compare to the clustering capabilities of VMS? If I remember
correctly, a VMS cluster can make several physical computer behave like one
logical computer with multiple CPUs and disk packs.

~~~
drudru11
VMS did not share memories throughout the cluster.

------
ris
Hmm. Proprietary "magic".

"Come, base your entire infrastructure on our hypervisor..."

------
poelzi
So, it is basically for people who don't want or can't distribute their work
using proper data modeling and multiple distributed processes.

Don't get me wrong, there may be use-cases where this is actually the best
approach. But having 1 OS with many, many NUMA nodes (that's how I guess the
OS sees the cluster) is also very hard to get right. You need to program NUMA
aware and use local memory where possible, all kind of locks will become very,
very slow so you need to program lock-less where possible or have some sort of
transactional memory. Anyway, I would say that the gain you get by using
Threads will most likely be overcome by the complexity in memory management
and data structures to reach good performance.

Message passing is much easier to program and allows easy encapsulation of
modules and test abilities.

Cool project nonetheless :)

------
rdl
I don't think I'd use this for my core application (where it's worth building
application-specific distribution), but having an option in the toolbox for
other/internal/non-core apps which have grown beyond a single machine.

(I met the founder at a conference last year where we were both speaking; I
was suspicious of the whole model (because it sounds like magic), but talked
with them a bit more and it seems legit for a lot of workloads.)

------
dh997
Having been an HPC sys admin pre and post cloud, this unicorn comes around
about every ten years... many previous startups failed because they tried to
push expensive custom hardware with fancy interconnects, but most importantly,
there isn't a need because scaling apps horizontally instead of vertically is
usually more powerful, cheaper and more reliable as evidenced by Google's
dominance. Plus, Rocks Cluster, Hadoop, Kafka, Kinesis, iPython, Erlang/OTP,
Akka and more already make these sort of things moot because deploying a
resilient cluster has not been as impressive as summiting Mt. Everest for a
decade now. It's feels like trying to sell a magical product (cluster resource
management) to get a major advance for free (single process system image).
Maybe they can make the file system, ram, network devices, packets, firewall
rule and everything else magically work, but fault handling and latency are
big concerns that rapidly increase in complexity, orthogonal to anticipating
understanding of how to most efficiently schedule what a process wants done
when it would be simpler to code for parallelism such as MapReduce / actors
from the beginning.

~~~
sz4kerto
> there isn't a need because scaling apps horizontally instead of vertically
> is usually more powerful

No, it's isn't more powerful per se. There are many problems that simply
cannot be scaled horizontally at all because the data exchange pattern between
nodes. However, the infrastructure/hardware behind horizontal scaling is
simple, and it's basically unlimited.

~~~
taneq
If these problems can't explicitly be scaled horizontally by the developer,
why would this system be able to scale them horizontally by doing it behind
the scenes? Any limitations of data exchange between nodes would still apply.

~~~
dh997
Most of these single system image clusters have used custom, fast
interconnects akin to Inifinband or hypertransport to ameliorate locality
issues (latency and bandwidth)... Without such, it's neigh impossible without
pinning all resources for a "process" to one physical box.

Good luck ever getting more than one box worth of performance or resources out
of a single application instance.

Hence MapReduce, actors, promises, message-passing, etc.

------
ctstover
Why does this make me think of RAM doublers from the win 3.1 era?

~~~
slang800
link for those who aren't familiar:
[http://www.ambrosiasw.com/Ambrosia_Times/January_96/3.1HowTo...](http://www.ambrosiasw.com/Ambrosia_Times/January_96/3.1HowTo.html)

------
cdvonstinkpot
ScaleMP does this, too. Not for the faint of wallet, even for the free edition
considering you have to use approved hardware...

[http://www.scalemp.com/products/vsmp-foundation-
free/](http://www.scalemp.com/products/vsmp-foundation-free/)

~~~
alilee
ScaleMP has been around for a long time. It would be good to get an update or
understand the distinction between the two products.

------
gaze
Mosix is still a terrible idea.

------
jjoe
Is this NUMA orchestration?

~~~
seabrookmx
That was my thought as well. Like NUMA but over the network.

------
jupp0r
I don't see how this is relevant when you can buy servers with 3TB of RAM off
the shelf.

