I worked on a tool that processes a large-ish dataset using only data structures in memory, since this is much faster and simpler than using a system like Spark for example. Not only that, the nature of the processing algorithm (a reduce, in effect) makes it kind of pointless to run on a cluster of nodes.
The dataset is 2-3B records with 5-12 64-bit values each, stored in a few dozen files using the Apache Arrow format. If we take the midpoint of this range that's 170 GB just with raw data. With the overhead of data structures, I was running the process with ~400 GiB of RAM and could have done more on a beefier machine.
It took about 20-30 minutes to run the full algorithm on these tens of billions of data points and this approach was perfect for this use case. No overhead of Spark and all of its dependencies, just one program, a bunch of input files, and it's done when I get back from lunch.
curious how would it have performed if you loaded it all in a SQLIte database and instead ran a SQL query ?
If the B-tree structure used by SQL was small enough to fit in memory it should still be fast I assume
This is just hitting the market so it will be interesting to see where it goes.
If the vendors were ready for something like this on the software side, this would be great for edge compute when low latency response is required - remote utility substation handling and reacting to a large array of sensors feeding at 60 data points per second. In some use cases going to the control centre and back would be too slow to benefit. Basic grid control is well handled, but I could see optimizations benefiting from this. Vendors and utilities are way behind on this though.
That will only help if every NIC has about 20 TX queues or so. If it cant utilize the cores, or the driver or app cant then all those cores won’t help.
Oracle Cloud provides 4 Altra Cores, 24GB of RAM, and 200GB of storage for free (supposedly indefinitely). I use it for a Minecraft server. Handles ~15 players with a half-dozen plugins without players complaining about any lag. I only use 4GB of the RAM because Java's Garbage Collector - and Minecraft is heavily single-threaded so I'm probably not using all cores very effectively, but it's free and works.
EDIT: And encoding is limited to ~16 cores in practice. It seems like after that, the communication between threads get too much to be useful. Unless you plan to be doing 5-simultaneous encodings at a time, then you're gonna have to find something else to do with all those cores.
Is there a reason that 175W of processing power on 80 small cores at 2 GHz would be faster than, for example, an AMD EPYC 7F32, which has a similar TDP of 180W and 8 cores with 2 threads each that run at ~4 GHz?
Naively, assuming identical instruction sets (I know they're not), 16 threads at 4 GHz is less than half as good as 80 at 2 GHz. But that can't be the whole story.
AVX2 (256-bit SIMD instructions) is huge in the encoding world. A lot of these encoding algorithms operate over reasonable block sizes (8x8 macroblocks) that ensure that SIMD-instruction sets benefit greatly.
ARM only has 128-bit SIMD through NEON. Its reasonably well designed, but nothing beats the brute force of just doing 256-bits at a time (or 512-bits in the case of Intel's AVX512)
Oracle Cloud provides 4 Altra Cores and 24GB of RAM for free. I can support ~15 players with a half-dozen plugins without players complaining of any lag. Minecraft is very single-threaded though and I'm only using 4GB of the RAM because of Java's Garbage Collector - but it does work and it's supposedly free indefinitely.