I worked on this movie, I was at DNEG at the time. One of the standout things that I remember is that this particular simulation was toxic to the fileserver that it was being stored on.
From what I recall, I don't think that it was running on that many machines at once. Mainly because it required the high memory nodes that were expensive. I think it was only running on ~10 possibly 50 machines concurrently. But I could be wrong.
What it did have was at least one dedicated fileserver though. Each of the file servers at the time were some dual proc dell 1u thing with as much ram as you could stuff in them at the time (384 gigs I think). They were attached by SAS to a single 60 drive 4u raid array. (Dell PowerVault MD3460 or something along those lines. They are rebadged by Dell and were the first practical hotswap enclosure that took normal 3.5" SAS drives, that didn't cost the earth)
The array was formatted into 4 raid6 groups, and LVM'd together on the server. it was then shared out by NFS over bonded 10gig links.
Anyway. That simulation totally fucked the disks in the array. By the time it finished (I think it was a 2 week run time) it had eaten something like 14 hard drives. Every time a new disk was inserted, another would start to fail. It was so close to fucking up the whole time.
I had thought that the simulation was a plugin for houdini, or one of the other fluid simulation engines we had kicking around, rather than a custom 40k C++ program.
The paper mentions that the accretion disc was rendered in Houdini.
It also mentions that ‘several hundred’ of their 1633 10-core, 156GB (weird number?) blade servers were used, but didn’t seem to go into details on data storage.
Is it possible you were working at the compositing phase, which would have been very heavy on random read and writes, resulting in more wear on the disks?
I was a systems engineer at the time. Compositing is actually really quite ideal linear streaming. You read the images serially and tend to write them back serially. which for spinny hard disks is about as optimal as possible. each frame tends to be between 20-150 megs (resolution, layers, bitdepth). but they are rarely read in a random IO way, you start at frame 1 and go to frame
there were 32 primary fileservers at the time, two were spare, and, including the nearlines (which were _n_ lustre machines and each had 4 raid arrays attached by SAS rather than one) we'd normally expected to replace twoish disks a week.
They ran particle sims all the time. Water, smoke, explosions are all staples of VFX. It was just something weird about this particular sim.
My understanding was that the actual simulation was causing the disks to die, rather than the render. A render can be restarted, a sim, less so. Well, this sim at least.
Is it just so much read/write? I'm having hard time understanding why a simulation would need so much read/write to disk? Wouldn't the CPU and RAM be more important for calculations, then write them to disk when done?
TLDR: the number of particles used, and the memory required was far too big to fit on one machine. more over, machines needed to pass data to each other, they chose to use files for that.
Long story:
Forgive me if you already know this, I'm going to start with a toy example and then ramp up the scale.
Imagine that you are doing a "normal" simulation of something like a ball rolling down a ramp into a pot. That's fairly simple, but you need to know the position of the ball, the force vector and the location of any object near by that might collide with it.
It gets a bit harder when there are two balls, as they might also interact, so you need to store the state of two balls.
At a thousand balls, you start to need to think about scaling/threading/parralising (I mean you probably don't, you'd just use physX from nvidia and make it the library's problem, but bear with me) One way is to divide up the simulation area into a bunch of voxels and treat them as individual processing areas. When an object enters/exits an simulation area, you pass the object over with its force vector, and mass, and let that simulator deal with it.
Now, I'm not a physics engine person. The above may or may not happen as I describe, but the key thing is, there are sub simulation to allow you to run in parallel. You need to make sure that each processing voxel has completed it's processing for that time step before you can move on.
Once you have calcuated the position of each object, you can save that to disk and begin to calculate the next frame. Note, this isn't rendering, this is just working out the "pose" (position and rotation) of each active object.
The simulation of the black hole was effectively a massive particle simulation. I don't recall how many particles, but something like 14 billion sticks in my mind. It might have only been a billion. So they need to calculate and store the id, position, rotation, velocity and probably other details for every single particle. Even if everything is in a single float for a billion particles, its 28 gigabytes for position and rotation alone (xyz for position, quaternion for rotation) That's without any other state info like heat, force vectors, the weird quantum stuff or weight.
Once you have the position of all the particles, you then need to fire photons through them to work out what colour the pixel should be. That involves loading in the position of the particles, firing rays through them all and recording what it hits and why.
But why so much file IO? didn't the machines talk directly?
DNEG at the time was as close to being a "perfect" unix/linux shop as you could get. everything was ephemeral, even the file servers. All the nodes in the render farm could configure themselves from scratch (pretty much) from power on. You'd plug it in, switch it on and it would work out what it's hostname is, image themselves and join the rendering system autonomously. Most binaries that you used were stored on an NFS share somewhere. this meant that you had more or less complete control over the data flow. For example `rm` wasn't actually rm, it was a wrapper that moved files into a 'deleted' area, rather than binning them.
There were many layers of backup, all the way to tape.
Your home directory followed you everywhere, which meant so did your environment. This meant that if you wanted a specific version of a program, we had a wrapper that set that for you. So typing "maya" would spin up the right version of maya, use the correct plugins, and safe your files to the correct fileserver.
Everything was divided into shows. so you'd cd /shows/$showname/$department/$shotnumber/$version and you'd magically be on the right fileserver. This was done via the magic of symlinks.
So everything was files, and if you wanted to exchange large amounts of data, you'd use files.
If anyone is interested I'll talk about the system they used to coordinate all the rendering (no it wasn't k8s)
So, the "farm" (the name given to all the machines that render everything) had 36k CPUs. I can't remember what the specs of the machines were but I think they were either 8 or ten core CPUs. Most of them were blade units, because that was the densest way to fit in that many CPUs into that sized space (The farm lived in the basement and consumed something like half a megawatt, I can't remember if that included aircon or not.)
Now, each machine on the farm was split into slots. From memory, the biggest slot was 8 cores, but you could request less.
THe farm ran "jobs" which were lots of commands strung together into a "direct acyclical graph" (DAG) for short. A node in the job could be as simple as "cd /show && mkdir dave/" or it could be a render. EAch stage in the job could have a dependency, either on a physical property, like amount of ram, or machine class (some CPUs were newer than others) a license to run renderman (a renderer from pixar) or some other expensive bit of software. It could also be dependent on a previous stage completing (so frame 44 can't render before frame 20 because it needs to reference something that frame 20 generates.)
All these commands are parcelled up into a single lump, using a "job description language" and sent to the scheduler.
Its the scheduler that works out where and when to place a command, on which machine. Now, the system that they used at the time was called alfred. The thing you need to know about alfred is that it's interface was written in something that looks like the athena widget set: http://appartager.free.fr/renderman/prman%2012.5/programming...
Alfred is old, as in, Single threaded, older than SSH old. The man page dates from 1995, and I suspect that its probably older still by a good 5 years.
However, despite being old, its still fast. It can dispatch jobs way quicker than k8s, even on an old shitty machine. But, we were pushing it a bit. I think we were sending something like 30k commands an hour through the thing. (ie, telling a machine to run a command, store the logs, capture the return code, pre-empt, reap, all that kinda jazz). We did have to run it on an overclocked workstation, as the main VM cluster wasn't quite fast enough in single threaded performance to keep up with demand.
WE had something like 800 artists in the building, all using the quaint athena interface.
There was a cgroups wrapper that was written to make sure that people couldn't take more ram than was allotted. We over subscribed CPU by something like 10-20%. If you went over your ram allocation, you'd get OOM'd. swapping ram between processes is expensive, swapping CPU is pretty much free (its not, but the penalty for running at 110% CPU is way less than paying the electricity for having more machines and undersubscribing.)
So why not K8s?
I'm sure some people do use it. But its not practical for batch processing like this for a number of reasons:
1) scaling past 500 nodes means that you loose a lot of network to message passing and state transfer
2) the scheduler isn't designed to have complex dependency trees (by default you can have a sidecar and thats about it really. You can create a service, but thats not really designed for ephemeral tasks)
3) the networking is batshit. (virtual networking is really not great for low latency high throughput stuff like NFS or some other file protocol)
What can you use?
If you're on AWS, Batch is good enough. Its not as fast, but it'll do. You'll need to write an interface to make complex job graphs though.
Implemented using semi-realistic Einstein equations. They do talk about implementing a fully realistic version in the paper, but the director deemed it visually too confusing for the audience, so you only get to see a dumbed-down version in the movie.
The geodesic equations are not Einstein's equations.
The code for Interstellar only had to ray trace on a fixed analytical background spacetime (Kerr metric),
rather than solve partial differential equations for the metric itself.
Since the digital movie camera used to film the actors relies on the photoelectric effect to turn light into pixel data, you could say Einstein’s equations were used for the live-action portion too!
Yes, it’s an even larger format than traditional 70mm. The cells are physically larger. Anecdote: Nolan worked with Kodak to invent black & white IMAX film for Oppenheimer.
There was a pretty good talk[1] on this at SIGGRAPH 2015. Speakers included later Nobel prize winner Kip Thorne.
I tried to find any recordings but had no luck. (Presumably it's on the conference DVD)
New Fermi estimation interview question just dropped: are there more movies where Matt Damon is stuck on a planet, or movies where Tom Hanks plays a captain?
TLDR
“A typical IMAX image has 23 million pixels, and for Interstellar we had to generate many thousand images, so DNGR had to be very efficient. It has 40,000 lines of C++ code and runs across Double Negative’s Linux-based render-farm. Depending on the degree of gravitational lensing in an image, it typically takes from 30 minutes to several hours running on 10 CPU cores to create a single IMAX image. Our London render-farm comprises 1633 Dell-M620 blade servers; each blade has two 10-core E5-2680 Intel Xeon CPUs with 156GB RAM. During production of Interstellar, several hundred of these were typically being used by our DNGR code.“
GPUs are almost irrelevant for VFX work at this scale due to the memory requirements. The render nodes used for Interstellar had 156GB of RAM, and a decade later the biggest GPUs still don't have that much memory (unless you count Macs I suppose but the existing software ecosystem is very CUDA-centric).
Small VFX shops do typically render on GPUs nowadays, but the high end is still dominated by CPU rendering.
That goes for all the biggest players - Pixar, WDAS, Dreamworks, ILM, WETA, Framestore... all do their final rendering on CPUs. Some of them have adopted a hybrid workflow where artists can use GPU rendering to quickly iterate on small slices of data though.
I was thinking of their graphics cards which currently max out at 48GB, but true, their HPC accelerators do have a lot more memory now. In the case of Nvidia they would be a trade-off for rendering though since their HPC chips lack the raytracing acceleration hardware which their graphics chips have.
Besides, in the current climate I don't think VFX companies would be eager to bid against AI companies for access to H100/H200s, the VFX companies don't have infinite venture capital goldrush money to burn...
Its a particle simulation, so a lot of it is memory management.
I can't remember the actual size of the simulation in terms of number of particles, but it was using something like 2 150TB fileservers to store it on.
With NVME and cheap ram, it would probably at the point where it'd be useful on a GPU.
It is a pity they didn't spend comparable effort on the story. I thought it was a really disappointing film, especially after they banged on so much about how good the physics in it was (apart from the black hole rendering, it wasn't).
Did we watch the same movie? They got bunch of physics stuff right, like the water planet, the relative aging due to being near the massive black hole etc. Only thing that wasn't "correct" was the wormhole and especially the tesseract, but that was some creative writing which allowed for a nice twist. I enjoyed it, a nice scifi.
Cooper sobbing as he listened to years of messages from his family? I wish I could've written that. The twist where Mann activated his beacon because he didn't want to die alone? Solid.
We need the gravity equation to get off Earth because the crops are dying? Love can escape a black hole through the tesseract? This is just patent technobabble nonsense.
Where _Interstellar_ fails is in exposition. Love is the most powerful force in the universe? The Fifth Element pulled that off by not pretending to be serious. Anything is permitted in science fiction, but once you commit to explaining it, you need to make sense.
The love angle actually makes sense here. There's no technological way to do what needed to be done: find a willing recipient amongst strangers for a message from the impossible. Only because Murph shared a bond she was able to correctly bet on that there was a message and find out what it was. In something everyone else would not even consider.
Nothing is forbidden, until you try to explain it. I come from the perspective that genre is a contract with the audience. Here _Interstellar_ is making a genre error.
Take _Star Wars_ for example. You don't need to know how the Force works. There's no suspension of disbelief because you've bought into the genre and its tropes. The Force is everywhere, Luke is atuned to it. We don't need to know anything else mechanixally. Then they rolled up with midichlorians and reduced the whole thing to jibberish.
_Interstellar_ sets a hard sci-fi expectation with a visually accurate black hole before losing the train of thought. Imagine if _Lord of the Rings_ spent 20 minutes discussing how the plate tectonics of Middle Earth gave rise to Mount Doom before veering off into the history of the eagles. It gives your audience whiplash.
I'd argue that Interstellar is defying your expectations, but it's hard sci-fi all the way. At no point happens something that's proven to be impossible, and the future is, well, ever unknown to us, and to the viewer.
Given enough time and resources, you'd be right, this would be sufficient. But remember the beginning of the movie - they explicitly stated that every year fewer crops could be harvested, which meant that most people had to work in food production, no matter what they did before. It was also stated that the population wouldn't accept spending going to interplanetary exploration instead of food production - they'd rather survive a little longer than to try and fulfill an impossible goal, or just to have a couple of people survive on ships/stations for a couple years longer, without any hope for long-term survival.
Under these circumstances (only a couple years left until starvation begins & no existing industry) it seems absolutely realistic that the government is able to throw together one hailmary in secret, without also being able to scale production up to save most people before they starve. Remember, the goal of the mission wasn't to measure the data they need to finish their equations, it was to send out one single ship that could continue human civilization on an inhabitable planet.
Had the future humans not intervened (by placing the black hole near inhabitable planets), humanity on earth would have died.
> Had the future humans not intervened (by placing the black hole near inhabitable planets), humanity on earth would have died.
Not only a black hole, a supermassive Kerr black hole with relativistic rotation at its equator.
They could have done a better job at the planets themselves though - Most of them were pretty useless for terraforming.
While I'm sure the US government would trade immediate comfort for long-term survival of the species, I don't think other governments would have the same priorities. I'd assume China would be one of the first to build multiple O'Neil cylinders with carefully controlled environments and as little Earth-originated biomass as possible (to contain the blight).
Their starship was bullshit. On earth they needed a big Saturn 5 like rocket to bring it into space and later they fly into orbit like it's nothing.
If the planets they visited had much lower gravity then earth this may be possible, but this wasn't noticeable or talked about. And even on Mars you need much more fuel to get into orbit than could be stored in their tiny ship.
To expand on this, if they tried to land on the water planet where time so dilated that one hour equals 7 months… what would their velocity have been when they contacted the surface? And how much energy did their spaceship need to reach escape velocity from there?
Well, if you look for this, many if not most movies will have some inaccuracy or holes in explanation like this. In the end, I watch movies for entertainment. If I'm looking for scientific precision, I see a documentary instead.
Eh, this isn't actually that bad. It's quite easy, using a jet aircraft, to (via a ballistic trajectory) get into space. Surrounded by reaction mass as you are, the rocket equation isn't nearly so bad. You could conceivably dock with something in orbit, if you had elastic tethers or something to make the acceleration survivable. If you're much lighter than what you're docking with, you won't knock the satellite out of orbit.
Heck, you could even ignore the jet aspect, and go for full rocket. (An ICBM only weighs around 50 tonnes, after all.) Getting the heavy thing into orbit, though, requires proper, multi-stage rockets.
That's the point at which the jet stops working, but by that point you can be going quite fast. If there isn't enough air to run the jet engine, there also isn't enough air to slow you down (much).
Assuming no air resistance and 10m/s^2 gravity. I calculate that you would have to be doing ~1,100m/s (~mach 3.2 at sea level) straight up at 37km to reach 100km. Or ~1,600m/s (~mach 4.7 at sea level) at 45 degrees to perpendicular.
Yeah, they could've used something simpler and less hand-wavy. Irreversible climate change or a freaking asteroid on a collision course with earth and they needed the black hole data to build something to deviate it. Literally anything else would've worked better.
I think the blight was chosen as a metaphor for the virus of defeatism and anti-scientific attitude. To which the movie proposes (and arguably wants to be) a cure, made of optimistic sci-fi, cool science and, of course, love.
The issue on the literal level of the narrative is that if you have no way to control the infection on earth, there's no reason to suppose you won't bring it with you to the next planet.
We're talking about a movie in which a space agency builds an enormous rocket entirely in secret and without any suppliers; then enrolls the astronaut at the head of the mission on the same day of the departure; and finally launches the rocket from inside its office halls. ¯\_(ツ)_/¯
It also hard to believe that being close enough to a giant black hole that it causes massive time dilation, doesn't seem to cause any other issues beyond large waves.
100% agree, like you I was hyped after the pre-release teaser PR stuff came out about the realistic black hole rendering etc, was hoping for an actually decent sci-fi
and then just a lame story culminating in "space ghosts"
Writing numerical solvers isn't that inspiring by itself to be honest. But I agree that the subject is fascinating. I suppose the code is not public so we could see ourselves, is it?
From what I recall, I don't think that it was running on that many machines at once. Mainly because it required the high memory nodes that were expensive. I think it was only running on ~10 possibly 50 machines concurrently. But I could be wrong.
What it did have was at least one dedicated fileserver though. Each of the file servers at the time were some dual proc dell 1u thing with as much ram as you could stuff in them at the time (384 gigs I think). They were attached by SAS to a single 60 drive 4u raid array. (Dell PowerVault MD3460 or something along those lines. They are rebadged by Dell and were the first practical hotswap enclosure that took normal 3.5" SAS drives, that didn't cost the earth)
The array was formatted into 4 raid6 groups, and LVM'd together on the server. it was then shared out by NFS over bonded 10gig links.
Anyway. That simulation totally fucked the disks in the array. By the time it finished (I think it was a 2 week run time) it had eaten something like 14 hard drives. Every time a new disk was inserted, another would start to fail. It was so close to fucking up the whole time.
I had thought that the simulation was a plugin for houdini, or one of the other fluid simulation engines we had kicking around, rather than a custom 40k C++ program.