My impression is supercomputers exist mainly for incredibly intensive large-scale simulation/calculation that can't be subdivided into parts -- e.g. weather or nuclear explosion simulation.
3D map processing feels like the literal opposite of that -- trivially parallelizable, with a much higher ratio of data/IO to calculation.
Are they running out of tasks for Blue Waters to do, or trying to find a high-profile project to justify it politically or something? I really can't imagine for the life of me why you wouldn't just run this in any enterprise-grade cloud.
I suspect that the mismatch is worse nowadays. Although software and interconnects have improved, core counts and node counts have gone up even faster.
IMO simulation-guided research would probably have gone faster at the lab if the money for the top-10 system had been spent on a bunch of smaller clusters with less exotic hardware, divvied up according to the actual lines of research scientists were pursuing. But there's prestige and budgetary room for a new Grand Challenge system that may not be there for a bunch of more affordable, less exotic systems. And once in a while somebody does have a job that only runs well on the big machine.
This is also why I don't much worry about China building systems that rank higher on the Top 500 than American systems. Until Chinese research groups start churning out Gordon Bell Prize-winning software to go with the giant systems, they're probably just misallocating even more money than American labs.
EDIT: well that was arrogant and foolish of me to dismiss Chinese HPC. I looked up recent Gordon Bell Prize winners and Chinese researchers won in 2016 and 2017. It looks like they're making good progress in using those really big systems.
A supercomputer typically means that those thousands of cores are connected with fast and expensive interconnects so that the cores can communicate with low latency. A large portion of the budget is usually spent on this interconnect. If you have an embarrassingly parallel problem and you run it on a supercomputer then that expensive interconnect is sitting idle - you would get the same performance on AWS or a more standard compute cluster.
Today what is called a super computer is usually just a cluster (i.e. multiple connected normal spec computers). It is normally connected with a high speed interconnect though (100 Gbit/sec and more) what is its most defining capability.
Why they are using this cluster? My speculation is that probably because it is available and does not have much use for the real scientific computing (because it is old https://bluewaters.ncsa.illinois.edu/hardware-summary) and intelligence agency prefers to rather support the academia then feed some commercial entity.
That's his point. The high speed interconnect, the defining capability of the modern super computer, is unnecessary to solve the problem they are using a super computer to solve.
They could have equally well used BOINC or some other distributed, loosely coupled technology for this.
I agree with you regarding "it's available." It sounds like a press person got ahold of it and stopped paying attention once they saw the word "supercomputer".
That would be intense visual processing, stitching together various angles of photos to work out a terrain elevation.
They are not dealing with a 3D point cloud of elevation data.
Will HPC migrate towards the cloud? Maybe yes, but we need several major overhauls to tooling before that is anywhere close to happening.
Just think about how much work it would be today to configure a Packer image that has several MPI libraries, a scheduler like SLURM, various Python versions + required packages, C and Fortran compilers, BLAS/LAPACK/etc, VCS systems, integration with some sort of user authentication system including support for SSH login to each node as well as linked for each user to the accounting in the scheduling system, and to have confidence that it will be highly performant for the application you work with on the AWS allocation that you have requested. Not many people could pull that off in a reasonable amount of time, if at all.
I'm not sure how weather simulation work but I always wondered why they aren't performed cellular automaton style. Bottom up rather than top-down. At each time step a given cell state is computed based on the state of its neighbors at t-1. This should be parallelizable. I feel it's also how it works in the real world anyway.
The thing is: you need the state at t-1 of all your neighbors. Then you can do a small timestep to get from t-1 to time t. And then you need the NEW state of your neighbors. That requires a fast interconnect. Which HPC machines have, unlike most clouds or commodity clusters.
In other words, yes it is parallelizable, but not trivially so, because the different grid points are coupled.
I did my PhD in physics simulations (molecular dynamics), and have the same problem there. I tried running these simulations in Google Cloud without any good performance results due to high latency (compared to HPC). I’m no GC expert though, so should be possible to improve what I did.
Available on Netflix for example:
Their referenced Arctic DEM gives a 2m resolution however on accuracy notes 'Without ground control points absolute accuracy is approximately 4 meters in horizontal and vertical planes'.
This is much better than most global data (SRTM and ASTER both at 30m resolution)
However it is not as high resolution as many existing free models for individual parts of the globe. As an example here in Australia I can get free 1m resolution DEMs of cities with accuracy noted at "0.3m (95% Confidence Interval) vertical and 0.8m (95% Confidence Interval) horizontal".
Basically fixes the problems with SRTM, but you're still left with 30m resolution.
You can read more about ArcticDEM here: https://www.pgc.umn.edu/data/arcticdem/
Seems like EarthDEM is the same. So all the advantages and disadvantages that come with stereo photogrammetry. Very high resolution, but potential for inaccuracy and quite difficult to check. Probably they used SRTM or something similar to validate their models.
>IceSAT altimetry data points are used to improve the vertical accuracy of both the DEM strips and mosaic files. IceSAT data points are filtered to exclude points in areas of high relief and over hydrographic features. Additional filtering is applied to remove altimetry points collected outside the temporal window of the source imagery acquisition date.
An xyz translation is calculated for each strip and the offset is added to the metadata file. The individual DEM strips are not translated before distribution. Users can apply their own corrections to the strip if they do not agree with the one originally provided.
Where available, additional control information such as LiDAR or surveyed GPS points have been applied.
They've changed quite a bit it since I last used it for my CS senior project a few years ago, but figured out you can see DEM segments for download by clicking the icon with the popup name "Layer List" then click "3DEP Elevation - Index", then "DEM Product Index". You will need a GIS program like QGIS or ArcGIS to view and raw DEM data.
However, it clearly doesn't do well with water, lots of "fake" terrain going on where the ocean meets the land.
To prove my point, here is some art I made  from open data (I think NASA) with Blender and QGIS, meaning free software stack on very much a normal computer. My model was the state Schleswig Holstein, a german county, but as said, you can find data for pretty much everything. The resolution is not astonishing, but enough to spot Germany's only high sea island, the "Wattenmeer" where tide causes some spots to lie below water level on average, the mouth of the Elbe and more cool stuff when you know where to look.
My point isn't to downplay the effort of EarthDEM, I just want to make more people aware of what is already very much out there :)
Nope, resolution is the problem, not scale. You need low orbit stereo photos, like from an airplane. Without clouds. With satellite you only get 30m, with airplanes you get under 1m. Cities usually rent a plane a year to adjust their maps. Countries don't, they usually just rely on the cheap satellite photos.
Depends on the country. There are places with "mountains next to open grass lands next to coastline next to forests within a few miles of each other" (where few, eg. 100 miles). The world is not all like Utah US50 or whatever huge expanse.
But even so,
(1) there's something exciting on its own for driving in real-mapped as opposed to some fake designed terrain
(2) for certain games, it can main point (e.g. I can imagine a "route 66 hot rod race" or a "drive Monaco Rally" (and of course things like MS Flight Simulator, combat games, etc)
(3) There's absolutely no reason why a game couldn't use real world data and cherry pick different terrains for variability from those...
Self driving cars can't (and don't) rely on maps to prevent injury or death. They only use to know how to go from A to B in the most general way -- and they'd still need to check for bypasses, closed roads, etc... (on top of the real-time, processing of obstacles, vehicles, lanes, traffic lights, weather, and so on).
Humans are creators. Humans build things larger than themselves.
Nothing is perfect. Choose what to focus on.
Allows you to fetch real world height data for use in Cities:Skylines (and any other game that uses heightmaps, pretty much).
edit - Can build scenery off of OpenStreetMap data http://wiki.flightgear.org/Osm2city.py
But the question is what heavy processing computer doesn't have IO issues? Also, Blue Waters isn't really a GPU super computer like Summit and Sierra are (or the up coming Aurora and Frontier). It has 4228 nodes (out of ~27000 nodes) that have GPUs on them, and they only have one, and they are Keplers. Those aren't great GPUs and aren't going to do very well in parallel either. There's a big bottleneck in GPU IO. I think this program will not be utilizing GPUs very heavily. Worse, they don't have many CPUs per node. It's 8-16 cores and 32-64 GB memory per node. There's going to be a lot of time spent in communication.
I'll admit that BW doesn't seem like the best computer to the job, but you use what you got. I'll buy the argument that this is the wrong computer for the specific job, but what would you use besides a super computer? (I think Summit would be a good computer for this job)
You may find especially interesting an article I published that used an embarassingly parallel computing system that I built which ran on Google's internal infrastructure (not a supercomputer) in response to my codes not running well on supercomputers.
If you wish to have a substantive discussion about my statements (rather than flinging insults), I'd be happy to.
Getting chunk edges to align might be tricky, but the usual solution to that is to stagger the chunks and throw away the edges.
I strongly suspect that they don't need a supercomputer, but they had one available and it sounds cool. I sit one building over from the ~13th largest supercomputer in the world and I've definitely used it when I needed a bit more computing power than my laptop could provide. I used the supercomputer because I could, not because I needed to.
I've worked on supercomputers for years and generally, one could not run code at scale unless it scaled- in terms of parallel performance- and also used the network in a non-trivial way. Supercomputers aren't just speedups for normal codes (I'm pretty sure you know this).
The real issue with modern supercomputers is that basically none of them have decent disk IO. That's what differs between modern Internet/cloud clusters and supercomputers. Cloud clusters emphasize very high connectivity between durable storage and worker nodes. None of the supercomputers have a decent disk IO stack (mostly GPFS and Lustre) and this ends up being no end of pain for applicaiton developers. The only recent improvement in this area was "burst buffers", but that's really just accelerated data staging.
It's useful for addition information to build a map, but it's not a map.
So I would guess that DMA had a 3D terrain map around 1989.
I know it was used for cruise missile flight planning.
No idea what the resolution was.
Even with our very detailed underwater mapping the Navy still has areas that are not well covered.
For example: Go to Google Maps, satellite view. Then zoom in and rotate with Ctrl + click & drag.
Microsoft and Mapbox also have an experimental project where computer vision on cars maps the location of recognized objects. This is probably true of every company researching self-driving cars.
Tech: We need these maps for our cruise missiles.
For any place in at least the western world there almost certainly exists better (and sometimes open) data than this project will provide. The big win for this project isn't that the data will always be the best available, but that it will be a single source of open data of consistent quality and format for the entire earth.