Instead, and to my joy, it was a well-reasoned essay with good, solid points.
My only quibble is that Charm++ is not "a framework for particle simulation methods". While the molecular dynamics program NAMD has been using it for 20 years, which is why I know of Charm++, it wasn't designed specifically for particle simulation methods, nor is restricted to that topic. Quoting from http://charm.cs.illinois.edu/newPapers/08-09/paper.pdf :
> NAMD, from our oldest collaboration, is a program for biomolecular modeling ; OpenAtom is a Car-Parinello MD program used for simulation of electronic structure (in nanomaterials, as well as biophysics) ; ChaNGa, an astronomy code ; and RocStar, a code for simulating solid-propellant rockets, such as those in the space shuttle solid rocket booster
Great writeup though. I do think we need to get more people into the mindset that MPI won't be the standard in 10 years, otherwise it will still be the standard in 10 years.
The broader article deserves further though, and I hope to find time to respond. But it is clear that raising the level of abstraction beyond MPI is necessary.
It's YARN container system is flexible enough to run any JVM application. For example we use it to run an autoscaling ElasticSearch cluster alongside our Hadoop workloads. And we are actively investigating using it to run our Scala microservices.
I looked at YARN now. I've not heard of it before. It doesn't look like it has anything to do with the topic at hand. How would one build an explicit solver for a 1D diffusion equation, corresponding to the examples given in the "HPC is dying, ..." article, using YARN?
How do you do checkpointing so you can restart your 10 million atom simulation should there be a system fault after 2 weeks of run-time? (Checkpoints need about 220 MB; each atom has an x,y,z position as well as a vx,vy,vz velocity vector. Also, it needs to be at the same timestep across the entire distributed machine.)
Instead, it looks like YARN is designed for service-based components, where the components are relatively independent from each other, and where failure recovery is mostly a matter of starting a new service and resending the request.
If my understanding is correct, then it's certainly more capable than map-reduce. But not in a direction that's relevant for most current HPC.
It is analogous to a set of Docker containers distributed across nodes. The same methods you would use synchronize state in that situation you could use with YARN. For example using a persistent distributed system e.g. Hazelcast to handle system failures and checkpointing.
I am not saying this is some amazing solution to every HPC problem only that Hadoop is far, far more flexible than many people give it credit for.
Parts of my simulation are out of phase. I need some gather step to collect the data from individual nodes, when a given timestep is reached, and save the state. A simple solution is to do a barrier every ~30 minutes, send to the master node, and have it save the data.
When I look at Hazelcast I see what looks to be a different sort of clustering - using clusters for redundancy, and not for CPU power. Eg, I see "Hazelcast keeps the backup of each data entry on multiple nodes", and I think "I don't care." If a node goes down, the system goes down, and I restart from a checkpoint. It's much more likely that one of the 512 compute nodes will go down than some database node.
I'll withdraw my original statement that "A map-reduce system like Hadoop" and say simply "a system like Hadoop isn't a good fit for HPC problems".
Here's a lovely essay which agrees with me ;) http://glennklockwood.blogspot.com.au/2014/05/hadoops-uncomf... . It considers the questions:
> Why does Hadoop remain at the fringe of high-performance computing, and what will it take for it to be a serious solution in HPC?
Sorry, I'm not involved in HPC at all. I know a little bit about Hadoop. I'm mostly interested in building online message processing and blended real-time/historical analytics. Our problem domain wouldn't want to lose all capacity if part of the system became unavailable.
First, the simulation can be set up to match the hardware. One simulation program I used expected that the nodes would be set up in a ring, so that messages between i and (i+1)%N were cheap. It ran on hardware with two network ports, one forwards and one backwards in the ring. In fact, the only way to talk between non-neighbors was to forwards through the neighbors.
If a node goes down, then the ring is broken, and the entire system goes down.
This is very different than a cluster with point-to-point communications, where a router can redirect a message to a backup node should one of the main nodes go down.
The reason for this architecture is that there's a lot of inter-node traffic. When I was working on this topic back in the 1990s, we were network limited until we switched to fiber optic/ATM. When you read about HPC you'll hear a lot about high-speed interconnects, and using DMA-based communication instead of TCP for higher performance. All of this is to reduce network contention.
Suppose there's 1GB/s of network traffic for each node. (High-end clusters use InfiniBand to get this performance.) In order to have a backup handy, all of that data for each node needs to be replicated somewhere. That's more network traffic. Presumable there are many fewer spare nodes than real nodes, since otherwise that's a lot of expensive hardware that's only rarely used. If there are 512 real nodes and 1 backup node, than that backup node has to handle 512GB/second. Of course, the backup node can die, so you really want to have several nodes, each with a huge amount of bandwidth.
Even then, the messages only exchange part of the state data. For example, in a spatial decomposition, each node might handle (say) 1,000 cells of a larger grid. The contents of a cell can interact with each other, and with the contents of its neighbor cells, up to some small radius away. (For simplicity, assume the radius is only one cell away, so there are 26 neighbors for each cell.)
If one node hosts one cell and another node hosts another then at each step they will have to exchange cell contents, in order to compute the interactions. This requires network overhead.
On the other hand, a good spatial decomposition will minimize the amount of network traffic by putting most neighbors on the same machine. After all, memory bandwidth is higher than network, and doesn't have the same contention issues.
But this means that the node has mutating state which isn't easily observed by recording and replaying the network. Instead, the backup node needs to get a complete state update of the entire system.
This is a checkpoint. But notice that I used a spatial decomposition to minimize network usage by not sending all of the data all of the time? I've thrown that out of the window. Now I need to checkpoint all of the time, and have the ability to replay the network requests that the node is involved in, should it go down.
This is complicated, and will likely exceed what the hardware can do, given that it's already using high-end hardware for the normal operations.
For our domain, we'll gladly accept the increased network cost and node redundancy for durability because most of work ends up being not involved with other nodes (most of our computations can occur wherever the data is stored and mutations, aside from append, are infrequent).
Thank you for giving me some context.
The capitalization and apposition makes it pretty explicit.
I was at a talk of his last year and there are a number of fault tolerant MP algorithms being drawn in. MPI hasn't been updated in ages, I don't think that necessarily means we need to ditch it, it just means the standard needs to be modernized. I don't feel very strongly about this since working with MPI is a huge pain in the ass and it seems like the challenge of modernizing it is just gargantuan.
Also I'm not familiar with spark, but isn't Chapel a decade old at this point and barely works at all? I tried their compiler last summer and it took 5 minutes to compile hello world, hopefully its improving.
For Chapel, it depends on what you count; it very heavily borrows from ZPL, which is much older, but Chapel itself was only released in 2009. It is already competitive with MPI in performance in simple cases, while operating at a much higher level of abstraction. Whether Chapel, or Spark, are the right answers in the long term, I don't know; but there's a tonne of other options out there that are worth exploring.
I think a large part of the inertia behind MPI is legacy code. Often the most complex part of HPC scientific codes is the parallel portion and the abstractions required to perform them (halo decomposition etc). I can't imagine there are too many grad students out there who are eager to re-write a scientific code in a new language that is unproven and requires developing a skill set that is not yet useful in industry (who in industry has ever heard of Chapel or Spark??). Not to mention that re-writing legacy codes means you're delaying from getting results. Its just a terrible situation to be in.
I think Spark will totally displace map-reduce in the next 12 months (because it's got map reduce in it, but in memory).
Chapel's made by Cray. If what you're saying is true then Cray's not done a very good job of advertising Chapel. God knows they have the capability to advertise properly.
It's also pretty easy to see how UPC or co-array fortran (which is part of the standard now, so isn't going anywhere any time soon) would work. They'd fall closer to MPI in complexity and performance.
You couldn't plausibly do big 3d simulations in Spark today; that's way outside of what it was designed for. Now analysing the results, esp of a suite of runs, that might be interesting.
Scala is 12 years old, Go is 6 years old and Clojure is 8 years old.
Chapel may not be an appropriate replacement for all MPI programs, but it can be used for some programs today.
I really wish people would give Fortran a second chance. It has come a long way from the ancient, all-caps days.
Even in F08 there's a lot of backwards-compatibility cruft still left in the language, too. The IO model still provides very little abstraction and is based on tape drives. You can/have to "rewind" files. There are obscure "unit descriptors" that manifest themselves as integer literals in most code posted online which makes it a chore to learn from. As far as I can tell there is no functionality that approximates the behaviour of C++'s streams.
It's fast as hell, and the GNU compiler is mature and well-developed, but Fortran remains a horrid language for doing any sort of interactive programming. It's best used if you just give it some arguments, let it run free, and then have it return some object or value that a more sane language can then interpret and present to the user for a decision.
There is little reason to learn a language where the only sane choice for doing input/output involves calling your Fortran module from a python script and letting the python handle i/o.
This isn't necessarily a Fortran-specific thing. The standard C library includes a rewind(fd) function, equivalent to lseek(fd, 0, SEEK_SET).
Fortran might be used more widely, like C++ is, if it wasn't so awful for doing things other than shitting out numbers at insane speeds.
Well, you can think of a "unit descriptor" (or somewhat more Fortranny, "file unit number") as something roughly equivalent to a POSIX file descriptor, which is also an integer. The problem, as you allude to, is that classically unit numbers were assigned by the programmer rather than the OS or runtime library, so you could end up with clashes e.g. if you used two libraries which both wanted to do I/O on, say, unit=10. Modern Fortran has a solution to this, though, in the NEWUNIT= specifier, where the runtime library assigns a unique unit number.
> As far as I can tell there is no functionality that approximates the behaviour of C++'s streams.
As of Fortran 2003, there is ACCESS="stream", which is a record-less file similar to what common operating systems and programming languages nowadays provide.
> It's fast as hell, and the GNU compiler is mature and well-developed, but Fortran remains a horrid language for doing any sort of interactive programming.
Personally, I'm hoping for Julia to succeed, but we'll see..
Modern Perl is pretty nice. 90s Perl still not so good.
This is how the modern FORTRAN Hello world looks like...
print *, "Hello World!"
end program hello
CPU clock speeds maxed out between 3-4GHz a decade ago. Nobody develops special supercomputing CPUs any more. The market is tiny. Old supercomputer guys reminisce about the glory days when IBM, Cray, Control Data, and UNIVAC devoted their best R&D efforts to supercomputers. That ended 30 years ago.
Supercomputers have poor price-performance. Grosch's Law  stopped working a long time ago. Maximum price/performance today is achieved with racks of midrange CPUs, which is why that's what every commercial data center has. Now everybody has to deal with clusters of machines. So cluster interconnection has become mainstream, not the province of supercomputing.
It is instead arguing that traditional HPC is being made irrelevant because traditional HPC uses MPI (the first successful distributed/parallel computing library), which is increasingly irrelevant in favor of newer libraries for the same task.
"MPI is a language-independent communications protocol used to program parallel computers."
Runs fine on commodity clusters.
Kind of.... For simple, low communication jobs this is true. But when you start trying to find the eigenvectors of a large sparse matrix, communication becomes your bottleneck, at which point MPI on commodity clusters (those without a really fancy interconnect) "works", but not fast enough to be useful.
That is a commodity cluster.
It's title literally is: "HPC is dying, and MPI is killing it".
His comment shows more understanding of the article's main point (about the demise of HPC) that your "it's about MPI"...
CPUs, err, schmee-PUs. It's all about the interconnect and people can and do make special interconnects.
If so, that could be interpreted as simply a 4D square grid which wraps around the edges, right? (just as a 3D torus is a 2D grid which wraps)
Also here's an image... which I admit is not terribly useful, but its what the national lab people put out.
Even for x86 this isn't true (4+GHz is at least possible), let alone platforms like POWER which have already pushed beyond 5GHz. Fancier things like vacuum-channel transistors, graphene transistors, etc. could push that even further once they break into commercial viability.
Not that clock speed alone really matters all that much compared to the other performance benefits of high-performance RISC architectures like POWER and SPARC...
> Nobody develops special supercomputing CPUs any more.
Today I learned that Blue Gene was a figment of my imagination :)
Special supercomputing CPUs are still being developed. The reason why they seem insignificant is because their market size has remained relatively constant, while the markets for general-purpose, non-supercomputing-specific platforms have grown much more rapidly. This doesn't mean supercomputing is dead necessarily, just like how the invention of the microwave oven doesn't mean that ordinary ovens are suddenly dead. Rather, it's just an indicator of different use cases, and the different markets thereof.
> The top 10 are all Government operations.
It's a bit misleading (though I suppose technically accurate) to list academic institutions (like the University of Texas, which holds the #7 spot) as "Government operations"; they're government-funded, yes, but there's a big difference between that and, say, an actual government agency directly managing such an installation. I also fail to see how even a majority of those being government installations has anything to do with anything; governments typically have much greater capital to spend on such things - and greater need for such things - than all but the most massive commercial entities.
HPC was never really the purview of commercial enterprises anyway (unless they had extreme computational requirements). The uptick in the use of COTS products for high-performance computing among enterprises (particularly big Internet-reliant ones like Google) wasn't really at the expense of the HPC crowd losing potential users; it's rather just something that formed very recently alongside HPC already being a niche topic.
Basically, by your arguments, "high-performance computing" has been dying for basically as long as it's existed.
> Grosch's Law  stopped working a long time ago.
Only because the world switched to clustering, where Grosch's Law doesn't quite apply, and hasn't addressed the limitations of current transistor technology (like the above-mentioned vacuum-channel and graphene transistor technologies, among many others).
> Maximum price/performance today is achieved with racks of midrange CPUs, which is why that's what every commercial data center has.
That's what "every commercial data center has" (this isn't exactly true, but we'll go with it for now) more because of price alone than because of an actually-calculated price/performance ratio. Businesses tend to think in terms of short-term investments much easier than they tend to think in terms of long-term investments (in contrast with academic and often government institutions, which tend to think in the opposite direction, and therefore have entirely different sets of problems in many cases).
Meanwhile, the big businesses that really do actively calculate an optimal price/performance ratio (like Google) aren't the ones using COTS solutions; they usually have the financial capability to invest in homegrown solutions and cut out any unnecessary expense, and are certainly not just buying a bunch of prebuilt servers from Dell. Google in particular has started to invest heavily in IBM's Open POWER initiative, probably due to a perception that POWER will offer a better price/performance ratio than x86 in their already-very-customized hardware stack.
Some of the sparse matrix computations in structural mechanics and in some machine learning algorithms have some overlap. But mostly, group 2 has little reason to be interested in what group 1 is doing.
Now, group 2 obviously has more modern tools than the 3d-simulation community, because machine learning came to common use much later that numerical fluid mechanics.
But do 3d-simulation people also have much reason to be interested in what the machine learning people are doing?
The "machine learning / big data" people are probably not doing anything that makes a weather prediction model to run faster? Or are they?
In terms of absolute performance HPC is absolutely faster. In terms of bangs for bucks, Big Data is hands down faster. Also in terms of accessibility Big Data is hugely easier - I can build you a 100 core big data system for $300k
My point is, the big data and the physics simulation people probably do not have a lot of common interests - besides using large amounts of computing power.
Also I think the dichotomy you're looking for is IO bound vs CPU bound problems. Although certainly there are a plethora of different kinds of IO bound problems (asynch vs synch or disk bound vs memory bound vs cache bound).
The hardware is different in terms of the layout. Aggregations of small cores on boards (gpus) vs. very high speed large cores with lots of local memory. Highly localised connections vs. an interconnect fabric.
And it is more accessible because it's affordable, and you can get at it in the cloud; this means that skills building is easier for more people and it also means that a wider user base is possible.
Anyway, for a typical HPC cluster, it's bog standard x86 hardware, the only remotely exotic thing is the Infiniband network. Common wisdom says that since Infiniband is a niche technology, it's hugely expensive, but strangely(?) it seems to have (MUCH!) better bang per buck than ethernet. A 36-port FDR IB (56 Gb/s) switch has a list price of around $10k, whereas a quick search seems to suggest a 48-port 10GbE switch has a list price of around $15k. So the per-port price is roughly in the same ballpark, but IB gives you >5 times better bandwidth and 2 orders of magnitude lower MPI latency. Another advantage is that IB supports multipathing, so you can build high bisection bandwidth networks (all the way to fully non-blocking) without needing $$$ uber-switches on the spine.
The GPU thing seems to have fallen out of my original comment, I meant to write "I can build you a 100,000 core system for $300k" but some how the decimal point jumped left three times! To do that I would definitely have to use GPU's...
I am seriously lusting after such a device, I feel that there is much to be done.
If you use processes such as in the Erlang VM, they're doing calculations, sure, but they're also sending messages back and forth, and they're acting as supervisors, and they're being shuffled around by the VM. There's a lot going on. And that extra stuff that's going on takes away from the time you could be multiplying stuff. And even then, there's been no optimization done for this sort of calculation. There are a lot of tricks you can do. Heck, the better matrix multiplication libraries have individual optimizations for CPUs.
While I have been using C++/Rcpp to extend R as an occasional time saver (for analysis rather than simulations), it's only been little snippets written badly.
Anyway, since its so easy to come up with a too long list of technologies to learn, then only scratch the surface, hearing that Erlang/Elixir isn't the completely wrong tool is helpful.
How much better that is than a C++ version, I don't know. But if you want to get better at C++ and give it a try, forget what you know and read this book: http://www.amazon.com/Programming-Principles-Practice-Using-...
Modern Hadoop ecosystem is designed for different workload from MPI's. It emphasizes co-localizing date and computation, seamless robustness,and trades off raw power for simple programmingmodels. MapReduce turns out too simple, so Spark implements graph execution, which is nothing new to HPC. As far I know Spark's authors don't believe it is ready for distributed numerical linear algebra yet. But a counterpoint is that I am seeing machine learning libraries using Spark, so perhaps things are improving.
One thing I have learnt today is that MPI isn't gaining popularity. I just have a hard time picturing a JVM language in overall control in HPC where precise control of memory is paramount to performance.
The beauty of mpi is:
* its definition is completely open
* it segregate the high level message passing interface from the low level stuff
I don't know if Spark itself is the right way forward; but it's an example of a very productive high-level language for certain forms of distributed memory computing. And some of these issues - like the JVM - aren't fundamental to Spark's approach; there's no inherent reason why something similar couldn't be built based on C++ or the like.
In particular, a decent garbage collector will give you performance dependent on the number of live objects (typically low) and not on the number of allocations and deallocations, as you might see in a non-garbage collected language. This gives great allocation performance and reduces overheads.
The disadvantages can be (potentially long) GC pauses and higher overall memory requirements, but in practice this isn't usually a problem for non-interactive systems.
Of course, if you do have a device with low memory and low tolerance for GC pauses (i.e. mobile gaming) there might be a problem.
The main disadvantage seems to be less predictable performance; which could be a problem in domains which require good timing performance, but that's not really Spark's problem.
A GC'd language is also generally easier to program in, since one doesn't have to (in general) worry about memory management, so it's generally a lot easier to program very large systems with lots of moving parts.
BTW, thanks for a thought provoking article. You have given me a lot to ponder.
I'm not familiar with the HPC space but I thought a lot of new work, at least in machine learning, was migrating to GPGPU instead of traditional CPUs. The compute per $ or per watt payoff is too large to ignore.
But it was a pain, especially since our code was a mix of c (which was easy to mpi) and ada (not so easy). Its pretty low level stuff (I think we used Open MPI). All the nodes need to have MPI set up and configured, fine if you have a team willing to do it but these days....
mpiexec -n 10 myprocess
I think we liked it because the processes would be put to sleep by the mpi daemon until a message arrived. You can sleep and wait for a message with sockets now I think. Its been a while since I've used the unix IPC (Interprocess communication).
I don't think I'll miss it.
I built a tiny cluster in my basement (a prototype) and looked at MPI and decided that it was way to complicated, so I just built something that pushes the essential bits, pretty much without abstraction, to the nodes and was done. The cluster is a very specific solution, so I felt justified in not looking at MPI. And now the decision feels even more justified.
Fortran, MPI, even C to an extent -- can we please move on? I don't understand why the scientific community is so reluctant to embrace change. It seriously doesn't take that long to learn a new language or a platform like Github (yeah, that's still considered "new" in the scientific community), and the time investment more than pays itself back many times over.
Let's assume the opposite were true, and it was fast to embrace change. How much time would be spent on this change -- relearning, rewriting, refighting old bugs -- vs. actual work done?
Change is overhead. You do as little of it as necessary, and only when not changing starts costing a lot. Which means you change, but slowly.
As to Fortran, it will go away when something better comes along, and then it will do so slowly, for the aforementioned reasons.
Four out the ten computers are owned by DOE. That's a pretty significant investment, so they're going to be reluctant to change over to a different system. And, to be clear, a different software setup could be used on these systems, but they were almost certainly purchased with the idea that their existing MPI codes would work well on them. Hell, MPICH was partially authored by Argonne:
so they've a vested interest in seeing this community stay consistent.
Now, on the technical merits, is it possible to do better? Of course. That being said, part of the reason that DOE invested so heavily in this infrastructure is that they often solve physics based problems based on PDE formulations. Here, we're basically using either a finite element, finite difference, or finite volume based method and it turns out that there's quite a bit of experience writing these codes with MPI. Certainly, GPUs have made a big impact on things like finite difference codes, but you still have to distribute data for these problems across a cluster of computers because they require too much memory to store locally. Right now, this can be done in a moderately straight forward way with MPI. Well, more specifically, people end up using DOE libraries like PETSc or Trilinos to do this for them and they're based on MPI. It's not perfect, but it works and scales well. Thus far, I've not seen anything that improves upon this enough to convince these teams to abandon their MPI infrastructure.
Again, this is not to say that this setup is perfect. I also believe that this setup has caused a certain amount of stagnation (read huge amount) in the HPC community and that's bad. However, in order to convince DOE that there's something better than MPI, someone has to put together some scalable codes that vastly outperform (or are vastly easier to use, code, or maintain) the problems that they care about. Very specifically, these are PDE discretizations of continuum mechanics based problems using either finite different, finite element, or finite volume methods in 3D. The 1-D diffusion problem in the article is nice, but 3-D is a pain in the ass, everyone knows it, and you can not get even a casual glance shy of 3-D problems. That sucks and is not fair, but that's the reality of the community.
By the way, the oil industry basically mirrors the sentiment of DOE as well. They're huge consumers of the same technology and the same sort of problems. If someone is curious, check out reverse time migration or full wave inversion. There are billions of dollars tied up in these two problems and they have a huge amount of MPI code. If someone can solve these problems better using a new technology, there's a huge amount of money in it. So far, no one has done it because that's a huge investment and hard.
I was a Smalltalk coder. I thought it was the best thing since sliced bread. It has always been clear to me that Smalltalk is far superior to Java.
I left the company after a little while, to do C++ graphics. I later heard that my former employer rewrote their Smalltalk application in Java.
Now no one uses Smalltalk anymore. While Objective-C is based on Smalltalk, Smalltalk was far easier to use, however lots of people use Objective-C. No one uses Smalltalk.
How could it have been different? My friend Kurt Thames once said that "Smalltalk is the way object-oriented programming SHOULD be done." I have always agreed with that.
But when new methods (!) of OOP arose, all the Smalltalk crowd did was gripe about how Smalltalk was far better than Java or Objective-C.
What let Java get ahead of Smalltalk for me personally, as someone getting into programming in 1996, was that i could write it in the text editor i already had, compile it with a compiler i could get for free, and then post the source code on Geocities (actually, Xoom - remember that?) to share with others.
Whereas when i tried to get into Smalltalk, the first thing i had to do was learn my way around this wacky environment with its strange class browser and ultra-retro window manager, and get my head around the fact that my source code wasn't anywhere particular, and yet was everywhere, and that if i wanted to share your code, i had to somehow "file out", and then hope that my internet friends could successfully "file in" to their own potentially modified images. Once i'd got hold of the tools at all, that is.
Which is not to say that the Smalltalk environment was not better than Notepad/DOS box/javac, because of course it was. It just didn't lend itself to adoption and spread nearly as well. It was a tool for masters, with affordance for apprentices.
Also, Java had pretty good networking right in the standard library, and networking was really exciting in 1996.
We used Visual Smalltalk Enterprise. It had the cool feature that, at the end of the workday, I could make what amounted to a core dump, then the following morning I would load my core dump into a running program, and there would be all my open windows with the cursors in the right places in the source documents and so on.
That was quite cool I really enjoyed it, however that environment was profoundly non-portable. I expect that much of the success of Java as opposed to Smalltalk was the simple ability one had to post a tarball full of source code on one's FTP site.
You wrote "your ability to write Java source in any text editor at all was, all by itself, a new method of doing Object-Oriented Programming".
Simula, considered the first OO language, is text based in the same way that Java is, and not image based like Smalltalk. Since Smalltalk is inspired by Simula, I do not believe one can say using any text editor is a new method of doing OO programming.
I think that Smalltalk has the right level of abstraction and a lot of very good other things about it.
MPI was, as the article points out, the wrong abstraction for the problem. If MPI dies, I am ok with that.
I am sad that Smalltalk is not more widely used.
Smalltalk's demise no doubt had a lot to do with Sun's marketing people convincing a bunch of Pointy-Haired Bosses that garbage collection means that you have no memory leaks, as well as that Java was the only way to do cross-platform development.
IMHO Java is one of the very worst ways to do cross-platform, however when I ported a a Mac OS Pascal program to Java so that it could be run on both Windows and Mac, the client was completely convinced that Java was the only way that could possibly be done - this despite my loud and frequent protests that the state of Java at the time was quite poor, that the Java interpreter was dog-slow, that Java sucked the memory dry, and that I knew a whole bunch of ways to write cross-platform native code that would be far faster and use far-less memory.
The reason that Smalltalk specifically suffered from this, is that Sun made most of its money by selling servers to the enterprise. Sun Workstations were favored by scientists and engineers, however Sun's real money came from enterprise servers.
Right around that time, Smalltalk was largely used for enterprise applications. Enterprise application developers loved Smalltalk absolutely to death however the bean counters and the PHBs were more inclined to listen to marketdroids promises about garbage collection being immune to memory leaks.
Garbage collection and memory leaks are orthogonal.