Hacker News new | past | comments | ask | show | jobs | submit login
A Note on Distributed Computing (1994) [pdf] (scholar.harvard.edu)
88 points by vaughan on Jan 4, 2023 | hide | past | favorite | 42 comments



Waldo went on to be the chief architect of Jini at Sun, which IMO is a very elegant solution to the problems of distributed computing. Unfortunately Jini never found success in the real world, but there are a lit of great ideas contained within, like resource leasing, tuplespaces, and interface-based proxying. Here’s a later interview with Waldo that’s a bit of a retrospective [0]. With 20 years passed since then, I don’t think we have yet found a satisfactory approach to distributed systems, so maybe it’s time to revisit these older ones.

0. https://www.artima.com/articles/jim-waldo-on-distributed-com...


Interactive Distributed systems are only slow if you command them to do things sequentially and wait for a round trip.

If you send the computation plan with the initial input request, they can be fast.

We can also program distributed systems to run a switch statement after each response. A state machine for every response. What if you could coordinate a computer to do things depending on result? And you can do this similar to Capnproto promise pipelining?

With promise pipelining you can say what operations you want to do when the operation completes against the output of the response from the server.

One of my ideas is the thought puzzle of extremely high performance computer with high latency.

You design sessions which decide what to do and what to do given what output. Then the computer automatically follows the sequence of commands depending on the output of the remote system.

I see it as chaining together promises that are typed. Imagine an advanced option Monad that captures every possible output state and let's you pattern match it.

When you next create a change for your CI system which needs to build a docker image, upload, prepare a Kubernetes manifest, run a data migration, run a redacted data import you can use this architecture.

Admittedly we need an elegant GUI for queuing responses of servers.

My idea was GUI thunking that is inspired by Haskell's thunks

https://github.com/samsquire/gui-thunks


Depends on how embarrassingly parallel (https://en.wikipedia.org/wiki/Embarrassingly_parallel) your problem is. If many individual operations depend on the results of previous operations, it'll be slow.


You are of course completely right.

How do you parallelise building a docker image, compilation, uploading to a server and deploying to Kubernetes?

I was trying to remove all round trips from the equation.


If they depend on each other, you don't. The best you can do is get all the data in the right place(s) to reduce latency and use caches to reduce duplication of effort.


The more complex the request you can make the larger the DOS attack surface is. It’s a tough problem. We need some more little languages like eBPF to send just enough conditional logic across.



I had not recalled that was Liskov's work. I do have a vague recollection of basically the same thing coming out of Microsoft Research in the late 90's, but I can't recall if that was a paper or a patent application.

Promise chaining is handy but unless the response to the first query is a naked value like an ID or a user name then you need logic to extract the 2nd question from the 1st answer, which is going to require running logic on the server and while not DDOSing yourself. There's a lot of relatively cool stuff that was done in the 80's and 90's that had not really absorbed the import of the Morris Worm.


Hard to believe this paper is 28 years old now! I still forward it to folks who tell me excitedly about how they've found some new system that makes it so that remote resources look just like local resources. Fortunately that fantasy is less common now that so much is remote, folks are more used to experiencing dropouts and jitter and delay when interacting with things online.


"The hard problems in distributed computing are not the problems of how to get things on and off the wire. The hard problems in distributed computing concern dealing with partial failure and the lack of a central resource manager. The hard problems in distributed computing concern insuring adequate performance and dealing with problems of concurrency. The hard problems have to do with differences in memory access paradigms between local and distributed entities. People attempting to write distributed applications quickly discover that they are spending all of their efforts in these areas and not on the communications protocol programming interface."

They got that right. Most of the value of Java, and the market value of Google and Amazon, come from solving these problems.

The viability of javascript in the browser has depended entirely on the communications protocol not being that important.

One thing the paper misses is the value of encryption. Distributed computing is severely limiting if it can't be encrypted. Indeed, the rate-limiting factor to uptake might be not protocols or distributed algorithms, but confidence in encryption and distributed security -- and that's as much a social problem as a technological one.


At the time this was written (IIRC), public-key encryption was still under the DSA/RSA patents and therefore not commonly available. And (possibly) the machines didn't have the horsepower to do a lot of cryptography.


Encryption was export-restricted until 1996.

Actually that isn't really relevant - most of the systems the authors are talking about (and that they worked on) would have been running on enterprise or campus LANs, and it wasn't totally clear (at least to the sysadmins at the place where I worked) whether it was even legal to connect corporate networks to the internet at that point in time if you weren't a govt contractor.

Based on what I recall, most of the security threats we were worried about had to do with corporate espionage and information theft, and would have required physical access to the network. In that environment, encryption isn't important in the same way as it is on today's Internet.


Original title: A Note on Distributed Computing

> A better approach is to accept that there are irreconcilable differences between local and distributed computing, and to be conscious of those differences at all stages of the design and implementation of distributed applications. Rather than trying to merge local and remote objects, engineers need to be constantly reminded of the differences between the two, and know when it is appropriate to use each kind of object.

Aka. Transparent remoting.


We changed the title back to the original, in keeping with the site guidelines (see https://news.ycombinator.com/newsguidelines.html). Among other things, it's much better when searching for previous threads on the same article. (I found several previous submissions, but they didn't have comments).

We also changed the URL from https://scholar.harvard.edu/waldo/publications/note-distribu... - might as well go straight to the paper.

It's a good submission—thanks!


> Transparent remoting

Presumably you mean "Transparent remoting isn't a thing"?

Paper authors were at Sun, so had experience with NFS (the design of which assumed that transparent remoting was a thing).


NFS is my favorite example of how to do distributed computing incorrectly. Take a filesystem API that assumes fast, local access, and that individual I/O operations are cheap and immediate (like stat()’ing a file or resolving a symlink), and suddenly make it so all these operations may have to block for hundreds/thousands of milliseconds while accessing something over a remote server potentially on the other side of the world. It causes so many issues: syscalls that can’t be interrupted (and thus you can’t ctrl+c a process), stuck mounts that can’t be unmounted, hanging for ages on `ls`’ing a folder with a lot of symlinks…

NFS locking up is the most common answer to the typical interview question “what could cause a load average in the hundreds on a machine with zero CPU usage?”: because there’s a stuck NFS server and the pending IO operations put the processes in the run queue.

I’m glad there’s a paper that articulates my thoughts about this subject better than I can.


> It causes so many issues: syscalls that can’t be interrupted (and thus you can’t ctrl+c a process), stuck mounts that can’t be unmounted, hanging for ages on `ls`’ing a folder with a lot of symlinks…

These things have nothing to do with distributed systems, they can happen on any machine. That's what the "Not ready reading drive A. Abort, Retry, Fail?_" prompt is for.


umount -l -f will unmount (logically) a stuck mount and let you mount something else there.


And before that there was /etc/fastboot or the L1-A two finger salute!


GFS at Google was created for this very reason. In the early days, lots of disk servers exported NFS and lots of hosts mounted lots of NFS. This caused things like index building to break.

What's interesting about GFS is that it's a purely RPC protocol with no kernel and the access library is linked into the application. That allowed a lot of freedom for both the server developers and the clients. It also forced the client developers to explicitly acknowledge they were working with a remote filesystem.

(BTW, I work for an enterprise that uses NFS; we have 20PB in there and lots of HPC jobs and interactive users pounding on it. But we spent $$$ for a very good NFS server)


I have some experience in running pretty large compute clusters with NFS as the way everything is accessed, so I feel your pain. We had an environment where all the binaries themselves were in NFS, as well as the data being processed by the jobs (on different servers at least… the ones hosting the binaries were the “tools” servers), and so we had thousands of machines hitting the tools NFS mounts thousands of times per second.

A typical issue was jobs which ran a bash script which calls perl a bunch. Perl is on NFS. Oh and perl isn’t just on NFS, but the path to it is a symlink, which resolved to another symlink, which resolved to another symlink, which finally resolved to the perl binary. So that’s a half-dozen stat() calls per each call to `perl`. (None of these are cached, because metadata wasn’t cached, only the file contents were…) Said bash script would use ` | perl -pe` instead of grep, etc…

Oh and typical perl programs called by these scripts had a bunch of modules they needed. Guess where the modules are? NFS! Not just NFS, but behind their own sets of symlinks (gotta be able to version your perl code! Why not use symlinks!)

Some of these symlinks would trigger an automount. So now your bash script’s call to `perl -pe ‘s/foo/bar’` had to go mount something from another tools server. Yay.

Needless to say we paid the storage vendor a lot for support.


Yeah, college students routinely bring down clusters by submitting large jobs with naive IO patterns. A common strategy at supercomputer centers is to not mount the home dir at all, and force jobs to stage every single bit they need into the work node's filesystem.

We put tons of artifacts and modules on NFS but our filers are designed to scale. Once, I even had to evaluate a product (Gear 6) which was a read-only cache of another NFS server, just to offload the heavy reads caused by cluster scripts.

With all that said, the best cluster I ever ran was diskless- all the worker nodes mounted / and /home from a master node, booting straight into a root NFS filesystem.


AIX 3.2.5 (IIRC) had a bug in NFS where you could corrupt a filesystem's lock with the result that any process that touched the filesystem in any way would block.

IBM's automountd was single threaded.

Someone's code (specifically, a systems programming class assignment) would lock the FS with their home directory. After a period of inactivity the automounter would try to unmount the directory, blocking. The machine is now nearly useless.


Oh, Netapp had a similar bug in ONTAP. If you had a nearly full (98%) filesystem, snapshots enabled, and somebody did an rm -rf <MYDIROFLOTSOFDATA>, the whole filer would stop responding to all users for half an hour or more.

Eventually we got a patch (custom for our enterprise) that limited the outage to anybody who had a current working directory that was the parent of the rm -rf'd dir. And we dropped NetApp shortly after.


I hope I didn’t build that system. Sorry. The design is giving me flashbacks to the mid-90s in oil and gas.


I hit that a couple of weeks ago after some 30+ years. I forgot how it was. And the per client per directory file entry cache. It doesn't play nice with systems with producers that create files and consumers that look for them. By the way, Amazon's EFS is based on NFS.


Ok, so be constructive: what should an interface for remote file access look like?


The most important thing is that it should be asynchronous, and only asynchronous. A syscall like read() blocks by default (suspending the whole thread) until an NFS server comes back with a response. A more appropriate interface should not allow callers to operate in a blocking way.

Blocking incurs a lot of problems, because the kernel ends up waiting a ridiculously long default timeout for the server to respond, and the calling thread is blocked the whole time. Making a remote-access API asynchronous (callback-based), or at least having O_NONBLOCK being the only allowed option, would help a lot here.

But really, the other problem is that filesystems have too much baggage around assumptions of speed/locality/reliability that just blindly treating a remote server as if it’s a local filesystem is just a bad idea in the first place. Too much client software is written with the assumption that filesystems are fast (ie. you don’t need to cache it, stat()’s are cheap, etc) that it’s just better to not conflate the two.


Open(path) -> handle, Read(handle, offset, size), Write(handle, size), SnapshotForAppend(handle, newpath), Close

And write is write-once, from the start of the file.


Was this design issue addressed in the subsequent revisions of the protocol?


TBH I’m not sure. The run queue (high load) issue seems the kind of thing that could be an issue with the client kernel, not necessarily the protocol, although I’m not sure of the details. It’s possible it’s fixed now, I haven’t been an NFS admin in 10 years :)

But the real point is that the filesystem, as an abstraction, has too much baggage from clients that make local-first assumptions, like that stat()’s are cheap (let’s use lots of symlinks! Hey, why is the NFS server CPU usage so high from stat() calls?), or that the filesystem can be used for easy cache (which on NFS is kind of the opposite of what you want), etc… silently and transparently making your filesystem an NFS mount is a bad idea unless the software using that mount is aware that the other end is a (potentially slow) NFS server and not local disk. The two really shouldn’t be conflated.


> The run queue (high load) issue seems the kind of thing that could be an issue with the client kernel

Under Linux, a high load roughly tells you that the system is bottlenecked on something, where that something isn't necessarily CPU cycles. If your system is bottlenecked on pending NFS requests, the load will be high, and that's not the result of any kernel shortcomings (unless one blames blocking file I/O on the difficulty of correct use of exposed async/non-blocking file I/O syscalls).

The high load average is a matter of accounting[0]. Under Linux, tasks (threads) in I/O wait are considered runnable, although they're not scheduled for CPU time. Other OSes (e.g. most Unix descendants, probably WinNT) make other accounting decisions, but either way, tasks blocked on I/O don't affect CPU-bound tasks. The Linux manner of load accounting isn't what I would have implemented at first, but if you're going to collapse system load into a single metric, then it seems best to include blocked tasks. If you've got high load, then it's time to dig deeper into more nuanced metrics anyway, so in retrospect, I agree with Linux's more holistic load accounting.

[0] https://en.wikipedia.org/wiki/Load_(computing)#Unix-style_lo...


Oh yeah, don't get me started about NFS! (Oops, too late.) I'll just link with short summaries:

https://news.ycombinator.com/item?id=20007875

>There used to be a bug in the GatorBox Mac Localtalk-to-Ethernet NFS bridge that could somehow trick Unix into putting slashes into file names via NFS, which appeared to work fine, but then down the line Unix "restore" would totally shit itself.

And here's another cool party trick you can baffle people with (not related to NFS, just an obscure MacOS quirk):

>I just tried to create a file name on the Mac in Finder with a slash in it, and it actually let me! But Emacs dired says it actually ended up with a ":" in it. So then I tried to create a file name with a colon in it, and Finder said: "Try using a name with fewer characters or with no punctuation marks." Must be backwards compatibility for all those old Mac files with slashes in their name. Go figure!

https://news.ycombinator.com/item?id=31820504

>NFS originally stood for "No File Security".

>The NFS protocol wasn't just stateless, but also securityless!

>Stewart, remember the open secret that almost everybody at Sun knew about, in which you could tftp a host's /etc/exports (because tftp was set up by default in a way that left it wide open to anyone from anywhere reading files in /etc) to learn the name of all the servers a host allowed to mount its file system, and then in a root shell simply go "hostname foo ; mount remote:/dir /mnt ; hostname `hostname`" to temporarily change the CLIENT's hostname to the name of a host that the SERVER allowed to mount the directory, then mount it (claiming to be an allowed client), then switch it back?

>That's right, the server didn't bother checking the client's IP address against the host name it claimed to be in the NFS mountd request. That's right: the protocol itself let the client tell the server what its host name was, and the server implementation didn't check that against the client's ip address. Nice professional protocol design and implementation, huh?

>Yes, that actually worked, because the NFS protocol laughably trusted the CLIENT to identify its host name for security purposes. That level of "trust" was built into the original NFS protocol and implementation from day one, by the geniuses at Sun who originally designed it. The network is the computer is insecure, indeed.

>And most engineers at Sun knew that (and many often took advantage of it). NFS security was a running joke, thus the moniker "No File Security". But Sun proudly shipped it to customers anyway, configured with terribly insecure defaults that let anybody on the internet mount your file system. (That "feature" was undocumented, of course.)

https://news.ycombinator.com/item?id=31820891

>Stewart, I think "sucks" is a pretty fair description of a protocol that actually trusted the client to tell the server what its host name is, before the server checked that the host name appears in /etc/exports, without verifying the client's ip address. On a system that makes /etc/exports easily publicly readable via tftp by default.

https://news.ycombinator.com/item?id=31822138

>Speaking of YP (which I always thought sounded like a brand of moist baby poop towelettes), BSD, wildcard groups, SunRPC, and Sun's ingenuous networking and security and remote procedure call infrastructure, who remembers Jordan Hubbard's infamous rwall incident on March 31, 1987?

https://news.ycombinator.com/item?id=31821646

>Another reason that NFS sucks: Anyone remember the Gator Box? It enabled you to trick NFS into putting slashes into the names of files and directories, which seemed to work at the time, but came back to totally fuck you later when you tried to restore a dump of your file system.


Remote pretending to be local is asking for failure. But local can trivially conform to abstractions designed for the remote case. Languages like Erlang and E showed that programming that way doesn't have to suck.

When I read this paper (it's been a long time) it seemed to overlook this angle on the problem.


The paper specifically discusses that option (making local conform to remote abstractions) but finds it inadequate for various reasons. Stated succinctly: "Rather than encouraging the production of distributed applications, such a model will discourage its own adoption by making all object-based computing more difficult."


Thanks, you're right and my memory was fuzzy. But the strong claims like that in the paper are about trying to completely erase the local/remote difference, which neither E nor Erlang did. Instead they reduced the pain in a way that this paper leaves an impression is not worth pursuing:

> Rather than using those resources in attempts to paper over the differences between the two kinds of computing, resources can be directed at improving the performance and reliability of each.

> ... it is a mistake to attempt to construct a system that is “objects all the way down” if one understands the goal as a distributed system constructed of the same kind of objects all the way down.

(E.g. in E the objects are the same kind; it's the references to objects that have two kinds.)


"Local conforming to the remote case" will be slow and have a lot of overhead, unless the "remote" abstractions are carefully designed to scale down efficiently to a single node. You can view the SSI (single system image) approach as trying to do exactly that.


hmm maybe? the repeated references to Object-Oriented structures means partly, that 'code + data + an interface' exists systematically, as the basis for interaction. What you do with that to expose to the human mind is quite flexible (with costs).


Foundational paper IMHO that all computer programmers should read.


It was nice to see the later Java people at Sun (and even later the XMLHTTP/AJAX people at Microsoft) finally pick up some of the ideas the earlier NeWS people were talking about until we were blue in the face, and implementing in the 80's with PostScript, long before JavaScript or even Java.

In 1990 Owen Densmore rightfully complained, "This has been particularly difficult for me to get across to others here at Sun."

James Gosling designed NeWS long before he designed Java. This is his 1986 paper about it, when it was originally called "SunDew".

http://www.chilton-computing.org.uk/inf/literature/books/wm/...

NFS Version 3 (NeFS) was all about applying NeWS's distributed computing architecture of sending PostScript programs instead of using a fixed protocol to interact with the file system as well as the window system.

In fact John Warnock originally intended PostScript to be a "linguistic motherboard" that supported network services like file storage and other features as well as graphics, which Owen Densmore recounted in his "Swiss Army NeWS" paper.

https://news.ycombinator.com/item?id=20891291

DonHopkins on Sept 5, 2019 | parent | context | favorite | on: Samsung Announces Key-Value SSD Prototype

It's like NFS 3 (NeFS) in hardware! (NeWS for disks: A PostScript interpreter in the kernel as a file system API.)

https://www.donhopkins.com/home/nfs3_0.pdf

Network Extensible File System Protocol Specification (2/12/90)

Comments to: sun!nfs3 nfs3@SUN.COM

Sun Microsystems, Inc. 2550 Garcia Ave. Mountain View, CA 94043

1.0 Introduction

The Network Extensible File System protocol (NeFS) provides transparent remote access to shared file systems over networks. The NeFS protocol is designed to be machine, operating system, network architecture, and transport protocol independent. This document is the draft specification for the protocol. It will remain in draft form during a period of public review. Italicized comments in the document are intended to present the rationale behind elements of the design and to raise questions where there are doubts. Comments and suggestions on this draft specification are most welcome.

[...]

Although it has features in common with NFS, NeFS is a radical departure from NFS. The NFS protocol is built according to a Remote Procedure Call model (RPC) where filesystem operations are mapped across the network as remote procedure calls. The NeFS protocol abandons this model in favor of an interpretive model in which the filesystem operations become operators in an interpreted language. Clients send their requests to the server as programs to be interpreted. Execution of the request by the server’s interpreter results in the filesystem operations being invoked and results returned to the client. Using the interpretive model, filesystem operations can be defined more simply. Clients can build arbitrarily complex requests from these simple operations.

https://news.ycombinator.com/item?id=22456710

DonHopkins on March 1, 2020 | parent | context | favorite | on: Sun's NeWS was a mistake, as are all toolkit-in-se...

Owen Densmore recounted John Warnock's idea that PostScript was actually a "linguistic motherboard". (This was part of a discussion with Owen about NeFS, which was a proposal for the next version of NFS to run a PostScript interpreter in the kernel. More about that here:)

https://news.ycombinator.com/item?id=17077721

Owen Densmore's discussion of John Warnock's "Linguistic Motherboard" idea for PostScript:

https://donhopkins.com/home/archive/NeWS/linguistic-motherbo...

    Date: Tue, 20 Feb 90 15:20:52 PST
    From: owen@Sun.COM (Owen Densmore)
    To: don@cs.UMD.EDU
    Subject: Re:  NeFS

    > They changed the meaning of some of the standard PostScript operators,
    > like read and write, which I don't think was a good idea, for several
    > reasons... They should have used different names, or at least made ..

    Agreed.  And I DO see reasons for the old operators.  They could
    be optimized as a local cache for NeFS to use in its own calcs.

    > Basically, NeFS is a *particular* application of an abstraction of
    > NeWS. The abstract idea is that of having a server with a dynamically
    > extensible interpreter as an interface to whatever library or resource
    > you want to make available over the network (let's call it a generic
    > Ne* server).

    Very true.  This has been particularly difficult for me to get across
    to others here at Sun.  I recently wrote it up for Steve MacKay and
    include it at the end of the message.

    > It's not clear to me if NeFS supports multiple light weight PostScript
    > processes like NeWS.

    I asked Brent about this, and he agreed that it's an issue.  Brent
    has been talking to a guy here who's interested in re-writing the
    NeWS interpreter to be much easier to program and debug.  I'd love
    to see them come up with a NeWS Core that could be used as a generic
    NetWare core.

    I think you should send  your comments off to nfs3 & see what happens!
    I agree with most of your points.

    Owen

    Here's the memo I consed up for MacKay:
Window System? ..NeWS ain' no stinkin' Window System!

-or-

Swiss Army NeWS: A Programmable Network Facility

Introduction

NeWS is difficult to understand simply because it is not just a window system. It is a "Swiss Army Knife" containing several components, some of which contribute to its use as a window system, others which provide the networking facilities for implementing the client-server model, all embedded in a programmable substrate allowing extremely flexible and creative combination of these elements.

During the initial implementation phase of the Macintosh LaserWriter software, I temporarily transfered from Apple to Adobe working closely with John Warnock and other Adobe engineers. At lunch one day, I asked: "John, what do you plan to do after LaserWriter?" His answer was interesting:

        PostScript is a linguistic "mother board", which has "slots"
        for several "cards".  The first card we (Adobe) built was a
        graphics card.  We're considering other cards.  In particular,
        we've thought about other network services, such as a file
        server card.
He went on to say how a programmable network was really his goal, and that the printing work was just the first component. His mentioning using PostScript for a file server is particularly interesting: Sun's next version of NFS is going to use PostScript with file extentions as the client-server protocol!

This paper explores NeWS in this light: as a Programmable Network Facility, a major part of Sun's future networking strategy.

[...]


This is an old paper that is not super meaningful these days. Distributed computing has transformed radically in the past three decades. In the 80s and 90s, there were few useful distributed systems, and we only had a rudimentary idea as to how to build them. Now we run planet-wide distributed systems for billions of users. The lessons from the 90s are no longer very useful.


The vast majority of modern distributed systems are (possibly stacked) client/server architectures, possibly with a significant minority of publish/subscribe systems. And even so, latency, partial failure, and concurrency are still major issues.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: