My first job was an ISP I co-founded. In retrospect, the biggest mistake we did was over-investing in hardware, by buying two 120MHz Pentium servers when we could have gotten away with 486's and a leased Cisco access server or two.
But those two 120MHz Pentium servers with 128MB a piece, between them served up dozens of corporate websites, handled 32 dialup connections (we had a horrific setup with Cyclades serial port multipliers wired up onto a large wooden board with US Robotics Sportster modems wall mounted - the Sportster were cheap consumer grade modems prone to overheating, so stacking them in any way was generally a bad idea), ran a NNTP server with 30,000 USENET groups, mail for all our customers, and shell access.
We got 5-6 25Mhz or 33MHz 486's with 16MB RAM to use as X servers, and ran all the clients (Emacs, Netscape and a bunch of terminals was a typical session) on the two Pentiums.
Funny, my 2nd job was as employee #1 for a local ISP. We had a very similar arrangement (USR modems and all) for dial-in until we financed out and bought proper hardware, I wish I could remember the details. We hosted something like 3000 users, all their websites, shell access, Usenet, everything off of just 2 or 3 Pentiums with even fewer specs than what you describe. Everything ran quite well. We even ran a couple very high volume corporate websites for some rather famous television shows of the time off of this arrangement.
I'd point out that it appears even a high volume site like HN hasn't knocked this ancient system offline.
It drives me nuts when I read about figuring out how to handle some couple hundred requests a second on racks full of modern hardware with Gigabytes of RAM. Any clearance bin laptop you can buy at Costco should be able to handle thousands to tens-of-thousands of requests per second. We're clearly doing something wrong these days.
The thing is, so much was static. It's easy to shuffle bytes.
There are a few problems: People don't know or care about profiling the basic stuff like context switches and data copies; people don't have a baseline idea of what should be possible; abstractions upon abstractions upon abstractions even when it complicates the code.
One of my biggest pet peeves in that respect is that people tend to not even know how many objects they create in many dynamic languages. Many people don't even know of any simple ways of finding out.
My most successful application speedup (EDIT: measured in speedup per hour expended - plenty of examples of much greater speedups, but they are rarely as quick) was spending an afternoon eliminating string copies in a late 90's CMS written in C++, and cutting page generation time by 30% in a system that was already heavily optimized (it provided a statically typed C++ like scripting language that was bytecode compiled at a time when most competing solutions were using horribly inefficient interpreters, and were often themselves written in slow intepreted languages).
My second biggest pet peeve is when people think system calls are cheap because they look like function calls. Sometimes you can increase throughput by an order of magnitude just by eliminating small read()'s (non-blocking filling of userspace buffer instead, and "read" from that). When I see slow throughput, the first thing I do is break out "strace" to look for unnecessary system calls...
But these problems are often a direct result of abstractions that means huge numbers of developers don't know or understand the cost of a lot of the functionality they depend on any more, and can get away with that "too often" because they work on hardware that is so fast it is extremely forgiving (e.g. my new "weird unknown Chinese brand" 8-core phone does about 10 times as many instructions per second as my ISPs total computing capacity in 1995, and it's nowhere near the peak performance of todays flagship phones), ignoring the fact that at "web scale", things are not as forgiving any more: server costs quickly start adding up.
But of course, most of us most of the time work on stuff that needs to handle little enough traffic that paying an extra $20k for hardware is cheaper than spending the developer time to optimize this stuff. Ironically this makes the optimization more expensive: Fewer developers ever get to obtain the experience to optimize this stuff at scale.
>My most successful application speedup ever...
> people tend to not even know how many objects they create in many dynamic languages
One of my biggest speedup success stories was an internal inventory reporting tool for a large-ish online retailer. The code was written in Perl, with huge amounts of data aggregation happening in memory and using loads of Perl hashes pointing to other Perl hashes point to others (and turtles all the way down). On small subsets of the data, it would run for 4 or 5 hours then spit out some graphs -- this was acceptable performance for daily reports. But as the site grew, it started taking 8 hours, then 10 hours and then finally crashing when the system simply ran out of memory.
So I did a simple conversion, all the hashes pointing to hashes etc. into arrays pointing to arrays. Pretty simple in the code, only a few lines of code here and there and a couple functions to map hash keys into array indexes.
The next day the entire inventory system did a run (with this one change) and finished in 10 minutes and AFAIK they still use that same system even though the site and user activity has grown a hundredfold.
The developer (self-taught) of the original code asked why this worked. I walked him through the big-O complexity of arrays vs. hashes, and more importantly the amount of stuff the hash type was dragging around through memory. There used to be a fantastic site with visualizations of all the perl datatypes and we sat down and started counting bytes until he understood the difference.
It's excellent that you'd do that for him, and he was curious/receptive enough to ask and step through it all. So many times I've seen people stay ignorant to save face.
I catch myself just throwing everything in inappropriate data structures myself these days, because the hardware lets me get away with it, and many recent languages makes hashes so ridiculously simple to work with and don't provide much more than that + arrays.
I guess that's kind of the problem though right? The hardware lets you get away with it, only until it doesn't. Then if you don't know the fundamentals, you're in trouble or you end up spending an inordinate amount of time and money trying to spread the task across dozens of systems in a couple racks of in some datawarehouse somewhere.
And then the blame often gets shifted to things like the language or framework and rewrites start being bandied about instead of analysing algorithm usage because the problem often isn't apparent when looking at profiles etc. unless you're used to looking for exactly this class of problems.
And people start talking about "Big Data" and Hadoop and similar when their data are really middle of the tree and can easily fit on a single server..
I'm at a company now where our default deployment is something like 6 fairly heavy VMs. You can jam it all into 4 lighter VMs and service a single user (which is what we do for testing on our laptops) and it runs..."ok".
Looking at what they're actually doing, basically a few postgres or solr requests, file serving and generating a some web pages...the data for one of our larger deployments barely breaks 1 GB when dumped out of the databases (It doesn't handle more than 100 users at a time)...it should be running cleanly off of a $699 mid-range all-in-one desktop from Costco and have enough headroom for 5,000 more users.
Yet somehow between all the framework, JVM and various other nonsense, there's murmers about moving that system to a more web-scale infrastructure because of all the performance issues we're seeing. Everytime somebody uploads some content to the server 4 of the VMs peg out their cores for several seconds. As I look around in engineering meetings at all the fresh young engineers working on this stuff, nobody seems to think this is a problem.
On the client side the problem is even worse, constrained to the 1990s era performance we're stuck with inside of browsers.
Machines capable of crunching billions of calculations per second, and putting tens of billions of bytes into main memory are brought to their knees dragging and dropping a half-dozen items in a GUI (something my Commodore 64 could actually handle) or running two programs at once, something my Amiga 500 could do.
It's bad and it makes me feel bad. I think growing up when computers were basically terrible has broken me to where things that young engineers think is cool and amazing I'm finding more and more disappointing...it's not that old computers were great, it's that modern computers aren't but they should be.
I'm every now and again hacking on AROS - a reimplementation (and in some ways extension) of the AmigaOS API's.
The tragicomical part of it is that "booting" the Linux hosted version of AROS (there's also a native version) on my laptop, with a custom Startup-Sequence that starts FrexxEd (a scriptable "emacs like" editor, but without - by default at least - Emacs keybindings) lets me "boot" AROS straight into FrexxEd in a fraction of the time it takes Emacs to start on my laptop...
Of course Emacs is kinda famous for being a resource hog, but anyway - FrexxEd can run comfortably in 1MB of RAM, and that's including a bytecode compiled C-like extension language and full AREXX integration.
I'm wondering what I can do to either get a smoother integration between AROS and Linux, or run it "full time". There are a few missing pieces still holding me back.
But it still irks me that mainstream OS's lack so much that AmigaOS has (e.g. FrexxEd exposes all open buffers as files, so you can e.g. use diff and patch between your last saved version of a file and the in-memory buffer, or run gcc on the in-memory buffer to get an error report into another buffer. Of course you can do the same with tmp files, but why should you need to?)
Hashes end up O(n) not O(k) in worst case scenarios. Arrays are guaranteed O(k). But that's also a simplistic view of how they work. Hashes, by necessity are far more complex than arrays. As you use them, there's lots more overhead involved that can cause things like cache misses, memory doubling when growing, rehashing etc.
More importantly, hashes need to store both the key and value pair. Arrays just need to allocate memory and user pointer arithmetic to offset from the array start location to arrive at each index.
If you think of an Array as just a Hash with numbers for the keys, Arrays "store" the keys basically for free. While even a hash with integers for the keys has to store 4 bytes for each key (on 32-bit implementations).
When hashes hit some utilization percentage, Perl (and any sane language) constructs a new, larger hash, then moves all the keys and values over (requiring all the keys to be rehashed again). When Perl lists get need to grow, a new array is constructed and all the pointers from the old one are copied over to the same indexes in the new one, no rehashing is needed and no values need to be copied. Destroying the old array and returning it to memory is also easier and faster than destroying the old hash. Hashes are also much more aggressive in resizing. I believe Perl Arrays resize only once they're 100% consumed, but hashes, in order to prevent collisions, have to resize once they hit something like 50% utilization, and they grow by doubling their space leaving lots of unused entries in the hash (I could be wrong on the exact details, but the ideas is the same).
Here's the illustrations of Perl's datatypes. It really clears up lots of mysteries about how Perl works internally.
The relevant portions are "AV" and "HV". I haven't read this in years, but at one point I had large chunks of it memorized and could hand count the number of bytes in memory I expected a Perl datastructure to take up and be fairly accurate with it.
For sake of comparison it's useful to run the following code.
and check the memory usage for both. On my system, the first one consumes about 35MB of memory. The second one takes about 216MB of memory and is a hair slower.
Now suppose each entry in each structure pointed to another array or hash respectively each with a million entries, and each of those in turn did the same. The memory growth of the array version is going to be much slower than the hash version.
This holds true for most languages, they implement their arrays, lists and hashes similarly and have similar issues for all of them.
Thanks for the clarification. It's pretty appalling for Perl, though, and I don't agree with your claim that this holds true for most languages.
A lot of languages have specialized structures for arrays and on the JVM, there is even close to no penalty for using ArrayList over a regular array (it's actually recommended for a bunch of reasons).
There is really no excuse for taking so long to populate a hash table (as long as the hashcode implementation is reasonable), especially if the indices are integers.
If I understand how Java implements the ArrayList, yeah, there's not much more overhead vs. an Array.
I don't have the time right now, but I'd be interested if you could whip up a quick simple profile like I did above for various generic Java collections like Map<T,T> and HashMap<T,T> and observe the time and memory consumption for each. I'd guess that the various Hash-like collections in Java will display a similar memory consumption to Perl's.
And a regular old int[] consumes a little over 12MB.
int [] a = new int[1000000];
for(int i=0;i<1000000;i++){a[i] = i;}
An Integer array of a million entries like this consumes a little over 42MB.
Integer [] a = new Integer[1000000];
for(int i=0;i<1000000;i++){a[i] = i;}
And an ArraList<Integer> consumes around 35MB (a bit surprising)
ArrayList<Integer> a = new ArrayList();
for(int i=0;i<1000000;i++){ a.add(i); }
A HashMap consumes about 77MB.
HashMap<Integer,Integer> a = new HashMap();
for(int i=0;i<1000000;i++){ a.put(i,i); }
A Java process on my machine, doing nothing at all except spinning a loop consumes about 8.5MB.
- Java ints are 32-bit or 4-bytes. Theoretically then, An array of a million ints is just 41,000,000 or 4,000,000 bytes or just under 4MB. As we can see in my test, 4MB + the size of the JVM running is just about 12MB, or pretty much what's expected.
- An array of a million Integers is actually an array of 64-bit pointers pointing to a 32-bit int, with some other overhead. I've read that an Integer object consumes about 16-bytes total. So a million of those is 15.6MB + 8.5MB = 24.1MB. The actual memory consumption is 42MB, so I'm off by about half, I'm not sure where exactly, but there's probably some kind of extra references to something I'm missing or some kind of JVM house cleaning I'm not thinking about.
- An ArrayList appears to be a nice abstraction on an Array (O(1) adds, get, etc.) It probably has some logic to do the Perl thing and just has some logic to build bigger arrays as more stuff is added. So there's probably some kind of internal penalty when they're exceeded.
But at any rate, I wouldn't expect the memory consumption to be all that different from an array of Integers. And the test shows it to actually be a little tiny bit less at 35MB.
- Now we come to a HashMap. A HashMap is basically a list of key->value pairs. This means that for an entry in the HashMap we need something like (I'm probably wrong in the specific details, but this is a useful thought experiment)
key (8-bytes for pointer to 16-byte Integer) + 8-byte pointer to the value = 32-bytes
value (8-byte pointer to a 16-byte Integer)
So a k->v pair is 56bytes.
1 million of those is about 55MB + 8.5MB = 63.5MB.
BUT, hashes don't consume space linearly as they grow. Many hash function will double the size of the hash once it passes some consumption metric, maybe 50%.
Thus we have potentially 1,000,000 completely empty HashMap entries. Which look like this.
empty key (8-byte pointer to NULL)
value (8-byte pointer to NULL)
or 16-bytes doing nothing in particular 1,000,000 empty entries (50% of the total hash size) ~ 16MB
for a total of 63.5MB + 16MB = 79.5MB. Our actual is 77MB so we're within spitting distance.
So if you consider that Perl's version of this test is 35MB for the Array version, which is on the order of what Java was providing we're pretty good. While the Perl Hash version took up 216MB vs. 77MB -- which is not too surprising as Perl's dynamic typing objects take up lots of memory under the hood.
It's interesting that all your calculations are about the memory footprint of these data structures, which in my experience, have very, very little impact on the performance of a big application. Need more memory? Just buy it or allocate it at startup. Problem solved.
I really wish more of the scaling problems I encounter on a daily basis were memory footprint related, but the reality is that they never are.
The real problem that was plaguing Perl in the example posted above was the runtime cost of operations which, in any reasonable library or language, should be constant but is apparently linear in Perl.
Well, strangely enough memory allocation/deallocation takes time. The fewer bytes you shuffle around, the less time memory operations consume. On some systems, once you start exceeding physical memory, you end up in swap space, which slows things down even further. Once you run out of every possible place you can stash something, your run-time effectively grows to infinity because your process has crashed.
So if your data is consuming 32gb of memory, and all 32gb have to meander their way into a register and back out, assuming a mov operation takes 1 CPU cycle. You've effectively had to move 64gb of data.
On a machine with 64-bit word length that's ~537 million cycles to move in and 537 million cycles to mov out or about 1 billion CPU cycles.
That may not sound like a lot, but nobody just moves things into or out of a register. Let's say you're doing some kind of multiplication. That's 3 cycles per mul * 537 million = 1.6 billion CPU cycles.
And now you're at 2.6 billion CPU cycles just to do something like "multiply each of half a billion numbers by some number" (assuming all the data fits in memory). Again, on a Ghz + machine that may not sound like a huge number, but hopefully you can see that the number start adding up quickly and I'm just calculating out a trivial operation of effectively 3 ASM operations.
One big mistake people make is assuming pointer dereferencing is "free". But all that dereferencing takes time as well. The pointer has to be moved into a register, read, then that memory location has to be moved into memory and then you might be able to do something to it.
Or take something like the example given in Perl. In theory a hash lookup can be O(k) if you're lucky. But that discounts the time it takes to run the hash function on the key, mov pointers around, lookup values, move more memory addresses around, search memory for a free spot to malloc (and if you're out and have to swap to disk, there goes a couple billion cycles waiting for that to happen)....etc. etc. etc. Oh, your hash is 50% utilized, let's wait while a new hash object is malloc'd that's twice the size of your current one, and every k->v pair is rehashed and inserted into the new hash, then the old hash is destroyed.
In the middle of all this, your OS is moving the entire execution stack all over the place as it context switches through the hundred processes that you have running so each can get a chance to do what they do. So while your process may only "need" a 20 or 30 billion CPU cycles, it takes 2000 or 3000 billion CPU cycles for it to get all those.
I'm oversimplifying, modern CPUs are much smarter these days and make some thing likes context switching less expensive, but the point still stands. If you're trying to compute something on lots of memory, it will take lots of time. Trying to fit your problem data into a more compact form might take less time if you run the numbers and figure out the cycle cost...which practically nobody who graduated school in the last 10 years knows how to do.
The next biggest thing you can do is algorithmic. You can often literally take a problem that will run until the universe grows cold, and turn it into a problem that can finish in a few minutes with a solid understanding of algorithms.
I'm continuously amazed at how relatively small datasets (small by way of the power of the machines we have today), and problems that should be quick single machine problems, get turned into entire racks full of machines running night and day. We've gotten really good at interconnecting machines to do useful things, but really stupid about how to get even a fraction of the performance potential out of an individual machine.
We used a single 500 MHz web server at the ReactOS.org project and it handled the slashdot/reddit effect til ~2008. Slashdot.org was the 8 biggest website in the world back in 2005 (Alexa ranking). I coded a novel CMS in PHP that generated hundreds of static pages in 2-3 seconds. Our open source project team as well as dozens of translators (27 languages) used the RosCMS ajax-based interface to edit the content. Back then ajax was little known and GMail used iFrames instead of ajax. PHP based MediaWiki, phpBB, bugzilla also run on the same server, and all used our central logon system. That's the old version of the website: http://old.reactos.org
On a recent project my social website served thousands of concurrent requests on a notebook class hardware with a single CPU. It uses nginx and HHVM (Facebook's PHP JIT VM).
And @vidarth is right that people usually don't know how many objects they create. But if you tend to care about speed you better should be careful. The same goes for database access, raw optimised SQL with good indexes fly.
> I coded a novel CMS in PHP that generated hundreds of static pages in 2-3 seconds.
heh...that's fast enough to basically generate an entire site on demand (per user-click) on modern hardware. Why bother dynamically generate a single page when you can dynamically generate an entire site for a user? Now instead of serving thousands of requests per second, you can serve in the low hundreds and be just like any other modern framework!
Next step! Hadoop clusters and web scale infrastructure consuming hundreds of kilowatts so you can serve high hundred concurrent requests per second.
But now you have a scalability problem. Better bring in an entire team to solve this.
I owned an ISP in upstate NY for a few years. Basically hosting websites for clients. I bought a used APC NetShelter cabinet and put 2 large UPS's in the bottom along with and Air conditioner cooling fan in the top. I have a Sun Fire V 1 u server that was running apache. I have a few Sun Blade 100 Workstations handling mail and a few more handing DNS, NTP, etc. All I remember spending at least 25K on all of this.
One interesting thing that I remember was trying to build a drive array back then. Trying to figure out how many IDE drives I could stuff into a 4U case to use as what we call NAS now a days.....
I did this from my house and had to upgrade to 200amp electrical service and have dedicated circuits run to my house. In order to do this the phone company had to dig up part of the road and it took them a week.. Not convenient for all that traffic down my street.
The phone company gave up on the idea of digging in our case.
We had stupidly not thought the phone situation through, so we got offices first and investigated the phone line availability second. Luckily this was before de-regulation so the incumbent telco was legally obliged to provide service no matter what at a fixed price per line.
Unluckily we'd gotten offices in the embassy area in Oslo. Unlucky because due to the embassies, this area was one of the first in Norway to get upgraded to an automated exchange, and said exchange had last seen a major upgrade in the 1930's or thereabout (given that this was before commonplace internet access spurred a flurry of exchange upgrades), and was near capacity - until we arrived this was not seen as a problem, as the area was fully built up and there was seemingly no reason to expect any big increase in number of lines that couldn't be accomplished just fine mainly by line splitting (multiplexing two lines onto one copper connection to the exchange) that worked fine for voice calls, but not so well for modems...
Their solution turned out to be to run a massive cable largely hanging in the trees along a major road from the nearest newer exchange with decent capacity a few hundred meters away, through a hole cut in one of our windows, to a multiplexer cabinet they installed in our office.
They did tell us they were going to get it sorted out "properly" but by the time we moved to new offices they still hadn't been able to.
wow, running a cable through trees into a hole in a window. That sounds crazy, but a conversation piece for sure!
They ran my circuit to the outside of my house, but would not drill a hold for me to connect it to the inside wiring I had already in place.
I tried to do this myself, measured wrong and ended up tapping into my main power wire with a 1/2 inch 2 foot long drill bit and setting my house on fire. Thankfully the damage was minimal and I was up and running a few days later once power was restored to my house.
Exactly. The 120 MHz Pentium servers were state of the art when we bought them. Some of the fastest we could get our hands on. And way too expensive (I seem to remember we spent the equivalent of about $16k/piece on them - driven up by the absurdly large 128MB RAM, and the huge diskspace of a few GB - but I may be mistaken).
I was primarily an Amiga user still at that time - happily using an Amiga 3000 quite similar in hw specs to the Mac in the linked article as my primary desktop.
IIRC, "Turbo" was actually the normal clock frequency. Not having turbo-button enabled was mostly for things like games where the game speed was tied to the number of CPU ticks instead of absolute time. Turbo button sounds better than "slow-down-my-PC button".
We sold of the dialup part to the Norwegian subsidiary of International Data Group for a pittance after we in our naivety (we were mainly CS students with no business experience at the time) started a price war with the largest ISPs.
Biggest lesson apart from being careful about large capital investments: If you think you can do things _far_ cheaper than an incumbent in an underdeveloped market, chances are they too can do things far cheaper, but have so far opted not to; forcing them into a price war without large backing is not a good idea.
Then we continued for a few more years doing consulting and business hosting as well as handling the backend support for IDG's Norwegian ISP, before the lawyer of our original investor hooked us up with much better paying jobs for the Norwegian office of a California based startup.
It was an interesting experience. Learned a lot. Didn't make a lot of money. But it defined my career - 6 out of my 8 subsequent jobs have a "lineage" going back to that company.
Is this a "load test"? It's certainly handling it quite well! Performance of the CPU is ~9 Dhrystone MIPS - comparable to a 33MHz 386 (9.9 DMIPS). Serving static content is really not CPU-intensive; most of the computation would probably be in TCP/IP processing. I think this, and other examples of old hardware commonly thought to be too slow for anything but still doing something useful, shows that much of the software we use today is really vastly less efficient than what's theoretically possible. I.e. how to make use of limited computing power efficiently can be more important than how much of it there is.
This thread is going to contain a lot people complaining that even though today's hardware is orders of magnitude more powerful than twenty years ago, our computers seem to be just as slow.
I worked in a college lab full of Mac IIcis, and they felt ridiculously slow even back then. :) I think people's memories get funny and they forget that it took minutes to launch programs and you could even watch the windows redraw.
I remember setting up a IIci with a fast scsi drive and system 6. It booted in about 5 seconds. I don't think anything I've seen has come close to that fast until the era of SSDs.
Later, when I got my first DSL line, I had a different cast off IIci running netbsd doing NAT for my little network. That would have been 1997 or so, 640k symmetric, I think the IIci had 24 megs of memory or so. Old and slow at the time, but less powerful than this one.
IIcis had some sort of onboard video which made them feel sluggish compared to similar Macs. We retrofitted them with cache cards (?) which helped a little. But, yep, very slow disks, System 7, and they IIRC had 4MB of memory. (24MB would have been an unimaginable luxury.)
Yep, ci's had onboard video, the cx didn't. I think that they were the first color/030 machines with onboard video.
They always seemed fast to me, because I bought a Mac SE just before the classic/lc/si came out. That was when Apple did a massive sales push on campus in late August/early Sept, and announced new machines in October.
It was an era where you could "feel" things like a few CPU Mhz or a 32KB cache upgrade. And you had to pay hundreds or thousands for those little upgrades, so it had to be worth it.
I stuck with a IIfx until Windows 95 came out, and learned that the little things (e.g. disk and video) to some extent matter more than the big things.
Not for everyone. In my case, I have to use browser extensions to force Youtube to use the Flash player because the HTML5 one doesn't do hardware accelerated playback for some reason and drains my laptop's battery like crazy (not to mention the fan noise).
The first Unix machine I used had 2 megabytes of RAM and was connected to two dozen terminals serial. It ran on a 68020. It probably wouldn't be able to run Apache, but it ran all our accounting and managed production for a large-ish corrugated paper factory.
A Mac IIci is ridiculously powerful in comparison.
The first Unix machine I used had 768k or less and was a 6 or 8 MHz 68000. It's still bizarre to think that we had 12 users on a TRS-80 at once, via serial terminals. Even more bizarre, my first *nix was Microsoft Xenix.
A few years later, we had a lab running Apple A/UX on either the Mac II or Mac IIx. They were barely usable as small XTerminals at 640×480 8-bit color.
You can use virtual memory on the 68020, but you need a 68851 between the 68020 and memory. The 68851 is an MMU coprocessor, like the 68881/68882 is a floating-point coprocessor.
There used to be a quite persistent website hosted from a C64 running Contiki. The pages were probably cached in RAM, but I still like to imagine that a beige 1541 disk drive spun up every time someone made a request.
The thing preventing that is that the C64 "Datasette" couldn't trigger play/rewind/fast forward from software, so it'd require it to show a prompt to the operator for each file it needed to load... Someone should hack together an interface - the C64 has usable extra IO pins on the cartridge port that surely should be possible to hook up to them.
From a very cursory look at a diagram, it might actually be possible (don't know about safe) to hook the input that normally comes from the play/rewind etc. switch straight to suitable pins on the user port (or you may fry something, but the C64 is remarkably resilient to crazy people like me soldering stuff straight onto the PCB without knowing quite what we're doing, or replacing IC's with "mostly the same but not 100% compatible chips" just to see what would happen... it's a wonder I never broken my C64's)
A single page load with no AJAX roundtrips, and no client-side scripts running helps a lot. Was reminded of that when I clicked on a search result to a mailing list archive today that happened to be hosted at Google Groups (vs the usual Mailman-style archives), and it took something absurd like ~4-5 seconds to load.
The Macintosh IIci came out in 1989. Pretty good for a 24 year old machine! I pull mine out every once in a while to remind myself that we haven't advanced that much in a quarter of a century.
I ran my first webserver on a Mac IIci at work, circa '94. We didn't have a firewall and were right on the internet. I soon was able to apple? script up a page with a live QuickCam image, text-box, cgi-bin, and the speech synth and let my friends have at it!
I received nothing but innuendo and sheep noises, but it was a blast. This was a year or so before I started administering the definitive Quake server on the corporate network, and dare I say... internet. Up to twenty players at a time, some on dialup.
My first job was an ISP I co-founded. In retrospect, the biggest mistake we did was over-investing in hardware, by buying two 120MHz Pentium servers when we could have gotten away with 486's and a leased Cisco access server or two.
But those two 120MHz Pentium servers with 128MB a piece, between them served up dozens of corporate websites, handled 32 dialup connections (we had a horrific setup with Cyclades serial port multipliers wired up onto a large wooden board with US Robotics Sportster modems wall mounted - the Sportster were cheap consumer grade modems prone to overheating, so stacking them in any way was generally a bad idea), ran a NNTP server with 30,000 USENET groups, mail for all our customers, and shell access.
We got 5-6 25Mhz or 33MHz 486's with 16MB RAM to use as X servers, and ran all the clients (Emacs, Netscape and a bunch of terminals was a typical session) on the two Pentiums.