Hacker News new | past | comments | ask | show | jobs | submit login

The entire reason for the popularity of distributed systems is because application developers in general are very bad at managing I/O load. Most developers only think of CPU/memory constraints, but usually not about disk I/O. There's nothing wrong with that; because if your services are stateless then the only I/O you should have is logging.

In a stateless microservice architecture, disk I/O is only an issue on your database servers. Which is why database servers are often still run on bare metal as it gives you better control over your disk I/O - which you can usually saturate anyway on a database server.

In most advanced organizations, those database servers are often managed by a specialized team. Application servers are CPU/memory bound and can be located pretty much anywhere and managed with a DevOps model. DBAs have to worry about many more things, and there is a deeper reliance on hardware as well. And it doesn't matter which database you use; NoSQL is equally as finicky as a few of my developers recently learned when they tried to deploy a large Couchbase cluster on SAN-backed virtual machines.




For web services, distributed usually makes sense, because the resource usage of a single connection is rarely high. It's largely uninteresting to consider huge machines for that use case for the reasons you outline.

But that's not really what the discussion is about. In the web services case, your dataset often/generally fits in memory because your data set is tiny. You don't need large servers for that, most of the time. Even most databases people have to work with are relatively small or easily sharded.

In the context of this discussion, consider that what matters is the size of the dataset for an individual "job". If you are processing many small jobs, then the memory size to consider is the memory size of an individual job, not the total memory required for all jobs you'd like to run in parallel. In that case many small servers is often cost effective.

If you are processing large jobs, on the other hand, you should seriously consider if there are data dependencies between different parts of your problem, in which case you very easily become I/O bound.


I own a ASUS gaming laptop...

It has two flaws: One, it has nVidia Optimus (that just suck, whoever implemented it should be shot).

Two, the I/O is not that good, even with a 7200 RPM disk, and Windows 8.1 make it much worse (windows for some reason keep running his anti-virus, superfetch, and other disk intensive stuff ALL THE TIME).

This is noticeable when playing emulated games: games that use emulator, even cartridge ones, need to read from the disc/cartridge on the real hardware, on the computer they need to read from disc quite frequently, the frequent slowndowns, specially during cutscenes or map changes is very noticeable.

The funny thing is: I had old computers, with poorer CPU (this one has a i7) that could run those games much better. It seems even hardware manufacturers forget I/O


I can bet in your case it's not the disk I/O actually, unless you have some very strange emulator the file access should still go through the OS disk cache. VMware for example surely benefits from it. How many GB do you have on the machine? How much is still left free when you run the emulator and the other software you need?


I noticed it as disk i/o because I would leave task manager running on second screen and every time the game lagged memory and CPU use were below 30% and disk was 100%, and if I left sorting by disk usage, the first places are the windows stuff, and after them, the emulator.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: