Hacker News new | past | comments | ask | show | jobs | submit login

> that many programmers reach for it first when they should be reaching for it last.

What's the go-to solution to get a UI to not block when running a computationally expensive task that takes a lot of time to finish?




I don't claim these are "go-to" solutions, but only that there are multiple solutions to pick from.

One solution is processes (mentioned in the post). Fork a process which does your computationally expensive thing and then get the result when you are done. For the security minded, we've seen this make a bit of a come back because separate processes can be run with more restrictions and can crash without corrupting the caller. We see this in things like Chrome where the browser, renderers, and plugins are split up into separate processes. And many of Apple's frameworks have been refactored under the hood to use separate processes to try to further fortify the OS against exploits.

Another solution is break up the work and processing in increments. For example, rather than trying to load a data file in one shot, read a fraction of the bytes, then on the next event loop, read some more. Repeat until done. This can work with both async (like in Javascript) or you can do a poll model. Additionally, if you have coroutines (like in Lua), they are great for this because each coroutine has its own encapsulated state so you don't have to manually track how far along you are in your execution state.


> One solution is processes

More expensive to start than threads, and far more expensive and complex and restrictive to move data around. Sounds like with the exception of some specific corner cases, threads are a better solution.

> Another solution is break up the work and processing in increments

Either the tasks aee broken into ridiculously fine-grained bits that are hard to make sense or keep track,or you still get a blocking UI. Furthermore, the solution is computationally more expensive.


Fork/exec time for extra processes is usually unimportant. If data transfer is truly a bottleneck, shared memory is as fast as threading.

These costs, though, are generally trivial compared to the lifecycle costs of dealing with multithreaded code. Isolation in processes greatly enhances debuggability, and it's almost impossible to produce a truly bug-free threaded program. Even a heavily tested threaded program will often break mysteriously when compiled with a different compiler/libraries, or even when seemingly irrelevant code changes are made. It's a tar pit.


> More expensive to start than threads,

Maybe, but, on Linux, processes and threads are almost the same thing.

Additionally, even where a process is a bit more expensive to create, it is not enough to block the UI thread from being responsive. I have first hand experience with this on different operating systems, including Windows, and it is more than fast enough to keep the UI completely responsive.

> and far more expensive and complex and restrictive to move data around.

Not necessarily. For threading, synchronization patterns are not necessarily simple. (This is why computer science instruction spend time on these principles.)

Furthermore, some languages and frameworks provide really nice IPC mechanisms. Apple's new XPC frameworks are pretty nice and make it pretty easy to do.

> Either the tasks aee broken into ridiculously fine-grained bits that are hard to make sense or keep track,or you still get a blocking UI.

As I mentioned, coroutines make this dirt easy. It principle, this doesn't have to be hard.

> Furthermore, the solution is computationally more expensive.

That doesn't really follow. The underlying task is the where the computation is. You are just moving it, either to a process, a thread, or dividing it up, or something else (e.g. send it to a server to process). At the end of the day, it is the same work, just moved.

Yes, you might need some state flags for breaking up the work, but threading also requires resources such as creating and running the thread, the locks and protecting your shared data, and so forth. There is no free lunch any way you do this.


Processes might be more expensive but they do have advantages.

If you do use a lot of CPU time, spawning a process instead of a thread might not have any noticeable impact at all.

Additionally, IPC isolates the process, meaning it can be more resistant to hostile takeover (if you drop privs correctly) and additionally you avoid any and all shared state that could possible result in unforeseen bugs.


What’s the big O of starting a pool? It’s around 1 either way right?

Presumably the work processing time overwhelms the IPC time.


You will want two have two (conceptual) independent entities, doesn't matter if they are processes or threads. Depending on the architecture they may not even live in the same machine. One entity will deal with user input, which will cause some work to be requested. The other entity will perform the work and report results. You pass messages between them.

The exact architecture will vary according to your needs. There was one project I was involved with, which contrary to what Joel Spolsky would say, we recommended that it be entirely rewritten. The biggest problem? Spaghetti code and threads. Or rather, they way threads were misused. You see, there was no logical module separation, they had global variables all over the place, with many threads accessing them. There were even multiple threads writing to the same file (and of course, file corruption was one of the issues). To try to contain the madness, there was a ridiculous amount of locking going on. They really only needed one thread, files and cronjobs...

For the rewrite, since we were a temporary team and could not trust whoever picked maintenance of the code to do the right thing, we split it into not only different modules, but entirely different services. Since the only supported platform was linux (and Ubuntu at that), we used d-bus for messaging.

This had the not entirely unexpected side effect of allowing completely independent development and "deployment", way before microservices became a buzzword. You could also restart services independently and the UI would update accordingly when they were down.

Even then, at least one of these services used threads (as tasks). Threads are great when they are tasks, as they have well-defined inputs, outputs and lifecycle.

At another project, I had to call a library which did not have a "thread-safe" version. A group at another branch was using Java, and they were arguing that it would be "impossible" to use that library without threads. The main problem was, as expected, that the library used some shared state. We would just fork() and call the library and let the OS handle.

Threads are a nice tool, but that is only one of the available tools in your toolbox. Carpenters don't reach for a circular saw unless there is no other way, because it is a dangerous, messy and unwieldy tool.


Put the computationally expensive task in its own process, and have the UI monitor it as needed as part of its event loop.

(That won't work in every case, but it should be thoroughly considered first.)


I just rewrite an expensive task so that it explicitly processes a chunk that takes a limited amount of time...which also helps with running out of resources in many cases.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: