Hacker News new | past | comments | ask | show | jobs | submit login

Which isn't necessarily a problem. Just put X shard on the same machine. If the largest city can be handled by a single server, I generally agree with your parent.

I've thought about this problem before, both for a related problem space and with friends working in this specific space. The short version is to create a grid where each square holds car data (id, status, type, x, y, ...) in-memory. Any write-lock of concern is only needed when a car changes grid, and then only on the two grids in questions. This can be layered multiple levels, and your final car-holding structure could be an r-tree or something.

The grid for a city can be sharded across multiple servers. And, if you told me that was necessary, fine..but as-is, I'm suspicious that a pretty basic server can't handle a tens of thousands of cars sending updates every second.

Friends tell me the heavy processing is in routing / map stuff, but this is relatively stateless and can be sent off to a pool of workers to handle.




I always wonder in these cases about giving each car an actor in Erlang/Elixir and having complete network transparency, message handling and crashes handled for free.

The routing is very complex too but as you note scales well, until you want to start routing/pickups based on the realtime location of other cars.


I've given that kind of model a lot of thought too, but I did not find very satisfying solutions to these problems:

- How do you distribute the car actors on nodes, assuming the number of nodes is variable? (I think riak_core looks interesting, but it does not seem to have a way to guarantee unicity of something - it's rather built to replicate the data on multiple nodes for redundancy)

- What happens if a node fails? What mechanism is going to respawn the car actors on a different node? How do you ensure minimal downtime for the cars involved?

- What happens if there's a netsplit, e.g. how do you ensure no split brain where two nodes think they're responsible for a car?

It feels to me like the traditional erlang-process-per-request architecture coupled with a distributed store (riak or w/e) makes it possible to avoid the very difficult problem of ensuring one and only one actor per car in a distributed environment.


For the netspit stuff I'd look at Phoenix.Presence works and see how that handles it with a CRDT. There are various ways of distributing things across nodes in Erlang/Elixir but maybe you'd need to build something.

I think you are right about the riak core stuff - you could probably keep track of cars using some sort of distributed hash and kill multiple cars if they were to ever spawn.

In fact a way of instantiating processes and finding them based on a CRDT is probably a pretty cool little project...


I think that's a cool idea. It would have the downside that error recovery could take a while though, depending on the permdown period, so during that time a driver would be stuck; while in a request-based system they can immediately try again and it would work (hit a different instance). But it might not be too bad.


I've had uber crash on drivers while on a ride, even 30 seconds is better than complete failure! I think it would make uber more resilient...


> having complete network transparency, message handling and crashes handled for free

That describes no system, ever, including Erlang. TANSTAFL


Care to explain which bits Erlang/Elixir doesn't do or at least provide a solution for? As the sibling show you may need to be smart about certain things but the OTP definitely provides ways of solving the above things.


You're saying you want to extend "just let it crash" to the individual car level? :)


Depends on how smart your supervisor gets doesn't it but sure, why not? Many times processes can be restarted in microseconds...


When the Facebook IPO crashed the NASDAQ, I suspect it's because NASDAQ was sharding by ticker symbol. That was rational until one ticker became 1/3 of the trading volume.


No, that had absolutely nothing to do with it. It was based on a bad assumption about volume of trade entries/cancellations during the initial price calculation.


Sounds like you know more. Care to explain what the assumption was and how it caused the malfunction?



That doesn't contradict my conjecture that the issue was improper sharding. The jump in duration of cancellation detection from 2ms to 20ms could have been because they were running that calculation on a single machine.

Although... Ethernet latency would probably make it tough to stick to 2ms.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: