I see this and I immediately think of "trash sorting" at ultra high speed. If one can combine this with a bunch of accurate (laser precision) air guns, to shoot and move individual pieces of trash you can sort through a truck load of trash in a matter of seconds, perhaps in the air while they are being dumped! compare this approach with how we are currently doing it [0] - Somebody should get Elon Musk on this project right away!
Sorting by optical recognition and air guns to separate a falling curtain of product into two output streams is already a product. The development of these machines are the reason that 10 or 15 years ago you stopped seeing bad beans in bulk bean bags. I am involved in the tea industry where they are used to sort tea by grade - stems, bad leaves, broken leaves, full leaves.
Yes but when I went to the recycling (sorting) facility in San Mateo they remarked that their plastic sorting systems work by infrared reflection and so cannot see black plastic. They said that because of this they are unable to process black plastic entirely. I got the sense that there’s room for improvement.
Reminds me of the library modernization drive in 'Rainbows End'. The book digitizer is basically a wood chipper with lights and high speed cameras in the debris chute.
Not related to Elon but there is a company called Pellenc ST in the south of France that work exactly on this kind of problem. You can see a video of one of their machines here [0].
I work at an AI consultancy [1] that help them use deep neural nets in these high throughput and low latency conditions. It's an interesting challenge and the performance than can be squeezed from modern hardware are indeed impressive.
This is a thing already. In my understanding, it's a staple in several kinds of recycling processes. Random sample of related links (there's a seemingly infinite amount of these though):
Trash sorting is probably better than self driving cars. I only see speed talk on this page and nothing about accuracy.
Musk needs like 99.9999% accuracy at near zero latency over several hours of operation. I think Tesla currently is at maybe 99.995% from driving my car. The last 0.005% results in phantom braking etc. It's actually a very hard nut to crack and I don't expect them to achieve the full self driving in all conditions for another 10-15 years maybe. The edge cases are just too many.
I like the trash idea though (or a Q/A robot at a factory etc).
Out of curiosity, what are the possible use cases for object detection at >100 fps? I assume it would have to be objects that move very fast, i.e. nothing ordinary that I can think of.
[edit] actually stupid question. I assume it's more about throughput than fps, i.e. be able to process lots of streams on the same machine, for instance for doing mass analysis of CCTV streams.
While I'm not into object detection such as this, I can easily imagine this being part of a system where you want the rest of the system to have time to act on the information.
As such the point isn't that you can detect objects >N fps, but rather that the object detection shouldn't take more than X% of the time per cycle so that the overall cycle time can run at a given rate.
If your pipeline depends on running inference on a single frame at a time, for example some kind of control loop, then you need to be a bit careful about how you measure speed; you have to use the effective time per batch (ie batch size 1), not the amortised frames per second using as big a batch as will fit. You can still interleave processing though.
Very practical question :)
Exactly as you say, multi-stream throughput. Also for faster than realtime offline processing of video.
Check the caveats section at the end of the post - DeepStream is probably not well suited to high throughput single-stream inference.
I don't think you'll find a 1000 fps camera on a "standard" AV platform. And if you did, I imagine it would be too noisy to be useful without a ton of illumination.
But machines aren't (yet) as capable as humans at driving-situation-recognition and driving-decision-making. One way they can compensate for those shortcomings is to be superior in other ways: 100% vigilance and super-fast reaction times/decision-making.
No, that's only if you want the human to be able to make the reaction. If the application was self-driving, you'd prefer the car to react faster than a human. For a military application like projectile detection to avoid or destroy the object, you'd want something even faster.
I'm not sure what your argument is. Any self-driving system should strive to be much better than human drivers. It makes complete sense to have reaction time much better than human.
For reference I am making a farming robot that goes at 1 meter per second and I run the main control loop at 10hz. I would absolutely run a car that goes freeway speeds at at least 100hz.
Cloud service that needs to serve multiple requests or process many video streams in parallel. (faster performance = less hardware required, bigger scale and potentially improve end user experience - save from their data being on the cloud of course).
On device (eg mobile phone) processing with battery usage that respects the user. Older hardware/models inclusion as well.
Of course the above aren't cases were the stream itself is 100+fps, but more of broader general benefits. For a 100+ fps stream.. well there are many things that go fast, imagine you wanted a robot that tracks or catches a fly before it takes off. Flies have a reaction time of 5ms (200fps), that's why it's hard for us to catch! Expand and apply the same concept to other things (that are fast, or happen very quickly) now...
Plenty of robotics tasks could benefit from high FPS tracking of single streams. Generally process tasks where faster=better. But yes, tracking many streams at once is useful too!
> Could >100 FPS be realistically achieved today using only CPUs or mobile phones?
Not yet.
Google's MediaPipe object detector (which is one of the most optimised mobile solutions around) can do "26fps on an Adreno 650 mobile GPU"[1].
The Adreno 650 is the GPU in the Snapdragon 865, ie the current high end SOC used by most non-Apple phones. This gives roughly the same performance as an iPhone 11.
A weird question, but since there's another article on HN right now about programming language energy efficiency https://news.ycombinator.com/item?id=24816733 any idea whether going from 9fps to 1840fps consumes the same power, 200x the power, or somewhere in between?
Great question, now I wish I'd recorded power consumption for all these experiments.
Judging from cumulative hours of watching the output of nvidia-smi I've definitely seen a linearish relationship between utilization and power draw (with a non-zero floor of 30-40W).
I see Rust is almost equal to C if not better in the graph, however, I think equally-skilled programmers in either language would show the Rust programmer spending more 'energy' programming and iterating than the C programmer, but then make the argument that the C program will use more 'energy' downstream if bugs slip in. In any case, an eye-opening metric on what I, and I am sure many, take for granted. Cool.
Good work getting TensorRT running we had a real pain in the butt recently when working with it and just opted to go with ONNXRuntime, their graph optimizer and their TensorRT backend -- may not be as fast as straight TensorRT from comparisons I've seen but it got us to a competitive inference and latency so we're happy with it.
Any word on latency? I didn't see anything in the article. I guess, since this is a synthetic test just pumping a single image file through repeatedly instead of an actual video stream, then it wouldn't realistically be measurable. But if latency is particularly low, this would be a boon for AR systems.
BTW, this is pumping the same video file through the network - not just a single file.
I don't measure latency, but this is not a deep pipeline so it's easy to calculate.
Latency fundamentally limited by the model processing single frame, all-in-all, probably somewhere around 10 to 15ms depends on your input size (assuming VGA type of input). This is a great article talks about system-engineering for the vision pipeline, but to solve the latency issue, you need either a beefier processor (or more specialized processor) or a better tuned algorithm.
> There is evidence (measured using gil_load) that we were throttled by a fundamental Python limitation with multiple threads fighting over the Global Interpreter Lock (GIL).
Can anyone comment on how often this is a problem and if this problem is truly fundamental to Python? Could it be solved in a Python 3.x release?
Yes, this is a fundamental part of Python. By default, a single Python process is single-threaded in the traditional sense. So, using "threads" (i.e. the Threading module) in Python is actually more like using fibers in some other language. They're not OS threads. So, if you're not waiting on I/O, then yes, the threads will fight over the GIL and performance will suffer. This is inherent to Python and will not be changed.
But there's a few more things that can be said about this. Python "threads" are really just a mental construct for designing programs. The selling point is that you can share variables and data between "threads" without having to worry about locks or data corruption or anything like that. It just works. But, even with that advantage, you're relying on Python to switch between "threads" on its own, and that could easily slow things down. If you're willing to drop the mental construct and go for better performance but still use a single process and be able to share variables, the asyncio module will let you control when the main Python process will move between points of code flow.
However, if you really want to use traditional multiple processes/threads just use the Multiprocessing module. It actually launches multiple Python processes and links them together. It's called in a similar fashion to Threading, so there isn't much code change for that part. But because it's no longer a single process - and no longer bound by the GIL - you can't share data between the processes as easily. With Multiprocessing, you'll need to create slightly more complex data structures (like a multiprocessing manager namespace) to share that data. It's not that hard, but it requires a bit of planning ahead of time.
[0] - https://www.youtube.com/watch?v=QbKA9uNgzYQ