I built this for my CS bachelor's thesis. The goal was to develop a system capable of detecting violent actions and specific weapons (knives) in video streams in real-time.
The system combines object detection, pose estimation, and temporal analysis to recognize violent actions in video. It uses YOLO11n-detect to identify knives and YOLO11n-pose to extract 17 skeletal keypoints from people in the scene. Individuals are tracked across frames using BoT-SORT so their movements can be analyzed over time. These sequences of normalized skeletal keypoints are then processed by a Bidirectional LSTM, which classifies the action as violent or non-violent.
I've also included a YouTube playlist showing the system in action on various real-world and simulated scenarios.
I’m looking for feedback on the architecture, particularly on how to better handle complex group interactions or reduce ambiguity in high-speed non-violent movements.
Yes, you're nit the first telling me this. At first i wanted to make full custom pcb with smd components. But it cost me too much to make just one and some components were difficult to find, so i opted to make more like a shield so is more easy to find the componets and piu easier to solder than SMD for those who have never done it before. This is the same.The last one is also valid for the reason why it was written in Arduino and not in C, that at least even those who are not so familiar with it can modify it
I’m not sure I’ll ever understand HN aggression with certain downvotes. This is the OP from Italy and likely English is not their first language. Why is this comment so bad?
In the first version i added It, probably on the initial commit you can see It, but It have a problem with the microcontroller that i use, so for now i removed it
And that's very fair but an OOK replayer is not a clone. The hardware isn't nearly the same capability. That's fine, it's just a different product entirely
Exactly, the flipepr have a enormous community that help, beyond all the people that work for flipper. So in this case is no, the software support is not even remotely comparable
The system combines object detection, pose estimation, and temporal analysis to recognize violent actions in video. It uses YOLO11n-detect to identify knives and YOLO11n-pose to extract 17 skeletal keypoints from people in the scene. Individuals are tracked across frames using BoT-SORT so their movements can be analyzed over time. These sequences of normalized skeletal keypoints are then processed by a Bidirectional LSTM, which classifies the action as violent or non-violent.
I've also included a YouTube playlist showing the system in action on various real-world and simulated scenarios.
I’m looking for feedback on the architecture, particularly on how to better handle complex group interactions or reduce ambiguity in high-speed non-violent movements.
Repo: https://github.com/lraton/real-time-violent-action-detection
reply