Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Pip3 install videoflow – new Python library to do computer vision on video (github.com/videoflow)
151 points by jadielam on May 24, 2019 | hide | past | favorite | 16 comments


I don't know about you guys, but I find it really cool that pre-trained models will be part of open source software from now on.


Definitely. Last weekend, I had a simple project of locking my computer screen when I am away with the help of the webcam. I know this is trivial with current state-of-the-art pretrained neural net models, but I did give up.

I couldnt find an easy way to fetch pretrained models (from where? which one?), feed the webcam stream, and then do inference "is there a face in the image?".

It would probably be trivial for someone who knows CV libraries or tensorflow, but we really miss the good old time when 5 obvious lines of Python did the job.

Maybe a new Python standard library with an API for AI would be useful here.


I think you're onto something. A "pip install" for models. Perhaps "pip install" is sufficient and we just need to start using it.


Back in the days, when I started Python, I didn't use external libraries. It was all the included batteries and docs, and it was great to start.

I was thinking about a new high-level module in the standard, you would just (pseudo python code) :

  import ai.vision
  guesses = ai.vision.guess(open('myimage.png'))
  print(guesses)
  # ["cat","person","boat"]
Three lines, everything else is abstracted. You would have ai.vision and ai.text for a start, with some accessors to other state-of-the-art models (like ai.vision.imagenet if you know better what your are doing)

The underlying models would be shipped and updated with newer versions of Python.


Not exactly. The problem is that different models are tuned for different use cases. So I'd expect something like this:

   $ pip install pymodel-ResNet-1337

   #! /usr/bin/env python3
   import ai
   model = ai.loadModel("ResNet-1337",input=ai.input.Image,output=ai.output.Category)
   model.guess(open('myimage.png'))
Problems abound, for example, how would you resample/rescale the image.

I'd want, to start, a standard model format that can be serialized/deserialized into any language that can be (for example) pip installed and loaded. People seem to use HD5 but I don't think there is any sort of "standard".

So I'd expect the first incarnations of this idea to look like this:

   $ pip install tfmodel-ResNet-1377
or

   $ pip install kerasmodel-ResNet-1337
With some hooks for loading models:

   #!/usr/bin/env python3

   # whereas before you'd build a network, train it, and then use it, here you get the whole shebang in one go
   model = keras.loadInstalledModel("ResNet-1337")
Rest of it is up to the user, as usual


Could just use opencv and check for your face.


try yolo.


I kinda like that the title is basically a call to action button in the form of how to get the product.


The repo is trending on Github today: https://github.com/trending/python?since=daily


I've implemented something similar with Gstreamer for my own work. Gstreamer has some of the most disorganized documentation known to man, though, and if something doesn't work, you have to do elaborate googling first to figure out what's wrong (error messages aren't generally helpful) and then experiment with how to fix it. There are also 2 distinct incompatible versions of it in the wild: 0.10 and 1.0.

Needless to say, I strongly suspect very few researchers have enough dev chops to meaningfully use Gstreamer. When it does work, though, it's pretty great, and it offers reasonable integration with GTK and can be made to work with Cairo. I got it to the point where it works for what I need to do, but I wouldn't want to release the code on GitHub because it's relatively high maintenance.

Interestingly I ended up with a similar layout: a set of frame producers (video file, webcam, sequence of jpegs), 2 video sinks (gtk and mp4), and the code to draw stuff in between (using PyCairo for vector and PIL for raster).


I am glad to receive some validation on my approach. The way you did and the way I am doing it seems very natural to me. I explored the gstreamer way at the beginning, but thought that I wanted to build something that would work out of the bat.


I think you should release the code. It might be helpful in its own ways.


I hate how publishing on github makes people assume they have _any_ obligations to/for it. Just make an issue with the title "Use it as YOU see fit; I don't maintain this; if you pay I will probably look at your request" (the last is optional) .

The sheer existence of this helps. You don't pay for hosting.


Wow if you figured out how to get gstreamer working -congrats . I’d love a gist in python explaining how to set it up —- the docs are so hard to understand


How is it different from CV2?


Hey looks promising! I’m slowly adding vision to my robot project, and I’ve got a couple of stereo cameras I need to process.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: