
Pip3 install videoflow – new Python library to do computer vision on video - jadielam
https://github.com/videoflow/videoflow
======
cheez
I don't know about you guys, but I find it really cool that pre-trained models
will be part of open source software from now on.

~~~
antpls
Definitely. Last weekend, I had a simple project of locking my computer screen
when I am away with the help of the webcam. I know this is trivial with
current state-of-the-art pretrained neural net models, but I did give up.

I couldnt find an easy way to fetch pretrained models (from where? which
one?), feed the webcam stream, and then do inference "is there a face in the
image?".

It would probably be trivial for someone who knows CV libraries or tensorflow,
but we really miss the good old time when 5 obvious lines of Python did the
job.

Maybe a new Python standard library with an API for AI would be useful here.

~~~
cheez
I think you're onto something. A "pip install" for models. Perhaps "pip
install" is sufficient and we just need to start using it.

~~~
antpls
Back in the days, when I started Python, I didn't use external libraries. It
was all the included batteries and docs, and it was great to start.

I was thinking about a new high-level module in the standard, you would just
(pseudo python code) :

    
    
      import ai.vision
      guesses = ai.vision.guess(open('myimage.png'))
      print(guesses)
      # ["cat","person","boat"]
    

Three lines, everything else is abstracted. You would have ai.vision and
ai.text for a start, with some accessors to other state-of-the-art models
(like ai.vision.imagenet if you know better what your are doing)

The underlying models would be shipped and updated with newer versions of
Python.

~~~
cheez
Not exactly. The problem is that different models are tuned for different use
cases. So I'd expect something like this:

    
    
       $ pip install pymodel-ResNet-1337
    
       #! /usr/bin/env python3
       import ai
       model = ai.loadModel("ResNet-1337",input=ai.input.Image,output=ai.output.Category)
       model.guess(open('myimage.png'))
    

Problems abound, for example, how would you resample/rescale the image.

I'd want, to start, a standard model format that can be
serialized/deserialized into any language that can be (for example) pip
installed and loaded. People seem to use HD5 but I don't think there is any
sort of "standard".

So I'd expect the first incarnations of this idea to look like this:

    
    
       $ pip install tfmodel-ResNet-1377
    

or

    
    
       $ pip install kerasmodel-ResNet-1337
    

With some hooks for loading models:

    
    
       #!/usr/bin/env python3
    
       # whereas before you'd build a network, train it, and then use it, here you get the whole shebang in one go
       model = keras.loadInstalledModel("ResNet-1337")
    

Rest of it is up to the user, as usual

------
Waterluvian
I kinda like that the title is basically a call to action button in the form
of how to get the product.

------
jadielam
The repo is trending on Github today:
[https://github.com/trending/python?since=daily](https://github.com/trending/python?since=daily)

------
m0zg
I've implemented something similar with Gstreamer for my own work. Gstreamer
has some of the most disorganized documentation known to man, though, and if
something doesn't work, you have to do elaborate googling first to figure out
what's wrong (error messages aren't generally helpful) and then experiment
with how to fix it. There are also 2 distinct incompatible versions of it in
the wild: 0.10 and 1.0.

Needless to say, I strongly suspect very few researchers have enough dev chops
to meaningfully use Gstreamer. When it does work, though, it's pretty great,
and it offers reasonable integration with GTK and can be made to work with
Cairo. I got it to the point where it works for what I need to do, but I
wouldn't want to release the code on GitHub because it's relatively high
maintenance.

Interestingly I ended up with a similar layout: a set of frame producers
(video file, webcam, sequence of jpegs), 2 video sinks (gtk and mp4), and the
code to draw stuff in between (using PyCairo for vector and PIL for raster).

~~~
jadielam
I think you should release the code. It might be helpful in its own ways.

~~~
namibj
I hate how publishing on github makes people assume they have _any_
obligations to/for it. Just make an issue with the title "Use it as YOU see
fit; I don't maintain this; if you pay I will probably look at your request"
(the last is optional) .

The sheer existence of this helps. You don't pay for hosting.

------
MauiWarrior
How is it different from CV2?

------
TaylorAlexander
Hey looks promising! I’m slowly adding vision to my robot project, and I’ve
got a couple of stereo cameras I need to process.

