Definitely. Last weekend, I had a simple project of locking my computer screen when I am away with the help of the webcam. I know this is trivial with current state-of-the-art pretrained neural net models, but I did give up.
I couldnt find an easy way to fetch pretrained models (from where? which one?), feed the webcam stream, and then do inference "is there a face in the image?".
It would probably be trivial for someone who knows CV libraries or tensorflow, but we really miss the good old time when 5 obvious lines of Python did the job.
Maybe a new Python standard library with an API for AI would be useful here.
Three lines, everything else is abstracted. You would have ai.vision and ai.text for a start, with some accessors to other state-of-the-art models (like ai.vision.imagenet if you know better what your are doing)
The underlying models would be shipped and updated with newer versions of Python.
Not exactly. The problem is that different models are tuned for different use cases. So I'd expect something like this:
$ pip install pymodel-ResNet-1337
#! /usr/bin/env python3
import ai
model = ai.loadModel("ResNet-1337",input=ai.input.Image,output=ai.output.Category)
model.guess(open('myimage.png'))
Problems abound, for example, how would you resample/rescale the image.
I'd want, to start, a standard model format that can be serialized/deserialized into any language that can be (for example) pip installed and loaded. People seem to use HD5 but I don't think there is any sort of "standard".
So I'd expect the first incarnations of this idea to look like this:
$ pip install tfmodel-ResNet-1377
or
$ pip install kerasmodel-ResNet-1337
With some hooks for loading models:
#!/usr/bin/env python3
# whereas before you'd build a network, train it, and then use it, here you get the whole shebang in one go
model = keras.loadInstalledModel("ResNet-1337")
I've implemented something similar with Gstreamer for my own work. Gstreamer has some of the most disorganized documentation known to man, though, and if something doesn't work, you have to do elaborate googling first to figure out what's wrong (error messages aren't generally helpful) and then experiment with how to fix it. There are also 2 distinct incompatible versions of it in the wild: 0.10 and 1.0.
Needless to say, I strongly suspect very few researchers have enough dev chops to meaningfully use Gstreamer. When it does work, though, it's pretty great, and it offers reasonable integration with GTK and can be made to work with Cairo. I got it to the point where it works for what I need to do, but I wouldn't want to release the code on GitHub because it's relatively high maintenance.
Interestingly I ended up with a similar layout: a set of frame producers (video file, webcam, sequence of jpegs), 2 video sinks (gtk and mp4), and the code to draw stuff in between (using PyCairo for vector and PIL for raster).
I am glad to receive some validation on my approach. The way you did and the way I am doing it seems very natural to me. I explored the gstreamer way at the beginning, but thought that I wanted to build something that would work out of the bat.
I hate how publishing on github makes people assume they have _any_ obligations to/for it.
Just make an issue with the title "Use it as YOU see fit; I don't maintain this; if you pay I will probably look at your request" (the last is optional) .
The sheer existence of this helps. You don't pay for hosting.
Wow if you figured out how to get gstreamer working -congrats . I’d love a gist in python explaining how to set it up —- the docs are so hard to understand