Token costs are not zero when you’re running local models, because you paid for the hardware, and you can’t scale inference indefinitely without paying for more hardware.
Ok, but running a 11B model gets things 60% of the time right and consumes maximum of electricity of your machine. Not sure if that makes you product the best. Further video generation is very compute intensive. I guess price will decrease over time but the technical advance will allways be for the smarter model