Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Shelly: Write Terminal Commands in English (github.com/paletov)
35 points by paletov on Jan 16, 2024 | hide | past | favorite | 54 comments
Shelly is a powerful tool that translates English into commands that can be seamlessly executed in your terminal. You won't have to remember obscure commands anymore.



For macOS users, iTerm2 offers the exact same feature [1][2] since v3.5.0beta9 (Dec 19, 2022).

Both Shelly [3] and iTerm use OpenAI under the hood. Here are several alternatives [4].

[1] https://github.com/gnachman/iTerm2/commit/7bcc4e0bedb22c4fd9...

[2] https://cixtor.com/blog/iterm2-openai

[3] https://github.com/paletov/shelly/blob/main/src/llm_service....

[4] https://github.com/search?q=terminal+chatgpt&type=repositori...


I looked at the iTerm2 example at [2]. The result was:

  find ~/Library -name '*.binarycookies' -exec binarycookies {} ; |
  grep -E "GitHub|Slack" |
  awk '{print $NF "\t" $1}'
Isn't that wrong? I mean, set aside how the search pattern also matches "Slackware", but my reading of the binarycookies command at https://github.com/cixtor/binarycookies/blob/master/cmd/bina... says the output columns are:

  expires domain path name value "Secure"? "HttpOnly"? comment?
so the awk command will get the cookie value (at $NF for most cookies) and the expires (at $1), when the request asked for the cookie name ($4) and value ($5).

The following (assuming no typo) looks like it should actually do the request:

  find ~/Library -name '*.binarycookies' -exec binarycookies {} ; |
  awk '$4 == "Google" || $4 == "Slack" {print $4 "\t" $5}'
OTOH, I expect the author of the tool to expect best how to parse its output, so perhaps I'm looking at the wrong tool?


Not so long ago I used ChatGPT to help me come up with a command to delete files that were older than 7 days.

Fortunately I checked the result before copy pasting what GPT told me as it would have deleted any file older than 7 days on the machine. I know it was “my fault” to begin with as I didn’t explicitly specify “in the current directory” or “in a given directory”. But I can see how making such a (very nice nonetheless) tool part of one’s daily routine could quickly lead to damaging outcomes.


For any shell command where a minor mistake has the potential of ruining your day, take a look at “try”, which will allow you to inspect the effects before running it against a live system: https://github.com/binpash/try


> Please note that try is a prototype and not a full sandbox, and should not be used to execute commands that you don't already trust on your system, (i.e. network calls are all allowed)


True, ‘try’ won’t make reading and understanding a command obsolete, but it’s a great tool to prevent accidental deletion of important files, like OP described.


I was taught that Try was to be sure your command was well formed, if its a complex command with some regex for example...

Though I have not used it much over the years, as I havent had to do much personal regex for a long while.


That's a great tool! Hope OP's sees this. If the default would be to use "try" when using shelly and I just had to "commit [Y/n]" at the end, then I would seriously consider using it. One could do something like `shelly "my description" --force` to bypass the "try" usage.


No, but I done something more stupid. I had it help me write a command to delete Woocommerce/Wordpress users from my site which were older than 1 year and who had never ordered before.

It deleted tens of thousands of active users who had placed orders. I had a backup which was only a couple of hours old, thank the lord.

Lesson learned.


I feel really excited for this era of software development.

As someone who actually reads manuals instead of just running output from language models, I'm excited for the work I'll get to come in and fix the problems created by folks who just run output from language models.


This is very cool. I wonder if we are going to have to get used to a new paradigm in software, where you have tools that are incredibly powerful and that you just accept that sometimes it 'gets it wrong'. There's no debugging, no root cause analysis, just a shrug of the shoulders and 'sometimes it gets it wrong mate, what're you gonna do?'. This is probably the mental model most laypeople have of software already I suspect but for software engineers it's somewhat of a shift. Bit of a deal with the devil, perhaps.


tools that are incredibly powerful and that you just accept that sometimes it 'gets it wrong'. There's no debugging, no root cause analysis, just a shrug of the shoulders and 'sometimes it gets it wrong mate, what're you gonna do?'.

So, pretty much what we have now with the vast majority of mega-tech companies with zero customer service. Plus all the growth-hack startups playing "monkey see, monkey do."


> you have tools that are incredibly powerful and that you just accept that sometimes it 'gets it wrong'

I think this is already the case with existing software tools and developers. That's how code ships with bugs both known and unknown.


The difference is that you can throw a senior engineer at a bug and know that the issue can be root caused and fixed because the behavior is "fundamentally deterministic", whereas with AI for the foreseeable future all you can do is maybe tweak the model and pray.


> There's no debugging, no root cause analysis, just a shrug of the shoulders and 'sometimes it gets it wrong mate, what're you gonna do?'

This been the case for as long as I can remember, seems to have more to do with individual developers typical methodology rather than the tools available.

I remember a bunch of issues with early npm versions were resolved by deleting the node_modules directory and running `npm install` again. Sometimes it borked the directory, sometimes it didn't, deleting everything and beginning from the beginning resolved many of those issues.


You are absolutely right of course, most day to day bugs we don't have either the inclination, time or knowledge to root cause much less fix, but I feel it's a comfort to know that you (or someone) could.


Well, as in many cases, I think it depends. There are some situations where you can accept errors, but in some others definitely not: imagine you're trying to delete a set of files from a directory that contains other files you're interested in. If the AI commits a mistake and deletes some of the other files you will be disappointed. Now, you should have a backup. But what if you had the AI assistant come up with the backup command for you and by mistake it didn't include that directory?


I wonder how we might design for such a system. I'd say as a starting point any action should be undo-able. Then you give it a go to see if it works, and if it didn't you can always get back. I've read this is good practice in any system, as a user can get inured to 'are you sure' dialogs and just click through them reflexively.


This makes me think of an Asimov story in which people could no longer do math because they hadn't needed to for centuries, until they did and couldn't.

I can see the benefit for speed, the same way I would use a calculator to multiply 6 4 digit numbers, but not because it's "obscure."

If the command is "obscure" one probably doesn't need it often (e.g. me with any dd command) in which case asking in English might cause unforeseen problems.


That story is likely “The feeling of power”.

See: https://en.m.wikipedia.org/wiki/The_Feeling_of_Power

And for the full story: https://archive.org/details/1958-02_IF/page/n4/mode/1up?view...


Ok so this uses an LLM to put together a command for you.

The LLM doesn't actually understand the command, it's just "guessing" each 'word' in the response, as the most likely, based on the data it was trained on.

LLM's in general are trained on internet published bodies of text, for obvious reasons.

One of the largest bodies of text that would inform something like this, will be sites like the SuperUser, AskDifferent & Unix stackexchanges, and possibly the litany of gists out there.

I'm curious to see how the results would fare vs the top-ranked answer to the top question matching the input query from each of the three stack exchange sites.

Either way it sounds like a sub-par solution compared to just reading a couple of man pages.


Makes you wonder how much broken code LLMs have been trained on via StackOverflow questions.


I guess LLMs in general could be thought of as just a faster way to copy and paste bad S/O code, so perhaps it's just the next logical step of "fail fast" for people who need to fail faster than they can copy and paste.

/s

On a serious note: the one thing I think an LLM would be perfectly suited for, I haven't seen anyone make yet: producing real-enough sounding copy text, as a better replacement for Lorem Ipsum (aka Lipsum) text.

The point of Lipsum is that the text is gibberish and thus should be ignored, so the focus is on the important factors: layout, colours, design, functionality.

But in 20 years I've never once seen a non-technical person who could grasp that concept, so it's defeats the whole purpose it's being used. Of course it's also inevitable that some Lipsum will end up in the final product sometimes too, which confuses non-technical people and highlights an obvious failure to technical people.

LLM powered text waffling on about the topic would be perfect. It doesn't matter if it hallucinates shit, it just needs to seem correct at a casual glance and it's already a massive improvement.

Essentially Turbo/Retro/Hyper Encabulator[1], in text.

1: https://www.youtube.com/watch?v=5nKk_-Lvhzo


>Of course it's also inevitable that some Lipsum will end up in the final product sometimes too

I'd think this would happen much more often if instead of Lorem Ipsum you switch to something that looks at a passing glance more realistic.


Possibly, but it's also much more passable to non technical users I think. "The text is wrong" is a scenario anyone can understand. "The text looks like Latin" doesn't make sense to anyone who doesn't understand what Lipsum is, so they just think it's completely broken.


I developed `kel` it does the similar thing with more features such as chat with docs and supports more LLMs, please check https://kel.qainsights.com


This looks really cool.

I am fairly new to working with LLMs (have been doing local versions of stable diffusion, basically) -- but interested in kel with Ollama...

I want to run a local one, but a newb Q:

If I run an Ollama and kel on my machine, can I ask about stuff going on ON my machine? I dont need to CLI calc, or ask for populations...

I want to know

>"How much space are all the PDFs on the machine taking up?"

>"Move all the PDFs to /home/PDFs, but organize them based on what they are related to"

>"write me an ffmpeg command to convert directory /pics so all images are 1024x1024"

>"Which config files in this directory have [text]"

That sort of thing.

I want to ask about MY machine's state.

How do?


Yes, it should be possible. This is something I am working.


How is your Wife's AWS EC2 Automation script working now thats a year old?


I'm pretty sure OpenAI's data collection is going to be on par with Google's and Facebook's soon.


I've been working on implementing this same thing but using gpt4all rather than requiring an API key. I got a bit discouraged when my first few test examples didn't work. One was like, "show me all files less than 3 weeks old ending in .txt". It got the shell date manipulation stuff completely wrong, and no amount of begging and hinting seemed to get it to work. It also seemed to prefer "ugly" things like piping through sed and another shell rather than using xargs.


But what if shelly hallucinates with "rm -rf /" ?


Yeah, that would be bad. That's why I added an intermediate step.

It opens the command in an editor like when you run "git commit", so you can edit it. Saving executes the final version.


zsh can replace special variables like `!!` and `!:1` inline when you hit Tab. Or, if you hit Enter, it won't execute the command itself but prepare a new prompt pre-filled with the same command and all special variables substituted to their values. I don't really know how it works in zsh (bash for example doesn't warn you and just executes the command right away with the actual values in place), but anyway it may be more seamless than opening an editor.


Is there a cowboy mode, where I can skip this step and 'just run it'? (excuse me if this is mentioned in the docs)


It's too early for cowboy mode. The LLM isn't reliable enough to let it run commands without approving them first.

As others noted, "try" could be used to make the commands testable, so I'll look into it this weekend.


Doesn't look like it.


I had success using curl alias to query GPT for any command.

  gpt=curl -s https://api.openai.com/v1/chat/completions -H "Content-Type: application/json" -H "Authorization: Bearer <YOUR_API_KEY>" -d "{\"model\": \"gpt-3.5-turbo\", \"messages\": [{\"role\": \"system\", \"content\": \"Give short single line answer.\"}, {\"role\": \"user\", \"content\": \" $\* \"}]}" --insecure | rg -o "content\":(.\*)" -r $$1
Updated it to use your prompt instead and should give much nicer results now.


I'm curious how the example

  shelly 'find "class" in files in this directory without the .venv dir'
is interpreted. I'm assuming something like:

  grep -l -R --exclude-dir .venv class
and possibly with the "--null" option, but I can also see how using "find" suggests the user is thinking of the find command, so wants filenames containing the substring "class" but not in the .venv directory. My first attempt at that is:

  find . -name '*class*' | grep -Fv '/.venv/'


Currently it returns:

grep -r --exclude-dir=.venv "class" .


Ahh, so also report the line where it's found, and not the file. Also reasonable.

I see the documentation now shows the expansion, which is nice.

It's interesting how 'check size of files in this directory but format it with gb/mb, etc' is converted to "du -sh *" while my Mac's "man du" has this:

     Show disk usage for all files in the current directory. 
     Output is in human-readable form:

           # du -ah
I point it out because the author of the man page intends for "all files in the current directory" to include files in subdirectories, and for all dot files, while OpenAI GPT version only shows info for non-dot files, and only in the top-level directory. (There will also be an issue if there are too many files for "*" expansion.)

OTOH, "'find "class" in files in this directory without the .venv dir'" uses a recursive grep, which means that "in this directory" here is taken to include subdirectories ... which the poser of the question clearly intended as otherwise "without the .venv dir" does not make sense.

Still, one "files in this directory" in one case includes dot files while the other does not, which is a subtle inconsistency.


If I can’t remember the obscure command in the first place, how will I know that what the AI is suggesting is correct? I’ll need to look it up anyway.

I’m assuming the commands aren’t automatically executed, as that sounds absurd.


> I’m assuming the commands aren’t automatically executed, as that sounds absurd.

Seeing how much some people trust LLMs, I wouln’t be surprised. We need to come up with an equivalent of Darwin Awards for computer systems.


It opens the command in nano/vim in the terminal and lets you edit it. Saving executes the final version.

Currently, the user has to be experienced enough to know if a command is safe to be executed. However, you are right, and I'll look into adding a safety feature that signifies if a command is potentially dangerous.


> Saving executes the final version

That is quite dangerous, and I assume that it is actually "on exit" and not "on save", as I assume it is implemented by creating a temp file and opening it with the editor. There are various ways that an editor can crash/abort which would automatically execute the command.


Looking up a given command is at least possible. Remembering a forgotten command doesn't work at all.


Nice! How does it compare with https://github.com/TheR1D/shell_gpt ?


My understanding is that ShellGPT aims to be a complete OS assistant. It's similar to Open Interpreter (https://github.com/KillianLucas/open-interpreter).

Shelly is a mini tool at the moment that only generates and executes commands for you.


The system prompt seems very simple and short, is it enough? does it work well? (I don't have an OpenAI key...)


Too bad you need a subscription to try it out.


Being open source, it shouldn't be too hard to modify to use gpt4all locally, which presents the same API as openai.


I love the name :)


They might want to change it though, it's already used elsewhere

https://apps.apple.com/us/app/shelly-ssh-client/id989642999





Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: