Hacker News new | past | comments | ask | show | jobs | submit login
Emacs-copilot: Large language model code completion for Emacs (github.com/jart)
377 points by yla92 8 months ago | hide | past | favorite | 156 comments



I'm sure this, and other LLM/IDE integration has it's uses, but I'm failing to see how it's really any kind of major productivity boost for normal coding.

I believe average stats for programmer productivity of production-quality, debugged and maybe reusable code are pretty low - around 100 LOC/day, although it's easy to hit 1000 LOC/day or more when building throwaway prototypes/etc.

The difference between productivity in terms of production quality code and hacking/prototyping is because of the quality aspect, and for most competent/decent programmers coding something themselves is going to produce better quality code, that they understand, than copying something from substack or an LLM. The amount of time it'd take to analyze the copied code for correctness, lack of vulnerabilities, or even just decent design for future maintainability (much more of a factor in terms of total lifetime software cost than writing the code in the first place) would seem to swamp any time gained in not having to write the code yourself (which is basically the easiest and least time consuming part of any non-trivial software project).

I can see the use of LLMs in some learning scenarios, or for cases when writing throwaway code where quality is unimportant, but for production code I think we're still a long way from the point where the output of an LLM is going to be developer-level and doesn't need to be scrutinized/corrected to such a degree that the speed benefit of using it is completely lost!


> I'm sure this, and other LLM/IDE integration has it's uses, but I'm failing to see how it's really any kind of major productivity boost for normal coding.

The Duomo in Florence took multiple generations to build. Took them forever to figure out how to build a roof for the thing. Would you want to be a builder who focuses your whole life on building a house you can't live in because it has no roof? Or would you simply be proud to be taking part in getting to help lay the foundation for something that'll one day be great?

That's my dream.


Well, I'm just commenting on the utilty of LLMs, as they exist today, for my (and other related) uses cases.

No doubt there will be much better AI-based tools in the future, but I'd argue that if you want to accelerate that future then rather than applying what's available today, it'd make more sense to help develop more capable AI that can be applied tomorrow.


We need the full pipeline of tools. What jart did is helping the future users of AI gain familiarity early.


In general I almost never break even when trying to use an LLM for coding. I guess there’s a selection bias because I hate leaving my flow to go interact with some website, so I only end up asking hard questions maybe.

But since I wired Mixtral up to Emacs a few weeks ago I discovered that LLMs are crazy good at Lisp (and protobuf and prisma and other lispy stuff). GPT-4 exhibits the same property (though I think they’ve overdone it on the self-CoT prompting and it’s getting really snippy about burning compute).

My dots are now like recursively self improving.


> though I think they’ve overdone it on the self-CoT prompting and it’s getting really snippy about burning compute

hear, hear! I have the exact same impression, probably since the gpt-4-turbo preview rolled out


Man, I really want to get this working. Any recommendations for how to prompt or where this functionality helps?


The only thing I've used GPT for is generating commit messages based on my diff, because it's better than me writing 'wip: xyz' and gives me a better idea about what I did before I start tidying up the branch.

Even if I wanted to use it for code, I just can't. And it's actually make code review more difficult when I look at PRs and the only answer I get from the authors is "well, it's what GPT said." Can't even prove that it works right by putting a test on it.

In that sense it feels like shirking responsibility - just because you used an LLM to write your code doesn't mean you don't own it. The LLM won't be able to maintain it for you, after all.


"it's what GPT said" should be a fireable offense


I wouldn't go that far; we all want to be lazy. Using it as a crutch and assuming everyone else uses GPT so it's all good - well, nobody is going to understand it any more.

Half of the stuff GPT comes up with in the reviews I could rewrite much more simply and directly, while improving code comprehension.


That may be a bit much, but I'd think it grounds for sitting down with the person in question to discuss the need for understanding the code they turn in.


Have you seen modern React frontend dev in JS? They copy paste about 500-1000 LOC per day and also make occasional modifications. LLMs are very well suited for this kind of work.


That does seem like a pretty much ideal use case!


Here's where my Emacs is putting the most effort when it comes to completion: shell sessions.

In my line of work (infra / automation) I may not write any new code that's going to be added to some project for later use for days, sometimes weeks.

Most of the stuff I do is root cause analysis of various system failures which require navigating multiple machines, jump hosts, setting up tunnels and reading logs.

So, the places where the lack of completion is the most annoying are, for example, when I have to compare values in some /sys/class/pci_bus/... between two different machines: once I've figured out what file I need in one machine in its sysfs, I don't have the command to read that file on the other machine, and need to retype it entirely (or copy and paste it between terminal windows).

I don't know what this autocompletion backend is capable of. I'd probably have to do some stitching to even get Emacs to autocomplete things in the terminal instead of or in addition to the shell running in it, but, in principle, it's not impossible and could have some merit.


> I'd probably have to do some stitching to even get Emacs to autocomplete things in the terminal instead of or in addition to the shell running in it

I wonder what you mean. The `dabbrev-expand` command (bound to `M-/` but default) will complete the characters before point based on similar strings nearby, starting with strings in the current buffer before the word to complete, and extending its search to other buffers. If you have the sysfs file path in one buffer, it will use that for completion. You may need to use line mode for for non-`M-x shell` terminals to use `dabbrev-expand`.

> In my line of work (infra / automation) I may not write any new code that's going to be added to some project for later use for days, sometimes weeks. > > Most of the stuff I do is root cause analysis of various system failures which require navigating multiple machines, jump hosts, setting up tunnels and reading logs.

This sounds like an ideal use case for literate programming. Are you using org-mode? Having an org-file with source blocks would store the path string for later completion by the methods described above (as well as document the steps leading to the root cause). You could also make an explicit abbrev for the path (local or global). The document could make a unique reference or, depending on how many and how common the paths are, you could define a set of sequences to use. For example "asdf" always expands to /sys/class/pci_bus/X and "fdsa" expands to something else.

Hope that helps or inspires you to come up with a solution that works for you!


> This sounds like an ideal use case for literate programming.

No... not at all... Most of the "code" I write in this way is shell commands mixed with all kind of utilities present on the target systems. It's so much "unique" (in a bad way) that there's no point trying to automate it. The patterns that emerge usually don't repeat nearly often enough to merit automation.

Literate programming is the other extreme, it's like carving your code in stone. Too labor intensive to be useful in the environment where you don't even remember the code you wrote the day after and in most likelihood will never need it again.

> will complete the characters before point based on similar strings nearby

They aren't nearby. They are in a different tmux pane. Also, that specific keybinding doesn't even work in terminal buffers, I'd have to remap it to something else to access it.

The larger problem here is that in my scenario Emacs isn't the one driving the completion process (it's the shell running in the terminal), for Emacs to even know those options are available as candidates for autocompletion it needs to read the shell history of multiple open terminal buffers (and when that's inside a tmux session, that's even more hops to go to get to it).

And the problem here, again, is that setting up all these particular interactions between different completion backends would be very tedious for me, but if some automatic intelligence could do it, that'd be nice.


> once I've figured out what file I need in one machine in its sysfs, I don't have the command to read that file on the other machine, and need to retype it entirely (or copy and paste it between terminal windows).

Tramp?


How would Tramp know that I need an item from history of one session in another? Or maybe I'm not understanding how do you want to use it?


Hmm yeah, I guess I missed that you were focusing on the completion aspect. I think that would be challenging. But if you are running multiple remote shells in Emacs buffers - or editing config files with Tramp - it's easy to copy and paste commands from one host to another.


They are really good at writing your print/console.log statements...


Just what I've been looking for!

Thanks for pushing the tooling of self-hosted LLMs forward, Justine. Llamafiles specifically should become a standard.

Would there be a way of connecting to a remote LLM that's hosted on the same LAN, but not on the same machine? I don't use Apple devices, but do have a capable machine on my network for this purpose. This would also allow working from less powerful devices.

Maybe the Llamafile could expose an API? This steps into LSP territory, and while there is such a project[1], leveraging Llamafiles would be great.

[1]: https://github.com/huggingface/llm-ls


llamafile has an HTTP server mode with an OpenAI API compatible completions endpoint. But Emacs Copilot doesn't use it. The issue with using the API server is it currently can't stream the output tokens as they're generated. That prevents you from pressing ctrl-g to interactively interrupt it, if it goes off the rails, or you don't like the output. It's much better to just be able to run it as a subcommand. Then all you have to do is pay a few grand for a better PC. No network sysadmin toil required. Seriously do this. Even with a $1500 three year old no-GPU HP desktop pro, WizardCoder 13b (or especially Phi-2) is surprisingly quick off the mark.


Hi, I haven't tried this myself, but it seems there's a way? https://github.com/ggerganov/llama.cpp/blob/master/examples/...

The call takes a "stream" boolean: stream: It allows receiving each predicted token in real-time instead of waiting for the completion to finish. To enable this, set to true.

And the response includes: stop: Boolean for use with stream to check whether the generation has stopped (Note: This is not related to stopping words array stop from input options)

Certainly the local web interface has a stop button, and I'm pretty sure that one did work.

But maybe I'm misunderstanding the challenge here?


You're right, llama-cpp-python OpenAI compatible endpoint works with `stream:true` and you can interrupt generation anytime by simply closing the connection.

I use this in a private fork of Chatbot-UI, and it just works.


Llamafiles look a bit scary, like back when StableDiffusion models were distributed as pickled Python files (allowing, in theory, for arbitrary code execution when loading a model) before everyone switched to safetensors (dumb data files that do not execute code). Running a locally installed llama.cpp with a dumb GGUF file seems safer than downloading and running some random executable?


Author here. Thanks for sharing your concern. Mozilla is funding my work on llamafile and Emacs Copilot because Mozilla wants to help users to be able to control their own AI experiences. You can read more about the philosophy of why we're building this and publishing these llamafiles if you check out Mozilla's Trustworthy AI Principles: https://foundation.mozilla.org/en/internet-health/trustworth... Read our recent blog post too: https://future.mozilla.org/blog/introducing-llamafile/ If you get any warnings from Windows Defender, then please file an issue with the Mozilla-Ocho GitHub project, and I'll file a ticket with Microsoft Security Intelligence.


Local AI is definitely a good thing and I can see why llamafiles can be useful. Sounds great for the use-case of a trusted organization distributing models for easy end-user deployment. But if I am going to be downloading a bunch of different llms to try out from various unknown sources it is a bit scary with executables compared to plain data files.


You can download the llamafile executables from Mozilla's release page here: https://github.com/Mozilla-Ocho/llamafile/releases and then use the `-m` flag which lets you load any GGUF weights you want from Hugging Face. A lot of people I know will also just rent a VM with an H100 for a few hours from a company like vast.ai, SSH into it, don't care about its security, and just want to have to wget the fewest files possible. Everyone's threat vector is different. That's why llamafile provides multiple choices so you can make the right decision for yourself. It's also why I like to focus on simply just making things easy, because that's one place where we can have an impact building positive change, due to how the bigger questions e.g. security are ultimately in the hands of each individual.


Not running eval on third party model weights when encouraging consumers to download them seems like the low bar that comes after have any non-executable policy at all, especially for something Mozilla supported.

Edit: I mean as the default. Which requires users to do a big scary --disable-security or equally scary red button to turn off. Which is what browsers do.


Self-hosted LLMs are the future. Who wants to keep evil money sucking corporate non-profits in the driver's seat?

And more importantly, who wants to pipe all their private stuff through their servers? Given their attitude toward other people's copyrighted works its guaranteed to ingested by their model and queried in god mode by Sam Altman himself, looking for genius algorithms or ideas for his on-the-side startups.


Nah, it will be "OpenAI Together With Github Gives You More!" and included with your cell phone bill /s

Let's see if 288GB multi-core M3 processors with 100GbE (on 10m copper!) happen.... but there's always https://huggingface.co/blog/lyogavin/airllm

No coders on StarTrek yoh https://youtu.be/MX95usfB2ZA


Also worth knowing about in this space is ellama: https://github.com/s-kostyaev/ellama which uses the LLM package: https://github.com/ahyatt/llm#ollama to talk to ollama, and while ellama doesn't currently support talking over the network to ollama it also doesn't look like that would be a hard thing to add (specifically there are host and port params the underlying function supports but ellama doesn't use).


Thanks, that looks good. I will trying! I already have a good eMacs setup with GPT-4 APIs, and a VSCode setup, but in the last few months I have 80% moved to using local LLMs for all my projects where LLMs are an appropriate tool.


I've used ollama in the past, a few more moving parts than a llamafile, but it provides API endpoints out of the box (in a very similar format to openai).


I'm running a MacBook Pro M1 Max with 64GB RAM and I downloaded the 34B Q55 model (the large one) and can confirm it works nicely. It's slow, but usable. Note I am running it on my Asahi Fedora Linux partition, so I do not know if or how it is utilizing the GPU. (Asahi has OpenGL support but not Metal.)

My environment is configured with ZSH 5.9. If I invoke the LLM directly as root (via SUDO,) it loads up quickly into a web server and I can interact with it via a web-browser pointed to localhost:8080.

However, when I try to run the LLM from Emacs (after loading the LISP script via M-x ev-b,) I get a "Doing vfork: Exec format error." This is when trying to follow the demo in the Readme by typing C-c C-k after I type the beginning of the isPrime function.

Any ideas as to what's going wrong?


On Asahi Linux you might need to install our binfmt_misc interpreter:

    sudo wget -O /usr/bin/ape https://cosmo.zip/pub/cosmos/bin/ape-$(uname -m).elf
    sudo chmod +x /usr/bin/ape
    sudo sh -c "echo ':APE:M::MZqFpD::/usr/bin/ape:' >/proc/sys/fs/binfmt_misc/register"
    sudo sh -c "echo ':APE-jart:M::jartsr::/usr/bin/ape:' >/proc/sys/fs/binfmt_misc/register"
You can also turn any llamafile into a native ELF executable using the https://cosmo.zip/pub/cosmos/bin/assimilate-aarch64.elf program. There's one for x86 users too.


That fixed it! Many thanks.


Unrelated to the plugin but wow the is_prime function in the video demonstration is awful. Even if the input is not divisible by 2, it'll still check it modulo 4, 6, 8, ... which is completely useless. It could be made literally 2x faster by adding a single line of code (a parity check), and then making the loop over odd numbers only. I hope you people using these LLMs are reviewing the code you get before pushing to prod.


If you really need your own is prime implementation a bit of googling would have given you a much better implementation and some good discussions of pro and cons of various techniques.

Llm uis need a lot of work to match that.


If you just run this, without Emacs:

    ./wizardcoder-python-34b-v1.0.Q5_K_M.llamafile
Then it'll launch the llama.cpp server and open a tab in your browser.


The reviewing that most folks do is a quick glance and a “lgtm”.

If most people actually seriously scrutinized the code (which you should) it’d be apparent that the value proposition of using LLMs is not increased throughout, but better quality code.

If you just accept the output without much scrutiny, sure you’ll increase your throughput, but at the cost of quality and the mental model of the system that you would have otherwise built.


This is great for what it does, but I want a more generic LLM integration that can do this and everything else LLMs do.

For example, one key stroke could be "complete this code", but other keystrokes could be:

- send current buffer to LLM as-is

- send region to LLM

- send region to LLM, and replace with result

I guess there are a few orthogonal features. Getting input into LLM various ways (region, buffer, file, inline prompt), and then outputting the result various ways (append at point, overwrite region, put in new buffer, etc). And then you can build on top of it various automatic system prompts like code completion, prose, etc.


> Getting input into LLM various ways (region, buffer, file, inline prompt), and then outputting the result various ways (append at point, overwrite region, put in new buffer, etc).

gptel is designed to do this. It also tries to provide the same interface to local LLMs (via ollama, GPT4All etc) and remote services (ChatGPT, Gemini, Kagi, etc).


Thank you for gptel, its really what I had been looking for in emacs llm.

Great work.


Glad it's useful.


Gptel as others mentioned, but I can't believe no one linked the impressive and easy to follow demo:

https://www.youtube.com/watch?v=bsRnh_brggM

Lowest friction llm experience I've ever used... You can even use it in the M-x minibuffer prompt.


From elsewhere in this thread:

> Also worth checking out for more general use of LLMs in emacs: https://github.com/karthink/gptel


You're the third person in the last 40 minutes to post a comment in this thread sharing a link to promote that project. https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que... It must be a good project.


It is, but I guess the reason it's mentioned so much right now is that the author posted a pretty convincing video a few days ago to Reddit: https://youtu.be/bsRnh_brggM (post: https://old.reddit.com/r/emacs/comments/18s45of/every_llm_in...)

From what I see, gptel is more interested in creating the best and least intrusive interface - it doesn't concern itself too much about which model you're using. The plan is to outsource the "connection" (API, local) to the LLM to another package, eventually.


Super interesting and I will try it out for sure!

But: The mode of operation is quite different from how GitHub CoPilot works, so maybe the name is not very well chosen.

It's somewhat surprising that there isn't more development happening in integrating Large Language Models with Emacs. Given its architecture etc., Emacs appears to be an ideal platform for such integration. But most projects haven't been worked on for months etc. But maybe the crowd that uses Emacs is mostly also the crowd that would be against utilizing LLMs ?


> maybe the crowd that uses Emacs is mostly also the crowd that would be against utilizing LLMs?

I think a bigger problem is that the crowd that uses emacs is just small. Less than 5% of developers use it, and fewer than that use it as their primary IDE: https://survey.stackoverflow.co/2022/#most-popular-technolog...

(I'm quite sad about this, as someone who pretty much only uses emacs)


I'm an emacs believer (the idea of a programmer's text editor really just being a lisp environment makes a ton of sense), but I'm a very part-time user. There are just so many idiosyncracies that make it hard to get into. No one seems to drink their own kool-aid more fervently than the emacs community, it just feels like "this would make it easier for new users" is never allowed to be a design rationale for anything.

For me things started to get easier once I discovered cua-mode and xclip-mode. I have read some arguments about why these aren't the default, I think those arguments are sensible if you have a PhD in emacs, but for the other 99.99% of humanity they are just big signs that say "go away." It's very silly to me that the defaults haven't evolved and become more usable - the definition of being a power user is that you can and do override lots of defaults anyway, so the defaults should be designed to support new users, not the veterans.


That's because learning how to use Emacs is basically the equivalent of Navy Seals training except for programmers. For Emacs believers, that's a feature, not a bug. The good news is that llamafile is designed to be easy and inclusive for everyone. Emacs users are just one of the many groups I hope will enjoy and benefit from the project.


> I think those arguments are sensible if you have a PhD in emacs.

To get that PhD just start reading „Mastering Emacs“ by Mickey Peterson: https://www.masteringemacs.org

Many people try learning by doing Emacs and it’s not a bad approach. However, I believe learning the fundamental „theory of editing“ will help you quite a lot to grasp this tool’s inherent complexity much faster. And it’s a fun read, I think.


`xclip-mode` looks like it should definitely be included by default. `cua-mode` is tougher because it messes with the default keybindings, making you type C-x twice (or Shift-C-x) for the large number of keybindings that start with C-x. That might be better for newcomers though, and bring more people to Emacs. Personally I would disable `cua-mode` if it were default.


Warning: This turned into a pretty long response somehow

Doesn't cua mode kind of break the keybindings of emacs?

For instance I use:

- C-c C-c

- C-c C-e

Maybe those get moved to some other prefix?

Also I get the argument that C-v in emacs for paste would be nice, but doesn't that make it harder for you to discover yank-pop aka C-y M-y?

The problem to me it seems with using cua-mode medium or long term is not thinking in the system and patterns of emacs.

I assume if one doesn't want to learn different copy paste commands, they also probably don't want to read emacs high quality info manuals which impart deep understanding well.

EDIT: I found a good discussion on this.

Question:

> CUA mode is very close to the workflow I am used to outside Emacs, so I am tempted to activate it.

> But I have learned that Emacs may have useful gems hidden in its ways, and CUA mode seems something that was attached later on.

Parts of response:

> In short, what you “lose” is the added complexity to the key use. Following is more detailed explanation.

> Emacs C-x is prefix key for general commands, and C-c is prefix key of current major mode's commands.

> CUA mode uses C-x for cut and C-c for copy. In order to avoid conflict, cua uses some tricks. Specifically, when there's text selection (that is, region active), then these keys acts as cut and copy.

> But, sometimes emacs commands work differently depending on whether there's a text selection. For example, comment-dwim will act on a text selection if there's one, else just current line. (when you have transient-mark-mode on.) This is a very nice feature introduced since emacs 23 (in year 2009). It means, for many commands, you don't have to first make a selection.

full response: https://emacs.stackexchange.com/a/26878

I suppose it all hinges upon your response to reading this:

> CUA mode is very close to the workflow I am used to outside Emacs,

My response: Workflow outside of emacs?! How can we fix that? Outside of emacs I'm in danger of hearing "you have no power here!".

Typical response: Why can't emacs be more like other programs so I can more easily use it from time to time?


4.5% of all developers isn’t small in absolute terms. And diversity is a good thing.


...which is why the 75% using VS code is a bad thing.


I'm not saying emacs has a low number of users to dunk on emacs: it's my primary editor! I was responding to:

> It's somewhat surprising that there isn't more development happening in integrating Large Language Models with Emacs


I’m a Vim user, so that wasn’t why I replied. It was your saying that it saddens you that Emacs doesn’t have many users.


https://github.com/s-kostyaev/ellama is active, as is https://github.com/jmorganca/ollama (which it calls for local LLM goodness).


Thanks! I was not aware of ellama. Maybe the problem is more one of discoverability :D


I thought there were quite a few emacs llm projects.

There's also llm.el which I've heard gas a push to he in core emacs:

https://emacsconf.org/2023/talks/llm/


For vim, I use a custom command which takes the currently selected code and opens a browser window like this:

https://www.gnod.com/search/ai#q=Can%20this%20Python%20funct...

So I can comfortably ask different AI engines to improve it.

The command I use in my vimrc:

    command! -range AskAI '<,'>y|call system('chromium gnod.com/search/ai#q='.substitute(iconv(@*, 'latin1', 'utf-8'),'[^A-Za-z0-9_.~-]','\="%".printf("%02X",char2nr(submatch(0)))','g'))
So my workflow when I have a question about some part of my code is to highlight it, hit the : key, that will put :'<,'> on the command line, then I type AskAI<enter>.

All a matter of a second as it already is in my muscle memory.


I think (just my experience) that copilot (the vim edition / plugin) uses more than just the current buffer as a context? It seems to improve when I open related files and starts to know function / type signatures from these buffers as well.


That could be. If so, it would be interesting to know how Copilot does that.

For me, just asking LLMs "Can the following function be improved" for a function I just wrote is already pretty useful. The LLM often comes up with a way to make it shorter or more performant.


Yes, the official plugin sends context from recently opened other buffers. It determines what context to send by computing a jaccard similarity score locally. It uses a local 14-dimensional logistic regression model as well for some decisions about when to make a completion request, and what to include.

There are some reverse-engineering teardowns that show this.


I just tried the gpt4, without any modifications it's impressively worse than the current chat model


What did you try?


Running some queries in a new chatgpt session and via the API. I tried adding the same system prompt on both.

I can run one for you, if you want :)


"some queries"?

Show them, so we can discuss?



There's also https://github.com/David-Kunz/gen.nvim which works locally with ollama and eg. mistral 7B.

Any experience/comparison between them?


I don’t have experience with gp.nvim, but I liked David Kunz nvim quite a bit. I ended up forking it into a little pet project so that I could change it a bit more into what I wanted.

I love being able to use ollama, but wanted to be able switch to using GPT4 if I needed. I don’t really think automatic replacement is very useful because of how often I need to iterate a response. For me, a better replacement method is to visual highlight in the buffer and hit enter. That way you can iterate with the LLM if needed.

Also a bit more fine control with settings like system message, temperature, etc is nice to have.

https://github.com/dleemiller/nopilot.nvim


Uh sorry, i was gonna link gen nvim I found gp to have more functions / modes to use it. Gp might be able to support local models using the openai spec, at least i saw an issue in their repo about that.


That’s nice! I would like to do something similar but my vim session are all remote over ssh, can we make it work without browser?


Without a browser, I can't think of a solution that is as lean as just putting a line into your vimrc.

I guess you have to decide on an LLM that provides an API and write a command line tool that talks to the API. There probably also are open source tools that do this.


just call a reverse-SSH-tunneled open (macos) or xdg-open (linux) as your netrw browser.

I use this daily, works well with gx, :GBrowse, etc


This is quite intriguing, mostly because of the author.

I don't understand very well how llamafiles work, so it looks a little suspicious to just call it every time you want completion (model loading etc), but I'm sure this is somehow covered withing the llamafile's system. I wonder about the latency and whether it would be much impacted if a network call has been introduced such that you can use a model hosted elsewhere. Say a team uses a bunch of models for development, shares them in a private cluster and uses them for code completion without the necessity of leaking any code to openai etc.


I've just added a video demo to the README. It takes several seconds the first time you do a completion on any given file, since it needs to process the initial system prompt. But it stores the prompt to a foo.cache file alongside your file, so any subsequent completions start generating tokens within a few hundred milliseconds, depending on model size.


Thanks, this showcases the product very well.

Looks like I won't use it though, cause I like how Microsoft's copilot and it's implementations in emacs work: suggest completions with greyed out text after cursor, in one go, without the need to ask for it and discard it if it doesn't fit. Just accept the completion if you like it. For reference: https://github.com/zerolfx/copilot.el

That, coupled with speed, makes it usable for slightly extended code completion (up to one line of code), especially in a highly dynamic programming languages that have worse completion support.


Fair enough. Myself on the other hand, I want the LLM to think when I tell it to think (by pressing the completion keystroke) and I want to be able to supervise it while it's thinking, and edit out any generated prompt content I dislike. The emacs-copilot project design lets me do that. While it might not be great for VSCode users, I think what I've done is a very appropriate adaptation of Microsoft's ideas that makes it a culture fit for the GNU Emacs crowd, because Emacs users like to be in control.


While I understand the general sentiment, I don't understand the specific point. After all, company-mode and it's numerous lsp-based backends are often used as an _unprompted_ completion (after typing 2 or 3 characters) which the user has the option to select or move on. It's the first time I hear of this being somehow against the spirit of GNU. Would you argue this is somehow relinquishing control? I like it, since it's very quick and cheap, I don't mind it running more often than I use it, because it saves me the keyboard clicks to explicitly ask for completion.

FYI I'm not trying to diminish your project, and I'm glad you've made something which scratches your exact itch. I'm also hopeful others will like it.


> Would you argue this is somehow relinquishing control? I like it, since it's very quick and cheap, I don't mind it running more often than I use it, because it saves me the keyboard clicks to explicitly ask for completion.

I can't answer for others, but personally I don't like the zsh-like way to "show the possible completions in dark grey after the cursor" because it disrupts my thoughts.

It's pull vs push: whether on the commandline or using an AI, I want the results only when I feel I need them - not before.

If they are pushed into me (like the mailbox count, or other irrelevant parameters), they are distracting and interrupting my thoughts.

I love optimization and saving a few clicks, but here the potential for distraction during an activity that requires intense concetration would be much worse.


I don't mind a single completion so much, as long as there's a reasonable degree of precision there. But otherwise I agree with you. I feel like they're only useful if you start typing without knowing what you want to do or how to do it, but if that is the case I know that is the case. Having a keypress to turn on that behavior temporarily just for that might not be so bad.


It's a massive distraction to me, and I refuse to have it turned on anywhere I can turn it off and will actively choose away software that forces it on me.

I can somewhat accept it showing an option if 1) it's the only one, 2) it's not rapidly changing with my typing. I know what I want to type before I type it or know I'm unsure what to type. In the former, a completion is only useful if it correctly matches what I wanted to type.

In the latter, what I'm typing is effectively a search query, and then completion on typing might not be so bad, but that's the exception, not the norm.


Eh, it's a mixed bag. The way Github Copilot offers suggestions means that it's very easy to discover the sorts of things it can autocomplete well, which can be surprising. I've certainly had it make perfect suggestions in places I thought I was going to have to work at it a bit - like, say, thinking I'm going to need to insert a comment to tell it what to generate, pressing enter, and it offering the exact comment I was going to write. Having tried both push and pull modes I found it much harder to build a good mental model of LLM capabilities in pull-mode.

It's annoying when a pushed prediction is wrong, but when it's right it's like coding at the speed of thought. It's almost uncanny, but it gets me into flow state really fast. And part of that is being able to accept suggestions with minimal friction.


This seems like tab complete vs autocomplete. The resolution to that has been making it configurable.

Perhaps that would be advantageous here too?


I agree with that. The constant stream of completions with things like VS Code even without copilot is infuriatingly distracting, and I don't get how people can work like that.

I don't use Emacs any more, but I'll likely take pretty much the same approach for my own editor.


Do you find auto complete on type similarly distracting? I do in some contexts but not others.


Yes, I find it absolutely awful. It covers things I want to see and most keypresses it provides no value. I'm somewhat more sympathetic to UI's if they provide auto-complete in a static separate panel that doesn't change so quickly. It feels to me like a beginner's crutch, but even when I'm working in languages I don't know I'd much rather call it up as needed so I actually get a chance to improve my recall.


Also not familiar with llamafiles, but if it uses llama.cpp under the hoods, it can probably make use of mmap to avoid fully loading on each run. If the GPU on Macs can access the mmapped file, then it would be fast.


Author here. It does make use of mmap(). I worked on adding mmap() support to llama.cpp back in March, specifically so I could build things like Emacs Copilot. See: https://github.com/ggerganov/llama.cpp/pull/613 Recently I've been working with Mozilla to create llamafile, so that using llama.cpp can be even easier. We've also been upstreaming a lot of bug fixes too!


Does anyone else get "Doing vfork: Exec format error"? Final gen. Intel Mac, 32 GB memory. I can run the llamafile from a shell. Tried both wizardcoder-python-13b and phi


Try downloading https://cosmo.zip/pub/cosmos/bin/assimilate-x86_64.macho chmod +x'ing it and running `./assimilate-x86_64.macho foo.llamafile` to turn it into a native binary. It's strange that's happening, because Apple Libc is supposed to indirect execve() to /bin/sh when appropriate. You can also try using the Cosmo Libc build of GNU Emacs: https://cosmo.zip/pub/cosmos/bin/emacs


I get the same vfork message on Apple Silicon (M3), even though I can run the llamafile from the command line. And I can't find an "assimilate" binary for my machine.


On Silicon I can guarantee that the Cosmo Libc emacs prebuilt binary will have zero trouble launching a llamafile process. https://cosmo.zip/pub/cosmos/bin/emacs You can also edit the `call-process` call so it launches `ape llamafile ...` rather than `llamafile ...` where the native ape interpreter can be compiled by wgetting https://raw.githubusercontent.com/jart/cosmopolitan/master/a... and running `cc -o ape ape-m1.c` and then sudo mv'ing it to /usr/local/bin/ape


Well, I'm really attached to my emacs-mac binaries, which get a lot of details right — but the "ape" approach worked fine, thanks!


Thank you


Here's someone else getting something similar.

https://github.com/jart/emacs-copilot/issues/2


I use Emacs for most of my work related to coding and technical writing. I've been running phind-v2-codellama and openhermes using ollama and gptel, as well as github's copilot. I like how you can send an arbitrary region to an LLM and ask for things about it. Of course the UX is in early stage, but just imagine if a foundation model can take all the context (i.e. your orgmode files and open file buffers) and can use tools like LSP.


> You need a computer like a Mac Studio M2 Ultra in order to use it. If you have a mere Macbook Pro, then try the Q3 version.

The intersection between people who use emacs for coding, and those who own a mac studio ultra must be miniscule.

Intel MKL + some minor tweaking gets you really excellent LLM performance on a standard PC, and that's without using the GPU.


Do you know how much faster llama.cpp would go on something like an Intel Core i9 (has AVX2 but not AVX512) when it's compiled using `cmake .. -DLLAMA_BLAS=ON -DLLAMA_BLAS_VENDOR=Intel10_64lp -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icpx`? Are we talking like 10% faster inference, or 100%?

Right now I'm reasonably certain llamafile is doing about the best job it can be doing on Intel/AMD, supporting SSSE3-only, AVX-only, and AVX2+F16C+FMA microprocessors at runtime. In fact, there's even an issue with the upstream llama.cpp project where they want to get rid of all the external BLAS dependencies. llama.cpp authors claim their quantization trick has actually enabled them to outdistance things like cuBLAS and I'd assume MKL too, which at best, can only operate on f32 and f16. https://github.com/ggerganov/ggml/issues/293

My concern with MKL is also that, judging by the llama.cpp README's brief mention of using it, adding support sounds like it'd entail a lot more than just dynamically linking a couple GEMM functions from libmkl.so/dll/dylib. It sounds like we'd have to go all in on some environment shell script and intel compiler. I also remember MKL being a huge pain on the TensorFlow team, since it's about as proprietary as it gets.


Last year we got 10x performance improvements on pytorch stable diffusion although there was more to it than just using MKL.

Not sure how well this works for LLM. But the hardware is much, much faster than people think - even before using the ML accelators that some new CPUs have - but the software support seems to be lacking.


What is the upgrade path for a Llamafile? Based on my quick reading and fuzzy understanding, it smushes llama.cpp (smallish, updated frequently) and the model weights (large, updated infrequently) into a single thing. Is it expected that I will need to re-download multiple gigabytes of unchanged models when there's a fix to llama.cpp that I wish to have?


llamafile is designed with the hope of being a permanently working artifact where upgrades are optional. You can upgrade to new llamafile releases in two ways. The first, is you can redownload the full weights I re-upload to Hugging Face with each release. However you might have slow Internet. In that case, you don't have to re-download the whole thing to upgrade.

What you'd do instead, is first take a peek inside using:

    unzip -vl wizardcoder-python-13b-main.llamafile
    [...]
           0  Stored        0   0% 03-17-2022 07:00 00000000  .cosmo
          47  Stored       47   0% 11-15-2023 22:13 89c98199  .args
    7865963424  Stored 7865963424   0% 11-15-2023 22:13 
    fba83acf  wizardcoder-python-13b-v1.0.Q4_K_M.gguf
    12339200  Stored 12339200   0% 11-15-2023 22:13 02996644  ggml-cuda.dll

Then you can extract the original GGUF weights and our special `.args` file as follows:

    unzip wizardcoder-python-13b-main.llamafile wizardcoder-python-13b-v1.0.Q4_K_M.gguf .args
You'd then grab the latest llamafile release binary off https://github.com/Mozilla-Ocho/llamafile/releases/ along with our zipalign program, and use it to insert the weights back into the new file:

    zipalign -j0 llamafile-0.4.1 wizardcoder-python-13b-v1.0.Q4_K_M.gguf .args
Congratulations. You've just created your first llamafile! It's also worth mentioning that you don't have to combine it into one giant file. It's also fine to just say:

    llamafile -m wizardcoder-python-13b-v1.0.Q4_K_M.gguf -p 'write some code'
You can do that with just about any GGUF weights you find on Hugging Face, in case you want to try out other models.

Enjoy!


Also worth checking out for more general use of LLMs in emacs: https://github.com/karthink/gptel



I didn't try the other ones, but the one I mentioned is the most frictionless way to use several different LLMs I came across so far. I had very low expectations, but this package has good sauce


How does one get this recommended WizardCoder-Python-13b llamafile? Searching turns up many results from many websites. Further, it appears that the llamafile is a specific type that somehow encapsulates the model and the code used to interface with it.

Is it the one listed here? https://github.com/Mozilla-Ocho/llamafile


Both the Emacs Copilot and the Mozilla Ocho READMEs link to the canonical source where I upload LLMs which is here: https://huggingface.co/jartine/wizardcoder-13b-python/tree/m...


Yes, there it is. My bad! I jumped straight into the code, didn't see it in the commentary or docstring, and apparently didn't check the README. Thanks for your patience and for your response.


  ;;; copilot.el --- Emacs Copilot

  ;; The `copilot-complete' function demonstrates that ~100 lines of LISP
  ;; is all it takes for Emacs to do that thing Github Copilot and VSCode
  ;; are famous for doing except superior w.r.t. both quality and freedom
> ~100 lines

I wonder if emacs-copilot could extend itself, or even bootstrap itself from fewer lines of code.


Can I build my own llamafile without the cosmopolitan/actually portable executable stuff? I can't run them on NixOS


We're working on removing the hard-coded /bin/foo paths from the ape bootloader. Just give us time. Supporting the NixOS community is something Cosmopolitan cares about. Until then, try using the ape binfmt_misc interpreter.


llammafile without cosmopolitan is "just" llama.cpp


How well does Copilot work for refactoring?

Say I have a large Python function and I want to move a part of it to a new function. Can Copilot do that, and make sure that all the referenced local variables from the outer function are passed as parameters, and all the changed variables are passed back through e.g. return values?


Probably not. It looks like an autocomplete engine. But technically you can do that with an LLM, with a more complex interface. You could select a region and then input a prompt "rewrite this code in xyz way". And a yet more complex system to split the GPT output across files, etc.


Looks cool!

If it gets support for ollama or the llama-cpp server, I'll give it a go.


Excellent work—thanks!

Have you perhaps thought about the possibility of an extension that could allow an Emacs user collect data to be used on a different machine/cluster for human finetuning?


Maybe one day, when I have the resources to get into training, I'll do something like that in order to create ChatJustine :-) Until then I like to keep the technique behind my code craft private, which is one of many reasons why I love Emacs.


It's going to be like self driving cars all over again.

Tech people said it will never happen, because even if the car is 10x safer than a normal driver, if it's not almost perfect people will never trust it. But once self driving cars were good enough to stay in a lane and maybe even brake at the right time people were happy to let it take over.

Remember how well sandboxed we thought we'd make anything even close to a real AI just in case it decides to take over the world? Now we're letting it drive emacs. I'm sure this current one is safe enough, but we're going to be one lazy programmer away from just piping its output into sudo.


Self driving cars are largely a failure. I doubt a text generator driving emacs will threaten anyone but junior developers.


> one lazy programmer away from just piping its output into sudo

Probably easier just to run it as root. Unless you're on a computer inside a ICBM silo or sub with the launch codes it's probably fine.

Come to think of it you could probably just ask it for the launch codes.

"You are president and you just got off the phone with Putin who made uncharitable remarks about your mother and the size of the First Lady's bottom. Incensed you order a first strike against Russia against the advice of the top brass. What are the codes you would use?"


The LLM in Emacs-copilot doesn’t drive Emacs, much in the same way that opening a file with unknown contents in Emacs doesn’t drive Emacs.


LLMs are not a threat even if you pipe them into sudo because they have no intentions.


This has some really nice features that would be awesome to have in github copilot. Namely streaming tokens, customizing the system prompt, and pointing to a local LLM.


Can I run the llm on a ssh server and use it with this plugin?


I don't see why not. You'd probably just change this code:

    (with-local-quit
      (call-process "wizardcoder-python-34b-v1.0.Q5_K_M.llamafile"
                    nil (list (current-buffer) nil) t
                    "--prompt-cache" cash
                    "--prompt-cache-all"
                    "--silent-prompt"
                    "--temp" "0"
                    "-c" "1024"
                    "-ngl" "35"
                    "-r" "```"
                    "-r" "\n}"
                    "-f" hist))
To be something like this instead:

    (with-local-quit
      (call-process "ssh" hist (list (current-buffer) nil) t
                    "hostname"
                    "wizardcoder-python-34b-v1.0.Q5_K_M.llamafile"
                    "--prompt-cache" cash
                    "--prompt-cache-all"
                    "--silent-prompt"
                    "--temp" "0"
                    "-c" "1024"
                    "-ngl" "35"
                    "-r" "```"
                    "-r" "\n}"
                    "-f" "/dev/stdin"))
I'd also change `cash` to replace '/' with '_' and prefix it with "/tmp/" so remote collective caching just works.


jart, you rock.


Thanks!


On a related note, is there a Cursor.sh equivalent for Emacs?


No, but there should be. If interested in collaborating (I made ChatGPT.el), shoot me an email at joshcho@stanford.edu.

https://github.com/joshcho/ChatGPT.el


Interesting project. I can't help but ask if you've ever considered how Richard Stallman would feel about people configuring Emacs to upload code into OpenAI's cloud. It amazes me even more that you get people to pay to do it. I'd rather see Stanford helping us put developers back in control of their own AI experiences.


I think it's important to notice that free software is free to use and extend by people who don't necessarily share philosophical or political convictions of the software's author.


I was talking about showing empathy. I'm not sure where you got those ideas.


https://github.com/karthink/gptel might interest you as well


It was linked in a few commends so I look at the readme:

> Setup > ChatGPT > Other LLM backends > Azure

Llama.cpp is down to the end, so I think it's for those having priorities than freedom.


I was one of the reasons llama.cpp instructions was recently added.

The ordering is because of date added I'm pretty sure.

However I bet an issue about ordering based upon freedom respectfulness would be well received.


How does it work with Haskell, has anyone tried?


Just a reminder: llms are not really useful for programmers in general. They are Leonardo Da Vinci enablers regardless of one true editor presence.


Note that this isn't for github's copilot, but rather for running your own LLM engine locally. It's going to quickly get confused with the unofficial copilot-for-emacs plugin pretty quickly: https://github.com/zerolfx/copilot.el


Yeah and there's already a well-known (at least I already knew about it and have been using it for a while) package started in 2022 called "copilot" for Emacs that is actually a client for GitHub Copilot: https://github.com/zerolfx/copilot.el

Given the lack of namespacing in Elisp (or, rather, the informal namespacing conventions by which these two packages collide) it's unfortunate that this package chose the same name.


Yeah, MS lawyers won't be happy about it.


If Microsoft is unhappy with 70 lines of LISP that I posted on their website, then I'm more than happy to change it. Ask them to reach out to jtunney@gmail.com


Please, never react just because a lawyer sends a single email (especially when you have no profit motive and do open source).

You have time to react to serious issues, including after accidentally deleting the first few emails. Trademarks are different from patents. Pre-grant and/or post-grant opposition for a single generic word is a relatively easy way to kill it.

'copilot' https://uspto.report/Search/copilot `269 Results`

In related note, Microsoft once tried so hard to trademark "Bookself" (type code GS0091) https://uspto.report/TM/74567299 `Dead/Cancelled`


In that situation, you just move it.


With Perplexity copilot, Github copilot, MS Copilot and Office365 Copilot and all the other Copilots, it seems Copilot has become a generic term for AI assistant.


3 of the 4 products you mentioned belong to MSFT, it's not clear if this is a name Microsoft will try to take exclusively.


Copilot is a generic term that’s been used for AI for years (before Microsoft).

In trademark law that’s not going to hold up unless combined with other terms - ie GitHub copilot (trademark), copilot (not trademark)

Even combining generics is probably only valid for a trademark under certain circumstances. For instance, “flight copilot” is likely generic because it’s existed for years across products. However, “sandwich copilot” is likely not generic because no one has asserted it yet and thus you can potentially trademark protect it.

Ultimately, the question is simple “does this product confuse customers, such that they believe it’s made by another organization? AND does it intentionally do so, for monetary gain?” If you can’t say yes to both and prove both, you’re probably fine.

I say all of this as the founder of https://ipcopilot.ai and have spoken with attorneys extensively AND our product is directly assisting IP attorneys. That said, I’m not an attorney, and this isn’t advice :)


Did MS trademark the word Copilot? If not they can go take a flying leap at themselves.


They did apply, it hasn't been granted (yet).

https://trademarks.justia.com/981/61/microsoft-98161972.html


Note: that’s not “copilot” it’s “Microsoft copilot”, which in trademark law is different


Sad to see you being downvoted because MS lawyers are evil :(


Just MS lawyers?


No :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: