Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: How have you integrated LLMs in your development workflow?
63 points by mjbale116 37 days ago | hide | past | favorite | 75 comments
Is it working for you? have you seen any performance increases?



I've tried it, and I don't like it. There's too much confabulation to be useful as code completion, and any task more complicated than that results in logic errors most of the time.

Code completion was fine without LLMs, and solving problems myself usually ends up being quicker than trying to coerce an LLM into doing it correctly and then verifying that the output is actually correct.

The one time I used an LLM in my workflow to good success was using ChatGPT to automatically create an enum of every EU/EEA country and a switch statement over that enum. Those sort of "grunt work" tasks that don't require any thinking, but a lot of typing, seem to be where LLMs shine.


> create an enum of every EU/EEA country and a switch statement over that enum

Sorry but if you then have to check that the list of the Countries is correct and complete, is it not faster to get a list from somewhere safe and then do your usual "Find-Replace" around the separator?

Edit: I just did it for a test: it took less than two minutes, without rush (search engine; open page; find data; open spreadsheet; paste table; paste column in text editor; find-replace around '\n').


I use Cursor primarily with Claude 3.5 Sonnet. Overall a solid productivity increase depending on the task.

I have a few observations:

- I vastly prefer Cursor's Copilot++ UX for autocomplete compared to GitHub's in VSCode, which I used until a few months ago.

- The Composer multi-file editor (cmd+i) is easily its most powerful feature and what I use most often, even when I'm working on single files. It just works better for some reason.

- It's far more effective working in popular stacks, eg. Typescript/NextJS etc. It's rarely a time-saver when working in Elixir, for example.

- In a similar vein, the less 'conventional' your task or code is, the less useful it becomes.

- As the context increases, it gets noticeably less useful. I often find myself having to plan what context I want to feed it and resetting context often.

- It's very effective at 'translation' tasks, eg. converting a schema from one format to another. It's much less effective at generating complex business logic.

- I only find it useful to generate code I confidently know how to write myself already. Otherwise, it doesn't save me time. The times I've been tempted, it's almost always bitten me.


Pretty similar observations. Using Aider with Claude on an iOS app I’ve it can be helpful to scaffold new modules for example, if I give it some existing code and tell it to copy the conventions. But it’s virtually useless for editing or changing code where it will often produce code that doesn’t compile, has bugs and/or doesn’t solve the requirements.

Anything to do with Swift concurrency it’s completely hopeless, I assume partly because there’s not enough training data yet.


> But it’s virtually useless for editing or changing code where it will often produce code that doesn’t compile, has bugs and/or doesn’t solve the requirements.

That is the exact problem I am trying to solve: modifying code with LLMs really sucks most of the time. I am trying a solution with Abstract Syntax Trees: I have the LLM write the code that will write the code you need. That is, modify the source tree rather than the text representation.

I wrote about my approach here: https://codeplusequalsai.com/static/blog/prompting_llms_to_m...

I do have it working for some cases quite well, but there are lots of pitfalls with this approach too. It does take a lot of context and the LLMs aren't really that well-versed at writing specifically esprima code for example. BeautifulSoup does work better, I guess because more people use it and there's more data in the training set.

I'm adding one language at a time, currently have HTML, CSS, Javascript and Python all kind-of working. It's pretty neat but I'm not sure how well it scales yet to larger projects and more difficult requirements/implementations.


Interesting! Bookmarked, I’ll read through it.


Pretty much the same, except I use github copilot. I have exactly the same remarks, but I also use it to generate test cases, and I use in addition Microsoft Copilot for:

- Writing docstrlings. It helps to have a template.

- rubber duck. II try to explain my issue to it. It doesn't find the solution 90% of the time, but explaining the issue helps me (and sometimes you get a hint)


Hadn't even picked on the composer, was happy with Cmd+k/cmd+L for weeks!


I have seen none across ChatGPT, Claude and Copilot.

They may be somewhat useful for some small function/lookup, but:

- they will always, without fail, hallucinate non-existent functions, methods, types and libraries. Your job becomes a constant code review, and I'm not here to babysit an eager junior

- claude will always, without fail, rewrite and break your code, and any fixes you've done to Claude's code. Even if you tell it not to. Again, you end up babysitting an eager junior

- fo any question outside of StackOverflow's top-3 languages and top-10 libraries they are next to useless as they know nothing about anything

These are not "AIs" or code assistants or autocompletes. These are generic token prediction machines which don't care if you write code or Harry Potter fanfics.

If you train them specifically on vast amounts of code and documentation, then maybe, just maybe, they may actually be useful


Code autocompletion. I think it works around 60% of the cases, sometimes it saves some tipping. Performance increase: small. Additionally, I don't learn the API as good as if I were to type it, considering moving back.


I stopped using autocomplete and I don't really miss it. It was really impressive for simple repetitive stuff, but it's an active nuisance when I'm working on something more complex.


This was my experience. Code autocomplete disrupts and derails me much more than it helps me.


Same here. Purely for scaffolding it's fine. But I won't let it decide implementation details. And have noticed the "copilot pause", slowing me down.

Maybe some day it will be good enough but that day is not now.


Yes, I use copilot, Claude, and sometimes Zed with the built in Claude AI tooling.

It's really useful for basic stuff, scripting, boilerplate, repetitive things, SQL. For example it's great at converting types from different languages, or generate types from API reference docs.

You gotta be careful though as it is not perfect and if you do something "not typical" it will not work well. Also sometimes it is a bit outdated, produces code with different "style", or just plain wrong stuff, so you have to tweak it a bit.

It's not going to code for you yet (unless you do very basic stuff) but it's a great tool to increase productivity. I do believe I move faster.


People who use code completion LLMs: is your company running a local install, or are they fine with your code getting shared with LLMs in the cloud?


Our company is very not OK with sharing IP with the cloud to the extent that we only recently started using github. So we're banned from using these things.


Then why are you using github at all?


Because Microsoft understands corporate needs, and can provide very specific guarantees to corporate clients.


Only if you trust Microsoft to guarantee that none of your data will ever make it out of their data centers.


Microsoft has been in the corporate game for longer than most people on this site have been alive. They understand what corporate clients want, and which guarantees they require. And they go to great lengths to provide what they guarantee.



I use GitHub Copilot at work. We were (and still are) not allowed to use our private accounts, but recently got company accounts we are allowed to use on codebases classified "confidential" or lower. We also have an internal chat interface to OpenAI's models with a similar restriction. I understand there's some extra agreements with Microsoft regarding our data.


They're fine with cloud. But... I suppose most of our most sensitive documents are in Microsoft's SharePoint anyway. Effectively sharing our code with them via Copilot is actually comparatively less problematic.

Personally, my main gripe with it are the response times. But I'm a latency junkie.


My company already hosts their code on Github, so using Copilot is not much of a concession.


My company is getting everyone Cursor licenses. We already have GitHub copilot.


I write more than I code, but I use ChatGPT a lot throughout the day.

I am integrating AI translations into my custom static site generator. I will test the outcome heavily before putting my name (and a big warning) on my translated content, but the early results look good. Getting it right is a lot harder than piping the page through ChatGPT. Everything needs to be translated from the UI strings to the URL slugs.

My work will no longer just benefit English-speaking immigrants, but also locals and immigrants who Google things in other languages. I am very excited about it.

I also use ChatGPT heavily for “what’s the word/expression for” questions and other fuzzy language queries. As a non-native speaker, I want to know if a given expression is appropriate for the context, if it’s dated, if it’s too informal and if it’s dialect-appropriate.

I also use it for coding, but mostly because it’s faster than reading python’s docs. I ask it questions I know the answer to, hoping to find better approaches. So much happened since Python 2.7, and I don’t always know what to ask for.

On occasion I treat it like I treated my mom as a child. I ask it all sorts of questions about the world I observe around me. It’s amazing to get a short, clear answer to work from, instead of sifting through barely relevant, spammy search results. This is super helpful when getting to know new tech and figuring out what it actually is.

It’s just so damn cool to have a sort of real-life Hitchhiker’s Guide slash Pokédex in my pocket. These things appeared in the span of a year, and nobody seems impressed. Well, I am mad impressed.


> It’s amazing to get a short, clear answer to work from, instead of sifting through barely relevant, spammy search results.

I like this premise but don’t like the fact that the answer could be completely false - fabricated and presented with equal confidence as the algorithm has zero understanding, just stats.


That is also the case for actual searchs on Google nowadays.


I've been using Zed with Claude 3.5 for a few weeks now, and I find it incredibly useful. Being able to be in full control what goes into the context, my workflow is usually

- add files, tabs, terminal output and IDE diagnostics into the context via slash commands - feed in documentation or other remote web content, also via simple slash commands - activate workflow mode, which will help you edit the files instead of having to copy things around - then ask questions, ask for refactoring, ask for new features

Often I like to ask for the high level approach, and if I agree with it let it guide the implementation. It makes mistakes, so I always have to validate and test what it creates, but as you add more information into the context and the LLM has a good amount of stuff to work with, the output quality really improves significantly.

It's a process though, and it takes time to get familiar with the workflow, build intuition when the LLM falls on its face, when you should try a different approach etc.

Generally I can echo what others have said, it works best if you already kind of know what you want to achieve and just use the LLM as an assistant that does the grunt work for you, documents the process, provides a rubber duck etc.

Generally, I would not want to work without an integrated LLM anymore, it provides that much value to my workflow. No panacea, no silver bullet, but when used right in the right circumstances it can be incredibly useful.

A secondary usecase for me is working on repositories where tasks and todos are structured in markdown files (even things like travel planning). Letting the LLM Guide you through todos and create a documentation trail through the process, identify important todos, carry along information as you go is wonderful, I would absolutely give that a try as well.


For common languages e.g. Python, Javascript it works reasonably well although it still seems to prefer older versions of libraries way too often. Which means you need to do refactoring straight out of the bat.

Whenever I use it with Rust, Golang, Scala etc it's not worth the effort.


I find LLMs via Aider great for:

* code translation - e.g. convert a self-contained implementation of a numerical algorithm from one language to another and generate test cases and property tests which make sure the implementations are equivalent. The goal is to avoid having to proof read the generated code.

* one-off scripts - any task where code design doesn't matter, the amount of code is limited to couple hundred lines (GPT-4o) and the result will be thrown away after use.

* API exploration - producing examples for APIs and languages I'm not fluent in. Reading reference documentation gives a better understanding, LLMs get the results out faster.


Why aider?


Not OP but aider is pretty useful because it can write files in your repo, run commands commit changes etc. It's pretty flexible and easy to give it context by just providing a path.


I use Tabnine - https://www.tabnine.com/ It supports multiple LLM's and it has it's own internal one, and it can be secure for a company and their codebase, unlike the generic OpenAI, Claude, etc.

I do have OpenAI keys and Claude, and I use them both (just as the mood fits, to see which works best).

I'm been coding for decades, so I"m quite experienced. I find that an LLM is no substitute for experience, but it definitely helps with progress. I work regularly in a range of languages: Java, Javascript with Typescript, Javascript pure ESM, Python, SQL. It's great to have a quick prototype tool.

One key takeaway - learning to "drive the LLM" is a skill by itself. I find that some people are "hesitant" to learn this, and they usually complain about how bad the LLM is at generating code.. but, in reality, they are bad at "driving" the LLM.

If I put you in an F1 car, the car would perform perfectly, but unless you had the skills to handle the car, you will not win any races.. might not get around the track one time..

Also, I'm in my 60's so, this is all "new" tech. I've just never been afraid of "new" tech. I'd hate for some 30 year old hot-shot to show my up because they learned to master using that LLM Tool and I just blew it off as "new tech".

Anyway, my $0.02


I've been using the cursor editor (vs code clone with LLM) for 6 months and it definitely improves productivity. Biggest gains are for writing/refactoring just a few lines at a time, where I can understand and test the new code easily.

Working with multiple products, Python and many different libraries and frameworks I have to constantly look up commands and usage. With the LLM I select a line or two, and ask 'sort by region, exclude pre 2023' and it writes (or adds) the necessary Pandas calls to the dataframe. The LLM has my code, tools documentation and more as context, so I don't have to say much to get the right code, but the questions are important, have to converse with a programmer mindset still.

It has almost completely replaced using Google for helping with code. Often I waste too much time in Slashdot looking at a dozen possible similar situations, but not what I need. The LLM immediately gives me something right in my own code. Usually have to give it some followup commands to tweak the code or take a different approach.


The competition in AI editors is a bit silly at the moment. Everyone and their dog are "building" an AI assisted editor now by duct taping Ollama onto VS Code. I don't like my data being sent to untrusted parties, so I cannot evaluate most of these. On top of that, the things keep evolving as well, and editors that I dismissed a few months ago, are now all of a sudden turning into amazing productivity boosters, thanks to developments in both models as well as in editor tricks.

My money is on Cursor [1], which does not stop to amaze me, and seems to get a lot of traction. The integration is very clever, and it is scary how it figures out what I intend to do. Then again, I'm probably doing mundane tasks most of the time. For the few bright moments in my day I tend to use ChatGPT, because most of my real problems are in application domains, not in code.

I am not a firm believer in forking large open-source projects, though, as it will take a lot of effort to keep up with future diversions. This makes me a bit wary of projects such as Cursor and Void [2]. Somebody needs deep pockets to sustainably surpass the popularity of VS Code. To point out just one problem with forking: VS Code works fine in Ubuntu, but Cursor does not work out of the box there. Having to disable the sandbox is a show-stopper for most.

In that respect, the extensions might be a safer bet, and I think Sourcegraph's Cody and Continue are making the largest waves there. Hard to tell with so many waves.

[1] https://www.cursor.com/

[2] https://voideditor.com/


> I don't like my data being sent to untrusted parties, so I cannot evaluate most of these.

> ...

> My money is on Cursor [1]

Cursor also sends all your data who knows where


Indeed. I am only using Cursor for some hobby projects.

While we're on this subject, is there a simple way to ensure that VS Code extensions do not contact external servers?


> is there a simple way to ensure that VS Code extensions do not contact external servers?

Unfortunately, no idea, but it's a good question I'd like to have an answer to


Until recently I was only using Copilot as an advanced autocomplete and to help writing tests, for which it's pretty useful.

A couple weeks ago I had to create some classes (typescript) implementing a quite big spec for a file format to transfer quizzes between applications. I decided to try some more advanced tool, ending up with Cursor and continue.dev. I copied the spec (which are public on the web) into a text file and used them as context, together with a skeleton start for the main class I needed, and experimented with different engines and different prompts.

The end result is that I got a very good starting point for my code, saving me many hours. Surprisingly, for this task, the best result was generated by Gemini 1.5 pro.

Since then I've been trying to integrate these tools more into my day to day programming, with varying results, but I'm now convinced that, even with the limits of the current LLMs, this technology can have a much higher impact on programming with better harnessing, eg integrating it with compiler/code analysis tools output and automated testing.


Asking ChatGPT to give me boilerplate is a huge productivity win for me still. For example, I recently asked ChatGPT to give me a Terraform file that sets up API Gateway, Lambda, and DynamoDB. The script it gave me worked after only a couple of minor tweaks. Was up and running in about 15 minutes, whereas I'm pretty sure it'd be hours if I had to dredge through the docs myself.


100%. I usually use it for prototyping new features.

Two examples...

1. Recently wanted to build a Chrome plugin and never built one before. Used o1-preview to build it all for me.

2. Wanted to build a visualization of the world with color-coded maps using D3. Again, hadn't used D3 much in the past... Claude basically wrote all the code for me and then I just had to make edits to fit my site/template.


One use cases I like a lot is copy paste the entire error from the console into the chat window. No explanation or preamble. Just raw output. Most of the time I get some useful leads out of it. It’s not always solving it entirely for me but even a hint in the right direction can save you hours of digging in the wrong direction.


I wrote my own Notebook Assistent and I use it to quickly make data driven arguments. It is able to both code and read program outputs so it's very good at pulling remote sources, then peeking at the result, doing the data wrangling and then displaying the output. I think having the LLM being able to read program state is a big upgrade for what you can achieve, it can do test driven development for example because it can read the test report. However, being a side project limits the fluidity of the UX, it's open source and buildless though if you want to modify it.its very hackable by design. https://observablehq.com/@tomlarkworthy/robocoop


I have not, LLM is generative and can only create generic stuff and if your project contains lots of generic stuff, is it really interesting? I want to spend time on the parts I find interesting to think thoughts about with my human brain. Problems that have a quick answer in ChatGPT but a more interesting and personal answer after thinking about it for hours and trying 3 different versions of it before deciding on one. Then if I want to quickly test how to create an X11 window I might generate the code and work backwards from that. But… I’m not really in a hurry and if you aren’t you don’t really need LLM


I haven't except on one or two occasionally where I tried it to use me help figuring out how to refactor a bunch of functions quickly.

The downside of LLMs is that it doesn't remove your need to know your code. And writing code yourself is a very good way to know it.


We have our own tooling; it's very good and definitely increases productivity for most. One observation is that people who are bad at reading code seem often to be get negatively impacted vs writing themselves. So from our teams I think that code reading will be more valuable than writing, if it was not already. Just having a loop of telling the llm something, running it and then tell what's wrong with the observed result is very slow vs reviewing the code and tell what's wrong with it instead.

We also use cursor, copilot, claude dev, aider and plandex to see if anything is beating anything else and to get ideas from for our toolkit.


ChatGPT o1 preview is producing amazing results for code and generally helpful with life. It's not well integrated to many IDE tools yet (I use Cody a bit which has a wait list for o1 preview), but then I keep going back to just using the ChatGPT interface anyway because it feels cleaner to manually select and upload the relevant files attachments anyway. I'm sure that will change as integrations improve. Claude seems really popular on here but I wonder if that's just because Sonnet is free and useful enough. I prefer the results of newer ChatGPT models currently and feel they are worth paying for.


I built a simple desktop tool [1] to streamline my coding workflow with LLMs, like integrating with local code context, managing prompts and custom instructions, etc. It supports copy/paste into ChatGPT/Claude, and sending prompts via API.

Currently I estimate it writes about 70% to 80% code (including adding features to the tool itself), and saved me hours of work per week.

Lately I've been exclusively using API since it is cheaper than paying $20 monthly subscription.

[1] https://prompt.16x.engineer/


I am currently working on a side project. I am not very familiar with the domain, and I am using a language I am not very proficient at. Claude has been a great help, discussing different approaches to a problem, sometimes writing (and explaining) a function or two. Or explaining an error.

On top of that, I have configured a chatgpt code reviewer on my github repos. Most of the comments I get from it on PRs are useless but from time to time it does spot a problem or suggest an improvement.


Code completion. I’m about 50/50 on whether it’s actually helpful or I just spend more time reading the completions to make sure they are correct instead of just learning and typing it out myself.


Writing documentation, amongst other things, has been a save for me.

I use it to understand new codebases quickly, create the documentation boilerplate for the code I'm working on that needs better docs, or update/rewrite outdated ones.

When the codebase fits in the context window, it's simple. But even if I'm working on a larger thing, it takes a bit of RAG-alike effort to build knowledge topology, and then it's super easy to get docs on (the actual!) architecture, specific components, and all the way down to atomic function level.


God, I hope you carefully check those docs before publishing them


I enjoy using the Projects feature of Claude. You have a workspace to add knowledge (file/image/text) that you are able to update and maintain throughout multiple discussions.


How is Claude performing when project grows and things get more complicated?

I had some issues with both gpt4 and sonnet getting slower and doing pointless refactors as project grows


It's easy, you just remove the old content and add the updated files. There is Project level Workspace and Chat level Workspace. Each have their own size limit. Once one chat has accomplished my task, I add whatever relevant Artifacts it produced to the Project Workspace and move on to a new chat.


I’ve tried but found it to be a distraction. Instead of focusing on the design and code now I have to focus on getting something out of a wiggling llm algorithm which I need to corner into producing what I need - and then troubleshoot the output.

It’s 10x easier and more direct to just do it myself, distilling my mental model into precise instructions.

Maybe it works better for languages with more boilerplate- I use Ruby which is quite terse.


I haven't liked any integrated tooling at all, but that's not such a surprise since I generally disable autocomplete and autocorrect since, like tinsel, I find them distracting.

I do really like to use the plain web browser tools, though (currently claude), for generating boilerplate code that I then review and integrate carefully into my code. It has sped up my workflow in that way.


Like many others I'm working on my own tool [1] that helps me make progress in my hobby projects. An example outcome of using this tool is my js13k game [2] (the idea and majority of code). The tool is being developed using the tool itself (inception ;)).

This experience of developing such tool is invaluable, I doubt I would be able to learn so much about LLMs and AI service providers if I was using some off-the-shelf software. It also allows me to adjust it to my own needs, while the "main stream" tools usually try to please "large enterprise customers", so as a regular user you end up living with annoying bugs/quirks, that will never be fixed (because no large customer asks for it).

In my opinion the future of such tools is multi modal (both input and output). For my specific use case (hobby gamedev) the goal is to eventually become "the idea guy", who occasionally jumps into the code to perform some fun programming task, and is not able to create assets(GFX/SFX) themselves.

For example: when I'm using my tool, and I want to fix a problem like misaligned buttons, instead of typing an exhaustive prompt, I rather prefer to paste a screenshot, and formulate task with just few words ("align those buttons").

[1] https://github.com/gtanczyk/genaicode/

[2] https://js13kgames.com/2024/games/monster-steps


Lovely project. Happy to see it’s in TypeScript for once. Also very clean code base and good abstractions!


LLM code-aware chats: continue.dev, with Claude 3.5

Autocompletions: GitHub Copilot, still it’s better that Mistral small code models in my opinion


Vercel’s V0 has been amazing. I lean backend, and the whole mess of styling is a continual struggle. V0 gets it right 90% of the time

But like everything you have to spec it well

Then any top model basically for duplicating work.

“Here is component B, here is component A and test A. Produce a test for B following the same pattern”


LLMs can be revolutionary for approaching areas you do not know well ("how do you do this with this library, provide short working code").

But you need to use Retrieval Augmented Generation (or bare LLMs will either invent the nonexistent or present solutions to non-proposed problems).


I must correct: also using RAG based LLM engines can provide very delirious responses. The possibilities are magnificent, but reliability is really not there yet.


I use Github Copilot at work (and at home).

I’m a huge fan of keeping things simple by being verbose, and Copilot takes all the drudgery away by filling out all the repeating patterns.

Once in a while when I can’t be bothered to figure out Typescript types, I ask ChatGPT to write one for me.


I only use it to get a sense of a project. Instead of looking for a similar project (and ending up in medium hell), I now ask claude some first steps. I can continue from there. It's like helping out a friend get his project doing g


It's great for refactorings and text manipulation -- things I used to do regex for can be done using gpt, though that depends on the scope so it is still not quite there to replace regex. Oh did I say it's great at bash too?


That's hilarious. Well done.


I haven't. I tried them out for a good while, but didn't find the cost/benefit ratio of using them to be to my advantage.


It is fine for expanding CSVs, test data, static arrays and similar. Not comfortable using it as part of an editor or cloud services generally.


I use it as a personal assistant, and as a replacement of internet search. It has been working great for things like refactoring small parts of code, generating boilerplate or working out what an arcane error message can mean. I find it useful for comparing two technologies, and to have it give me a refresher about some topic.

On a personal assistant level, it's been useful to have me remind of words that I have forgotten, or to have it rephrase a sentence in a different way, to suit some mood, etc.

Occasionally I have fun with it by having it answer in rhymes, or theatrically like a fortune teller, or like lyrics of some gangsta rap.

I don't know about performance increases. What I notice most is that I'm less annoyed with it, than with the general state of the internet. The web pages and I have different goals: I want a specific piece of information, and they want me to load all their ads, affiliates and such in return, and to spend time on the page reading their drivel. It is also horrible to search for version-specific information, for example, I appreciate that it brings up the Ruby answer for the latest version, but I'm stuck on 2.5.0, and so, I need that specifically. LLMs are usually great at this. Orders of magnitude less fluff, rich text-only answer.


Right now I'm working on my first large clojure codebase, so I do a lot of "what is the idiomatic way to do X?" questioning, pasting in small functions and asking if there's a good simplifying refactor etc. Before this I was writing a lot of numpy and doing a lot of math, so it was a lot of "there's what I want to calculate, how do I make numpy do that?".

I use Claude now that they've successfully made chatgpt too insufferable to use.

I haven't tried any of the special AI editors since I don't know if I would survive the traumatic injury of having emacs taken away from me.


Writes regexes for me.


I'd rather fucking die.


because?




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: