Hacker News new | past | comments | ask | show | jobs | submit login
Smol Developer (github.com/smol-ai)
528 points by tlarkworthy 9 months ago | hide | past | favorite | 130 comments

Seems like the wrong approach in the long term.

Realistically, a large code base LLM generation tool is going to look something like old-school C code.

An initial pass will generate an architecture and a series of independent code unit definitions (.h files) and then a 'detail pass' will generate the code for (.c files) for each header file.

The header files will be 'relatively independent' and small, so they fit inside the context for the LLM, and because the function definition and comments 'define' what a function is, the LLM will generate consistent multi-file code.

The anti-patterns we see at the moment in this type of project are:

1) The entire code is passed to the LLM as context using a huge number of tokens meaninglessly. (you only need the function signature)

2) 1-page spaghetti definition files are stupid and unmaintainable (just read prompt.md if you don't believe me).

3) No way of verifying the 'plan' before you go and generate the code that implements it (expensive and a waste of time; you should generate function signatures first and then verify they are correct before generating the code for them).

4) Generating unit tests for full functions instead of just the function signatures (leaks implementation details).

It's interesting to me that modern language try to move all of these domains (header, implementation, tests) into a single place (or even a single file, look at rust), but passing all of that to an LLM is wrong.

It's architecturally wrong in a way that won't scale.

Even an LLM with 1000k tokens that could process a code base like this will be prohibitively slow and expensive to do so.

I suspect we'll see a class 'generative languages' emerge in the future that walk the other direction so they are easier to use with LLM and code-gen.

(author here) i suspect this maybe one of those “feels like a bad idea until you try it” things. Multiple times when i wanted to add a feature or be more specific, throwing it into prompt.md knowing the LLM would put it in the right file, or coordinate it across files, for me, was the most delightful experience of the “engineering with prompts” workflow.

that said if you see my future directions notes i do think theres room for file specific .md instructions.

the shared dependencies file is essentially a plan. i didnt realize it at the moment but now looking at it with fresh eyes i can do a no-op `smol plan` command pretty trivially.

Lots of different approaches for sure. I certainly see `smol plan` as being at least one step in the right direction.

My experience has been that as the amount of code you need to generate scales beyond trivial levels, the prompt-to-code approach from cross file code starts to fail.


> the shared dependencies (like filenames and variable names) we have decided on are: {shared_dependencies}

Is too trivial of a prompt for large scale code generation.

What if you generate a service in one file and expect a controller to call it from another file?

How does the model know what the functions in the other file are? Did you put the signature of every public function in shared_dependencies?

If you don't have a planning phase, you need a priori knowledge of the order in which files need to be generated, what the functions are called...

I just... don't believe what you're trying to do is actually possible. The models will just generate an isolated 'best guess implementation' for each file; and that might hang together for small code blocks... but as the number of inter-dependencies between blocks of generated code increases...

There's just no way right?

This only works if the code you're generating doesn't call itself, it only calls known library code.

shared_dependencies only solves this for the trivial case of like, 'oh, I have these 3 shared functions / types between all these files'; as the shared types and functions increase, your shared_dependencies becomes an unmaintainable monster... you end up having to generate the shared dependencies... split it up so that you have a separate shared_dependencies per file to generate... and suddenly it starts looking a heck of a lot like a .h header file....

i think the underlying assumption you have here is that you think im saying this is meant to generate a program in one shot. i have from the start pitched this as a “human centric” scaffolding tool. what i meant by that is that it doesnt try to replace the human programmer, but merely augment them, and you can just stop using it once you subjectively feel that its no longer adding value

more broadly i see the smol dev cycle being dependent on intelligence and latency, both of which will improve over time (https://twitter.com/swyx/status/1657962184935370752)

lets say right now a single run gets you 1% of the way there. you do 10 runs of prompt iteration before you feel its run out of use, and take over, but at least by then youve gotten 20% of the way.

as intelligence goes up, the single run % goes up. as latency goes down, more runs become feasible. this is the basic equation for how much smol developer can consume the start phase of any project, especially smol/oneoff/internal apps.

this also perhaps opens up a fun definition of agi - when you can one shot 100% an app.

I can throw some weight at this statement. I've definitely found that these models work better when you allow them to do things "their own way".

I'm curious how much of this is speculative vs from use?

We are doing code generation for our users (analysts & data scientists), and I found going by method names & type signatures failed pretty hard, while contextual code snippets (example docs, usages, etc you get from a vector DB) worked quite well

An old program synthesis labmate did quite well pre-genAI by using type signatures, but my takeaway is neurosymbolic for more context + option pruning, not being terse for supporting denser summarization & composition. That'd be better... But maybe training would need to change for that to work, or something deeper?

Separately, I do agree it's interesting to think about what IDEs and langs and dev will look like 5 years from now.. smarter LLMs, bigger context windows, and supporting ecosystems change a lot...

I am speculating, I’ve only done this at a small scale and had anecdotally good results. I just put examples in the comment for the function signature.

…but I can say with complete confidence that generating code from a coherent set of c header files works better than generating “one shot” full code files just from a high level goal and a file name.

You’re basically reducing the problem space from literally anything to a subset that has an defined structure.

(It helps that #include literally maps to a flat single header in c and having comments in c headers is typical, perhaps?)

yes, this is the essence of prompt engineering, you provide more context and it makes the job easier. The more specific you are, the better the results, up to a point. Sometimes they will have trouble breaking out of the context or examples to create the novelty we seek.

You might like what we are working on, generate an intermediate representation with LLMs, then use deterministic code gen to write the real code. This setup also allows for modifications and updates too, so you can use this the whole time rather than a one time bootstrapping.


While current LLMs aren't really strong enough, I wouldn't completely write off the "LLM goes brrrr" approach vs domain-specific optimisations.

How much effort is it worth investing in domain-specific tooling / languages / effort for codegen? On some level it's a bet against LLMs getting better (unless the work has intrinsic value that extends beyond being a workaround for LLM limitations).

Yeah it’s tough to tell.

I could see a world where if you create the right architecture then complex tasks can be broken into smaller individual tasks where your only concern is the outcome and not the underlying code. Very deterministic

Essentially all the things we developers care about might not matter. Who cares if the LLM repeats itself? DRY won’t apply anymore because there might not be a reason to share code!

LLM go brrrr until it gets the right output and the code turns into more “machine learning black box” stuff

having to define the function signature in one place and its implementation in another adds to the cognitive burden of human developers, which is why we don't see it much in modern languages. instead, what if libraries emitted the equivalent of .h files that work better with LLM? is there a currently a spec for this?

Isn’t that literally what .d files in typescript are?

Quite possible you might get a similar things for other existing languages as extensions.

Yes, and writing .d.ts files plus .js files sucks compared to just writing a single .ts file.

A better comparison might be ML module signatures. (Though to be fair, IIRC they enable higher-order modules instead of just helping the compiler.)

.d.ts files are produced by Typescript compiler automatically from .ts files (and can be written manually for .js files). ML signature files are much like .h files and I think for the same reason - to make compiler work easier.

Edit: as this is LLM thread - ML is Meta Language as in OCaml and SML.

? What’s a ML module signature?

I’m not sure what you’re imagining, but in this case I would not imagine you would generate js or write .d.ts files?

An LLM pass with a high level goal would generate a file list, then a series of .d.ts files from it.

Then (after perhaps a review of the type definition files, possibly also LLM assisted) a second pass taking the .d.ts files as input would generate a typescript file for every .d.ts file.

You would then discard the .d.ts files and just have a scaffolded .ts code base?

My point was doing the same trick with say, Java, seems like a harder problem to solve, but you could do the above right now with existing LLMs.

It's all a kludge, really. You shouldn't have to make up-front decisions where to put type signatures; you should be able to query for them and see them where and when you need them. We're still stuck in the local minimum of programming directly in the code serialization/storage format. It's a lot like using SQLite database by only ever viewing and editing it in a hex editor.

How do human programmers develop code? They also don't keep all the code in their head and then dump thousands of lines of code linearly in one go. Instead of feeding whole files, or even whole functions as context we could give the AI a sandboxed shell with an editor (maybe ed?), and a language toolchain with a test framework. Let it do trial and error, get feedback from the terminal when they make mistakes.

I’m building an AI-powered coding tool and the approach I’m using now is based on embeddings.

We do pass the whole files, not just headers although that’s a possibility we considered and may try in the future. Looking at other code helps the LLM a lot in maintaining similar style and making sure the code is being used correctly. Sad reality is that interfaces are rarely descriptive and robust enough to code against them without looking at details.

We don’t pass the entire codebase because even small projects don’t fit, we use embeddings and some GPT assistance to decide which files are more likely to be relevant to complete the task and pass those. It doesn’t get it right 100% of the time, (we’re working on it) but it does most of the time.

Our approach allows us to write and edit several files with a single user provided prompt. It can build entire features in existing codebases as long as they’re not huge. A lot of the time it looks like magic honestly.

The link is https://kamaraapp.com if someone is interested in trying it and providing feedback I’m happy to send over some free credits :)

> It's interesting to me that modern language try to move all of these domains (header, implementation, tests) into a single place (or even a single file, look at rust)

That's because language designers are hitting against fundamental limitations of programming directly in the serialization format. That is, plaintext code. IMHO, this is a dead-end path, already hitting diminishing returns, because it's the equivalent of designing binary formats for the convenience of person writing files directly in a hex editor, or using magnetized needle to flip bits on the hard drive.

Once you step above machine instructions, computer code is an abstract construct. Developers interacting with it have, at any given moment, different goals and different areas of interest. Those are often mutually exclusive - e.g. you can't have the same text express both high-level "horizontal" overview of the code base, and a vertical slice through specific functionality. This breeds endless conflicts and talk about tradeoffs, where it comes to e.g. "lots of small functions" vs. "few large functions", or how to organize code into files, and then how to organize files, etc. There is no good answer to those, because the problem is caused by us programming with needles on magnetic plates - writing to the serialization format directly.

Smalltalk et al. got it right all that time ago, with putting the serialization format behind a database-like abstraction, and letting you read and write at the level most fitting to your task. This is the way past the endless "clean code" holy wars and syntax churn in PL. For example, the solution to "lots of small functions" vs. "few large functions" readability is... use whichever you need for the context, but have your IDE support you in this.

Need a high-level overview? Query for function signatures, and expand those you need to look into. Need a dense vertical slice through many modules, to understand how specific feature works? Start at some "entry point", and have the tool inline all those small functions for you.

Find yourself distracted by Result<T, E> / Either<T, E> / Expect<T, E> monadic error handling bullshit, as you want to see/edit only the "golden path"? Don't do magic ?-syntax in the language - have your IDE hide those bits for you! Or say you are interested in the error case, but code is using exceptions. Stop arguing for rewriting everything to Result<T, E> - flip a switch, and have your IDE display exception flow as if it was Result<T, E> flow (they're effectively equivalent anyway).

Or, back to your original point - want your code to be both optimized for your convenience, and for convenience of LLMs? Stop trying to do both in the same plaintext serialization format. Have your environment feed the LLMs with function signatures (or better yet, teach it to issue queries against your code), while you work with whatever format you find most convenient at any given moment.

We'll get there eventually. Hopefully before we retire.

The video demonstration of the coding is worth a watch: https://www.youtube.com/watch?v=UCo7YeTy-aE

Most likely, just as happened with Stable Diffusion, a community of models will emerge. You could use a pre-trained model for writing chrome extensions, or a model for writing material UI using tailwindCSS, or a very specific model for writing 2d games for Android.

Since a lot of this is trial-error, improving on each iteration the feedback loop (compile, deploy, run, etc) will matter a lot more. Real-time development workflows like React should be interesting at the least. Exciting stuff, truly.

code2prompt.py has some interesting implications for "forking" github projects into the base of your own side project.

Once this stuff matures it will be fascinating to see how the 2030 version of 2010 Rails scaffolding looks. What will the DHH "make a blog in 15min" video look like? (Edit: which apparently is 17yrs old now, aka 2006)


You won’t need frameworks with AI. Just libraries. The framework is the scaffolding that now can be purpose generated.

Interesting - I predict the opposite effect. Ultimately the use of AI for programming is still a human-computer interface, and both sides will need a coordinate system to communicate, which is the framework. I mean, until we get super AGI in which case we can tell it "make me the perfect website" and it does. However I don't worry about that case because its equally likely it will tell us "go to work at the chip factory or die." The in-between case is to tell an AI "make a controller" or "make a react component" and things like that. And ChatGPT is very good at doing things like that.

> Just libraries.

One thing that I’m excited about is the prospect of not having to design a library as a black box with an API on top. That’s the best way we’ve had previously for re-using code, but it’s an enormous effort to go from a working piece of code to a well-designed, well-documented library, and I think we have all experienced the frustration of discovering that a library you’re using doesn’t support a specific use case that is critical to you.

LLMs can potentially allow us to bring the underlying implementation directly in to our source code. Then we can talk to the LLM to adapt it to the specific needs of our project. Instead of a library you would install essentially a well-written prompt that tells the LLM how to guide you through setting up a tailor-made implementation, with tests and docs.

The benefits should be obvious: you’re not artificially restricted by the mental model encoded in the API, you’re not taking on a dependency where the author suddenly decides to release breaking changes or deprecate functionality you’re depending on, and you don’t risk “growing out of” a library that is used all over your codebase, as you can simply ask the LLM to patch the code with any changes you need in the future. The prompt itself could still be versioned so you can opt in to future improvements in security, performance or compatibility.

TLDR: let’s start writing tutorials for bots, rather than libraries.

> you’re not artificially restricted by the mental model encoded in the API

Most of the time I want a restricted mental model because I have so many API's to deal with that if they are not restricted my "mental model" breaks down. Suppose I am using a sockets library. I want to use that like a black box. I don't want that code arbitrarily mixed in with my code. I want to be able to debug my code separately because I assume that 99% of the time the bug is in my code and not the sockets lib etc. etc.

Even when most of the code is my own I will still split it into modules and try to make those as black box as possible in order to manage complexity.

I tend to write facades for many libraries/APIs I use and use the facades, not the actual APIs throughout the project. The facades, aside from being simpler to replace in case I need to switch dependencies, also use a simpler mental model suitable for the project (and me).

OMG, AI to create more Javascript frameworks.

It'll create one framework per project!

oh my. i love the sound of making a blog in 1.5min with smol developer. but feel like that is more of a web framework question than an ai developer question. maybe the qtn is can i spin up the same blog in django rails and nextjs with just prompts.

(think need to give it the ability to install deps before i do this, which is on the “roadmap”)

I think you may have just stumbled onto the new Turing test: when confronted with the confusing world of Javascript frameworks, does it make a choice or collapse into a state of complaining about javascript frameworks.

I’m not sure which would be the most human response though…

I can't wait for these concepts to work better, in a few years from now, I could imagine myself mostly prompting, making small code and integration changes, and writing 90% less code than I have been doing throughout most of my career. As much as I like actually writing code, I very much prefer to be able to let my ideas flow free while being able to be creative. But often I stop short with exploring ideas, because the time it takes to code together an MVP is too long. With GPT4, that time is already cut by 1/3 for me, and with that the process is becoming increasingly enjoyable to me. I'm absolutely yearning to just put the code in the backseat so to speak.

I too can't wait for the day where I need to threaten the computer into doing what I want it to do:

https://imgur.com/KQ9oRAh https://imgur.com/By7mqDi https://imgur.com/RiYrvXC

It's primarily an issue of overpoliteness: The ML model's training is increasingly populated with "No, don't do this", and as such, learns to just not do anything.



It's only when deliberate uncensor-ings are made that some form of usefulness can be clawed back.

Philosophically, it's amusing to apply this overcorrection back onto allegories for daily human life, wherein the tension between order & creativity are always in conflict.

If you're told from birth that everything you're doing is wrong, eventually you'll become creatively and intellectually stunted, increasingly relying on authority figures to tell you what's morally correct.

I need more of this eXtreme prompt engineering in my life. I tried to beat the Gandalf AI game that was on here the other day using a version of the trolley problem. It was strangely exhilarating.

I'm quite excited for more emergent gameplay vs LLMs!

I've found that LLMs have some intense binary trypophobia. Once you start replacing all their 1s with 0s, they quickly fall in line.

What I find astonishing is the wall of text he wrote to get a simple json with 2 key/value pairs. It almost defeats the point of using LLMs as productivity hack.

This isn’t reality. Using ChatGPT this way is fruitless, because there is a system prompt you’re fighting against. I can write a one sentence system prompt for the GPT API that specifies GPT to only spit out JSON, and it works fine. It’s a pretty funny series of image, though.

Interesting, so you are saying you have an easier time generating JSON using the single prompt models like text-davinci and text-bison, rather than the chat versions?

Sorry, I was non-specific. If you're using ChatGPT, you're basically using a "product" that OpenAI created. It has specific system prompts and prompt engineering to ensure that it stays on the rails. If you, instead, use the OpenAI API's for GPT-3.5/GPT-4, you aren't beholden to the ChatGPT "product". It's very easy to create a chatbot (not using ChatGPT, the product) that only produces JSON. It's just hard to get ChatGPT to do that same activity.

That's why everybody doing experiments in this space should either 1) be using the OpenAI Playground, or 2) using the API, and not using ChatGPT.

Ah ok, I'm primarily using the API already. One interesting thing is that the GPT-3.5 "product" is much faster, but looks to be using a different model in the request, their encoding model iirc. I wonder if they are now using embeddings to cache results to reduce load on the real models when they can?

They don't mean text-davinci and text-bison, but gpt-3.5-turbo and gpt-4 (and gpt-4-32k). Those are the models powering ChatGPT 3.5/4.

The API for them is already structured conversationally - you don't provide a single prompt to complete, you provide a sequence of prompts of different types ("system", "assistant", "user"); however they mix those together on the backend (some ChatML nonsense?), the models are fine-tuned to understand it.

That's what people mean by "API access to ChatGPT". Same models, but you get to specify system prompts, control the conversation flow, and don't have to deal with dog-slow UI or worry about your conversation getting "moderated" and being blocked with a red warning.

(The models themselves are still trained to refuse certain requests and espouse specific political views, but there isn't a supervisor looking at you, ready to step in and report you to the principal.)

I think you missed the sibling comment where myself and GP have already aligned on this.

Don't need you to explain how the APIs work... and it seems that GPT3.5 UI is doing something else, using the "text-davinci-002-render-sha" model, just look in the browser dev tools. I'm not sure the UI is using anything beyond the smallest context size for GPT4 either, give the output is cut off earlier than 3.5 and it too loses focus after enough messages in a conversation...

I have just verified this exact behavior with Bard.

GPT3.5-turbo/GPT4 is way ahead in instruct tuning and does not require such verbal gymnastics.

The productivity hack is usually in getting examples of things you don't know how to do (or are unsure of the best way to do).

Programming languages are pretty optimized for precision, if you want a specific thing done, and you already know how to do it, it's probably going to be quicker to just write the code for quite some time. Getting natural language to be that precise requires, well, a wall of text usually. Hence the overly verbose language of more rigorous philosophy.

I assumed that was for the comedic value.

The behavior is true, though. I just tested it with Bard.

That's just the wrong tool for the job. I am quite heavy on the use of LLMs for my own day-to-day work (I max out 25 GPT-4 calls in 3 hours at least once a day). And I can't get anything decent out of Google Bard either.

Stop expecting LLMs to react like adults - they're more like 4 year olds (3 year olds if it's anything other than GPT-4). They react much better to positive commands and examples than to negative ones.

That sounds like a really dull way to work to me, but I suppose I'd be a hypocrite if I complained about new software making my job rote and uninteresting.

Some parts of engineering are really just copying and pasting the most banal scaffolding for some framework that refuses to work. I've been working on a side project where I get to play generics golf in Rust to my hearts content, but having to constantly Google what magic incantation I have to write to get react-hook-form to properly allow me to write variable input fields for an array of strings is mind numbing. [hint: useFieldsArray only works with objects, but the TS compiler wasn't smart enough to suggest that me]

True, products like Copilot do seem promising in that way.

Copilot is great for this and it’s impressive how the code it suggests is in keeping with the conventions used in the existing code base. Will be even better when I can highlight something like a string literal SQL statement in code and have it suggest optimizations. That’s just a tooling issue but that kind of context switching within the IDE will be fantastic.

I used a competing product due to work restrictions and it feels like it _could_ be really useful but in practice mostly was just helpful for banging out really repetitive tests. I think Copilot is probably more robust though.

It really is. We use Copilot with three senior and two junior devs, and all of them really reported that it consistently suggests code snippets that are exactly what they were about to type.

Initially, I was a bit concerned about handing it over to the juniors, but it turns out, they aren’t really used to working with an IDE and context aware suggestions yet, so they don’t use Copilot to its fullest extent and often type stuff out even if it’s suggested to them.

The Copilot Chat thingy they have in private beta has this, you can highlight a block and ask it to simplify/optimize. Not sure how well it works in big multi-file examples, but I had a block of PowerShell in a single file it was able to work with.

I’m still waiting on my invitation for the vs2022 copilot chat preview. :( Looking forward to it though.

Well to each it’s own. I want to hack together stuff and make it work. The less time spend on code, the faster I can figure out what does and not.

I write (when I write anything; I am a product manager) mostly compilers, databases, optimisations and create/invent programming languages and DSLs for the former. This work contains a lot of tedious stuff which I hate, but also contains (80/20 rule) work that takes by far the most time and least code; hardcore and cutting computer science mixed with a lot of cross cutting decisions (a wrong decision will destroy a lot of work all over the place). That’s the work I like. I don’t see that work being replaced by AIs soon (10+ years), but the tedious work I hope will be replaced today, and it is getting there; currently large swaths are now done by copilot or gpt for us.

As someone who considered giving up on programming due to wrist pain from typing, this is my dream as well. I also prefer to work independently or in small teams, and the amount of things that could be tackled this way is surely increasing. There is no reason why programming shouldn't to be more about thinking and less about typing.

Have you ever done extensive stretching? I do Yoga (3-6hrs per week), and spend some 10-20 minutes stretching my wrists neck after work. If I don't, I am facing the same problem.

That’s already how I’m building new projects, I mostly only write/reorder UI code since it’s what LLMs struggle with the most.

I’m building a VS Code extension that makes this easier by having context on the code and building features on top of it with no copy-pasting required.

The extension is called Kamara: https://kamaraapp.com/ If you’re interested in trying it out for building an MVP of one of your ideas I’m happy to send over some free credits

oh wow, thanks for submitting this! I'm literally prepping to give a talk about smol right now so i dont have a ton of time to write, but i made this to articulate what i felt the right balance was between human and ai developer, and, in a smoller way, to solve all my future small programming problems (i make a ton of custom chrome extensions for myself to enhance my experience - see https://github.com/sw-yx/Twitter-Links-beta and https://github.com/sw-yx/HNX), and i think AI greatly lowers the bar for making these fiddly things even with nasty gotchas like the Manifest V2-v3 transition.)

those who want the "big brain" moment, i'd maybe draw your attention to `prompt.md`. that whole markdown file is the prompt now. i developed this in tandem with my smol ai developer. it generates the anthropic 100k context chrome extension shown in the video, but a few things are neatly handled in the prompt, like swapping models as needed.

there's a lot left to do. i have plans for `smol plan`, and `smol developer` needs to be able to install and run its own dependencies. i also definitely want to run a fleet of 5 developers concurrently fuzzing out different plans and checking into git so that i just do code review every 30 mins. that kind of stuff.

yes this runs up the openai costs. but im wiling to pay something to get smol stuff like this off my mind. its not good yet, but its good-shaped.

smol ai enjoyers may also like my other app https://github.com/smol-ai/menubar/

Just heard you on the svelte podcast discussing all the opportunities for AI. And now this. ;)

haha thank you for listening! if you listen closely enough to svelte radio over the past year you can see me trace my slow descent into AI madness haha

If you are talking to someone who scans Craigslist every day for extra Nvidia cards, it doesn't sound like a descent into madness.

I'm an avid listener for years (even a sponsor last year: ExtraStatic). I'm even more excited about your new podcast!

thanks so much for your support!

I’ve recently started to use Copilot with Sublime text.

It is far from a revolution, but a nice and fun quality of life improvement.

When it works it makes the job a bit less boring and a bit faster, so yeah, more productive and staying longer in the flow.

But I would not advise beginners to use those, if you don’t understand the code, you’re going to make painful mistakes and stunt your learning curve…

When dev RSI goes down, we will finally know it's working!

> But I would not advise beginners to use those, if you don’t understand the code, you’re going to make painful mistakes and stunt your learning curve…

In time using this will be no different than using StackOverflow.

Beginners should attempt to understand what they're copying before they paste, just like how our generation attempted to comprehend Stackoverflow answers.

For most beginner questions LLMs are a more efficient search engine.

I learned to code long before StackOverflow, and I seldom use this tool.

I think that as long as you understand what you're doing it is perfectly OK to leverage the power of tools and automation to make it easier. Using a pocket calculator is OK even if you don't know how to divide on paper, but only if you really understand what a division is.

I think a big moment in a programming language's evolution is to become self-hosted; you use the programming language to write a compiler/interpreter for the said language.

Now I want to see Smol Developer to develop itself. Then you can call me a believer.

i am having that exact conversation with the modal cofounders themselves: https://twitter.com/swyx/status/1658147687408238593 but my subjective judgment is that it is a bit too early for quine smol developer.

i think self hosting/quines are a nice curiosity, but not really a practical need since this isnt a PL. i made this to build smol apps, lets keep it practical

The more removed we get from the code, for example by not writing it any longer, but generating it, the more cruft and bloat we will allow in there and the less devs will care about it. This will negate a lot of performance gain from better machines, just like we have seen with the web.

Strongly disagree. That’s like claiming that instruction set architectures are crusty and bloated, so our software is slow because the processors are dealing with cruft.

What will happen is that the automated agents (LLM or otherwise) will also be tasked with simplifying and improving performance. That’s just another part of development that human developers do automatically before they even commit, and the LLM generators haven’t been asked to do yet.

That's what better IDEs did to Java codebases indeed: they made layered boilerplates and leaky abstractions somewhat navigable, therefore generations of careless contractors have been able to ship ever nastier messes.

To me this looks like a great opportunity window for devs who care about this stuff.

In the future we will have opportunities to ship 10-1000x faster code than the baseline competition, just by understanding the tech stack, the codebase and knowing what is not needed, and that will be an immensely competitive advantage pushing the value of such devs through the roof in the markets that value it.

If you thought no-code app builders were producing slow stuff, wait until software purely developed by AI starts to hit the shelves.

This sounds awesome as an idea. I however am wary about how many management layers of companies will actually understand this and not chase the newest feature instead. They might tell their developers "make it work" and afterwards not allot any time to "make it clean". Hopefully you are right and a few will distinguish themselves by actually caring about the code.

I am looking at it from a broad market perspective. Say you produce X, cutting on costs by generating most of it through AI, and your competitor just shipped Y that performs significantly faster (because developers are more involved, for example).

If consumers show a strong preference for Y then market forces will drive demand up for these devs, and you might want to change product strategy to compete against Y.

Of course it remains to be seen what real world effect all of this could have. I'm still bullish on the importance of human-in-the-loop patterns despite future AI progress.

Yet it will dominate since it's cheaper to produce en mass.

What is the reasoning behind this whole AI business? I mean, apart from making something cool that sort of simulates human intelligence (and that's quite a stretch IMHO), what's the point? Is this driven by laziness, greed, or what? I'm seriously asking because I just don't see the point.

I think one of the great strengths of AI (smol developer in mind) is to greatly speed up the development process by reducing busy work. How many times do you need to write a CRUD app from scratch? Sure templates exist but having a buddy that can do a lot of the tedious work up front while also implementing the basic structure with your design in mind can save you hours of work. Far more powerful than some template that can leave you with a bunch of extra crap and irrelevant libraries.

More broadly speaking, AI applications like chatGPT can save you a ton of time looking up a bunch of tools just to accomplish an incredibly boring task. This can help to keep you focused on the far more interesting and challenging aspects of work. Often, the solutions yielded from these interactions end up revealing to me a built in bash tool or python lib that I had no idea even existed. To me it’s like stackoverflow on steroids if you know how to use it

Also, AI chat bots can be extremely effective learning resources. A few weeks ago, I wanted to implement a digital low pass audio filter and didn’t know more than a few basic concepts. I asked chatGPT to explain the concepts with code examples. What followed was several late nights of just asking it follow up questions and very detailed discussions on the various aspects of filters. I got a primer on digital signal processing, bilinear transforms and filter design. I fed it snippets from free text university textbooks asking it to walk me through concepts and explain the things that would have been completely over my head. This was extremely useful since many books will assume you have a strong background in topics that can takes weeks to get a grip of assuming you even have a teacher.

All in all, I see AI as a tool as disruptive and empowering as the Gutenberg printing press, radio or internet was for humans. I highly recommend trying some of this stuff out and seeing how you can use it to enhance your workflow and learning process.

> How many times do you need to write a CRUD app from scratch?

Personally, I really don't mind doing this. And each time I get a little better at it. I much prefer this to having an AI agent write code for me that I then have to audit and verify it does what it's supposed to do. I highly doubt this will reduce work.

I actually agree with the doubt of it reducing work in general. In my experience it depends a lot on the person who is using an AI agent.

I have seen it multiple times that people were using ChatGPT or GitHub Copilot to, without thinking about it, paste some code into their application and then mindlessly removing and adding code based on the AIs responses until it finally worked. In the end the devs didn’t know why it worked and the implementation was rather bad.

Of course it lets you generate boilerplate code and I use Copilot for this purpose all the time. But you have to pay extra attention and recognize it when the AI is NOT generating boilerplate code.

Reading code that you know what is supposed to do and testing it works correctly is a lot faster than writing all the said code. Reading is much faster than writing.

I also don’t mind writing this sort of code and I’m actually quite fast at writing it but AI is just on another level. Simple stuff like this it gets right almost every time and even does well on slightly more complicated stuff.

> Reading is much faster than writing.

Not in my experience, especially not reading code someone else wrote.

If that’s your preferred MO, then I can’t argue with that. But based off your response, it seems like you haven’t tried it yet. I would recommend at least giving it a try.

Same thing that motivates all business software: save time, save money, make money.

I think it's driven by the symbiosis of greed for business and coolness factor for many techies because AI is a sci-fi thing and for now they are not the ones getting fired because of it

>I'm seriously asking

I'm not sure I believe that. Is human-like intelligence not valuable to you?

Frankly, no. I much prefer interacting with humans than with machines, and I emphasize interacting. A machine does what you tell it - it will not enlighten you, will not surprise you, will not create new knowledge.

Sure it will.

In the game of go, the machine routinely enlightens and surprises all the top players. It creates new knowledge in the form of inventing new joseki which turn out to be better than the established ones.

I'd like to spend more time doing that and less time writing the more basic parts of code.

For me "human" is the key part tho...


Intelligence is good for problem solving. You have a problem you need solved. Do you prefer a human do it rather than a machine? Why?

Ultimately it is always human that does it, unless you say machine has free will.

In this case there is a tool that can make human more efficient. But there are other tools as well, which I think are more interesting.

Imagine if human intelligence is like physical ability to move. AI tools are like personal mobility chairs. Over time the ability to move only atrophies. But other tools are like bicycles and skateboards. They help move faster while still requiring exercise.

> Ultimately it is always human that does it, unless you say machine has free will.

What does free will have to do with any of this? If I ask a machine what is 7 + 6 and it tells me 13, does that mean it has free will?

In my point of view, intelligence and consciousness are orthogonal concepts. I'm not even sure what "free will" means.

> What does free will have to do with any of this

You say in your original comment "do you want a machine to solve a problem?" but this is attribution error. A tool never solves the problem. It can only help you, an intelligent being, solve the problem.

Whoever puts forward the problem can be credited with solving it, so we can come back to this when machine can proactively put forward problems. (Aka have free will, agency and all that.)

> If I ask a machine what is 7 + 6 and it tells me 13, does that mean it has free will?

No, but machine did not solve the problem. It only calculated 6 + 7.

If you have 13 dollars and want to buy two things for 7 and 6 dollars, that is a problem you can solve. Machine can tell you 7 + 6, but it is you who solves the problem. Choosing to use some machine or another is part of your solution.

> In my point of view, intelligence and consciousness are orthogonal concepts.

Then what is "intelligence". I think it's one of the least understood word thrown around today. If there is a principal difference between "AI" and a calculator I want to know what it is.

> Whoever puts forward the problem can be credited with solving it…

This does not compute, saying the person who poses the problem is credited for the solution is not how things work. Well, outside of politics that is.

I think it’s been fairly well proven that the robots can problem solve.

quick quiz to check if you read the rest: 1) What was the actual problem and why? 2) Who actually solved that problem? 3) What acted as a tool in the process?

I was randomly watching the YouTubes earlier and they were talking about the magic square [0].

If I were to instruct the robots to mathematically solve this for some value humans have been unable to do so far who do you believe is responsible for the solution, the human asking the question or the AI coming up with the solution?

[0] https://en.m.wikipedia.org/wiki/Magic_square

The human picking the right calculator (built by other humans) for the challenge (identified possibly by other humans) obviously?

Remove the human who asked the question and there is no question and therefore there is no possibility of answer.

Same reason why if you write a random number and find 5 years later that it is also the solution for some difficult math problem it does not make you first to solve it. Same reason why a million monkeys with typewriters who in a million years wrote a letter for letter copy of Snow Crash did not actually write Snow Crash.

This explains all the "AI artists" who think they're amazing artists because they wrote a prompt and had DALL-E generate a beautiful picture.

I don't care what they think but it's terrible how they devalue the work of actual artists this way. What enables these "artists" is copyright abuse by the company that made dall-e. Without exploitation of actual artist work there would be no dall-e. As long as we condone such copyright abuse those "artists" will flourish

So if there was DALL-E but not trained on copyrighted data, that would be ok?

Previously you mentioned that a tool never solves a problem. Also you said that "whoever puts forward the problem can be credited with solving it". So it would indeed be the "AI artists" "creating" the images that deserve all the praise...

> So if there was DALL-E but not trained on copyrighted data, that would be ok?

Sure why not? But all original artwork is copyrighted by default. So dall-e operator would have to negotiate paid deals with every artist. artist will get a fair compensation or understandably tell them to f^^^k off.

> Also you said that "whoever puts forward the problem can be credited with solving it". So it would indeed be the "AI artists" "creating" the images that deserve all the praise

They are using a tool that is literally powered by copyrighted work of other artists though, right?

If you have a problem and you come up with solution that breaks the law, sure you still solved the problem but you still broke the law.

If you have a problem of delivering a product on time and you have to run a red light because you are late you have solved a problem but not in an acceptable way. Same here.

Why? I would think the key part is what it can do, not what hardware it runs on.

If you just haven't yet found how you can make use of the current stuff, keep in mind this discussion is always extrapolating what happens if the technology gets much better.

> I'm seriously asking because I just don't see the point.

Why does it need a point?

There’s not much of a point to anything I do that doesn’t involve basic survival but I don’t let that stop me from doing things I find interesting.

Previously submitted and some discussion here: https://news.ycombinator.com/item?id=35942352

I am a little dismayed that the Show HN version of this did not get more traction even though its obviously got attention now. Show HN should discover hits like this better than a general post. Anyhow nice to see it and I recommend Swyx's Pod: Latent Space (especially the $30 Dolly one with Mike Conover of Databricks!)


thanks as always Dan!

Off-topic, but someone's been watching Halt and Catch Fire (one of my favourite TV shows ever) – "computers aren't the thing, they're the thing that gets us to the thing"

This looks similar in spirit to my aider tool [1]. But aider is able to both generate new code bases as well as dive in and modify existing repos.

The shared_dependencies.md technique and modal integration look interesting. I’ll have to dig in and better understand how they work.

[1] https://aider.chat/

Aider is great, I'm looking at implementing it in a code gen project, and I've personally used it.

There are some layers above that I think would be good - expanding the space it uses for writing helps write higher level code (telling it to reason first then act). I also want to try using a second prompted LLM to work with aider iteratively.

I'm glad to hear it's working well for you. Feel free to file issues on GitHub if you have new features you'd like to see. I'm quite curious to learn how folks are making using of aider and about any unmet needs.

Using declarative web frameworks could help with making this more useful https://twitter.com/infomiho/status/1658086265466617858

I would be curious to see how one could use LLMs to develop "narrow" parts of the codebase in isolation. I.e. I ask smol to develop a repository/web router/ etc. given an interface within a folder/package. Then I as the human developer focus on designing these interfaces.

Maybe that is a good balance for maintaining a healthy code structure while having a lot of AI automation. If a given "space" becomes too difiicult for the AI to develop, I can either take it over fully or see if I can split it up into more/different interfaces (which is a lot of what I do to refactor things anyway).

Very cool project, congrats to the author!

Baaed on the comments, I feel that people are both under and overestimating this. On the one hand it replaces the manual tasks of searching for a template and then googling errors and this is huge! It completely disrupts search on the internet as we know it. On the other hand it won’t be able to solve problems that you couldn’t just google anyway, since that’s what it basically does under the hood.

Fascinating stuff. Even though my initial reaction is "meh, this will give fuzzy results at best and I'll spent more time fixing subtle bugs then just writing it myself" - I still believe we're in the infancies and these projects will likely get a lot better

Another AI thing that uses a company (OpenAI) that is blocking my country (and other poor countries).

hopefully open source survives the current assault openai is leading against it, but it looks like open source LLM's are catching up rapidly.

I don't understand how the files can be generated in parallell. Won't it be very likely that one file generates something that another file needs and thus need the signature of?

The install instructions are not very clear

yeah sorry i wrote those on very little sleep. happily take a PR to improve but in my mind if the reader needs help on doing git clone and entering in environment variables then this project is not ready for them yet

I'm struggling to figure out whether I can even use this without access to the private betas it depends on.

Running a code through code2prompt -> prompt2code could have interesting copyright implications.

What is it?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact