Yes, it's a mess, and there will be a lot of churn, you're not wrong, but there are foundational concepts underneath it all that you can learn and then it's easy to fit insert-new-feature into your mental model. (Or you can just ignore the new features, and roll your own tools. Some people here do that with a lot of success.)
The foundational mental model to get the hang of is really just:
* An LLM
* ...called in a loop
* ...maintaining a history of stuff it's done in the session (the "context")
* ...with access to tool calls to do things. Like, read files, write files, call bash, etc.
Some people call this "the agentic loop." Call it what you want, you can write it in 100 lines of Python. I encourage every programmer I talk to who is remotely curious about LLMs to try that. It is a lightbulb moment.
Once you've written your own basic agent, if a new tool comes along, you can easily demystify it by thinking about how you'd implement it yourself. For example, Claude Skills are really just:
1) Skills are just a bunch of files with instructions for the LLM in them.
2) Search for the available "skills" on startup and put all the short descriptions into the context so the LLM knows about them.
3) Also tell the LLM how to "use" a skill. Claude just uses the `bash` tool for that.
4) When Claude wants to use a skill, it uses the "call bash" tool to read in the skill files, then does the thing described in them.
and that's more or less it, glossing over a lot of things that are important but not foundational like ensuring granular tool permissions, etc.
One great thing about the MCP craze, is it has given vendors a motivation to expose APIs which they didn’t offer before - real example, Notion’s public REST API lacks support for duplicating pages.. yes their web UI can do it, calling their private REST API, but their private APIs are complex, undocumented, and could stop working at any time with no notice. Then they added it to their MCP server - and MCP is just a JSON-RPC API, you aren’t limited to only invoking it from an LLM agent, you can also invoke it from your favourite scripting language with no LLM involved at all
I remember reading in one of Simon Willison's recent blog posts his half-joking point that MCP got so much traction so fast because adding a remote MCP server allowed tech management at big companies whose C-suite is asking them for an "AI Strategy" to show that they were doing something. I'm sure that is a little bit true - a project framed as "make our API better and more open and well-documented" would likely never have got off the ground at many such places. But that is exactly what this is, really.
At least it's something we all reap the benefits of, even if MCP is really mostly just an api wrapper dressed up as "Advanced AI Technology."
Well. I bet Notion simply forget some of APIs are private before. I started developing using Notion APIs on the first day it got released. They have constant updates and I have seen lots of improvement. There is just no reason why they intentionally want to make the duplicate page API on MCP but not api.
PS. Just want to say, Notion MCP is still very buggy. It can't handle code block, nor large page very well
> There is just no reason why they intentionally want to make the duplicate page API on MCP but not api.
I have no idea what is going on inside Notion, but if I guess - the web UI (including the private REST API which backs it), the public REST API, and the AI features are separate teams, separate PMs, separate budgets - so it is totally unsurprising they don’t all have the same feature set. Of course, if parity were an executive priority, they could get there-but I can only assume it isn’t.
Pretty true, and definitely a good exercise. But if we're going to actual use these things in practice, you need more. Things like prompt caching, capabilities/constraints, etc. It's pretty dangerous to let an agent go hog wild in an unprotected environment.
Oh sure! And if I was talking someone through building a barebones agent, I'd definitely tag on a warning along the lines of "but don't actually use this without XYZ!" That said, you can add prompt caching by just setting a couple of parameters in the api calls to the LLM. I agree constraints is a much more complex topic, although even in my 100-line example I am able to fit in a user approval step before file write or bash actions.
when you say prompt caching, does it mean cache the thing you send to the llm or the thing you get back?
sounds like prompt is what you send, and caching is important here because what you send is derived from previous responses from llm calls earlier?
sorry to sound dense, I struggle to understand where and how in the mental model the non-determinism of a response is dealt with. is it just that it's all cached?
Not dense to ask questions! There are two separate concepts in play:
1) Maintaining the state of the "conversation" history with the LLM. LLMs are stateless, so you have to store the entire series of interactions on the client side in your agent (every user prompt, every LLM response, every tool call, every tool call result). You then send the entire previous conversation history to the LLM every time you call it, so it can "see" what has already happened. In a basic agent, it's essentially just a big list of strings, and you pass it into the LLM api on every LLM call.
2) "Prompt caching", which is a clever optimization in the LLM infrastructure to take advantage of the fact that most LLM interactions involve processing a lot of unchanging past conversation history, plus a little bit of new text at the end. Understanding it requires understanding the internals of LLM transformer architecture, but the essence of it is that you can save a lot of GPU compute time by caching previous result states that then become intermediate states for the next LLM call. You cache on the entire history: the base prompt, the user's messages, the LLM's responses, the LLM's tool calls, everything. As a user of an LLM api, you don't have to worry about how any of it works under the hood, you just have to enable it. The reason to turn it on is it dramatically increases response time and reduces cost.
Very helpful. It helps me better understand the specifics behind each call and response, the internal units and whether those units are sent and received "live" from the LLM or come from a traditional db or cache store.
I'm personally just curious how far, clever, insightful, any given product is "on top of" the foundation models. I'm not in it deep enough to make claims one way or the other.
You have a great way of demystifying things. Thanks for the insights here!
Do you think a non-programmer could realistically build a full app using vibe coding?
What fundamentals would you say are essential to understand first?
For context, I’m in finance, but about 8 years ago I built a full app with Angular/Ionic (live on Play Store, under review on Apple Store at that time) after doing a Coursera specialization. That was my first startup attempt, I haven’t coded since.
My current idea is to combine ChatGPT prompts with Lovable to get something built, then fine-tune and iterate using Roo Code (VS plugin).
I’d love to try again with vibe coding. Any resources or directions you’d recommend?
If your app has to display stuff, you have no code kits available that can help you out. No vibe coding needed.
If your app has to do something useful, your app just exploded in complexity and corner cases that you will have to account for and debug. Also, if it does anything interesting that the LLM has not yet seen a hundred thousand times, you will hit the manual button quite quickly.
Claude especially (with all its deserved praise) fantasizes so much crap together while claiming absolute authority in corner cases, it can become annoying.
That makes sense, I can see how once things get complex or novel, the LLMs start to struggle. I don't think my app is doing anything complex.
For now, my MVP is pretty simple: a small app for people to listen to soundscapes for focus and relaxation. Even if no one uses, at least it's going to be useful to me and it will be a fun experiment!
I’m thinking of starting with React + Supabase (through Lovable), that should cover most of what I need early on. Once it’s out of the survival stage, I’ll look into adding more complex functionality.
Curious, in your experience, what’s the best way to keep things reliable when starting simple like this? And are there any good resources you can point to?
You can make that.
The only ai coding tools i have liked is openai codex and claude code.
I would start with working with it to create a design document in markdown to plan the project.
Then i would close the app to reset context, and tell it to read that file, and create an implementation plan for the project in various phases.
Then i would close context, and have it start implementing.
I dont always like that many steps, but for a new user it can help see ways to use the tools
I already have a feature list and a basic PRD, and I’m working through the main wireframes right now.
What I’m still figuring out is the planning and architecture side, how to go from that high-level outline to a solid structure for the app. I’d rather move step by step, testing things gradually, than get buried under too much code where I don’t understand anything.
I’m even considering taking a few React courses along the way just to get a better grasp of what’s happening under the hood.
Do you know of any good resources or examples that could help guide this kind of approach? On how to break this down, what documents to have?
> Do you think a non-programmer could realistically build a full app using vibe coding?
For personal or professional use?
If you want to make it public I would say 0% realistic. The bugs, security concerns, performance problems etc you would be unable to fix are impossible to enumerate.
But even if you had a simple loging and kept people's email and password, you can very easily have insecure dbs, insecure protections against simple things like mysqliinjections etc.
You would not want to be the face of "vibe coder gives away data of 10k users"
Ideally, I want this to grow into a proper startup. I’m starting solo for now, but as things progress, I’d like to bring in more people. I’m not a tech, product or design person, but AI gives me hope that I can at least get an MVP out and onboard a few early users.
For auth, I’ll be using Supabase, and for the MVP stage I think Lovable should be good enough to build and test with maybe a few hundred users. If there’s traction and things start working, that’s when I’d plan to harden the stack and get proper security and code reviews in place.
One of the issues AI coding has, is that its in some ways very inhuman. The bugs that are introduced are very hard to pick up because humans wouldnt write it that way, hence they wouldnt make those mistakes.
If you then introduce other devs you have 2 paths, they either build on top of vibe coding, which is going to leave you vulnerable to those bugs and honestly make their life a misery as they are working on top of work that missed basic decisions that will help it grow. (Imagine a non architect built your house, the walls might be straight but he didnt know to level the floor, or to add the right concrete to support the weight of a second floor)
Or the other path is they rebuild your entire app correctly. With the only advantage of the MVP and the users showing some viability for the idea. But the time it will take to rewrite it means in a fast moving space like start ups someone can quickly overtake you.
Its a risky proposition that means you are not going to create a very adequate base for the people you might hire.
I would still recommend against it, thinking that AI is more like WebMD, it can help someone who is already a doctor but it will confuse, and potentially hurt those without enough training to know what to look for.
If I'd use Vibe coding I wouldn't use Lovable but Claude code. You can run it in your terminal.
And I would ask it to use NextAuth, NextJS and Prisma (or another ORM), and connect it with SQLite or an external MariaDB managed server (for easy development you can start with SQLLite, for deployment to vercel you need an external database).
People here shit on nextjs, but due to its extensive documentation & usage the LLM's are very good at building with it, and since it forces a certain structure it produces generally decently structured code that is workable for a developer.
Also vercel is very easy to deploy, just connect Github and you are done.
Make sure to properly use GIT and commit per feature, even better branch per feature. So you can easily revert back to old versions if Claude messed up.
Before starting, spend some time sparring with GPT5 thinking model to create a database scheme thats future proof before starting out. It might be a challenge here to find the right balance between over-engineering and simplicity.
One caveat: be careful to run migration on your production database with Claude. It can accidentally destroy it. So only run your claude code on test databases.
I’m not 100% set on Lovable yet. Right now I’m using Stitch AI to build out the wireframes. The main reason I was leaning toward Lovable is that it seems pretty good at UI design and layout.
How does Claude do on that front? Can it handle good UI structure or does it usually need some help from a design tool?
Also, is it possible to get mobile apps out of a Next.js setup?
My thought was to start with the web version, and later maybe wrap it using Cordova (or Capacitor) like I did years ago with Ionic to get Android/iOS versions. Just wondering if that’s still a sensible path today.
> Call it what you want, you can write it in 100 lines of Python. I encourage every programmer I talk to who is remotely curious about LLMs to try that. It is a lightbulb moment.
Definitely want to try this out. Any resources / etc. on getting started?
It uses Go, which is more verbose than Python would be, so he takes 300 lines to do it. Also, his edit_file tool could be a lot simpler (I just make my minimal agent "edit" files by overwriting the entire existing file).
I keep meaning to write a similar blog post with Python, as I think it makes it even clearer how simple the stripped-down essence of a coding agent can be. There is magic, but it all lives in the LLM, not the agent software.
I could, but I'm actually rather snobbish about my writing and don't believe in having LLMs write first drafts (for proofreading and editing, they're great).
(I am not snobbish about my code. If it works and is solid and maintainable I don't care if I wrote it or not. Some people seem to feel a sense of loss when an LLM writes code for them, because of The Craft or whatever. That's not me; I don't have my identity wrapped up in my code. Maybe I did when I was more junior, but I've been in this game long enough to just let it go.)
It’s also a very fun project, you can set up a small LLM with ollama or lm studio and get working quickly. Using MCP it’s very fast to getting that actually useful.
I’ve done this a few times (pre and post MCP) and learned a lot each time.
How does it call upon the correct skill from a vast library of skills at the right time? Is this where RAG via embeddings / vector search come in? My mental model is still weak in this area, I admit.
I think it has a compact table of contents of all the skills it can call preloaded. It's not RAG, it navigates based on references between files, like a coding agent.
This is correct. It just puts a list of skills into context as part of the base prompt. The list must be compact because the whole point of skills is to reduce context bloat by keeping all the details out of context until they are needed. So the list will just be something like: 1) skill name, 2) short (like one sentence) description of what the skill is for, 3) where to find the skill (file path, basically) when it wants to read it in.
I think, from my experience, what they mean is tool use is as good as your model capability to stick to a given answer template/grammar. For example if it does tool calling using a JSON format it needs to stick to that format, not hallucinate extra fields and use the existing fields properly. This has worked for a few years and LLMs are getting better and better but the more tools you have, the more parameters your functions to call can have etc the higher the risk of errors. You also have systems that constrain the whole inference itself, for example with the outlines package, by changing the way tokens are sampled (this way you can force a model to stick to a template/grammar, but that can also degrade results in some other ways)
I see, thanks for channeling the GP! Yeah, like you say, I just don't think getting the tool call template right is really a problem anymore, at least with the big-labs SotA models that most of us use for coding agents. Claude Sonnet, Gemini, GPT-5 and friends have been heavily heavily RL-ed into being really good at tool calls, and it's all built into the providers' apis now so you never even see the magic where the tool call is parsed out of the raw response. To be honest, when I first read about tools calls with LLMs I thought, "that'll never work reliably, it'll mess up the syntax sometimes." But in practice, it does work. (Or, to be more precise, if the LLM ever does mess up the grammar, you never know because it's able to seamlessly retry and correct without it ever being visible at the user-facing api layer.) Claude Code plugged into Sonnet (or even Haiku) might do hundreds of tool calls in an hour of work without missing a beat. One of the many surprises of the last few years.
Yes, it's a mess, and there will be a lot of churn, you're not wrong, but there are foundational concepts underneath it all that you can learn and then it's easy to fit insert-new-feature into your mental model. (Or you can just ignore the new features, and roll your own tools. Some people here do that with a lot of success.)
The foundational mental model to get the hang of is really just:
* An LLM
* ...called in a loop
* ...maintaining a history of stuff it's done in the session (the "context")
* ...with access to tool calls to do things. Like, read files, write files, call bash, etc.
Some people call this "the agentic loop." Call it what you want, you can write it in 100 lines of Python. I encourage every programmer I talk to who is remotely curious about LLMs to try that. It is a lightbulb moment.
Once you've written your own basic agent, if a new tool comes along, you can easily demystify it by thinking about how you'd implement it yourself. For example, Claude Skills are really just:
1) Skills are just a bunch of files with instructions for the LLM in them.
2) Search for the available "skills" on startup and put all the short descriptions into the context so the LLM knows about them.
3) Also tell the LLM how to "use" a skill. Claude just uses the `bash` tool for that.
4) When Claude wants to use a skill, it uses the "call bash" tool to read in the skill files, then does the thing described in them.
and that's more or less it, glossing over a lot of things that are important but not foundational like ensuring granular tool permissions, etc.