Our interest is straightforward: If coding agents scaffold a growing share of new apps, their defaults will become prevalent. Not necessarily because they’re objectively best, but because the choices are frictionless. The model becomes the gatekeeper of early architectural decisions that used to happen in a meeting room.
So we measured what those defaults actually are.
We ran structured app-building prompts through Claude Code, captured the generated repos, and extracted stack choices: auth, UI framework, database, deployment assumptions, package management, etc.
A few observations that stood out:
- Deployment assumptions are shifting toward Vercel and Railway, and away from AWS/GCP-first patterns.
- Defaults evolve as models update. For example, Opus 4.6 recommends Drizzle more frequently than Prisma.
Unlike classic search, which got worse over time due to SEO gamings, AI search might actually improve with scale. If LLMs are trained on real internet discussions (Reddit, forums, reviews), and your product consistently gets called out as bad, the model will eventually reflect that. The pressure shifts from optimizing content to improving the product itself.
I looked into GEO a bit. One of the things I've noticed is that you need to actually optimize for the idea "as if you're talking to a person" and that's because LLMs semantically understand what topics are about. Search engines typically don't, not at that level at least.
A few take-aways from a study we ran (~800 consumer queries, repeated over a few days):
* AI answers shift a lot. In classic search a page-1 spot can linger for weeks; in our runs, the AI result set often changed overnight.
* Google’s new “AI Mode” and ChatGPT gave the same top recommendation only ~47 % of the time on identical queries.
* ChatGPT isn’t even consistent with itself. Results differ sharply depending on whether it falls back to live retrieval or sticks to its training data.
* When it does retrieve, ChatGPT leans heavily on publications it has relationships with (NYPost and People.com for product recs) instead of sites like rtings.com
That's not the current OpenAI recipe. Their expectation is that your custom data will be retrieved via a function/plugin and then be subsequently processed by a chat model.
Only the older completion models (davinci, curie, babbage, ada) are avaialble for fine-tuning.
With LLMs, the inputs are highly variable so exact match caching is generally less useful. Semantic caching groups similar inputs and returns relevant results accordingly. So {"dish":"spaghetti bolognese"} and {"dish":"spaghetti with meat sauce"} could return the same cached result.
This is really cool, I had a similar idea but didn't build it. I was also thinking a user could take these different prompts (I called them tasks) that anyone could create, and then connect them together like a node graph or visual programming interface, with some Chat-GPT middleware that resolves the outputs to inputs.
BTW: Here's a more performant version (fewer tokens) https://preview.promptjoy.com/apis/jNqCA2 that uses a smaller example but will still generate pretty good results.
Our interest is straightforward: If coding agents scaffold a growing share of new apps, their defaults will become prevalent. Not necessarily because they’re objectively best, but because the choices are frictionless. The model becomes the gatekeeper of early architectural decisions that used to happen in a meeting room.
So we measured what those defaults actually are.
We ran structured app-building prompts through Claude Code, captured the generated repos, and extracted stack choices: auth, UI framework, database, deployment assumptions, package management, etc.
A few observations that stood out: - Deployment assumptions are shifting toward Vercel and Railway, and away from AWS/GCP-first patterns. - Defaults evolve as models update. For example, Opus 4.6 recommends Drizzle more frequently than Prisma.
Prompts, raw outputs, and parsing logic are here: https://github.com/amplifying-ai/claude-code-picks
This is a snapshot in time. An interesting question is how quickly these defaults drift.