Hacker News new | past | comments | ask | show | jobs | submit | seunosewa's comments login

Then you teach it. Even humans don't always produce the results we expect.

Have you tried that? It generally doesn't go so well.

In this example there are several commits where you can see they needed to fix the code because they couldn't get (teach) the LLM to generate the required code.

And there's no memory there, you open a new prompt and it's forgotten everything you said previously.


You can use prompts to fix some of these problematic tendencies.


Yes you can, but it almost never works


They were checking in buggy spaghetti code though.


You won't prompt them to do such actions right?


I won't, but do you bet nobody will?


Can you give an example so we can visualise this?


For instance, in an AIOps project we still perform a number of time series algorithms and then feed the results along with the original time series data to LLM. LLM will produce much more relevant and in-depth analysis than using the raw data along as input.


Make a good model first. Then you can start pushing for usage. Llama 4 isn't even the best open source non-reasoning model.


It is a good model, but I think my bar is to stop faking benchmarks first.


What Llama4 model is good in your opinion?


I've only used Scout, but I found it more than acceptable as a "GPT-4 or better" level model for coding questions. I've not stress tested it in any way because I didn't need to. So it's fine! It's just not special.


> Llama 4 isn't even the best open source non-reasoning model.

It isn't even open source. There is a 700 million monthly active users limit.

https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Inst...


> There is a 700 million monthly active users limit.

...so? Who cares? Why does it matter if big megacorporations cannot use it for free? I sure as hell don't care.

If you want to complain about the license a better target is the annoying advertising clause:

> i. If you distribute or make available the Llama Materials (or any derivative works thereof), or a product or service (including another AI model) that contains any of them, you shall (A) provide a copy of this Agreement with any such Llama Materials; and (B) prominently display “Built with Llama” on a related website, user interface, blogpost, about page, or product documentation. If you use the Llama Materials or any outputs or results of the Llama Materials to create, train, fine tune, or otherwise improve an AI model, which is distributed or made available, you shall also include “Llama” at the beginning of any such AI model name.


Is the code available?


I had not thought about sharing it. I rolled my own framework, even though there are several good choices. I'd have to tidy it up, but would consider it if a few people ask. Shoot me an email, info in my profile.

The more difficult part which I won't share was aggregating data from various systems with ETL scripts into a new db that I generate various views with, to look at the data by channel, timescale, price regime, cost trends, inventory trends, etc. A well structured JSON object is passed to the analyst agent who prepares a report for the decision agent. It's a lot of data to analyze. It's been running for about a month and sometimes I doubt the choices, so I go review the thought traces, and usually they are right after all. It's much better than all the heuristics I've used over the years.

I've started using agents for things all over my codebase, most are much simpler. Earlier use of LLM's might have been called that in some cases, before the phrase became so popular. As everyone is discovering, it's really powerful to abstract the models with a job hat and structured data.


The framework is now available on Github:

https://github.com/jacobsparts/agentlib

I'm planning to write a blog post about the larger system when I get the chance.


You can use a pool of interpreter processes. You don't have to spawn one for each request.


The ability to write a lot of code with OpenAI models is broken right now. Especially on the app. Gemini 2.5 Pro on Google AI Studio does that well. Claude 3.7 is also better at it.

I've had limited success by prompting the latest OpenAI models to disregard every previous instruction they had about limiting their output and keep writing until the code is completed. They quickly forget,so you have to keep repeating the instruction.

If you're a copilot user, try Claude.


They were working on the model before the acquisition. It makes sense to test it and see how it does instead of throwing the work away. Their data will probably be used to improve gpt-4.1, o4 mini high, and other OpenAI coding models


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: