Hacker News new | past | comments | ask | show | jobs | submit | sigmarule's comments login

It's not very different from documentation, except it's not used for learning but rather immediate application, i.e. it's something you must include with each prompt/interaction (and likely need a way to determine which are the relevant bits to reduce token count, depending on the size of the main prompt.) In fact, this is probably one of the more important aspects of adapting LLMs for real-world use within real team/development settings that people don't do. If you provide clear and comprehensive descriptions of codebase patterns, pitfalls, practices, etc the latest models are good at adhering to them. It sounds difficult and open-ended, as this requires content beyond the scope of typical (or even sensible) internal documentation, but much of this content is captured in internal docs, discussions, tickets, PR comments, and git commit histories, and guess what's pretty great at extracting high-level insights from these types of inputs?

Genuine question: how hard have you tried to find a good reason?

Obviously not very hard. And looking at the blatant and unproven claims on HN gave me the view that the proponents are not interested in giving proof; they simply want to silence anyone who disagrees LLMs are useful for programming.

My perspective is that if you are unable to find ways to improve your own workflows, productivity, output quality, or any other meaningful metric using the current SOTA LLM models, you should consider the possibility that it is a personal failure at least as much as you consider the possibility that it is a failure of the models.

A more tangible pitfall I see people falling into is testing LLM code generation using something like ChatGPT and not considering more involved usage of LLMs via interfaces more suited for software development. The best results I've managed to realize on our codebase have not been with ChatGPT or IDEs like Cursor, but a series of processes that iterate over our full codebase multiple times to extract various levels of resuable insights, like general development patterns, error handling patterns, RBAC-related patterns, extracting example tasks for common types of tasks based on git commit histories (i.e. adding a new API endpoint related to XYZ), common bugs or failure patterns (again by looking through git commit histories), which create a sort of library of higher-level context and reusable concepts. Feeding this into o1, and having a pre-defined "call graph" of prompts to validate the output, fix identified issues, consider past errors in similar types of commits and past executions, etc has produced some very good results for us so far. I've also found much more success with ad-hoc questions after writing a small static analyzer to trace imports, variable references->declarations, etc, to isolate the portions of the codebase to use for context rather than RAG-based searching that a lot of LLM-centric development tools seem to use. It's also worth mentioning that performance quality seems to be very much influenced by language; I thankfully primarily work with Python codebases, though I've had success using it against (smaller) Rust codebases as well.


Sometimes if it's as much work to setup and keep the tech running compared to writing it, it can be worth thinking about the tradeoffs.

A person with experience knowing how to push LLMs to output the perfect little function or utility to solve a problem, and collect enough of them to get somewhere is the interesting piece.


This sounds nice, but it also sounds like a ton of work to set up we don't have time for. Local models that don't require us to send our codebase to Microsoft or OpenAI would be something I'm sure we'd be willing to try out.

I'd love it if more companies were actually considering real engineering needs to provide products in this space. Until then, I have yet to see any compelling evidence that the current chatbot models can consistently produce anything useful for my particular work other than the occasional SQL query.


This could very well prove to be the case in software engineering, but also could very well not; what is the equivalent of "larger sets" in our domain, and is that something that is even preferable to begin with? Should we build larger codebases just because we _can_? I'd say likely not, while it does make sense to build larger/more elaborate movie sets because they could.

Also, a piece missing from this comparison is a set of people who don't believe the new tool will actually have a measurable impact on their domain. I assume few-to-none could argue that power tools would have no impact on their profession.


> Should we build larger codebases just because we _can_?

The history of software production as a profession (as against computer science) is essentially a series of incremental increases in the size and complexity of systems (and teams) that don't fall apart under their own weight. There isn't much evidence we have approached the limit here, so it's a pretty good bet for at least the medium term.

But focusing on system size is perhaps a red herring. There is an almost unfathomably vast pool of potential software systems (or customization of systems) that aren't realized today because they aren't cost effective...


Have you ever worked on a product in production use without a long backlog of features/improvements/tests/refactors/optimizations desired by users, managers, engineers, and everyone else involved in any way with the project?

The demand for software improvements is effectively inexhaustible. It’s not a zero sum game.


This is awesome. It's very similar to something I've been wanting to make just for my own personal use, but better. Is the source currently available, I'd love to mess around.


Thanks ! No, sorry, it's not yet open-source, but i think it will be at some point.


Don't apologize for using a theme for your landing page. And don't apologize to a competitor who is throwing shade and linking to their competing product.


In all honesty, don't. Seriously, this is bad advice. Nobody is going to visit your website and say "woah, this looks like Attio's website - I'm out!" with, evidently, the exception of a few folks from Attio. The website looks good, you and the other company aren't truly direct competitors so branding conflicts are not much of a concern, and if you both truly just derived your designs from a root common theme then I don't know why this is even being brought up here, unless the other commenters were unaware that their design was derived from a template. There are undoubtedly things significantly more worth spending time on as a very early startup then redesigning your website to appease a few people on HN, who were (assuming the common template bit is true) in the wrong for raising this issue to begin with and and should be updating their comment with this context.

EDIT - apologies, the OP here is not from Attio which was my assumption and would've made the OP's post unnecessary but an understandable reflex to seeing a doppelganger of their own website. If you check the OP's profile to see which company they're _actually_ from you will certainly realize that this entire comment chain should be fully ignored. It's pretty shitty, actually.


I think I was pretty open in my comment (cofounder).

They copied Attio's SVG image and everything. My issue was not the aesthetics but the fact they copied another organization's work. Surely, you don't think that's right?


Are you serious, man? You edited your comment, twice. We both know your comment did not include "(cofounder)", but did include unnecessary jabs at your your competitor ("this makes me lose all faith in our credibility.") when you posted it. I've used Budibase and think it's a great product, and couldn't have anything but respect for the people behind it. You're above this.

EDIT - Apologies, you did indeed include "(cofounder)" in your original post, I just missed it (based on bing's cached page.) Regardless, this is not the way to deal with competitors, and frankly your product speaks for itself. And perhaps obscene amounts of torrenting in my teenage years has permanently skewed my moral compass for these things, but I really don't care about table cell background svg theft.


I'm sorry to hear about your years of torment.


I think the answer is no but just to be sure: are you able to trigger step executions programmatically from within a step, even if you can't await their results?

Related, but separately: can you trigger a variable number of task executions from one step? If the answer to the previous question is yes then it would of course be trivial; if not, I'm wondering if you could i.e. have a task act as a generator and yield values, or just return a list, and have each individual item get passed off to its own execution of the next task(s) in the DAG.

For example some of the examples involve a load_docs step, but all loaded docs seem to be passed to the next step execution in the DAG together, unless I'm just misunderstanding something. How could we tweak such an example to have a separate task execution per document loaded? The benefits of durable execution and being able to resume an intensive workflow without repeating work is lessened if you can't naturally/easily control the size of the unit of work for task executions.


You can execute a new workflow programmatically, for example see [1]. So people have triggered, for example, 50 child workflows from a parent step. As you've identified the difficult part there is the "collect" or "gathering" step, we've had people hack around that by waiting for all the steps from a second workflow (and falling back to the list events method to get status), but this isn't an approach I'd recommend and it's not well documented. And there's no circuit breaker.

> I'm wondering if you could i.e. have a task act as a generator and yield values, or just return a list, and have each individual item get passed off to its own execution of the next task(s) in the DAG.

Yeah, we were having a conversation yesterday about this - there's probably a simple decorator we could add so that if a step returns an array, and a child step is dependent on that parent step, it fans out if a `fanout` key is set. If we can avoid unstructured trace diagrams in favor of a nice DAG-style workflow execution we'd prefer to support that.

The other thing we've started on is propagating a single "flow id" to each child workflow so we can provide the same visualization/tracing that we provide in each workflow execution. This is similar to AWS X-rays.

As I mentioned we're working on the durable workflow model, and we'll find a way to make child workflows durable in the same way activities (and child workflows) are durable on Temporal.

[1] https://docs.hatchet.run/sdks/typescript-sdk/api/admin-clien...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: