Hacker News new | past | comments | ask | show | jobs | submit | Bnjoroge's comments login

Hmm interesting. Do you think deterministic simulation is a much more accessible approach than verification? Also curious what tooling you think is still missing/not up to par?


I assume by deterministic simulation you mean: use a harness wherein you can deterministically run specific executions of a distributed system (e.g., force specific event interleavings, failures etc?)?

If so, yes it's more accessible. The previous paper I linked (Sieve) used such a technique to reliably execute test plans so any bugs we find can be reproduced. But when it comes to correctness, I still think verification is just much more powerful.

W.r.t tooling, I think there's still quite the usability and knowledge gap to cross. The way I see it, we've had decades to figure out how to engineer complex software systems from simpler parts, with a lot written about software engineering and testing in general.

Full-system verification in contrast is still really young, and breaking up a system into an implementation, specification and proofs currently requires quite some creativity and even awareness of how the verifier works. The Anvil project I mentioned above had a team of systems, distributed computing, reliability and verification experts (including Verus authors) team up to do this.


Probably talking about something like https://antithesis.com


Didn't take the class, but always felt like his book was too theoretical. I enjoyed "real world cryptography" and supplemented it with Dan's book.


just played with it. pretty solid! questions for me, how do you validate the new changes will work without any issues after applying them? the biggest issue I have with most code generators is the feedback loop from suggested code -> testing it out -> failing, doing it again. would be great if this was more seamless. additionally, would be helpful to control the code generation process in real-time.


Thanks for trying it! So far, the the testing and validation stage has been left to the developer. While this could change in the future, my experience has been the models aren't quite good enough yet to make an auto-test/auto-fix loop like you've described the default behavior. You end up with a lot of tire-spinning where you're burning a lot of tokens to fix issues that a human can resolve trivially.

I think it's better for now to use LLMs to generate the bulk of a task, then have the developer clean up and integrate rather than trying to get the LLM to do 100%.

That said, you can accomplish a workflow like this with Plandex already by piping output into context. It would look something like:

  plandex new
  plandex load relevant_context.ts some_more_context.ts
  plandex tell 'some kind of complex task'
  # ...Plandex does its thing, but doesn't get it 100% right
  npm test | plandex load
  plandex tell 'please fix the problems causing the failed tests'
As the models improve, I'm definitely interested in baking this in to make it more automated.


Out of interest, what kind of cost ranges are you seeing users spend on the OpenAI API using Plandex? (if only anecdotally)


There's a pretty huge range. It's a function of how much you load into context and how long the task is. So if you dump an entire directory with 100k tokens into context and then proceed to do a task that requires 20 steps, that will cost you a lot. Maybe >$10. But a small task where you're just making a few changes and only have a few small files in context (say 5k tokens) won't cost much at all, maybe like $0.10.

I haven't done too much digging into exactly how much people who are using Plandex Cloud are spending, but I'd say the range is quite wide even among people who are using the tool frequently. Some are doing small tasks here and there and not spending much--maybe they're on track for $5-10 per month, while I'd guess some other heavy users are on track to spend hundreds per month.


GitHubNext Dev here: We have a terminal feature that allows you to connect to sandbox in the cloud for you to validate these changes before pushing them out to your repo


do you guys support other languages other than python or js/ts?


The core E2B SDK [0] can run almost any language through custom sandboxes [1]. We have users running for example R.

The code interpreter SDK [2] that's using our core SDK supports only Python for now but we plan to support more languages. What languages would you like us to support?

[0] https://e2b.dev/docs/sandbox/api/process

[1] https://e2b.dev/docs/sandbox/templates/overview

[2] https://github.com/e2b-dev/code-interpreter


you should look at lance(https://lancedb.github.io/lance/)


folks at https://unikraft.io just released their (closed) beta. seems like there's still interest.


meh, lmk when they actually ship something that's not bs


definitely shifting to more engineering-focused type of roles so at the very I'd work on swe/infra skills


zed's great but they need to prioritise remote ssh like vscode. literally the main way most people do dev


Gonna need a citation on that. I _highly_ doubt most developers write code across a secure shell.


Not "most" but certainly many. I love remote dev (via Remote Containers in VSCode) for large C++ codebases (which could not even compile on my Macbook because of different architecture). Running `clangd` LSP on my local machine would be a nightmare (it could easily eat dozens of CPU cores and tens of gb of RAM when indexing), but on remote workstation it is a breeze. Also I could work on multiple branches simultaneously, spinning multiple VMs and running VSCode remotely on them. So basically my Mac is just a typewriter and everything heavy happens on a remote VM — isn't it nice?


I'm not sure that claim is correct, but could you describe what that setup looks like, and the benefits? A link to something more is also fine


I have never tried the remote ssh setup (even though vscodium is my primary editor). I don't know anyone who uses it daily, and I've never heard anyone irl mention doing it. I think you're slightly biased here by your experience, and assume everyone shares it.

Edit: hell, I've wanted to check remote ssh in vscodium now to know what I'm missing, and it looks like they basically contain MS DRM and are only compatible with official vscode. License is also proprietary[1] and makes it impossible to use it with vscodium or other OSS projects. Yikes.

[1]: https://github.com/VSCodium/vscodium/wiki/Extensions-Compati...


> literally the main way most people do dev

not correct at all


Almost no one uses remote SSH to do development.


It's how I do it since my workstation has way more resources.


Huh? I haven't seen remote SSH used in the past few companies I've worked at. Usually it's a local docker setup.


big fan of openlayer since rdv!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: