Hacker Newsnew | past | comments | ask | show | jobs | submit | btrettel's commentslogin

I had a similar setup in 2023, but the computer was reformatted after I moved. I wrote a HN comment about the setup before: https://news.ycombinator.com/item?id=37792204

I liked it and intend to use a similar setup in the future. There were quite a few "rough edges", unfortunately. In retrospect, a tiling window manager would have been a better choice.

I found Midnight Command to be great for this, with its integrated file manager, file viewer (mcview), editor (mcedit), and diff (mcdiff).

I didn't realize how much I relied on a unified clipboard until I didn't have one any longer. mcedit's clipboard was a file (or one of them was?), so I had to adjust some workflows.

The biggest problem came from my need to view a lot of PDF files. I had a framebuffer PDF viewer that was pretty clunky. It did not work with tmux and PDF files could not be opened directly from Midnight Commander as I recall. This specifically is why I'm thinking about a tiling window manager as I won't have to pick a clunky PDF viewer and the remainder will just work.


I had a similar experience when I was a TA at UT Austin that I wrote about on HN years ago: https://news.ycombinator.com/item?id=23163472

Such a bizarre story and a "lecturer".

Can you say more now, like what subject was he/she was teaching.


My PhD was in mechanical engineering. I don't want to get more specific than that on the subject in case it identifies the lecturer. I guess I used the phrase "lecturer" to distinguish him from my PhD advisor.

Thanks for replying anyway. BTW by quoting "lecturer" I meant his/her behavior was in dissonance of his/her role

Have you all considered adding scientific articles to your bibliographic database? Finding existing translations of scientific articles can be a real pain. I know because I spent a lot of time doing that during my PhD [1].

For a while I was collaborating with Victor Venema in the volunteer organization Translate Science [2] to try to create a bibliographic database of scientific translations, but unfortunately Victor died, and I became too busy to continue.

[1] https://academia.stackexchange.com/a/93209/31143

[2] https://translate-science.codeberg.page/


Thanks for the link; Translate Science is exactly the kind of gap-filling project that makes sense once you see how fragmented the bibliographic layer is. Sorry to hear about Victor; I'd seen the repos but hadn't known.

Scientific translations are a different animal from what I've been working on, in ways that make them both easier and harder. Easier because scholarly communication already has a near-universal identifier (DOI) and, in principle, Crossref metadata. Harder because most translated articles never get their own DOI — they live as post-hoc PDFs on an author's site or inside an institutional repository (HAL, SciELO, J-STAGE, NII) with no machine-readable back-reference to the original, and the original's Crossref record almost never points at them. So the signal is worse than with books despite the underlying infrastructure being better.

The approach that might transfer: instead of trying to convince publishers or journals to register translations (they won't), scrape what's already sitting in institutional repositories and national scientific databases, then reconcile by author + title fingerprint + language. The multilingual matching pipeline I use for books is probably the right shape for the article problem too, though the authority side is messier there. ORCID helps; affiliations drift and make it harder.

Not something I'm committing to build, but I'd be curious to see what you and Victor had assembled if any of it is still reachable. Happy to compare notes offline if useful.


Thanks for the reply. You're right that the data for this is very fragmented. Victor was looking at Crossref metadata. I think he always had what he was doing on Codeberg, though I'm not sure. I was looking at arXiv and 1960s to 1980s printed translation indices listing translations on paper that are today in archives uncatalogued at the Library of Congress, British Library, and other libraries/archives. (The indices list which libraries have each translation and what it says is accurate for the Library of Congress in my experience.) OCR was not cooperating on turning my scans of the translation indices into something I could parse, despite the indices having a regular structure indicating that they were computer-generated. LLMs likely would help with that now, but all of this was pre-ChatGPT. My plan was to automatically convert the bibliographic data in the indices to DOIs, but as it turns out, a large fraction of the articles in the indices do not have DOIs. We ultimately did not consolidate these sources.

Anyhow, it's obviously a huge task and I don't expect you to build this. I was just curious if you had thought about it as you clearly have a lot of relevant infrastructure in place. If I ever get the time and interest to work on this again, I'll reach out to you.


Similar to bragging about LOC, I have noticed in my own field of computational fluid dynamics that some vibe coders brag about how large or rigorous their test suites are. The problem is that whenever I look more closely into the tests, the tests are not outstanding and less rigorous than my own manually created tests. There often are big gaps in vibe coded tests. I don't care if you have 1 million tests. 1 million easy tests or 1 million tests that don't cover the right parts of the code aren't worth much.


Yes, I've found tests are the one thing I need to write. I then also need to be sure to keep 'git diff'ing the tests, to make sure claude doesn't decide to 'fix' the tests when it's code doesn't work.

When I am rigourous about the tests, Claude has done an amazing job implementing some tricky algorithms from some difficult academic papers, saving me time overall, but it does require more babysitting than I would like.


Give claude a separate user, make the tests not writable for it. Generally you should limit claude to only have write access to the specific things it needs to edit, this will save you tokens because it will fail faster when it goes off the rails.


Don't even need a separate user if you're on linux (or wsl), just use the sandbox feature, you can specify allowed directories for read and/or write.

The sandbox is powered by bubblewrap (used by Flatpaks) so I trust it.


You might want to look into property based testing, eg python-hypothesis, if you use that language. It's great, and even finds minimal counter-examples.


The “red/green TDD” (ie. actual tdd) and mutation testing (which LLMs can help with) are good ways to keep those tests under control.

Not gonna help with the test code quality, but at least the tests are going to be relevant.


If you start with the failing tests, you can use them plus the spec to give to review to another agent (human or silicon).

It's a bit like pre-registering your study in medicine.


It's a struggle to get LLMs to generate tests that aren't entirely stupid.

Like grepping source code for a string. or assert(1==1, true)

You have to have a curated list of every kind of test not to write or you get hundreds of pointless-at-best tests.


What I've observed in computational fluid dynamics is that LLMs seem to grab common validation cases used often in the literature, regardless of the relevance to the problem at hand. "Lid-driven cavity" cases were used by the two vibe coded simulators I commented on at r/cfd, for instance. I never liked the lid-driven cavity problem because it rarely ever resembles an actual use case. A way better validation case would be an experiment on the same type of problem the user intends to solve. I think the lid-driven cavity problem is often picked in the literature because the geometry is easy to set up, not because it's relevant or particularly challenging. I don't know if this problem is due to vibe coders not actually having a particular use case in mind or LLMs overemphasizing what's common.

LLMs seem to also avoid checking the math of the simulator. In CFD, this is called verification. The comparisons are almost exclusively against experiments (validation), but it's possible for a model to be implemented incorrectly and for calibration of the model to hide that fact. It's common to check the order-of-accuracy of the numerical scheme to test whether it was implemented correctly, but I haven't seen any vibe coders do that. (LLMs definitely know about that procedure as I've asked multiple LLMs about it before. It's not an obscure procedure.)


Both of these points seem like they would be easy to instruct an LLM to shape its testing strategy.


I think so too. If unclear, I don't use LLMs for coding at the moment and was just commenting on what I've seen from others who do in computational fluid dynamics.

Edit: Let me add that while I think it would be easy to instruct a LLM to do what I'd like, LLMs don't do these things by default despite them being recognized as best practices, and I'm not confident in LLMs getting the data or references right for validation tests. My own experience is that LLMs are pretty bad when it comes to reproducing citations, and they tend to miss a lot of the literature.


> You have to have a curated list of every kind of test not to write

This should be distilled into a tool. Some kind of AST based code analyser/linter that fails if it sees stupid test structures.

Just having it in plain english in a HOW-TO-TEST.md file is hit and miss.


> have a curated list of every kind of test not to write

I've seen a lot of people interact with LLMs like this and I'm skeptical.

It's not how you'd "teach" a human (effectively). Teaching (humans) with positive examples is generally much more effective than with negative examples. You'd show them examples of good tests to write, discuss the properties you want, etc...

I try to interact with LLMs the same way. I certainly wouldn't say I've solved "how to interact with LLMs" but it seems to at least mostly work - though I haven't done any (pseudo-)scientific comparison testing or anything.

I'm curious if anyone else has opinions on what the best approach is here? Especially if backed up by actual data.


It's going to be difficult for anyone to have any more "data" than you already do. It's early days for all of us. It's not like there's anyone with 20 years of 2026 AI coding assistant experience.

However we can say based on the architecture of the LLMs and how they work that if you want them to not do something, you really don't want to mention the thing you don't want them to do at all. Eventually the negation gets smeared away and the thing you don't want them to do becomes something they consider. You want to stay as positive as possible and flood them with what you do want them to do, so they're too busy doing that to even consider what you didn't want them to do. You just plain don't want the thing you don't want in their vector space at all, not even with adjectives hanging on them.


I don't have much data to go on (in accordance with what 'jerf wrote), however I offer a high-level, abstract perspective.

The ideal set of outcomes exist as a tiny subspace of a high-dimensional space of possible solutions. Almost all those solutions are bad. Giving negative examples is removing some specific bits of the possibility space from consideration[0] - not very useful, since almost everything else that remains is bad too. Giving positive examples is narrowing down the search area to where the good solutions are likely to be - drastically more effective.

A more humane intuition[1], something I've observed as a parent and also through introspection. When I tell my kid to do something, and they don't understand WTF it is that I want, they'll do something weird and entirely undesirable. If I tell them, "don't do that - and also don't do [some other thing they haven't even thought of yet]", it's not going to improve the outcome; even repeated attempts at correction don't seem effective. In contrast, if I tell (or better, show) them what to do, they usually get the idea quickly, and whatever random experiments/play they invent, is more likely to still be helpful.

--

[0] - While paradoxically also highlighting them - it's the "don't think of a pink elephant" phenomenon.

[1] - Yes, I love anthropomorphizing LLMs, because it works.


It's not a person. Unlike a person it has a tremendous "memory" of everything ever done its creators could get access to.

If I tell it what to do, I bias it towards doing those things and limit its ability to think of things I didn't think of myself, which is what I want in testing. In separate passes, sure a pass where I prescribe types and specific tests is effective. But I also want it to think of things I didn't, a prompt like "write excellent tests that don't break these rules..." is how you get that.


Two things:

1. Tests have always been both about the function of the application, but also the communication of what should be occurring to the larger team or yourself six months down the road.

With automated software development the communication with the LLM itself is a much larger part of it so I feel like it's "ok" to have lots of easy tests that are less about rigor and more about "yes this is how this should work"

2. Ideally we're going to get to the point where the tooling allows for adversarial agents with one writing code and one writing tests. Even for now just popping open a separate terminal window and generating+running tests in it from your main coding terminal is helpful.


The trick is crafting the minimal number of tests.


it is like reward hacking, where the reward function in this case the test is exploited to achieve its goals. it wants to declare victory and be rewarded so the tests are not critical to the code under test. This is probably in the RL pre-training data, I am of course merely speculating.


I do CFD in my day job, though not for electronics cooling. I don't think this is as easy as you imagine. It's relatively easy to make pretty pictures, but just because the picture is pretty doesn't mean that it's physical accurate or mathematically correct. Lack of resolution could be an issue, but there are plenty of more subtle problems as well. Jet impingement is known to cause problems with turbulence models, though some models claim to solve the issue. Plus, turbulence modeling isn't always predictive, and might require a certain amount of calibration any time a model is used in a new scenario. Add on top of that the fact that the computational cost of these simulations often is extremely high, even with turbulence models. Maybe people building PCs have plenty of unused CPUs and GPUs, though.

Unfortunately, I don't think CFD and turbulence modeling are things that you can just start doing well without learning a lot before starting.


You are probably right, my only exposure to CFD was through listening in at conferences, haha. It seems neat, though. They always had the coolest pictures.

I wonder, could there be any play in the fact that PC cases tend to be a little bit less general than just, like, any 3D model? There are only so many cases. Plus most of the parts are rectangular, and most of the surfaces are aligned the same set of axes.

Cabling might be a problem.


Location: United States (Open to any US location)

Remote: Yes, open to remote, hybrid, or in office

Willing to relocate: Yes

Technologies: Fortran, Python (Matplotlib, Numpy, Pandas, Scipy), OpenMP, Git/GitHub, Linux, Bash, others...

Résumé/CV: Available on request

Email: 7b8ci3kl@trettel.us

GitHub: https://github.com/btrettel

Personal website: http://trettel.us/

I'm Ben Trettel, an experienced mechanical engineer with a PhD, specializing in computational fluid dynamics, design optimization, and verification & validation of computer simulations.

I am particularly interested in opportunities to build cutting-edge physical products where computational simulation and design optimization are key.


Cool! Are you interested in defense? If so, we're based in Austin, TX and in the current YC batch.


I agree, my immediate reaction was that mechanical engineer is not a trades worker.

I majored in mechanical engineering at college. We had a required programming class. A lot of people like myself already knew how to program before we took the class too. We also had a required electronics class. My experience is that most folks with CS degrees would be surprised by the breadth of what mechanical/aerospace/chemical/etc. engineers learn.


Location: United States (Open to any US location)

Remote: Yes, open to remote, hybrid, or in office

Willing to relocate: Yes

Technologies: Fortran, Python (Matplotlib, Numpy, Pandas, Scipy), OpenMP, Git/GitHub, Linux, Bash, others...

Résumé/CV: Available on request

Email: 7b8ci3kl@trettel.us

GitHub: https://github.com/btrettel

Personal website: http://trettel.us/

I'm Ben Trettel, an experienced mechanical engineer with a PhD, specializing in computational fluid dynamics, design optimization, and verification & validation of computer simulations. Also, I am knowledgeable about patent law from time spent at the USPTO as a patent examiner.

I am particularly interested in opportunities to build cutting-edge physical products where computational simulation and design optimization are key.


Nice work. I made a similar (but much less capable) Python script [1] for my own use before and I can say that a tool like this is useful to keep the docs in sync with the code.

My script only detects whether a checksum for a segment of code doesn't match, using directives placed in the code (not a separate file as you've done). For example:

    #tripwire$ begin 094359D3 Update docs section blah if necessary.
    [...]
    #tripwire$ end
Also, my script knows nothing about pull requests and is basically a linter. So it's definitely not as capable.

[1] https://github.com/btrettel/flt/blob/main/py/tripwire.py

***

Edit: I just checked my notes. I might have got the idea for my script from this earlier Hacker News comment: https://news.ycombinator.com/item?id=25423514


This is really cool, had no idea someone had solved a similar problem this way. The checksum idea is genius!!


Where's the report referred to here? I'm doing Google searches including `site:morganstanley.com` for a bunch of quotes in this article and I can't find any single report that contains all of what's mentioned. I couldn't find anything by browsing their website either. I'm wondering if a lot of this is AI hallucination.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: