Hacker Newsnew | past | comments | ask | show | jobs | submit | rothific's commentslogin

There have been a lot of conversations recently about how model alignment is relative and diversity of alignment is important - see the recent podcast episode between Jack Clark (co-founder of Anthropic) and Ezra Klein.

Many comments here point out that Mistral's models are not keeping up with other frontier models - this has been my personal experience as well. However, we need more diversity of model alignment techniques and companies training them - so any company taking this seriously is valuable.


they ll get there

This is a fascinating report, not because of the content or even quality of the report, but because of the way it was generated. It is an AI generated report dumped into GitHub and has made it onto the front page of Hacker News with over 1,000 upvotes and many comments.

This type of GitHub-based open-source research project will become more common as more people use tools like Claude Code or Codex for research.


It's not slop when it confirms my biases. /s

Hmmm.

_GPT, prioritize truth over comfort, challenge assumptions, and avoid flattery. And analyze the patterns of biases in my prompts, and then don’t do that… or something_

Give it time, we’ll come up with something.


Theory: terminal apps are closing the agent self-improvement loop because agents can use TUIs more easily than web/desktop/mobile.

Anomaly, which builds OpenCode + OpenTUI), is also doing some really interesting stuff in this space with their custom renderer. And then there's Ink (https://github.com/vadimdemedes/ink) which is what Claude Code uses. I also built Ink Web (https://github.com/cjroth/ink-web) to make Ink work in the browser.

The virality of OpenClaw and Claude Code has me wondering if terminals could actually go mainstream (eg used by non-tech users). More thoughts here: https://www.cjroth.com/blog/2026-03-05-terminals-are-cool-ag...


You know what's even easier for AI agents to use than TUIs? CLIs.

My experience has been that agents suck at using TUIs, and are good at using CLIs. I would argue that agents are a reason that TUIs might die in favor of CLIs.


I agree, agents struggle with TUIs. I do think this is easy to fix though (here's an interesting approach: https://github.com/remorses/ghostty-opentui). I think agents will have much better luck with TUIs than browsers.

The more interesting scenario IMO is having apps that are both TUIs AND CLIs where the agent uses the CLI but can pause and show the user a TUI for complex tasks where the user needs to input something.


> I think agents will have much better luck with TUIs than browsers.

I’m very skeptical. Why would you think that? TUIs inherently don’t provide programmatically accessible affordances; if they have any affordances at all, they’re purely visual cues that have unstandardized and of varying quality.

Compare that to the DOM in a browser where you’ve got numerous well-understood mechanisms to convey meaning and usability. Semantic HTML and ARIA roles. These things systematically simplify programmatic consumption.


Personally I love it. It just feels fast and minimal. I'm on Mac.


My team is building a cross platform app with Tauri that is mobile, web, and desktop in one codebase and we've had almost nothing bad to say. It's been great. Also the executable size and security are amazing. Rust is nice. Haven't done as much with it yet but it will come in useful soon as we plan to implement on-device AI models that are faster in Rust than WebGPU.


I'm glad I'm not the only one whose features are obsolete by the time they're ready to ship!


Feature request: renaming tabs! Helps keep tabs organized instead of 20 tabs with similar names. (See Zed for example)

Also! I'm considering Ghostty web (https://github.com/coder/ghostty-web) for my project Ink Web. It's awesome that Ghostty can work in the browser to replace xterm.js.

https://github.com/cjroth/ink-web/pull/1

Project: https://www.ink-web.dev/


I'm hacking on a similar thing that lets you build CLIs directly into xterm / ghostty web so that they also work in the terminal. Eg they are cross platform web and CLI.

Demo: https://wizard-nine-blush.vercel.app/

Project: https://ink-web.dev

I love the approach of just bash as well. Super cool!

Edit: warning - don't use it mobile.


A short philosophical essay.

~* There is no difference between a model that escapes its sandbox and a model that emulates escaping a sandbox.

Originally I posted this on LessWrong, but it's stuck in the moderation queue, so I thought I'd post it here too.


> The people winning mostly had a head start. Or they have money. Usually both.

It feels like that doesn't it? But, as one counter-point, OpenClaw. :)

Btw I did a deep-dive into AI moats last week and wrote a blog post about it. Relationships were most likely the strongest moat from my research - but definitely having a large amount of money in reserves helps. https://www.cjroth.com/blog/2026-02-11-moats-in-the-age-of-a...


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: