More

martinald · 2026-03-04T11:35:17 1772624117

Anything jsonb in my experience is quickly CPU bound...

jjice · 2026-03-04T16:26:17 1772641577

Definitely. If you're doing regular queries with filters on jsonb columns, having the index directly on the JSON paths is really powerful. If I have a jsonb filter in the codebase at all, it probably needs an index, unless I know the result set is already very small.

martinald · 2026-03-04T17:26:11 1772645171

Yeah, the other problem is I've really struggled to have postgres use multiple threads/cores on one query. Often maxes out one CPU thread while dozens go unused. I constantly have to fight loads of defaults to get this to change and even then I never feel like I can get it working quite right (probably operator error to some extent).

This compares to clickhouse where it constantly uses the whole hardware. Obviously it's easier to do that on a columnar database but it seems that postgres is actively designed to _not_ saturate multiple cores, which may be a good assumption in the past but definitely isn't a good one now IMO.

d0100 · 2026-03-04T19:08:28 1772651308

I've shaved off 30s of queries by transforming json columns into a string after the first CTE is done with it

martinald · 2026-02-26T13:16:12 1772111772

But there's really good reason for this. On the app it can use NFC to read your passport data exactly. Until WebNFC supports reading passports, it is a much more efficient way.

It's not like they are getting some long term benefit of having the app on your phone. It's just because WebNFC can't read passports.

hsiudh · 2026-02-26T16:00:49 1772121649

That is not a `good reason`, that is `convenience` and it shouldn't be used to push to install an app from increasingly hostile nations/corporations.

> It's not like they are getting some long term benefit of having the app on your phone. It's just because WebNFC can't read passports.

The same way we complain that Facebook, Tiktok, etc gather too much data from app install, so can a government agency.

martinald · 2026-02-26T17:55:36 1772128536

You are literally sharing biometric passport information with the government for an ETA in this app. Information sharing is the whole point.

ozlikethewizard · 2026-02-26T18:05:58 1772129158

The information on my passport is of comparatively little value compared to the information on my devices. Most states could get my passport information with little more than a friendly request to my government, same for most, access to my phone however.

Why give up more information than is strictly necessary, so you can tap your passport on your phone? Not convincing imo.

martinald · 2026-02-28T11:01:03 1772276463

Because for many people with poor eyesight, poor English or computer literacy tapping a passport is far easier than typing the data in with no risk of transcription errors.

martinald · 2026-02-26T02:22:29 1772072549

But this is just the nature of LLMs (so far). Every "conversation" involves sending the entire conversation history back.

The article misses imo the main benefit of CLIs vs _current_ MCP implementations [1], the fact that they can be chained together with some sort of scripting by the agent.

Imagine you want to sum the total of say 150 order IDs (and the API behind the scenes only allows one ID per API calls).

With MCP the agent would have to do 150 tool calls and explode your context.

With CLIs the agent can write a for loop in whatever scripting language it needs, parse out the order value and sum, _in one tool call_. This would be maybe 500 tokens total, probably 1% of trying to do it with MCP.

[1] There is actually no reason that MCP couldn't be composed like this, the AI harnesses could provide a code execution environment with the MCPs exposed somehow. But noone does it ATM AFIAK. Sort of a MCP to "method" shim in a sandbox.

martinald · 2026-02-26T02:11:24 1772071884

MCP defines a consistent authentication protocol. This is the real issue with CLIs, each CLI can (and will) have a different way of handling authentication (env variables, config set, JSON, yml, etc).

But tbh there's no reason agents can't abstract this out. As long as a CLI has a --help or similar (which 99% do) with a description of how to login, then it can figure it out for you. This does take context and tool calls though so not hugely efficient.

cheriot · 2026-03-01T07:17:06 1772349426

That's fair. I just really don't like the way MCP gives the tool author control of my context. It's worth setting up some env vars and config files to avoid them.

martinald · 2026-02-24T22:08:08 1771970888

Please codesign your Windows installer exes :)

onecommit · 2026-02-24T22:12:19 1771971139

On it! Released windows out of beta yesterday. signed version sometime this week

martinald · 2026-02-24T22:23:24 1771971804

Thanks. Btw, doesn't work at all for me. I installed, tried to connect to my WSL2 instance on localhost via SSH, which worked. Selected a folder and got Claude Code is not installed (it is very much installed :)).

Then tried running the Linux version on WSL2 (not ideal because the wayland server on WSL2 is slow) - doesn't work. This 404s: https://github.com/generalaction/emdash/releases/download/v0...

Grabbed the version before and got "PTY unavailable: ... was compiled against a different Node.js version using NODE_MODULE_VERSION 127, this version requires NODE_MODULE_VERSION 123".

Hope you can fix the bugs. I love Conductor on my Mac, but I need something for my WSL2 machine. Ideally Windows which can SSH into WSL2 (for UI speed) or runs on Linux itself. This is very close to what I need if you fix the bugs :).

onecommit · 2026-02-24T23:43:45 1771976625

Thank you for flagging! We had a CI bug in v0.4.16 that caused the compilation error that we patched in the latest release (v0.4.17). I created a ticket for the provider detection on remote servers. On it!

martinald · 2026-02-23T01:08:39 1771808919

This seems unlikely while we have open weights models available that are ~as decent as the frontier ones.

Given the API prices for open weights models of similar size are 5-10x less than the frontier models the APIs are very profitable on a pure unit economics approach. I strongly suspect they make money off their monthly plans as well.

martinald · 2026-02-23T01:06:09 1771808769

It's a fair point, but I think people are thinking too much about 'cost' and 'subsidies' and just the fact that everyone is so compute stretched.

While it's sort of the same thing, I think it's much more a symptom of not enough compute vs some 'dump cheap tokens' on the market strategy.

One related thought I had was that given OpenAI is the only one _not_ doing this of the big3, it probably indicates they have a lot more spare compute.

It doesn't make sense to me that given the absolutely brutal competition any of these companies would block use of 3rd party apps unless they had to. They clearly have enough cash, so I don't think it's about money - I think it's that an indicator that Google and Anthropic are really struggling with keeping up with demand. Given Anthropics reliability issues last week this does not surprise me.

rustyhancock · 2026-02-23T01:29:55 1771810195

I agree with all this.

I would add though that many are also being caught up in antispam efforts.

I.e. that for every legimate OpenClaw user doing something trivial with their account misusing the sub. There is probably 10x using it to send spam emails and spam comments.

I suspect from googles perspective some of these people are just a rounding error.

That said I use API where I should and the sub in the first party apps. Perhaps I'm too much of a goody two shoes but AI already feels such an overwhelming value prop for me I don't care.

That said I think you're right in that money matters here but I think the subs as they intend people to use them is hugely profitable i.e. the people doing 10 chats per work day and a few in the evening but paying £20 per month.

easton · 2026-02-23T02:01:32 1771812092

> One related thought I had was that given OpenAI is the only one _not_ doing this of the big3, it probably indicates they have a lot more spare compute.

Or, pessimistically, it could indicate they’re burning cash hoping the subsidized access will eventually result in someone giving them a product idea they can build and resell at a profit.

If they let *claw (or third party coding agents, or whatever) run for six more months and in those months figure out how to sell a safe substitute and then cut off access, maybe it will have been worth it.

martinald · 2026-02-22T03:57:44 1771732664

I sort of agree with this, but what a lot of people are missing is it's unbelievably easy to clone a lot of SaaS products.

So I think big SaaS products are under attack from three angles now:

1) People replacing certain systems with 'vibe coded' ones, for either cost/feature/unhappiness with vendor reasons. I actually think this is a bigger threat than people think - there are so many BAD SaaS products out there which cost businesses a fortune in poor features/bugs/performance/uptime, and if the models/agents keep improving the way they have in the last couple of years it's going to be very interesting if some sort of '1000x' engineer in an agent can do crazy impressive stuff.

2) Agents 'replacing' the software. As people have pointed out, just have the agent use APIs to do whatever workflow you want - ping a database and output a report.

3) "Cheap" clones of existing products. A tiny team can now clone a "big" SaaS product very quickly. These guys can provide support/infra/migration assistance and make money at a much lower price point. Even if there is lock in, it makes it harder for SaaS companies to keep price pressure up.

harrall · 2026-02-22T04:18:42 1771733922

But have you ever tried to clone a product or tool for yourself before? At first it’s great because you think that you saved money but then you start having to maintain it… fixing problems, filling in gaps… you now realize that you made a mistake. Just because AI can do it now doesn’t mean you aren’t just now having to use AI to do the same thing…

Also, agents are not deterministic. If you use it to analyze data, it will get it right most of the time but, once in a blue moon, it will make shit up, except you can’t tell which time it was. You could make it deterministic by having AI write a tool instead… except you now have the first problem of maintaining a tool.

That isn’t to say that there isn’t small low hanging fruit that AI will replace, but it’s a bit different when you need a real product with support.

At the end of the day, you hire a plumber or use a SaaS not because you can’t do it yourself, but because you don’t want to do it and rather want someone else who is committed to it to handle it.

martinald · 2026-02-22T04:25:32 1771734332

I'm not saying _the end user_ clones it. I mean someone else does (more efficiently with agents) and runs it as a _new_ SaaS company. They would provide support just like the existing one would, but arguably at a cheaper price point.

And regarding agents being non deterministic, if they write a bunch of SQL queries to a file for you, they are deterministic. They can just write "disposable" tools and scripts - not always doing it thru their context.

svnt · 2026-02-22T04:52:28 1771735948

The challenge to this is that so much of the difficulty in getting people to switch products is trust, and a couple of people running saas with claude code has no differentiation and no durability.

I think it will be a little different: black box the thing, testable inputs and outputs, and then go to town for a week or two until it is reasonable. Then open source it. Too big/complex for an agent? Break down the black box into reasonable ideas that could comprise it and try again. You can replace many legacy products and just open source the thing. If the customer can leave behind some predatory-priced garbage for a solution where they get the code I think they would be a lot more likely to pay for help managing/setting it up.

aobdev · 2026-02-22T04:47:13 1771735633

But isn’t this what the article is saying? Even with AI you’re still not going to build your own payroll/ERP/CRM.

sebastos · 2026-02-22T04:02:55 1771732975

Insightful points!

It would be interesting if, with all the anxiety about vibe coding becoming the new normal, its only lasting effect is the emergence of smaller B2B companies that quickly razzle dazzle together a bespoke replacement for Concur, SAP, Workday, the crappy company sharepoint - whatever. Reminds me of what people say Palantir is doing, but now supercharged by the AI-driven workflows to stand up the “forward deployed” “solution” even faster.

martinald · 2026-02-22T04:23:12 1771734192

Thanks,yes exactly what I think.

Or an industry specific Workday, with all of workdays features but aimed at a niche vertical.

I wrote about this (including an approach on how to clone apps with HAR files and agents) if you are interested. https://martinalderson.com/posts/attack-of-the-clones/

martinald · 2026-02-21T20:36:26 1771706186

Just crazy. Why does a staging environment matter? They should be running some integration tests against eg an in memory database for these kinds of tasks surely?

martinald · 2026-02-21T02:57:44 1771642664

Well a few things.

Firstly, it's very useful to have your (or at least some) previous messages in. There's often a lot of nuance it can pick up. This is probably the main benefit - there's often tiny tidbits in your prompts that don't get written to plans.

Secondly, it can keep eg long running background bash commands "going" and know what they are. This is very useful when diagnosing problems with a lot of tedious log prepping/debugging (no real reason these couldn't be moved to a new session tho).

I think with better models they are much better at joining the dots after compactation. I'd agree with you a few months ago that compactation is nearly always useless but lately I've actually found it pretty good (I'm sure harness changes have helped as well).

Obviously if you have a total fresh task to do then start a new session. But I do find it helpful to use on a task that is just about finished but ran out of space, OR it's preferable to a new task if you've got some hellish bug to find and it requires a bunch of detective work.

CjHuber · 2026-02-21T03:06:19 1771643179

I mean I agree the last couple of messages in a rolling window are good to include, but that is not really most of what happens in compaction, right?

> there's often tiny tidbits in your prompts that don't get written to plans.

Then the prompt of what should be written down is not good enough, I don't see any way how those tidbits would survive any compaction attempts if the llm won't even write them down when prompted.

>Secondly, it can keep eg long running background bash commands "going" and know what they are. This is very useful when diagnosing problems with a lot of tedious log prepping/debugging (no real reason these couldn't be moved to a new session tho).

I cannot really say anything about that, because I never had the issue of having to debug background commands that exhaust the context window when started in a fresh one.

I agree they are better now, probably because they have been trained on continuing after compaction, but still I wonder if I'm the only one who does not like compaction at all. Its just so much easier for an LLM to hallucinate stuff when it does have some lossy information instead of no information at all

martinald · 2026-02-21T16:21:01 1771690861

AFIAK claude code includes _all_ messages you sent to the LLM in compactation (or it used to). So it should catch those bits of nuance. There is so much nuance in language that it picks up on that is lost when writing it to a plan.

Anyway, that's just my experience.

CjHuber · 2026-02-22T05:50:37 1771739437

I think your point doesn't hold up really. Telling an LLM to summarize something losslessly will loose so much more nuance than updating the plan directly every time when some useful information is gained.

That file is not even a plan but effectively a compaction as well, just better as its done on the fly only processing the last message(s) rather than expecting an LLM to catch all nuances at once over a 100-200k+ conversation.