rahabash's comments

rahabash · 2026-02-12T20:51:09 1770929469

plz do

logicprog · 2026-02-12T21:02:27 1770930147

The special (or at least new to me) things about Zed (when you use it with the built-in agent, instead of one of the ones available through ACP) basically boil down to the fact that it's a hyper advanced CRDT-based collaborative editor, that's meant for live pair programming in the same file, so it can just treat agents like another collaborator.

1. the diffs from the agent just show up in the regular file you were editing, you're not forced to use a special completion model, or view the changes in a special temporary staging mode or different window.

2. you can continue to edit the exact same source code without accepting or rejecting the changes, even in the same places, and nothing breaks — the diffs still look right, and doing an accept or reject Just Works afterwards.

3. you can accept or reject changes piecemeal, and the model doesn't get confused by this at all and have to go "oh wait, the file was/wasn't changed, let me re-read..." or whatever.

4. Even though you haven't accepted the changes, the model can continue to make new ones, since they're stored as branches in the CRDT, so you can have it iterate on its suggestions before you accept them, without forcing it to start completely over either (it sees the file as if its changes were accepted)

5. Moreover, the actual files on disk are in the state it suggests, meaning you can compile, fuzz, test, run, etc to see what it's proposed changes do before accepting them

6. you can click a follow button and see which files it has open, where it's looking in them, and watch as it edits the text, like you're following a dude in Dwarf Fortress. This means you can very quickly know what it's working on and when, correct it, or hop in to work on the same file it is.

7. It can actually go back and edit the same place multiple times as part of a thinking chain, or even as part of the same edit, which has some pretty cool implications for final code-quality, because of the fact that it can iterate on its suggestion before you accept it, as well as point (9) below

8. It streams its code diffs, instead of hanging and then producing them as a single gigantic tool call. Seeing it edit the text live, instead of having to wait for a final complete diff to come through that you either accept or reject, is a huge boon for iteration time compared to e.g. ClaudeCode, because you can stop and correct it mid way, and also read as it goes so you're more in lockstep with what's happening.

9. Crucially, because the text it's suggesting is actually in the buffer at all times, you can see LSP, tree-sitter, and linter feedback, all inline and live as it writes code; and as soon as it's done an edit, it can see those diagnostics too — so it can actually iterate on what it's doing with feedback before you accept anything, while it is in the process of doing a series of changes, instead of you having to accept the whole diff to see what the LSP says

rahabash · on April 25, 2023

For a devs sake - and before I go ask my new AI overlord - what is a good rule of thumb re: indexing? My naive assumption has always just been that if its a column you utilize to filter a lot of your queries, and youre unhappy with current perf, index it. Examples would be datetime columns (if sorting serverside) and columns like TenantId, UserEmail, etc.

tetha · on April 26, 2023

I'd say it depends a bit on your access pattern, the query frequency and the time you have for each query, and beyond that on your insert performance requirements. I'm kinda referring to PostgreSQL here since that's what we use at work. In there, I have like 2-3 rules of thumb by now:

A remarkable query pattern treats tables like hashmaps. You have a query, you expect 0 or 1 results, and you want those results as fast as possible, even more so than usual. This is usually accompanied with a low to very low write frequency in relationship to reads against this. Examples are - like you say - a login: Give me the account for this mail - generally, you'll have many more logins than registrations. Give me the current chat-room for a user - they enter a chatroom once every few hours and then look into it a lot. This usually warrants an index without much analysis. Often, foreign-key relationships also enter this pattern, but imo, that would require evidence through query plans to look further into it and normally, the primary key should handle these.

After that, it becomes a bit weird until you think about it, because you get into something like index sizes and selectiveness. The interesting part there is: Postgres on a decently quick core can iterate linearly through a few ten thousand to a low number of hundred thousand rows in a few micro- to milliseconds, per parallel worker thread. This is weighed against the cost of traversing a potentially large index - and an index cutting the dataset into too many very tiny pieces will be large. This might cause postgres to stop using an index even it if it matches conditions in the query, because just chugging through the dataset brute force in parallel is faster. And yes, I was apprehensive at first, but the optimizer tends to be right there.

So yeah, your approach sounds right there. Unless you really know this index is really good for the DB and you can make specific points about rows inspected per query, just don't bother. Once a query starts running slow - or classes of queries start running slow, throw the whole bus of EXPLAIN ANALYZE and something like a query plan visualizer[1] at it to identify the slow part. This will generally point to a slow sequential scan or a slow parallel sequential scan, or a crazy sized join. And then you can start thinking about creating some index for the conditions used in the query to chop the dataset into some c * 10k sized blocks for c being like 5-25ish usually.

The nature of that c is more about your performance requirements and the nature of these queries, as well as the write load. If this is a web app, with loads of reads and few inserts, c should probably be on the smaller side. Tenant-ID tends to be a good one, CreatedDate for some sequential data stream like an audit log or some tracking/reporting stuff, IDs transactions tend to reference this thing by. If you're looking at a write-heavy reporting setup, it might be better to be a bit coarser for quicker inserts, because who cares if a reporting query is chugging for 5 seconds? Though 5 seconds would be in that ugly uncanny valley - I can't just go and get coffee while the query runs in good faith, but it's too slow to be a good experience, lol.

And something to consider for the latter is also the general query patterns for the total application depending on the database schema. This, again, indicates that delaying index creating well into maturity of a codebase is a good idea. I've had quite a few applications and reporting-like solutions which ended up with like 6-8 different filter patterns throughout their queries. And sure, we could have created 6 different index structures to support all of those individually perfectly, but then we'd have ended up with a crazy amount of indexes which in turn wouldn't be great for insert-performance. Instead, since we waited for issues, we could identify 2 indexes that would support each individual query only to like 60% - 80% of the effect of the tailored index structure for this query, but these 2 indexes would support all queries across the board and - as we later saw - would support 90%+ of all queries created further down the line to a similar degree.

So, waiting until we had an understanding of the dataset and actual problems with actual data meant we could solve a lot of issue with a morning of analysis and a small touch of the system. And then performance issues were gone for a very long time.

1: https://explain.dalibo.com/

rahabash · on Sept 29, 2022

Ah, great explanation. Thanks for clearing this up!