stemchar's comments

stemchar · 2026-05-27T09:50:38 1779875438

I probably would've done the same. "I don't remember what the function is called" would've been fine-ish, but reaching for a regex is just insane.

roenxi · 2026-05-27T11:11:44 1779880304

A regexp basically comes with a compiler. Who knows what sort of optimisations they've built in under the hood. It wouldn't be surprising if there was a special fast-path for efficiently searching for a substring; that'd be effective in practice.

But more importantly it is hugely context sensitive on how often the function is going to be called and what IO needs to happen around it to decide if speed matters at all.

Using a regex as a first attempt is entirely reasonable. Especially in an interview about Python. If we care about efficiently doing substring matching Python isn't the language of choice. If a programmer just wants to remember how regex work and get on with their day they'll do fine at handling string problems.

petsfed · 2026-05-27T15:50:56 1779897056

Questions like "how would you search for a substring?" are so incredibly dependent on what you're doing on a day-to-day basis, and what you're doing with the data once you've split it. Just because .split(...) is in all the tutorials doesn't mean the codebase you've worked on for the last 5 years actually uses that specific call with any regularity, and it may well be the case that your codebase does use regexs more often (maybe for query-portability purposes).

I write bare metal firmware, primarily in C, and I've had to make it a point to explain, in most every interview I do, that I've only ever used malloc(...) in tutorials. "In my world, malloc is a 4-letter word". So while I know what it does, and how it works, I actually have to google its usage, and I'm not as keyed into its pitfalls, because every system I've ever worked on could not afford the risks associated with dynamic memory allocation.

All of this to say, bad interviewers go looking for a specific answer, good interviewers go looking for good process. All of the jobs I've held are ones that accepted that I was rusty on this or that specific call, but could think about the system holistically.

StilesCrisis · 2026-05-27T13:56:21 1779890181

Why? Unless it's an extraordinarily hot code path, it doesn't matter. A regex once compiled will be quite efficient.

gspetr · 2026-05-27T16:29:53 1779899393

It's simple, unless you're given a specific broader context (like we have an enterprise customer data pruning system that needs to handle a broad range of corner cases) then you must not resort to overengineering this early in an interview.

suttontom · 2026-05-28T02:18:52 1779934732

This is a good example of being bad at writing code.

Chris2048 · 2026-05-27T15:10:12 1779894612

And an extra import. also, it sounds like they where looking specifically for knowledge of built-in operators.

JCTheDenthog · 2026-05-27T17:15:24 1779902124

You have a problem, so you try to solve it with a regex. You now have two problems.

cucumber3732842 · 2026-05-27T11:02:23 1779879743

It kind of depends on the substring and problem context.

Arbitrary substring in arbitrary text vs extracting embedded plant code from product serial numbers.

As long as you've got a good explanation for what you chose and why you chose it and the pro/con it's probably fine.

teiferer · 2026-05-27T11:41:56 1779882116

Sarcastic or for real? Because I find that an obvious choice, a little depending on context though.

stemchar · 2026-05-21T03:36:42 1779334602

This doesn't sound correct. Source?

no-name-here · 2026-05-21T05:06:43 1779340003

In recent months, Linus said it specifically about code for a personal side project of his. The quote was in the commit message. (I’m not the grandparent commenter, and I think grandparent commenter’s claims may be too broad or require context.)

yrds96 · 2026-05-21T07:24:42 1779348282

There are a missing the context: The vibecoded application was written in python while the main code was written manually in C by Torvalds in this side project. He never ever said that AI produces better code than him in the language where he is proficientI.

helloplanets · 2026-05-21T07:41:27 1779349287

https://github.com/torvalds/AudioNoise

> The python visualizer tool has been basically written by vibe-coding. I know more about analog filters -- and that's not saying much -- than I do about python. It started out as my typical "google and do the monkey-see-monkey-do" kind of programming, but then I cut out the middle-man -- me -- and just used Google Antigravity to do the audio sample visualizer.

stemchar · 2026-02-05T02:51:16 1770259876

> Personally I've always called this style "declarative schema management" since the input declares the desired state, and the tool figures out how to transition the database to that state.

Personally I've called it a mistake, since there's no way a tool can infer what happened based on that information.

evanelias · 2026-02-05T03:36:10 1770262570

For schema changes, it absolutely can, for every situation except table renames or column renames.

That might sound like a major caveat, but many companies either ban renames or have a special "out-of-band" process for them anyway, once a table is being used in production. This is necessary because renames have substantial deploy-order complexity, i.e. you cannot make the schema change at the same exact instant as the corresponding application change, and the vast majority of ORMs don't provide anything to make this sane.

In any case, many thousands of companies use declarative schema management. Some of the largest companies on earth use it. It is known to work, and when engineered properly, it definitely improves development velocity.

sroussey · 2026-02-05T04:42:38 1770266558

Uh, any database of sufficient size is going to do migrations “out of band” as they can take hours or days and you never have code requiring those changes ship at migration start.

Small things where you don’t have DBA or whatever, sure use tooling like you would for auto-changes in a local development.

evanelias · 2026-02-05T05:54:27 1770270867

Very large tech companies completely automate the schema change process (at least for all common operations) so that development teams can make schema changes at scale without direct DBA involvement. The more sophisticated companies handle this regardless of table size, sharding, operational events, etc. It makes a massive difference in execution speed for the entire company.

Renames aren't compatible with that automation flow though, which is what I meant by "out-of-band". They rely on careful orchestration alongside code change deploys, which gets especially nasty when you have thousands of application servers and thousands of database shards. In some DBMS, companies automate them using a careful dance of view-swapping, but that seems brittle performance-wise / operationally.

sroussey · 2026-02-09T19:26:35 1770665195

Sure, but they don’t go out with the code. They go well before.

evanelias · 2026-02-09T21:25:19 1770672319

Right, but my point was that renames in particular typically can't go out well before the corresponding application change [1]. Thus, renames are "out of band" relative to the company's normal schema change process. (This is orthogonal to how schema changes are always "out of band" relative to code deploys; that wasn't what I was referring to.)

[1] In theory a custom ORM could have some kind of dynamic conditional logic for table or column renames, i.e. some way to configure it to retry a query with the "new" name if the query using the "old" name fails. But that has a huge perf impact, and I'm not aware of any common ORMs that do this. So generally if you want to rename a table or column that is already used in prod, there's no way to do it without causing user-facing errors or having system downtime during the period between the SQL rename DDL and the application code change redeploy.

tracker1 · 2026-02-05T19:28:19 1770319699

Not to mention apps that may have differing versions deployed on client infrastructure with different test/release cycles... this is where something like grate is really useful imo.

stemchar · 2026-02-05T02:48:39 1770259719

I renamed a column and it added a new one.

andrewg · 2026-02-05T06:19:47 1770272387

From the docs: https://github.com/sqldef/sqldef?tab=readme-ov-file#renaming...

You tell it what’s being renamed with a special comment.

stemchar · 2026-01-23T10:07:55 1769162875

> If you use streaming replication (ie. WAL shipping over the replication connection), a single replica getting really far behind can eventually cause the primary to block writes. Some time back I commented on the behaviour: https://news.ycombinator.com/item?id=45758543

I'd like to know more, since I don't understand how this could happen. When you say "block", what do you mean exactly?

bostik · 2026-01-23T17:56:33 1769190993

I have to run part of this by guesswork, because it's based on what I could observe at the time. Never had the courage to dive in to the actual postgres source code, but my educated guess is that it's a side effect of the MVCC model.

Combination of: streaming replication; long-running reads on a replica; lots[þ] of writes to the primary. While the read in the replica is going it will generate a temporary table under the hood (because the read "holds the table open by point in time"). Something in this scenario leaked the state from replica to primary, because after several hours the primary would error out, and the logs showed that it failed to write because the old table was held in place in the replica and the two tables had deviated too far apart in time / versions.

It has seared to my memory because the thing just did not make any sense, and even figuring out WHY the writes had stopped at the primary took quite a bit of digging. I do remember that when the read at the replica was forcefully terminated, the primary was eventually released.

þ: The ballpark would have been tens of millions of rows.

ants_a · 2026-01-24T12:04:43 1769256283

What you are describing here does not match how postgres works. A read on the replica does not generate temporary tables, nor can anything on the replica create locks on the primary. The only two things a replica can do is hold back transcation log removal and vacuum cleanup horizon. I think you may have misdiagnosed your problem.