A regexp basically comes with a compiler. Who knows what sort of optimisations they've built in under the hood. It wouldn't be surprising if there was a special fast-path for efficiently searching for a substring; that'd be effective in practice.
But more importantly it is hugely context sensitive on how often the function is going to be called and what IO needs to happen around it to decide if speed matters at all.
Using a regex as a first attempt is entirely reasonable. Especially in an interview about Python. If we care about efficiently doing substring matching Python isn't the language of choice. If a programmer just wants to remember how regex work and get on with their day they'll do fine at handling string problems.
Questions like "how would you search for a substring?" are so incredibly dependent on what you're doing on a day-to-day basis, and what you're doing with the data once you've split it. Just because .split(...) is in all the tutorials doesn't mean the codebase you've worked on for the last 5 years actually uses that specific call with any regularity, and it may well be the case that your codebase does use regexs more often (maybe for query-portability purposes).
I write bare metal firmware, primarily in C, and I've had to make it a point to explain, in most every interview I do, that I've only ever used malloc(...) in tutorials. "In my world, malloc is a 4-letter word". So while I know what it does, and how it works, I actually have to google its usage, and I'm not as keyed into its pitfalls, because every system I've ever worked on could not afford the risks associated with dynamic memory allocation.
All of this to say, bad interviewers go looking for a specific answer, good interviewers go looking for good process. All of the jobs I've held are ones that accepted that I was rusty on this or that specific call, but could think about the system holistically.
It's simple, unless you're given a specific broader context (like we have an enterprise customer data pruning system that needs to handle a broad range of corner cases) then you must not resort to overengineering this early in an interview.
In recent months, Linus said it specifically about code for a personal side project of his. The quote was in the commit message. (I’m not the grandparent commenter, and I think grandparent commenter’s claims may be too broad or require context.)
There are a missing the context: The vibecoded application was written in python while the main code was written manually in C by Torvalds in this side project. He never ever said that AI produces better code than him in the language where he is proficientI.
> The python visualizer tool has been basically written by vibe-coding. I know more about analog filters -- and that's not saying much -- than I do about python. It started out as my typical "google and do the monkey-see-monkey-do" kind of programming, but then I cut out the middle-man -- me -- and just used Google Antigravity to do the audio sample visualizer.
> Personally I've always called this style "declarative schema management" since the input declares the desired state, and the tool figures out how to transition the database to that state.
Personally I've called it a mistake, since there's no way a tool can infer what happened based on that information.
For schema changes, it absolutely can, for every situation except table renames or column renames.
That might sound like a major caveat, but many companies either ban renames or have a special "out-of-band" process for them anyway, once a table is being used in production. This is necessary because renames have substantial deploy-order complexity, i.e. you cannot make the schema change at the same exact instant as the corresponding application change, and the vast majority of ORMs don't provide anything to make this sane.
In any case, many thousands of companies use declarative schema management. Some of the largest companies on earth use it. It is known to work, and when engineered properly, it definitely improves development velocity.
Uh, any database of sufficient size is going to do migrations “out of band” as they can take hours or days and you never have code requiring those changes ship at migration start.
Small things where you don’t have DBA or whatever, sure use tooling like you would for auto-changes in a local development.
Very large tech companies completely automate the schema change process (at least for all common operations) so that development teams can make schema changes at scale without direct DBA involvement. The more sophisticated companies handle this regardless of table size, sharding, operational events, etc. It makes a massive difference in execution speed for the entire company.
Renames aren't compatible with that automation flow though, which is what I meant by "out-of-band". They rely on careful orchestration alongside code change deploys, which gets especially nasty when you have thousands of application servers and thousands of database shards. In some DBMS, companies automate them using a careful dance of view-swapping, but that seems brittle performance-wise / operationally.
Right, but my point was that renames in particular typically can't go out well before the corresponding application change [1]. Thus, renames are "out of band" relative to the company's normal schema change process. (This is orthogonal to how schema changes are always "out of band" relative to code deploys; that wasn't what I was referring to.)
[1] In theory a custom ORM could have some kind of dynamic conditional logic for table or column renames, i.e. some way to configure it to retry a query with the "new" name if the query using the "old" name fails. But that has a huge perf impact, and I'm not aware of any common ORMs that do this. So generally if you want to rename a table or column that is already used in prod, there's no way to do it without causing user-facing errors or having system downtime during the period between the SQL rename DDL and the application code change redeploy.
Not to mention apps that may have differing versions deployed on client infrastructure with different test/release cycles... this is where something like grate is really useful imo.
> If you use streaming replication (ie. WAL shipping over the replication connection), a single replica getting really far behind can eventually cause the primary to block writes. Some time back I commented on the behaviour: https://news.ycombinator.com/item?id=45758543
I'd like to know more, since I don't understand how this could happen. When you say "block", what do you mean exactly?
I have to run part of this by guesswork, because it's based on what I could observe at the time. Never had the courage to dive in to the actual postgres source code, but my educated guess is that it's a side effect of the MVCC model.
Combination of: streaming replication; long-running reads on a replica; lots[þ] of writes to the primary. While the read in the replica is going it will generate a temporary table under the hood (because the read "holds the table open by point in time"). Something in this scenario leaked the state from replica to primary, because after several hours the primary would error out, and the logs showed that it failed to write because the old table was held in place in the replica and the two tables had deviated too far apart in time / versions.
It has seared to my memory because the thing just did not make any sense, and even figuring out WHY the writes had stopped at the primary took quite a bit of digging. I do remember that when the read at the replica was forcefully terminated, the primary was eventually released.
þ: The ballpark would have been tens of millions of rows.
What you are describing here does not match how postgres works. A read on the replica does not generate temporary tables, nor can anything on the replica create locks on the primary. The only two things a replica can do is hold back transcation log removal and vacuum cleanup horizon. I think you may have misdiagnosed your problem.
reply