He won’t answer the question from a strategic perspective. He will answer only from a legal standpoint. That’s the only perspective he is an expert in.
This comment appears frequently and always surprises me. Do people just... not know regex? It seems so foreign to me.
It's not like it's some obscure thing, it's absolutely ubiquitous.
Relatively speaking it's not very complicated, it's widely documented, has vast learning resources, and has some of the best ROI of any DSL. It's funny to joke that it looks like line noise, but really, there is not a lot to learn to understand 90% of the expressions people actually write.
It takes far longer to tell an AI what you want than to write a regex yourself.
"It takes far longer to tell an AI what you want than to write a regex yourself."
My experience is the exact opposite. Writing anything but the simplest regex by hand still takes me significant time, and I've been using them for decades.
Getting an LLM to spit out a regex is so much less work. Especially since an LLM already knows the details of the different potential dialects of regex.
I use them to write regexes in PostgreSQL, Python, JavaScript, ripgrep... they've turned writing a regex from something I expect to involve a bunch of documentation diving to something I'll do on a whim.
Here's a recent example - my prompt included a copy of a PostgreSQL schema and these instructions:
Write me a SQL query to extract
all of my images and their alt
tags using regular expressions.
In HTML documents it should look
for either <img .* src="..." .*
alt="..." or <img alt="..." .*
src="..." (images may be self-
closing XHTML style in some
places). In Markdown they will
always be 
The markdown portion of that is a good example of the kind of regex I don't enjoy writing by hand, due to the need to remember exactly which characters to escape and how:
(REGEXP_MATCHES(commentary,
'!\[([^\]]*)\]\(([^)]*)\)', 'g'))[2] AS src,
(REGEXP_MATCHES(commentary,
'!\[([^\]]*)\]\(([^)]*)\)', 'g'))[1] AS alt_text
Perhaps Perl has given me Stockholm Syndrome, but when I look at your escaped regex example, it's extremely natural for me. In fact, I'd say it's a little too simple, because the LLM forgot to exclude unnecessary whitespace:
(REGEXP_MATCHES(commentary,
'!\[\s*([^\]]*?)\s*\]\(\s*([^)]*?)\s*\)', 'g'))[2] AS src,
(REGEXP_MATCHES(commentary,
'!\[\s*([^\]]*?)\s*\]\(\s*([^)]*?)\s*\)', 'g'))[1] AS alt_text
That is just nitpicking a one-off example though, I understand your wider point.
I appreciate the LLM is useful for problems outside one's usual scope of comfort. I'm mainly saying that I think it's a skill where the "time economics" really are in favor of learning it and expanding your scope. As in, it does not take a lot learning time before you're faster than the LLM for 90% of things, and those things occur frequently enough that your "learning time deficit" gets repaid quickly. Certainly not the case for all skills, but I truly believe regex is one of them due to its small scope and ubiquitous application. The LLM can be used for the remaining 10% of really complicated cases.
As you've been using regex for decades, there is already a large subset of problems where you're faster than the LLM. So that problem space exists, it's all about how to tune learning time to right-size it for the frequency the problems are encountered. Regex, I think, is simple enough & frequent enough where that works very well.
> As in, it does not take a lot learning time before you're faster than the LLM for 90% of things, and those things occur frequently enough that your "learning time deficit" gets repaid quickly.
It doesn't matter how fast I get at regex, I still won't be able to type any but the shortest (<5 characters) patterns out quicker than an LLM can. They are typing assistants that can make really good guesses about my vaguely worded intent.
As for learning deficit: I am learning so much more thanks to heavy use of LLMs!
Prior to LLMs the idea of using a 100 line PostgreSQL query with embedded regex to answer a mild curiosity about my use of alt text would have finished at the idea stage: that's not a high value enough problem for me to invest more than a couple of minutes, so I would not have done it at all.
Good points. Also looking at your original example I noticed that not only humans can explain regularities they expect in many different ways (also correcting along the way), they can basically ask LLM to base the result on a reference. In your example you provided a template with an img tag and brackets having different attributes patterns. But one can also just ask for a html-style tag. As I did with the "Please create a regex for extracting image file names when in a text a html-style tag img is met" (not posting it here, but "src" is clearly visible in the answer). So the "knowledge" from other domains is applied to the regex creation.
I know regex. But I use it so sparingly that every time I need it I forgot again the character for word boundary, or the character for whitespace, or the exact incantation for negative lookahead. Is it >!? who knows.
A shortcut to type in natural language and get something I can validate in seconds is really useful.
How do you validate it if you don’t know the syntax? Or are you saying that looking up syntax –> semantics is significantly quicker than semantics –> syntax? Which I don’t find to be the case. What takes time is grokking the semantics in context, which you have to do in both cases.
In my case most of my regex is for utility scripts for text processing. That means that I just run the script, and if it does what I want it to do I know I'm done.
LLMs have been amazing in my experience putting together awk scripts or shell scripts in general. I've also discovered many more tools and features I wouldn't have otherwise.
That doesn’t answer the question. By “validate”, I mean “prove to yourself that the regular expression is correct”. Much like with program code, you can’t do that by only testing it. You need to understand what the expression actually says.
Testing something is the best way to prove that it behaves correctly in all the cases you can think of. Relying on your own (fallible) understanding is dangerous.
Of course, there may be cases you didn't think of where it behaves incorrectly. But if that's true, you're just as likely to forget those cases when studying the expression to see "what it actually says". If you have tests, fixing a broken case (once you discover it) is easy to do without breaking the existing cases you care about.
So for me, getting an AI to write a regex, and writing some tests for it (possibly with AI help) is a reasonable way to work.
I don’t believe this is true. That’s why we do mathematical proofs, instead of only testing all the cases one can think of. It’s important to sanity-check one’s understanding with tests, but mere black-box testing is no substitute for the understanding.
in theory theory and practice are the same, in practice not really
in the context of regex, you have to know which dialect and programming language version of regex you’re targeting for example. its not really universal how all libs/languages works
I respectfully disagree. Thankfully, I don't need to write regex much, so when I do it's always like it's the first time. I don't find the syntax particularly intuitive and I always rely on web-based or third party tools to validate my regex.
Whenever I have worked on code smells (performance issues, fuzzy test fails etc), regex was 3rd only to poorly written SQL queries, and/or network latency.
All-in-all, not a good experience for me. Regex is the one task that I almost entirely rely on GitHub Copilot in the 3-4 times a year I have to.
I was using perl in the late 90s for sysadmin stuff, have written web scrapers in python and have a solid history with regex. That being said, AI can still write really complex lookback/lookahead/nested extraction code MUCH faster and with fewer bugs than me, because regex is easy to make small mistakes with even when proficient.
IME it's not just longer, but also more difficult to tell the LLM precisely what you want than to write it yourself if you need a somewhat novel RegExp, which won't be all over the training data.
I needed one to do something with Markdown which was a very internal BigCo thing to need to do, something I'd never have written without weird requirements in play. It wasn't that tricky, but going back trying to get LLMs to replicate it after the fact from the same description I was working from, they were hopeless. I need to dig that out again and try it on the latest models.
There's often a bunch of edge cases that people overlook. And you also get quadratic behaviour for some fairly 'simple' looking regexes that few people seem aware of.
I use regex as an alternative to wildcards in various apps like notepad++ and vscode. The format is different in each app. And the syntax is somewhat different. I have to research it each time. And complex regex is a nightmare.
Which is why I would ask an AI to build it if it could.
I personally didn’t really understand how to write regex until I understood “regular languages” properly, then it was obvious.
I’ve found that the vast majority of programmers today do not have any foundation in formal languages and/or the theory of computation (something that 10 years ago was pretty common to assume).
It used to be pretty safe to assume that everyone from perl hackers to computer science theorists understood regex pretty well, but I’ve found it’s increasingly a rare skill. While it used to be common for all programmers to understand these things, even people with a CS background view that as some annoying course they forgot as soon as the exam was over.
The first languge I used to solve real problems was perl, where regex is a first class citizen. In python less so, most of my python scripts don't use it. I love regex but know several developers who avoid it like plague. You don't know what you don't know, and there's nothing wrong with that. LLM's are super helpful for getting up to speed on stuff.
took me 25.75 seconds, including learning how the website worked. I actually solved it in ~15 seconds, but I hadn't realized I got the correct answer becuase it was far too simple.
"Text to SQL", "text to regex", "text to shell", etc. will never fundamentally work because the reason we have computer languages is to express specific requirements with no ambiguity.
With an AI prompt you'll have to do the same thing, just more verbosely.
You will have to do what every programmer hates, write a full formal specification in English.
You can get sent there by your affiliation. You don’t have to necessarily commit a crime. And all the administration has to do is make an accident and send you to the wrong place.
The vast majority are being sent to nearby countries as well.
The mastadon post should have linked to that. But he wanted to rant about its practical effects on the system, namely smtp servers. LE did not really mention how this impacts developers
The letsencrypt article is iterally the first link in the mastodon post, and indeed the mastodon post gives more context than letsencrypt's blog on the impact. As well as a link down-thread on where the rationale is in the Chrome root program.
Fuchsia as the core makes much more sense. It replaces Linux for a start and completely changes the security model to something a LOT more defendable among a bunch of other benefits.
I worked on Android at Google until 2023 and can 99.999% confirm for you Fuchsia, as the outside understands it is DOA. (i.e. as some sort of next gen OS, and if not that, some kernel that's on track to replace Linux in Android)
Long story short is you can imagine in 2019 there was X amount of engineers, 95% on Android and 5% on what you'd call Fuchsia.
The central argument up top became about why the renegade band that split off from Android/Chrome etc. to do Fuchsia in...early to mid 2010s?...and if it was going to provide a significant step forward. This became framed in terms of "$ of devices shipped", in which case, there was no contest.
Funnily enough this very article is about N dominos down from there (de-investment in Fuchsia, defenestration of head software guy of Android/Chrome/Chrome OS etc., ex. Moto hardware guy is in charge now)
Don't read this comment too closely, I was not in the room. For example, I have absolutely no actual quotes, or relayed quotes, to 100% confirm some set of individuals became focused on # of devices shipped.
Just obsessive enough to track wtf was going on, and on big enough projects, and trustworthy and hard working, and clearly without party or clique, such that I could get good info when asked, as it was clear my only concern was making things that were good and making sure all of Google's products could be part of that story.
Thanks for that insight. Obviously there’s a lot of context in there that isn’t at all clear outside.
One part I find hard to reconcile with all of that is that even just looking at public facing stuff alone it seems to be under VERY active development.
You're seeing the public facing side of it as a wholesale Android replacement, think of this repo as the tip of the iceberg: well, it used to be, but the rest of the iceberg disappeared.
It's still shipping, successfully, on millions(?) of devices yearly via Nest Hubs and such. Never say never. But all public-facing signs are fairly clear, the big move is ChromeOS into Android, not Android onto Fuchsia - and note how invested Google is in the Nest Hubs (read: not at all, languished for years now)
(I'm also curious if we could get a stats-based thing on, say, 2019 vs. 2024, My out-of-thin-air prior would be 30% more activity in 2019 than now. But I also figured there was 10 commits a day now, not 100.)
(I tried checking out the whole repo, but then all the apps on my macbook informed me en masse there was 0 disk space left :X Doesn't look like there's a GitHub mirror)
(cheers btw, you're my kind of people, one of the more soul-sucking parts of Google was finding out that kind of person is few, and far between) (i.e. curious and into It, not just here to make your boss or partner happy)
I guess that’s the thing, when I now look at the commits it’s way past what’s reasonable for nest. I see chromium is rolled back in for example, flutter is back in there, everything seems to be getting the bazel treatment, I see a bunch of non publicly announced releases this year… like things are clearly happening on some level in a way that’s not entirely consistent with the idea of a product that is in the midst of a death roll.
Do you think it’s possible that thinking evolved slightly beyond the number of devices shipped in the meantime perhaps?
I’d always been on the opinion of even just assuming they got to a point where the whole starnix linux syscall compatibility layer thing (which is what I see a lot of recent commits pointing towards) to a good point and stopped that it would still make sense in so many use cases that it would have been justified.
It would also change the number of devices shipped question from 1 dying product line to billions overnight which certainly might change the conversation and furthermore would put them into a pretty unique commercial position for more high assurance computing given the security model which would have a lot of follow on effects presumably
Also thanks again for taking the time to answer, you’re right I am genuinely very curious about this.
They did. The lumia had this feature. It also had a liquid cooling system. But the windows computer was quite limited. This was before they migrated to windows one core.
reply