More

cosmie · 2025-03-07T16:49:17 1741366157

Could be that they relied on AI to generate the gnarly regex ¯\_(ツ)_/¯

ddalex · 2025-03-07T16:50:44 1741366244

Any sufficiently advanced regex is indistinguishable from AI.

wslh · 2025-03-07T16:55:47 1741366547

Even simple regex ones, since we don't know if they were created by a human or an AI.

J_Shelby_J · 2025-03-07T19:04:01 1741374241

With enough if statements, you could create a LLM.

readthenotes1 · 2025-03-07T18:12:30 1741371150

I'm trying to think of a funny way to cite Clark's Law, but failing...

Izkata · 2025-03-07T20:00:26 1741377626

Regexes occasionally get called "black magic", and there is is an inverse of Clarke's Third Law: Any sufficiently advanced magic is indistinguishable from technology.

xeonmc · 2025-03-08T15:58:57 1741449537

I think you meant “any sufficiently commoditized magic is indistinguishable from technology”, or “any sufficiently analyzed magic is indistinguishable from science.”

jajko · 2025-03-07T22:11:51 1741385511

I am still waiting for an LLM trained to focus on effin' regexes and their variants like sed, somebody please do a page with ads for this and you will have a nice little side income and warm fuzzy feeling on top of it.

Natural language -> fully working one, I don't mean some email validators but way more complex stuff. Although, I've recently had a case which was too much even for regexes in any form or spec, then sort of grammar-based parser needed to be done from scratch.

vlovich123 · 2025-03-08T05:49:31 1741412971

I haven’t had the need for super complicated regexes, but ChatGPT and Claude both worked fine to generate a regex to extract markdown code blocks.

cosmie · 2025-02-23T20:50:56 1740343856

Take a look at Tapo or Kasa devices (both TP-Link products).

I recently got a few to try out, and expressly chose them because they do motion and sound detection on device and also support microSD for local recording.

I've only had them a few weeks so can't speak to any slow-showing pain points, but so far both the video doorbell[1] and two inexpensive cameras I purchased[2][3] to test out have been awesome.

I set up an automation so they record continuously when I leave and when home to record on detection only (motion for all three and sound for the Kasa camera) to try to be economical on wearing out the SD cards. But for me personally knowing I'll likely have those go out on me and need replaced was an ok trade off for the convenience, and probably a wash financially because everything I wanted happens locally whereas I kept seeing them gated behind a subscription plan when looking at other options.

There's also an option in the app to enable them to speak stream locally to a NAS or NVR via RTSP if you want to do that with them. So I can eventually set that up for more reliability when the eventual SD burnout occurs, and scratch my tinkering itch with things like streaming it to Frigate and testing it out vs the native detection features, without any actual presasure to need to since it all just works as is.

The doorbell is what I was originally needing the continuous local recording and on-device object detection for. The two cameras were bonuses I threw in to grab a few of their inexpensive models to try out while I was at it. And so far for about $100 in total I've been impressed. Key word being so far – they're still recent enough I might be in my honeymoon phase with them and just don't know it yet.

[1] https://www.amazon.com/dp/B0CQQZZXH9

[2] https://www.amazon.com/dp/B08ZXJJTYJ

[3] https://www.amazon.com/dp/B0DBPBSYMQ

rahimnathwani · 2025-02-23T21:14:10 1740345250

Yeah the $30 2k camera you linked seems good, but I worry about local card storage, because if someone steals the camera you have no evidence who did it!

I wish the camera could stream to Frigate/whatever but stream empty delta frames when nothing is detected.

cosmie · 2024-12-13T18:56:26 1734116186

From your experience, what would be the best way to handle spreadsheets?

simonw · 2024-12-13T19:01:19 1734116479

I don't think tabular data of any sort is a particularly good fit for LLMs at the moment. What are you trying to do with it?

If you want to answer questions like "how many students does Everglade High School have?" and you have a spreadsheet of schools where one of the columns is "number of students" I guess you could feed that into an LLM, but it doesn't feel like a great tool for the job.

I'd instead use systems like ChatGPT Code Interpreter where the LLM gets to load up that data programatically and answer questions by running code against it. Text-to-SQL systems could work well for that too.

btown · 2024-12-13T19:34:50 1734118490

This is an active area of research: https://github.com/SpursGoZmy/Awesome-Tabular-LLMs is a good starting point!

cosmie · 2024-12-13T20:58:44 1734123524

For me personally, a lot of times it's for table augmentation purposes. Appending additional columns to a dataset, such as a cleaned/standardized version of another field, extracting a value from another field, or appending categorization attributes (sometimes pre-seeded and sometimes just giving it general direction).

Or sometimes I'll manually curate a field like that, and then ask it to generate an Excel function that can be used to produce as similar a result as possible for automated categorization in the future.

So in most cases I both want to provide it with tabular data, and also want tabular data back out. In general I've gotten decent results for these sorts of use cases, but when it falls down it's almost always addressable by tinkering with the formatting related instructions – sometimes by tweaking the input and sometimes by tweaking the instructions for the desired output.

nprateem · 2024-12-13T22:02:59 1734127379

Give it the data as separate columns. For each cell give it the row index and the data.

That way it's just working with lists but can easily key that eg all this data is in row 3, etc. Tell it to correlate data by the first value in the pair like that.

__mharrison__ · 2024-12-13T23:07:58 1734131278

LLMs are decent at Pandas.

I say "decent" because most of the available training data for Pandas does things in a naive way.

OTOH, they are horrible at Polars. (I figure this is mostly a lack of training data.)

disgruntledphd2 · 2024-12-14T08:25:08 1734164708

> I say "decent" because most of the available training data for Pandas does things in a naive way.

They're around the level of the median user, which is pretty bad as pandas is a big and complicated API with many different approaches available (as is base R, in case people think I'm just hating on pandas).

danielmarkbruce · 2024-12-13T19:15:51 1734117351

Many LLMs are ok with json and html tables. Not perfect, but not terrible.

simonw · 2024-12-13T20:09:20 1734120560

I've seen enough examples of an LLM misinterpreting a column or row - resulting in returning the incorrect answer to a question because it was off by one in one of the directions - that I'm nervous about trusting them for this.

JSON objects are different - there the key/value relationship is closer in the set of tokens which usually makes it more reliable.

danielmarkbruce · 2024-12-13T22:39:10 1734129550

yeah... so, you want to two step it. Parse the table into something structured, then answer the question. For a lot of LLM "problems", it's about the same as teaching a kid a multi-step problem in math - if you try to do it in one step, you are going to have a hard time .

layer8 · 2024-12-13T22:14:34 1734128074

Markdown isn’t suitable for most spreadsheets in the first place, IMO.

irskep · 2024-12-13T19:39:22 1734118762

The only reason I'm not immediately answering is because I need to check whether it's a trade secret. We do our own thing that I haven't seen anywhere else and works super well. Sorry for being mysterious, I'll try to get an OK to share.

Edit: yeah I can't talk about it, sorry

cosmie · 2024-08-15T18:22:25 1723746145

Yes.

The flap created for LASIK (and LASIK-like surgeries, such as SMILE or LASEK) heals, but doesn't have the structural integrity that occurs when the epithelium has to fully regrow like for PRK. So that flap becomes a semi-permanent weakness that can be dislocated down the road and cause problems.

In general PRK is still considered the safest laser surgery option, but trades off the long-term risk of the epitheium flap for a much longer initial post-op recovery time. With PRK you have to be careful that there's no hazing as the epithelium regrows, but once it's regrown it's as good as it ever was. So for folks with a high risk of future eye injuries, PRK tends to be preferred (or required, in some instances like the special forces).

cosmie · 2024-07-30T12:28:27 1722342507

> Websites will just add a CNAME entry that points to whatever service they were using before. Then it's a second party (subdomain) cookie.

A lot of tracking prevention mechanisms have started baking in CNAME uncloaking in the last few years precisely for that reason. Safari/WebKit[1], Brave[2], uBlock Origin (on Firefox only)[3], and NextDNS[4] just to name a few.

At this point the industry has moved onto straight up reverse proxying so it's all first party context. In milder instances it's in the form of server-side tagging[5] (which isn't a true reverse proxy, but can easily be used as one). But at least in those instances the website operators are the ones that typically own the server-side tagging process and have oversight/control/visibility into what they're putting in place.

But that has a high bar for implementation and relatively few companies have the resources or competence for that sort of thing. So it's much easier to persuade website operators to put a pure, dumb reverse proxy in place that gives them an endpoint under the first party domain to load resources from and send hits through[6]. Including being able to use HTTP set-cookie headers in the responses, while they're at it. Which is coincidentally the only long-lived cookie that still exists in Safari/WebKit, since things like "Keep me logged in" functionality would break if they started auto-purging those too.

If it's written in javascript, it's gone in 7 days even if it's first party. And if it's an HTTP header from a CNAME, it's also gone in 7 days. Only cookies set with an HTTP set-cookie header from a first party context are durable anymore. So that's exactly where advertisers are going into as an end-run in the game of cat and mouse - with surprisingly willing adoption from website operators, who are desperate to get their attribution back and don't quite understand the risk profile it exposes them to when they approve letting a third party operator masquarade so deeply as the website operator itself.

[1] https://webkit.org/blog/11338/cname-cloaking-and-bounce-trac...

[2] https://brave.com/privacy-updates/6-cname-trickery/

[3] https://github.com/gorhill/uBlock/wiki/uBlock-Origin-works-b...

[4] https://medium.com/nextdns/nextdns-added-cname-uncloaking-su...

[5] https://developers.google.com/tag-platform/tag-manager/serve...

[6] https://developers.google.com/tag-platform/tag-manager/first...

[7] https://webkit.org/tracking-prevention/ (towards the bottom of the page)

cosmie · 2024-06-21T05:39:19 1718948359

That's what lawyers and accountants are for. A transaction between a sufficiently complex web of inter-related legal entities is indistinguishable from arms-length transactions.

You do not buy a syringe for a 1000x markup. You stand up a group purchasing organization (GPO) as a subsidiary and spin off your entire procurement department to it. You have your negotiating team accept nominal trade discounts off catalog prices, and instead prioritize lump-sum off invoice rebates at various spending thresholds. So while you previously got a $50 syringe discounted to $25 from a supplier, now your group purchasing subsidiary is paying nominally less than ~$50 per syringe but recieving a lump sum rebate against accounts payable from a supplier equivalent to $25 per syringe based on your primary client's expected spending volume (which is pretty predictable considering your former procurement department turned subsidiary has been working with this supplier for ages).

The group purchasing subsidiary then adds a nominal markup of their supply catalog, say 20% or so. So that $50 syringe is sold to the hospital for $60. A 20% markup is considered fair and reasonable so your auditors give it their blessing. Suddenly the hosital is paying $35 more per syringe but the supplier is still getting the same $25 they always have. You also sign up another local hospital to join the GPO and negotiate even higher rebate thresholds. Both hospitals auditors and lawyers point to the industry wide practice of GPOs and their perceived benefits, making plausible enough defenses against both criminal and civil complaints. The executive team of the GPO that came over from their parent orgs and orchestrated this whole thing get generous (but fair market rate) employment contracts and benefits that just happen to absorb the maojrity of that $35 per syringe profit so very little ends up bubbling back up to the non-profit parent entities.

Do the same thing with selling your real estate to a real estate holding company, which leases it back to you at market rates (that just keep going up and up). Do the same with your nurses – spin up a nursing staffing company and contract all your nurse staffing through it. Same with physicians – spin off any directly employed physicians into a physician staffing company and conmtract with them for services.

As long as you follow the appropriate corporate formalities, you suddenly have a ton of knobs you can turn to engineer any particular operating margin you want your healthcare system to be perceived as having. This isn't limited to hospital systems, but with the prevalent level of inefficient middlemen entities that already exist in the US healthcare system and contribute to runaway costs, it's pretty damn easy to throw a few of your own into the mix in a manner that passes legal scrutinty around self-dealing. There are also plenty of liability-related reasons to justify such setups as well, so it's not purely a shift around profits from your tax-exempt non-profit entity.

And insurers don't actually care much at all about any of this. As they're required to pay out 80% of premiums as claims and only allowed 20% for administrative expenses and profits, the easiest way to increase profits is for claims to increase (and subsequently allowing the absolute value of that 20% piece to grow). If hospital charges go up across the board because of these sorts of shenanigans, that's as much a boon to the insurers paying out as it is to whatever lucky winners are siphoning off the profits from those related entity subsidiaries.

[1] https://www.aeaweb.org/research/regulating-health-insurers-a...

avn2109 · 2024-06-24T14:55:03 1719240903

Super pro comment, should be much higher in the thread. You should write about this in more detail somewhere! And link it here.

cosmie · 2024-04-16T17:58:08 1713290288

What modem did they replace it with?

cosmie · on Nov 4, 2023

They're specifically formatted ASCII text files[1][2].

A single file can have multiple batches. A single batch can have multiple transactions.

For example, a company may upload an ACH file to their bank for payroll that contains a single batch in it and in that single batch they have entries for each employee's payroll deposit. Or maybe it has two batches – one for payroll deposits and a second one for reimbursements (or bonuses) that get deposited separately. Maybe it even has batches for things like 401k transfers to custodian accounts.

Their bank may receive that batch file, validate that and combine those batches with all the other ones it has and send those as a combined file to the clearing house.[3]

That said, nothing stops you from transfer if a file with a single batch containing a single transaction. Such as when you need to initiate an immediate transfer instead of collating it with all the other transactions at the end of the day.

At the end of the day, the ACH network is just a bunch of text files being pushed to sftp servers and servers periodically polling for new text files in those sftp directories to pick up and process.

[1] https://achdevguide.nacha.org/ach-file-overview

[2] https://github.com/moov-io/ach/blob/master/docs/file-structu...

[3] My understanding of this aspect is vague at best, so may be misconstruing how they're aggregated or passed through to the clearing house.

cosmie · on Nov 4, 2023

And that spreadsheet likely has an obligatory off by one row reference tucked away starting halfway down the employee lookup table in a hidden sheet, nonchalantly adding it's own wtf factor for who is and isn't included in the laying off in a subtle enough fashion to go unnoticed and accepted as is.

cosmie · on April 22, 2023

That's the default option. Commingling is an opt-in process to use the manufacture barcode instead of a unique label on each box, and makes it easier for Amazon to potentially fulfill orders by being able to pick inventory from a wider range of warehouses and make shipping more efficient.