> I can talk about concepts like "atoms" or "bacteria" or "black holes" with anyone, and they'll know what they are - even if their knowledge of those subjects isn't in depth.
I'm not convinced this is an unalloyed good. Knowing that a disease is caused by "bacteria" instead of "demons" isn't really helpful if you don't have a deep understanding of exactly what bacteria is. See, for example, all of the people who want antibiotics whenever they're sick for any reason. We've just replaced one set of weird beliefs in the general populace with another and given it a veneer of science.
> Knowing that a disease is caused by "bacteria" instead of "demons" isn't really helpful if you don't have a deep understanding of exactly what bacteria is.
This is a poor example. Even an incomplete image of the germ theory of disease is a massive improvement over thinking illness is caused by demons. An extremely superficial understanding of bacteria as "microscopic organisms which can make you sick" gives good justification why people should do things like wash their hands, cover their mouth when coughing, and not lick the railing on a subway.
Knowing the difference between bacteria being living organisms and viruses being not-quite-alive does not qualify as a "deep understanding" though.
Further, the presence of people misunderstanding something that most of the population knows pretty well in no way makes teaching that subject to the population bad. Your assertion would require that believing demons cause sickness actually has benefits we've lost.
But more people know what bacteria are at a baseline level and what they do with diseases than before when all we had were demons/bad humors/etc.
There are functionally illiterate people too in modern day and the average reading level is still elementary school level, but that's vastly better than before when the average person couldn't read at all.
Suicide does not have stable reporting rates. It was very stigmatized in the past, and so investigators would notoriously report suicides as "unknown cause of death" if they could.
Violent crime, on the other hand, is much more correlated with things like poverty than with mental health.
I think it's quite obviously the case that there are no clear indicators about what "mental health" looked like 100 years ago and there. Any projections into the past will involve a lot of extrapolation and have all sorts of biases.
They very clearly explain why this matters in the "Why should I care?" section. Partially quoting them:
> Harry Potter is an innocent example, but this problem is far more costly when it comes to higher value use-cases. For example, we analyze insurance policies. They’re 70-120 pages long, very dense and expect the reader to create logical links between information spread across pages (say, a sentence each on pages 5 and 95). So, answering a question like “what is my fire damage coverage?” means you have to read: Page 2 (the premium), Page 3 (the deductible and limit), Page 78 (the fire damage exclusions), Page 94 (the legal definition of “fire damage”).
It's not at all obvious how you could write code to do that for you. Solving the "Harry Potter Problem" as stated seems like a natural prerequisite for doing this much more high stakes (and harder to benchmark) task, even if there are "better" ways of solving the Harry Potter problem.
> Solving the "Harry Potter Problem" as stated seems like a natural prerequisite for doing this much more high stakes (and harder to benchmark) task
Not really. The "Harry Potter Problem" as formulated is asking an LLM to solve a problem that they are architecturally unsuited for. They do poorly at counting and similar algorithms tasks no matter the size of the context provided. The correct approach to allowing an AI agent to solve a problem like this one would be (as OP indicates) to have it recognize that this is an algorithmic challenge that it needs to write code to solve, then have it write the code and execute it.
Asking specific questions about your insurance policy is a qualitatively different type of problem that algorithms are bad at, but it's the kind of problem that LLMs are already very good at in smaller context windows. Making progress on that type of problem requires only extending a model's capabilities to use the context, not simultaneously building out a framework for solving algorithmic problems.
So if anything it's the reverse: solving the insurance problem would be a prerequisite to solving the Harry Potter Problem.
LLMs can't count well. This is in large part a tokenization issue. Doesn't mean they couldn't answer all those kind of questions. Maybe the current state of the art can't. But you won't find out by asking it to count.
The WHO list of essential medicines is not just over-the-counter drugs. It includes things like the chemotherapy drug cisplatin. I happened to need that for testicular cancer ~10 years ago, and the treatment cost was $50k (as "payed" by insurance). That overall seems pretty reasonable to me for the treatment I received, but definitely not something I'd expect the median American to be able to pay out of pocket.
The median American would not have to pay out of pocket, as nearly every American has health insurance (since the ACA, it is actually illegal not to have insurance).
I think it's accurate to say that the median American is insured, with only 8% of the population uninsured [1]. Although, to put that percentage in perspective, that's 26 million people and likely thousands in excess mortality relative to the insured poplulation.
I believe you're referring to the ACA's "individual mandate", which imposed a federal tax penalty for being uninsured. I won't argue whether that makes it illegal or not, but I can say that the individual madate was eliminated by the Tax Cuts and Jobs Act in 2019 [1]. There's no longer a federal tax penality for being uninsured.
This is purely anecdotal, but of that 8% (26 million), I would posit that most of those people are uninsured by choice. e.g., probably mostly young, maybe part-time workers without chronic illnesses.
You're wording in this comment (and the twitter/comment video) gives off the same vibes as the google april 1st videos for things like gmail motion (https://www.youtube.com/playlist?list=PLAD8wFTLnQKeDsINWn8Wj...). I honestly thought this was full sarcasm at first.
I don't see how that would have helped in this case. This was not a resource at a known location that was supposed to be only available to logged in users. This was a resource that the admins didn't know about available at an unknown url that was exposed to the public internet due to a configuration error. Are you going to write a test case for every possible url in your server to make sure it's not being exposed?
Something that could work is including a random hash as a first hidden email inside of every client, and then regularly searching outbound traffic for that hash. But that would be rather expensive.
n=1, head of a security at a fintech. We perform automated scans of external facing sensitive routes and pages after deploys, checking for PII, PAN, and SPI indicators, kicked off by Github Actions. We also use a WAF with two person config change reviews (change management), which assists in preventing unexpected routes or parts of web properties being made public unexpectedly due to continuous integration and deployment practices (balancing dev velocity with security/compliance concerns).
Not within the resources of all orgs of course, but there is a lot of low hanging fruit through code alone that improves outcomes. Effective web security, data security, and data privacy are not trivial.
You don't need to check every one though. Or any. You create a known account with known content in it (similar to your hash idea) and monitor that.
Even if they never got around to automating it and were highly laissez-faire, manually checking that account with those testcases say once a month would have caught this within 30 days. That still sucks but it's at least an order of magnitude less suck than the situation they're in now.
If the screenshot in the article isn't edited, this was an HTTP service exposed to the internet on an unusual port (81). I'd propose the following test cases:
1) Are there any unexpected internet-facing services?
* Once per week (or per month, if there are thousands of internet-facing resources) use masscan or similar to quickly check for any open TCP ports on all internet-facing IPs/DNS names currently in use by the company.
* Check the list of open ports against a very short global allowlist of port numbers. In 2024, that list is probably just 80 and 443.
* Check each host/port combination against a per-host allowlist of more specific ports. e.g. the mail servers might allow 25, 465, 587, and 993.
* If a host/port combination doesn't match either allowlist, alert a human.
Edit: one could probably also implement this as a check when infrastructure is deployed, e.g. "if this container image/pod definition/whatever is internet-facing, check the list of forwarded ports against the allowlists". I've been out of the infrastructure world for too long to give a solid recommendation there, though.
2) Every time an internet-facing resource is created or updated (e.g. a NAT or load-balancer entry from public IP to private IP is changed, a Route 53 entry is added or altered, etc.), automatically run an automated vulnerability scan using a tool that supports customizing the checks. Make sure the list of checks is curated to pre-filter any noise ("you have a robots.txt file!"). Alert a human if any of the checks come up positive.
OpenVAS, etc. should easily flag "directory listing enabled", which is almost never something you'd find intentionally set up on a server unless your organization is a super old-school Unix/Linux software developer/vendor.
Any decent commercial tool (and probably OpenVAS as well) should also have easily flagged content that disclosed email addresses, in this case.
3) Pay for a Shodan account. Set up a recurring job to check every week/month/whatever for your organization name, any public netblocks, etc. Generate a report of anything that was found during the current check that wasn't found during the previous check, and have a human review it. This one would take some more work, because there would need to be a mechanism for the human(s) to add filtering rules to weed out the inevitable false positives.
There was rather a lot of NATO coordination in the US-led invasions of both Iraq and Afghanistan. None of the military missions in these countries were in response to the Article V mutual defense clause of the NATO treaty. It's very easy to see how these operations (and therefore the NATO alliance) would be seen as aggressive to these countries.
This is false. Standard sampling algorithms like beamsearch can "backtrack" and are widely used in generative language models.
It is true that the runtime of these algorithms is exponential in the length of the sequence, and so lots of heuristics are used to reduce this runtime in practice, and this limits the "backtracking" ability. But this limitation is purely for computational convenience's sake and not something inherent in the model.
One of the reasons I've intentionally decided not become independently wealthy is that I want to have to explain to other people why I'm doing things. Part of my work is "charity-ish", and by not being able to do things on my own, I'm forced me to improve my communication skills and involve other people in these charity activities. I think that ultimately improves the final outcome, even if the process is immensely more frustrating.
I am referring to technical work specifically. Where most of the time people don’t even know what they want until they see it.
Creating mock-ups and going back and forth costs time and money.
While most of the time what I do is good enough - I am getting tired of people trying to block work until unimportant details are “discussed”.
It would not be frustrating if customers would be willing to pay for mock-up work and then for actual work but what most want is working application right away not mockup but they also want to waste time blabbing about details that would be clear in mock-up or in first version of the app.
Could be so with charity or depending on the field I suppose. I think innovating or figuring out new tech is different as if something is easily understood and explainable, it's probably already done, and if you have a great idea that hasn't been done it's probably because it's really hard to explain or to sell to others on it.
I suspect the numbers would be worse if you looked at households instead of individuals due to declining marriage rates (but I'm not willing to put in the effort to find numbers).
I don't understand what you're saying. To me, if the rate stays flat but represents fewer married couple, then this same rate actually means more homes are owned by people of that age.
Meaning - if guy and girl are married and own a house together, that counts as 2 people towards home owner bucket.
If they are not married, they'd need to each own a house for the same rate to hold.
I'm not convinced this is an unalloyed good. Knowing that a disease is caused by "bacteria" instead of "demons" isn't really helpful if you don't have a deep understanding of exactly what bacteria is. See, for example, all of the people who want antibiotics whenever they're sick for any reason. We've just replaced one set of weird beliefs in the general populace with another and given it a veneer of science.