Hacker News new | past | comments | ask | show | jobs | submit | cfcf14's comments login

The obvious next step here is to see how well this generalises to arbitrary inputs :)

Did your read the paper? Do you have specific criticisms of their problem statement, methodology, or results? There is a growing body of research indicating that in fact, there _is_ a taxonomy of 'hallucinations', that they might have different causes and representations, and that there are technical mitigations which have varying levels of effectiveness.


AI detectors do not work. I have spoken with many people who think that the particular writing style of commercial LLMs (ChatGPT, Gemini, Claude) is the result of some intrinsic characteristic of LLMs - either the data or the architecture. The belief is that this particular tone of 'voice' (chirpy sycophant), textual structure (bullet lists and verbosity), and vocab ('delve', et al) serves and and will continue to serve as an easy identifier of generated content.

Unfortunately, this is not the case. You can detect only the most obvious cases of the output from these tools. The distinctive presentation of these tools is a very intentional design choice - partly by the construction of the RLHF process, partly through the incentives given to and selection of human feedback agents, and in the case of Claude, partly through direct steering through SA (sparse autoencoder activation manipulation). This is done for mostly obvious reasons: it's inoffensive, 'seems' to be truth-y and informative (qualities selected for in the RLHF process), and doesn't ask much of the user. The models are also steered to avoid having a clear 'point of view', agenda, point-to-make, and on on, characteristics which tend to identify a human writer. They are steered away from highly persuasive behaviour, although there is evidence that they are extremely effective at writing this way (https://www.anthropic.com/news/measuring-model-persuasivenes...). The same arguments apply to spelling and grammar errors, and so on. These are design choices for public facing, commercial products with no particular audience.

An AI detector may be able to identify that a text has some of these properties in cases where they are exceptionally obvious, but fails in the general case. Worse still, students will begin to naturally write like these tools because they are continually exposed to text produced by them!

You can easily get an LLM to produce text in a variety of styles, some which are dissimilar to normal human writing entirely, such as unique ones which are the amalgamation of many different and discordant styles. You can get the models to produce highly coherent text which is indistinguishable from that of any individual person with any particular agenda and tone of voice that you want. You can get the models to produce text with varying cadence, with incredible cleverness of diction and structure, with intermittent errors and backtracking and _anything else you can imagine. It's not super easy to get the commercial products to do this, but trivial to get an open source model to behave this way. So you can guarantee that there are a million open source solutions for students and working professionals that will pop up to produce 'undetectable' AI output. This battle is lost, and there is no closing pandora's box. My earlier point about students slowly adopting the style of the commercial LLMs really frightens me in particular, because it is a shallow, pointless way of writing which demands little to no interaction with the text, tends to be devoid of questions or rhetorical devices, and in my opinion, makes us worse at thinking.

We need to search for new solutions and new approaches for education.


> We need to search for new solutions and new approaches for education.

Thank you for that and for everything you wrote above it. I completely agree, and you put it much better than I could have.

I teach at a university in Japan. We started struggling with such issues in 2017, soon after Google Translate suddenly got better and nonnative writers became able to use it to produce okay writing in English or another second language. Discussions about how to respond continued among educators—with no consensus being reached—until the release of ChatGPT, which kicked the problem into overdrive. As you say, new approaches to education are absolutely necessary, but finding them and getting stakeholders to agree to them is proving to be very, very difficult.


I recently deployed an AI detector for a large K12 platform (multi-state 20k+ students), and they DO work in the sense of saving teachers time.

You have to understand, you are a smart professional individual who will try to avoid being detected, but 6-12th grade students can be incredibly lazy and procrastinate. You may take the time to add a tone, style and cadence to your prompt but many students do not. They can be so bad you find the "As an AI assistant..." line in their submitted work. We have about 11% of assignments are blatantly using AI, and after manual review of over 3,000 submitted assignments GPTZero is quite capable and had very few (<20) false positives.

Do you want teachers wasting time loading, reviewing and ultimately commenting on clear AI slop? No you do not, they have very little time as is and that time will be better spent helping other students.

Of course, you need a process to deal with false positives, the same way we had one for our plagiarism detector. We had to make decisions many years ago about what percentage of false positives is okay, and what the process looks like when it's wrong.

Put simply, the end goal isn't to catch everyone, it's to catch the worst offenders such that your staff don't get worn down, and your students get a better education.


Doesnt google docs have a feature that shows writing history.

You could ask the student to start wrkting on google docs, and whenever someone gets a false positive, they can prove they wrote it through that.

And Besides 99% of people who use AI to write, dont bother claiming it as a false positive, so giving students the right to contest that claim would not be that much if a problem long term.


Yeah, those are great points, and our students do use Google Docs today, and you are right most students do not even contest it.

We let them resubmit a new paper when they are caught, and they get some one on one time with a tutor to help move them forward. Typically they were stuck or rushing, which is why they dumped a whole AI slop assignment into our LMS.


So uh, things are not looking so good for actual physics these days, I gather?


Former high energy theorist here: things are not looking so good for high energy physics (both theoretical and experimental) which loosely speaking accounted for maybe 1/3-1/2 of Nobel Prizes in the 20th century. That’s part of the reason I got out. I’m inclined to say astrophysics and cosmology, another pillar of the fundamental understanding of the universe, isn’t doing that well either, probably in the okayish but not as exciting as it used to be territory. I’m not qualified to talk about other fields.


I think saying they're not looking good might be a bit of an exaggeration. Technological developments in both high energy physics and astrophysics stuff are in-between generations of technology right now, which is why things are a bit slower than usual.

With astrophysics, we're probably going to need the more sensitive gravitational wave detectors that are in development to become operational for new big breakthroughs. With high energy physics, many particle colliders and synchrotron light sources seem to be undergoing major upgrades these days. While particle colliders tend to get the spotlight in the public eye and are in a weird spot regarding the expected research outcomes, light sources are still doing pretty well afaik.

This Nobel I think is mainly because AI has overwhelmingly dominated the public's perception of scientific/technological progress this year.


> With high energy physics, many particle colliders and synchrotron light sources seem to be undergoing major upgrades these days.

AFAIK synchrotron light sources are tools for materials science and other applied fields, not high energy physics. Did I miss something?

I am also puzzled by the "many particle colliders". There is currently only one capable of operating at the high energy frontier. It's getting a luminosity upgrade [1] which will increase the number of events, but those will still be the 14 TeV proton-proton collisions it's been producing for years. There is some hope that collecting more statistics will reveal something currently hidden in the background noise, but I wouldn't bet on it.

[1] https://home.cern/science/accelerators/high-luminosity-lhc


>AFAIK synchrotron light sources are tools for materials science and other applied fields, not high energy physics. Did I miss something?

When you put it like that, yeah, I was kinda being stupid. During my stint doing research at a synchrotron light source I was constantly told to focus on thinking like a physicist (rather than as a computer engineer) and most of the work of everyone who wasn't a beamline scientist was primarily physics focused, which is what led me to think that way. But you're right in that it might not make much sense for me to say that makes them high energy physics research tools first.

>I am also puzzled by the "many particle colliders". There is currently only one capable of operating at the high energy frontier. It's getting a luminosity upgrade [1] which will increase the number of events, but those will still be the 14 TeV proton-proton collisions it's been producing for years. There is some hope that collecting more statistics will reveal something currently hidden in the background noise, but I wouldn't bet on it.

The RHIC is also in the process of being upgraded to the EIC. But overall, yes, that's why I said they were in a 'weird' spot. I too am not convinced that the upgrades will offer Nobel-tier breakthroughs.


What are you considering "high energy physics"? "1/3-1/2 of Nobel Prizes in the 20th century" is a significant overestimation unless you are including topics not traditionally included in high energy physics. For example, there were many Nobel prizes in nuclear physics, which shares various parallels with high energy physics in terms of historical origins, experimental techniques, and theoretical foundations. But nuclear physics is in a very exciting era of experimental and theoretical developments, so your "not looking so good" description does not apply.


Much of nuclear physics was effectively “high energy physics” (or more appropriately named elementary particle physics) back in the day. They ceased to be elementary or high energy at some point. My very loose categorization is everything on the microscopic path towards the fundamental theories; and there’s another macroscopic path, cosmology.

Edit: Expanded a few times.


Agreed on that. My disagreement is with the statement that everything that was once referred to as high energy physics is "not looking so good". Nuclear physics in particular does not feel stuck in the way I've heard some high energy physicists talk about their field.


As a layman, the visualization of black holes, the superstructure above and below the Milky Way, JWST’s distant galaxy discoveries, gravitational wave detectors as mentioned, and some of the Kuiper Belt observations all seem to be interesting and exciting.

Oh and the death of string theory!


Interesting thought. I hear some voices saying theoretical physics is stuck with string theory, but am not really qualified to make a judgement.


Nobel prize was awarded to theoretical work in 2021: https://www.nobelprize.org/prizes/physics/2021/popular-infor...

"theoretical physics" is such a big and ambiguous concept that physicists tend not to use the word in discussions. Thereotical work often involves a lot of numerical simulation on super computers these days which are kind of their own "experiments". And it is usually more productive to just mention the specific field, e.g. astronomy, condensed matter, AMO etc, and you can be sure there is always a lot of discoveries in each area.


Physics is not stuck in string theory as physics is not just high energy theoretical particle physics. There's also more going on in high energy theoretical particle physics than just "string theory".


Much of the experimental action in recent decades has been in low energy theoretical particle physics. Down near absolute zero, where quantum effects dominate and many of the stranger predictions of quantum mechanics can be observed directly. The Nobel Prizes in physics for 1996, 1997, 1998, 2001, and 2003 were all based on experimental work down near absolute zero.


Well I'm sure a $50 billion collider will fix things.


Please bro just one more collider. Just one more collider bro. I swear bro we're gonna fix physics forever. Just one more collider bro. We could go up or even underground. Please bro just one more collider.


L-theanine (200mg) with around 100-150mg of caffeine has an extremely noticeable, positive effect on my ability to focus, feeling of "well-situatedness", and overall calmness. L-theanine by itself doesn't seem to do much. Caffeine on its own wakes me but makes me feel jittery and anxious, so it's definitely an interaction effect. Taurine has a much smaller effect on calmness, sans interactions - often indistinguishable from any other mild focus exercise like box breathing or stretching.


Quite good support for the caffeine/theanine interaction in the literature -- I was mainly trying to see if it could improve sleep which is why I didn't take it with caffeine. Would be interesting to do some blinded cognitive test at some point though to get some estimate on how much congitive performance is increased. I need to think about ways to measure that, aside from flashcards. I actually have the website connected to a chess API so that might be a nice test.


Are you consuming these things pure or via tea/coffee?


It's not - the FTC released a statement on this very topic a few months ago: https://www.ftc.gov/business-guidance/blog/2024/03/price-fix...


We won't really know the answer until SCOTUS rules on it.


We already know the answer. The question is, will the supreme Court allow a monopoly price setter anyways?


I don’t see this even making it to SCOTUS unless there’s a circuit split somewhere.


Police in large American cities are not likely to be of much assistance in this situation. Assuming they attend at all, I would expect them to not understand the nature of the issue and probably proceed to make it much worse.


After reading the paper, I'm really unsure what the novel contribution is. It feels like they're attempting to rebrand well-understood concepts within various fields (control systems theory, etc). The provided mathematical definition of antifragility is somewhat unconvincing too: it's not that it's wrong, per say, but in the effort to find something sufficiently broad to apply to many different fields of applied dynamical theory they've had to adopt a definition which is a bit unintuitive, and overly general.


This is really funny - it's bordering on truly absurd, almost incomprehensible madness to consider doing this seriously. I can't think of a single property you'd desire in a control system (state observability, auditability, guarantees on out-of-band input behaviour, stability under shocks, etc, etc) that would be present in an LLM control model.

I don't want to be disrespectful to the authors, and it's (vaguely) interesting to see how far they've been able to go with this, but this idea is still an abomination.


Was fun while it lasted! Will be interesting watching the internal story of the original lab unfold as it all becomes public eventually.


Just because a Twitter said it's over doesn't mean it's over. Prediction markets still think there's a ~1/5 chance of it being legit.


Can you share which markets your following?


This is the only one I'm following myself:

https://manifold.markets/QuantumObserver/will-the-lk99-room-...


No it wasn't. Things like this (or at least the dubious media frenzy around this) erode public's trust in science.


Science is like that. The problem is not with science or with this material not being super-conducting. The problem is with how science is explained in media. Failing to test hypothesis is a critical step in the scientific method.


It's important that public understand the difference between a preprint in arXiv, a paper in a serious peer review journal and the truth.

We've seen a lot of miracle cures during the covid-19 epidemic where all the support was a bunch of preprints (and some of the preprints were horrible, I read a few of them to write angry comments in HN).

A paper in a serious peer review journal (if possible/relevant preregistered and with a randomized controlled group) is much better evidence, but it still can be wrong.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: