Hacker Newsnew | past | comments | ask | show | jobs | submit | itintheory's commentslogin

Appears to be where the actual link, http://partnerportal.anthropic.com/s/partner-registration, redirects. Site.com is some Salesforce related domain.

Huh, so you got http; I'm now getting linked to: https://partnerportal.anthropic.com/s/partner-registration

Which Firefox warns me has an untrusted cert.


Classic vibe coding, everyone involved in AI has blinders when it comes to their dogfood.

What do you think it is about machine learning that makes it hard to replicate? I'm an outsider to academic research, but it seems like computer based science would be uniquely easy - publish the code, publish the data, and let other people run it. Unless it's a matter of scale, or access to specific hardware.

A lot of things are easy if you ignore the incentive structure. E.g. a lot of papers will no longer be published if the data must be published. You’d lose all published research from ML labs. Many people like you would say “that’s perfectly okay; we don’t need them” but others prefer to be able to see papers like Language Models Are Few-Shot Learners https://arxiv.org/abs/2005.14165

So the answer is that we still want to see a lot of the papers we currently see because knowing the technique helps a lot. So it’s fine to lose replicability here for us. I’d rather have that paper than replicability through dataset openness.


But the lab must publish at least the general category of data, and if that doesn't replicate, then the model only works on a more specific category than they claim (e.g. only their dataset).

Even with the exact same dataset and architecture, ML results aren't perfectly replicable due to random weight initialisation, training data order, and non-deterministic GPU operations. I've trained identical networks on identical data and gotten different final weights and performance metrics.

This doesn't mean the model only works on that specific dataset - it means ML training is inherently stochastic. The question isn't 'can you get identical results' but 'can you get comparable performance on similar data distributions.


Then researchers should re-train their models a couple times, and if they can't get consistent results, figure out why. This doesn't even mean they must throw out the work: a paper "here's why our replications failed" followed by "here's how to eliminate the failure" or "here's why our study is wrong" is useful for future experiments and deserves publication.

As per my previous comment - we are discussing stochastic systems.

By definition, they involve variance that cannot be explained or eliminated through simple repetition. Demanding a 'deterministic' explanation for stochastic noise is a category error; it's like asking a meteorologist to explain why a specific raindrop fell an inch to the left during a storm replication.


Lack of will. That was one of the main results from the survey from Whitaker in 2020. Making your code reusable and easy to understand is significant work that had no direct benefits for a researcher's career. Particularly because research code grows wildly as researchers keep trying thungs.

Working on the next paper is seem as the better choice.

Moreover if your code is easy for others to run then you're likely to be hit with people wanting support, or even open yourself to the risk of someone finding errors in your code (the survey's result, not my own beliefs).

There are other issues, of course. Just running the code doesn't mean something is replicable. Science is replicated when studies are repeated independently by many teams.

There are many other failure modes SOTA-hacking, benchmarking, and lack of rigorous analysis of results, for example. And that's ignoring data leakage or other more silly mistakes (that still happen in published work! In work published in very good venues even)

Authors don't do much of anything to disabuse readers that they didn't simply get really look with their pseudorandom number generators during initialization, shuffling, etc. As long as it beats SOTA who cares if it is actually a meaningful improvement? Of course doing multiple runs with a decent bootstrap to get some estimation of the average behavior os often really expensive and really slow, and deadlines are always so tight. There is also the matter that the field converged on a experimentation methodology that isn't actually correct. Once you start reusing test sets your experiments stop being approximations of a random sampling process and you quickly find yourself outside of the grantees provided by statistical theory (this is a similar sort of mistake as the one scientists in other fields do when interpreting p-values). There be dragons out there and statistical demons might come to eat your heart or your network could converge to an implementation of nethack.

Scale also plays into that, of course, and use of private data as the other comment mentioned.

Ultimately Machine Learning research is just too competitive and moves too fast. There are tens of thousands (hundreds maybe?) of people all working on closely related problems, all rushing to publish their results before someone else published something that overlaps too much with their own work. Nobody is going to be as careful as they should, because they can't afford to. It's more profitable to carefully find the minimal publishable amount of work and do that, splitting a result into several small papers you can pump every few months. The first thing that tends to get sacrificed during that process is reliability.


> stenography

I think you mean steganography[0]. Stenography is shorthand, used for transcription.

[0] https://en.wikipedia.org/wiki/Steganography


Any references on this? I hear this argument a lot. In fact, in a talk on AI last week I heard someone say:

"If you click the thumbs up button to rate a chat, the AI provider will use the contents for training, so our company's policy is never to click the thumbs up button"

That seemed so farcical I had a hard time taking this person seriously. Enterprise plans must give some strong guarantees around data usage, right?


Obviously I can speak only from my personal experience but just me I have 5 examples of companies that were “no AI, IP and all that” that are now full-on “every developer must use CC, Cursor…”

How many conpanes today don’t have “AI strategy” and are fearing will be left behind etc? In my small circle we went from “most are not using AI” to “none are not using AI” in somewhat short period of time


This is why most businesses only have ChatGPT subscriptions. Plus their integration into existing Microsoft products and billing.

Trusting Microsoft seems like a right move /s

Microsoft already has all their business data in the form of handing document storage and emails. Trusting another of their services to also not use that data for Microsoft's own purposes is reasonable.

On it's face, I'm not sure that's a new question. Bots using browser automation frameworks (puppeteer, selenium, playwright etc) have been around for a while. There are signals used in bot detection tools like cursor movement speed, accuracy, keyboard timing, etc. How those detection tools might update to support legitimate bot users does seem like an open question to me though.

> Fear-mongering isn't lucrative, isn't dopamine triggering

Isn't it? Isn't fear-mongering one of the main selling points for news-media? And a driving factor of engagement in social media?


I have this problem with calc.exe. Sometimes it'll launch from the start menu, but often won't. I pinned it to the taskbar, but muscle memory is a powerful force, so I usually try to launch it from the start menu first.


> you have to install them locally once

or install Docker and have the agent run CLI commands in docker containers that mount the local directory. That way you essentially never have to install anything. I imagine there's a "skill" that you could set up to describe how to use docker (or podman or whatever) for all CLI interactions, but I haven't tried yet.


Seems like the .scr files trigger CrowdStrike Falcon. Not clear where the executables run here come from...


Probably because .scr is not meant to be run directly, but malware regularly does that.


> a series of photos side by side of the same subject

Cameras are now "enhancing" photos with AI automatically. The contents of a 'real' photo are increasingly generated. The line is blurring and it's only going to get worse.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: