Hacker Newsnew | past | comments | ask | show | jobs | submit | dax_'s commentslogin

Normally I would agree, but I've seen this happen too often. Common sense be damned, just make the number look good.


My experience is exactly the opposite (company with more than 10k employees). Getting anything done in Azure takes me 10x as long, as all of Azure is managed by one team, and everything requires approvals, lots of bureaucracy. Also, as it turns out, it is extremely expensive. Per our guidelines everything needs to be isolated within company intranet (unless really required to be external), which often means we need premium tier services in Azure. These are really, really pricey sometimes.

On the other hand, if I request a virtual server, it takes less than a week, and I can work with it much more freely.


It's just as possible that they need to invest more and more for negligible improvements to model performance. These companies are burning through money at an astonishing rate.

And as the internet deteriorates due to AI slop, finding good training material will become increasingly difficult. It's already happening that incorrect AI generated information is being cited as source for new AI answers.


They are burning through money, but their revenue is scaling at a similar rate.

I'm sure most companies have understood the "AI outputs feeding AI's" incest issue for a while and have many methods to avoiding it. That's why so much has been put into synthetic data pipelines for years.


It's just one of those sites that focuses on one thing, and does that extremely well, without trying to extract as much money from its users as possible. Rare thing nowadays.


With Windows 10 going out of support soon, I suspect there will be an increase in Linux adoption. After all, why throw out perfectly good hardware because of an arbitrary rule that Microsoft made? For me, I know that I'll install Linux for some relatives.


GDPR doesn't stop personal data being stored. It handles whom it can be shared with, when it has to be deleted, and only collect as much data as required. Also gives transparency to the users about their data use.

And if I were to give over personal information to an AI company, then absolutely I'll prefer a company who actually complies with GDPR.


yea i mean. how would they know how to remove it from 'memory' since they have no way to know with 100% accuracy which parts of my chart are PII.


The cautious approach on their part would be to just delete the whole thing on any subject access deletion request.


yes if they aren't using that to train


As a metaphor (well, a simile) think of it like if they were providing you with an FTP server or cloud storage. It's your choice what, if any, personal data you put into the system, and your responsibility to manage it, not theirs.

As to what to do if you, with a customer's permission, put their PD (PII being an American term) into the system, and then get a request to delete it... I'm not sure, sorry I'm not an expert on LLMs. But it's your responsibility to not put the PD into the system unless you're confident that the company providing the services won't spread it around beyond your control, and your responsibility not to put it into the system unless you know how to manage it (including deleting it if and when required to) going forwards.

Hopefully somebody else can come along and fill in my gaps on the options there - perhaps it's as simple as telling it "please remove all traces of X from memory", I don't know.

edit: Of course, you could sign an agreement with an AI provider for them to be a "data controller", giving them responsibility for managing the data in a GDPR-compliant way, but I'm not aware of Mistral offering that option.

edit 2: Given my non-expertise on LLMs, and my experience dealing with GDPR issues, my personal feeling is that I wouldn't be comfortable using any LLM for processing PD that wasn't entirely under my control, privately hosted. If I had something I wanted to do that required using SOTA models and therefore needed to use inference provided by a company like Mistral, I'd want either myself or my colleagues to understand a hell of a lot more about the subject than I currently do before going down that road. Thankfully it's not something I've had to dig into so far.


Well if it continues like this, that's what will happen. And I dread that future.

Noone will care to share anything for free anymore, because it's AI companies profiting off their hard work. And no way to prevent that from happening, because these crawlers don't identify themselves.


I'm 99% sure I already saw a product launch on HN for precisely this idea.


Why would we embrace that even more? In Software Development we try to keep things deterministic as much as possible. The more variables we're introducing into our software, the more complicated it becomes.

The whole notion of adding LLM prompts as a replacement for code just seems utterly insane to me. It would be a massive waste of resources as we're reprompting AI a lot more frequently than we need to. Also must be fun to debug, as it may or may not work correctly depending on how the LLM model is feeling at that moment. Compilation should always be deterministic, given the same environment.


Some algorithms are inherently probabilistic (bloom filters are a very common example, HyperLogLog is another). If we accept that probabilistic algorithms are useful, then we can extrapolate that to using LLMs (or other neural networks) for similar useful work.

You can make the LLM/NN deterministic. That was never a problem.


Microsoft has really been putting a lot of focus on improving it with each release. I love reading through the blog articles for each major release, that outline all the performance improvements that were done: https://devblogs.microsoft.com/dotnet/performance-improvemen...


A warning for those not in the know, the performance improvement posts famously give mobile browsers trouble because they are so massive. All because the extent of the improvements is so great (along with the amount of detail the posts go into about the improvements).


I just viewed the one linked above, and the coupla second render delay at first aside, the post displayed nicely, at full frame rate.

Old Note 9, Chrome and Firefox.

Non flagship mobile devices could very well choke on one of those pages, but most newer devices should display these pages with little grief.


Interesting, I thought I saw the usual complaints even in the past year.


And if you look at the PRs for the core, there are Intel people hacking away at the low-level routines too; to make it run better on their latest server CPUs.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: