Hacker News new | past | comments | ask | show | jobs | submit login

Just reiterates that you don't own your data hosted on cloud providers; this time there's a clear sign, but I can guarantee that google's systems read and aggregated data inside your private docs ages ago.

This concern was first raised when Gmail started, 20 years ago now; at the time people reeled at the idea of "google reads your emails to give you ads", but at the same time the 1 GB inbox and fresh UI was a compelling argument.

I think they learned from it, and google drive and co were less "scary" or less overt with scanning the stuff you have in it, also because they wanted to get that sweet corporate money.




Of course Google reads and aggregates data inside your private docs. How would it provide search over your documents otherwise?


This feels a lot like "Of course they use their hands, they couldn't give you a massage otherwise" but it's in reply to a news article about the person who agreed to being touched being punched.


When I hit search, do the search right then. Don't grep out of a stored cache of prior searches.


The thing that makes it possible for search to be fast is pre-crawling and pre-indexing.

Some other engines don't do this, and the difference is remarkabe. Try a full-content search in Windows 7, you'll be staring at the dialog for two minutes while it tries to find a file that's in the same directory as you started the search in.


You said nothing about fast in your original though, so now you've moved the goal posts


I'm not really engaging in forensics-style debate. If you don't already know why "fast" is so integral to search as a feature that it goes without mention, I don't think we are enough on the same page to discourse on the topic.


I think returning results in a timely manner is more than an acceptable assumption.

The poster clearly thought about search in terms of the existing Google search functionality which is near instantaneous.

Usability matters to the average end user and a delayed search is not usable for most people.


re: data on cloud providers: I trust ProtonDrive to not use my data because it is encrypted in transit and in place.

Apple now encrypts most data in transit and in place also, and they document which data is protected. I am up in the air on whether a future Apple will want to use my data for training public models. Apple’s design of pre trained core LLMs, with local training of pluggable fine tuning layers would seem to be fine, privacy wise, but I don’t really know.

I tend to trust the privacy of Google Drive less because I have authorized access to drive from Colab Pro, and a few third parties. That said, if this article is true, then less trust.

Your analogy with early Gmail is good. I got access to Gmail three years before it became public (Peter Norvig gave me an early private invite) and I liked, at the time, very relevant ads next to my Gmail. I also, gave Google AI plus (or whatever they called their $20/month service) full access to all my Google properties because I wanted to experiment with the usefulness of LLMs integrated into a Workplace type environment.

So, I have on my own volition surrendered privacy if Google properties.


All it takes is a "simple" typo in the code that checks if the user has granted access to their content. Something as amateur (which I still find myself occasionally doing) as "if (allowInvasiveScanning = true)" that goes "undetected" for any period of time gives them the a way out yet still gains them access to all the things. Just scanning these docs one time is all they need.


> but I can guarantee that google's systems read and aggregated data inside your private docs ages ago

That is how search works, yes.

But if you’re trying to imply that everyone’s private data was scraped and loaded into their LLM, then no, that’s obviously a conspiracy theory.

It’s incredible to me that people think Google has convinced tens of thousands of engineers to quietly keep secret an epic conspiracy theory about abusing everyone’s private data.


I dunno, bro, software engineers have repeatedly shown total lack of wider judgment in these contexts over the years. Not to say there is, in fact, some kind of “epic conspiracy,” just that SWEs appear not to take much time to consider just what it is their code ends up being used for. Incidentally, that would be one way to start to get out of the mess we’ve found ourselves in: start holding SWEs accountable for their work. You work on privacy-destroying projects that society pushes back against, it’s fair game to put you under the microscope. Perhaps not legally, but we as a society shouldn’t hold back from directed criticism and social accountability for the individual engineers who enable this kind of shit. That will not be a popular take here. Perhaps it will be some solace to know I advocated the same kind of accountability for lawyers who enabled torture and other governmental malfeasance in the GWoT years. I was also looked at askance by other lawyers for daring to suggest such a thing. In that way, SWEs remind me of lawyers in how they view their own work. “What, I’m not personally responsible for what my client chooses to use my services for.”

Yeah, you are, actually.


I was hanging out around startup incubators, and, by extension, many wantrepreneurs. When asked about business model, the knee jerk reaction was usually “we’re going to sell data!” regardless of product. I was appalled by how hard it is to keep founders from abusing the data when I worked at startups. GDPR and the likes are seen as an annoyance and they make every effort to find a loophole.


To riff on the famous Upton Sinclair quote:

“It is difficult to get an engineer to see something, when his salary depends on his not seeing it.”


> that’s obviously a conspiracy theory.

Well, while some of our fellow humans are far too quick to jump on concluding that everything and the rest comes from some conspiracy, it shouldn't void the existence of any conspiracy as an extreme opposite.

In that case, whether these actors do it or not is almost irrelevant: they have the means and incentives to do so. What safeguard civil society is putting in place to avoid it to happen is a far more interesting matter.


> It’s incredible to me that people think Google has convinced tens of thousands of engineers to quietly keep secret an epic conspiracy theory about abusing everyone’s private data.

With NDA being all over the place, it does strike me as doable.

NDAs should have a time limit.

Additionally, no-one in their right mind will be a whistleblower nowadays.


isn't that it's supposed to work? we just need >0 people to blow a whistle if a whistle needs blowing. we don't need to rely on the people fearing for their jobs so long as >0 people are willing to sacrifice their careers/lives when there's some injustice so great it's worth dying for


Ya, no.

Sludge in a hole in your shop's backyard? Not worth it.

High level of chemicals in the air which may cause stillborns? Not worth it.

Scanning private files for an AI training? Not worth it.

Genocide? Not worth it.

There is absolutely nothing nowadays worth being banished from your livelihood.

Snowden and Assange are heroes. And insane. Threw away $$$ for their morals. Stanislav Petrov threw his career away instead of passing it up and let it be somebody's else problem.


> that’s obviously a conspiracy theory.

Of course. Google (and Apple, Microsoft not so much - but it is for your own good) will deny that they store your encryption keys.


One time I booked something on Expedia, which resulted in an itinerary email to my Gmail account. Lo and behold, minutes later I got a native CTA on the Android home screen to set up some thing or another on Google’s trip product. I dropped Android since, but Gmail is proving harder to shake.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: