I have wondered in the past what kind of monitoring / data harvesting google is doing with colab. I don't use them for anything nontrivial or work related precisely because I worry about google stealing my stuff or that I would be violating agreements with my customers by exposing their data to google. This story reinforces that they are snooping.
That aside, I don't think I find this too upsetting. They don't let people e.g. mine cryptocurrency either, I think it's fine that they limit how you can use their free (or cheap) "service".
Is there a good alternative to Colab you can just pay for, that's good for hobbyist individuals? Especially one that gives you shell access?
I tried using PaperSpace for a while but it works maybe 1 in 3 times I tried to use it, they (temporarily) banned me for no reason, they send me "itemized" bills with empty line items, and their support was no help.
What I settled on is I have a computer with a lower-end nvidia gpu that lets me debug code and do basic stuff but is impractical for training any large model or with lots of data (although since the incremental cost is only the electricity, you can often still get far by training overnight or when you're doing something else). And then I use AWS more deliberately once I've figured out exactly what I want to run. Otherwise, you're reserving a cloud GPU for all the setup and debugging.
If getting a gpu is not practical, I'd try one of the K80s or other cheaper GPUs on AWS for all your debugging and then port to the more expensive instances only if you need them. Doing this as a hobbyist will be relatively inexpensive. I think a lot of the cost incurred as a hobbyist comes from reserving a gpu when you don't actually need it.
Last time I checked it wasn't available outside the US and Canada at all, and didn't have a terminal, but I just checked and both of those seem to have changed.
I'm also not a fan of having to upload files to Google Drive and then write special code to import them. It means my code isn't portable to run locally or anywhere else and it's just extra effort. I just want normal file access and to write scripts like I normally would.
Still, I was spurred to ask the question because some people are categorically opposed to anything Google due to privacy issues.
At saturncloud.io we focus on helping data science teams move to the cloud (either in your AWS account or ours). One component we provide is hosted jupyter lab (along with SSH access so you can use PyCharm/VSCode or anything else.
Hobbyists often use our free tier (30 hrs of GPU per month) or pay for more compute ($0.69/hour) to do a variety of GPU workloads.
We do prohibit things like DDOS attacks and cryptocurrency mining.
Studio Lab doesn't have any GPU instances available in practice, unfortunately. They're permanently out of resources at the moment. The other two options are pretty expensive.
PaperSpace has improved a lot in the last few months. I've been using them nearly exclusively recently and have had zero issues.
They provide persistent storage and access to the full JupyterLab or Jupyter Notebook software, unlike Colab. I find this makes life far easier, since all my normal terminal workflows work fine.
Yes, I like them (and I feel like I've seen some other similar services but haven't looked into them closely). Its easy to start and stop interactive instances (easier than AWS imo) for doing stuff in notebooks, and to run training jobs on more powerful GPUs when you need to, because you're only paying for it when you're actually using it.
OVH actually has a (distantly) similar service (much less user friendly than grid and without the whole "grid" feature, but that matters less for hobbyists if you're not doing a lot of parallel runs). I liked the OVH one a lot in principle, but in practice found it too buggy to use properly (and they don't have customer support). For a budget project it could be worth trying.
I've migrated several notebooks from Colab to JupyterLab when playing around with various generative art models; it's a drop-in replacement and runs on your local machine's hardware.
I did have to do some fiddling to install a bunch of Nvidia crap on my local machine, however; blame CUDA.
You don't seem to offer viable GPU options - or maybe the pricing page is not up to date?
The GPU add-on cost is about 7.50$ per hour for a 12 GB K80 (which has 24GB, so you share it? Also, it's from 8? years ago?).
As an add-on, other vendors would give you V100s (possibly multiple ones).
I am pretty sure that's not actually what you are offering, because that would be ridiculous. So you probably want to update your pricing page ;-)
> I worry about google stealing my stuff or that I would be violating agreements with my customers by exposing their data to google.
Perhaps it just follows Google cloud's user data policy as long as you're using a paid service? But the latter might be a legitimate concern; some customer may ask their data not to leave any networks outside the contract and depending on this types of public services will increases the likelihood of such accidents.
They’d need to be very careful in doing so, as they both operate services that are certified and audited to meet US Gov FedRAMP High control standards, and doing that would violate several audited control standards.
A breach in that area would cost them potentially billions in losses.
Public clouds are separated from their gov versions, with gov versions using older, modified versions of what everyone else uses. public clouds ARE NOT fedramp compliant
Not necessarily. Lots and lots of .gov operates in commercial clouds. Google doesn’t even have a distinct offering - just support add-ons for things like CJIS compliance. Microsoft’s footprint is a lot more complex.
Outside of the DoD space, the main distinctions between these offerings is where data can be stored and where and to what level of vetting vendor employees are.
I'd be shocked if they weren't, it seems like a totally standard big corp playbook move. It would be the same exact thing Amazon was doing with their independent seller data[1].
that's not proof, lol. That uses publicly available code. are you suggesting it's trained on private code? not to mention I was talking about cloud workloads since you said everything
everything they can get their hands on includes both private and public data?
they have access to the private data, by definition. the entire point is that it's not exposed nor is it used. clearly you've never worked at one of the companies in question lol. when you're on-call you might need to actually access private data.
true data sovereignty products are something countries in the EU are wanting USA companies to implement though, so then it's simply inaccessible, not public or private.
the industry is moving in this direction with things like AWS Outposts
> when you're on-call you might need to actually access private data.
When I was on-call and had to dive deep to solve customers' issues at one of those companies, we could only access anonymized data stripped of all PII (including emails and even IP addresses), to the point where the only non-scrambled things we could get were timestamps and error stacktraces/executed commands (which were also stripped of all PII, which made debugging a bit more difficult).
And every instance of access to that anonymized no-PII data required a physical authenticator, and it would automatically send an email to your manager (and some mailing group, i forgot) and create a special log that indicated "such and such was trying to access this anonymized ABC data for XYZ stated purpose".
You have proof of these incredible claims - seems crazy to think Google/Microsoft are stealing private data? I know some folks at the NYT if you can provide credible evidence.
Nah the house has been entered without permission hard drives wiped, been shot at, few attempts on my life, threatened countless times, you obviously live in lala land! You dont run a country by being nice you know!
I know someone who knows some folks there (nobody well known, just professional contacts in IT). But I don't think it actually is all that useful (or impressive, six degrees to Kevin Bacon, right?) anyway, if somebody has actual credible tips that could lead to a big story, they should just go to
Hey, nice link and thanks for being supportive of reality. I think HN should, by default, assume all posters that are PRO megacorps/governments/rich - any person or group that could afford to hire shills - are paid shills. Change the color of their names or something. That way, they need to prove that what they're saying is actually valuable and not just shilling, instead of the way it currently runs where PR is trusted by default.
What's the harm in that, you know? The worst that would happen is that corporations would have a hard time astroturfing. The best that would happen is there'd be less PR.
I don't think aiming to just introduce doubt is a valid debate technique. It is a good way to disrupt productive conversation and ruin debate, though. You've linked to a website about, essentially, derailing discussions. You should stop using those as advice on how to make arguments. Throwing around evidence-less accusations just adds to the confusion, and bad actors are better at exploiting confusion than honest people.
Well you used to be able to use Google to find all the websites that the 5 eyes were seeding conversations on!
All you had to do was copy one of the opening paragraphs from a thread on a forum website like dosbods.co.uk or housepricecrash.co.uk and then paste it into Google and then you could see the exact same starting conversations, word for word taking place on a variety of websites, like pistonheads, mumsnet and others. The press help elevate websites like mumsnet to help with the "consensus cracking" aka psychological manipulation.
Different avatars, but the conversations always started off word for word identical because invariably non spooky sock puppets ie members of the public would join in the conversation so then it would deviate off. This is one of the methods to phish people.
The spooks were on house price crash because they wanted to know why people on there were hoping for a crash. The reason being is maslow's heirachy of needs, suggests shelter is one of the basic needs, so mortgages are a way of controlling people en-masse which we have seen since the late 90's with property prices going through the roof and endless hours of property porn on tv. Northern Rock 125% mortgages showed people would over extend themselves for the idea that on paper they were rich and gambling on capital appreciation without realising tax rules may change in the future. Its one of the many many ways the so called free press have played their part in manipulating and controlling the public, along with the banks lending without caution etc etc, but this is the beuaty of the hierarchical structure of control, beit employment or other entities which control everyone's lives.
I've actually met former head of MI5 Andrew Parker, in Bishop's Stortford, but I didnt know him, I only recognised him when he did a speech on terrorism some years later that was broadcast on the BBC 6 o clock news. What was interesting is he used his real life accent when talking to me, but used the BBC Received Pronunciation when doing the speech on Terrorism. I also met James Brokenshire MP then Tory housing minister, with Dame Stella Rimmington & Baron Jonathan Evans at the pump house at Wastwater, Scafell Pike, the day after Theresa May called the election! It was their way of telling me they were operating on Housepricecrash.co.uk. Why else would they drag the govt housing minister up to Scafell Pike when he was supposed to be off getting lung cancer treatment!?!
Its the spooks job to protect the economy, so phishing people looking for a housing crash is one remit, however they didnt realise they were giving themselves away in more ways than one! I'm always reminded the smallest of viruses can kill the largest of hosts! ROFL
I really do not like that because the ban feels fuzzy. If you are a researcher exploring those algorithms and using Colab because you have very limited GPU access of you own, you might now be worried that your very own topic of research will get you banned from the ressource you are using.
By default, Google will not allow you to access GPU instances on regular GCP. The GPU quota is initialized to zero and you have to open a support ticket in order to request GPU access, justifying why your use case deserves access. Evidently, "personal research project" is not a valid use case, so that leaves Colab as the only other option in the Google ecosystem.
AWS has a similar quota policy, but they seem less strict about use cases.
I'm not so sure this is true. I was able to get access to a GPU quota of 1 on GCP within 2 minutes. I highly doubt anyone reviewed what I wrote. I'm not sure how many you were requesting access to, but I suspect the main reason for gating higher counts are cost related. Quite obviously they are not cheap.
I've found that it varies. When I was at MIT, students and affiliates got credits and instances like candy (I feel like the school brand lowered friction). But then on Reddit, Stack Overflow, or Github I see a lot of people struggling to get quota (even when paying for the highest Colab tier).
Yes, you do. I went through this exact process in Fall 2021. I invite you to attempt to set up a personal GCP account for hobby use in the current GPU shortage, and share your experience.
No, it was extremely bizarre. They gave the option of escalating to sales, which I did, but all they did was deny my use case again for a lack of business justification. I think they're trying to prevent people from cryptomining, so they're just blanket denying access for small GPU quotas.
That is strange, why wouldn’t they just set the minimum fee at more than what cryptomining can maximally earn, and give unlimited access at those rates?
So, you're right. I started wiht a quota of 1 gpu and can make an instance, but if I request more quota it's immediately turned down (no human involved, afaict). It just says to contact your sales rep. I wasn't aware (wasn't the case when I worked there) they autoreject GPU quota increases.
Anyway, I checked and AWS does the same thing now, too. In fact, AWS took 12 hours to respond (rather than decline immediately), and then said they'd think about it, without a real no.
I checked in on this and they are new controls (new since I last evaluated the cloud using a hobby account) due to miners. ANd the message makes it clear- if you want to proceed, talk to a human. I don't like that but it's the same way it works for my work account.
I’m putting down my pitchfork, this makes all the difference. That’s totally reasonable to stop people abusing a free service (to the detriment of everyone else). Same basket as banning crypto mining as far as I’m concerned.
It is their right to ban any use they want. Colab is an awesome service, so much so that I seldom use my at-home GPU rig.
As consumers we can use or not use platforms like Google, Twitter, etc. as we like, or not.
Colab’s ban will be good for small 3rd party GPU cloud providers and that is a good thing. Same thing with Twitter: in the last month there is now better material on Mastodon.
It's my right to yell at anyone who accidentally steps off the sidewalk into my yard, but that right doesn't invalidate criticism and judgments I receive in response to my actions.
So I really dislike Jupyter, and I've tried using this[0] before to ssh into Colab and do work in a terminal setup.
You have to be careful to back up your code frequently (what I did was push to Github) since your ssh session goes away when your "kernel" (or whatever your colab session is called) goes away. You wouldn't have this worry if you were just using the web interface, since the code is always saved.
I'd imagine that, if you want to use VSCode, it's remote editing features, which I keep on hearing a lot about, would come in handy to edit over ssh!
VS Code can render Jupyter notebooks (it connects to the kernel). I don't know if "use colab from VSCode" means connecting your VS Code to a remote colab kernel.
I wonder if we'll end up in a future where gaming GPUs will be restricted from running ML algorithms, to prevent private individuals from running advanced deepfakes/DALLE-10 or whatever.
This is terrifying. If only a certain class of people has access to this technology (i.e. governments and huge tech companies), the temptation to use that technology to control the class of people who don't will be too much.
Good on them. As a researcher it is so disheartening when GPUs get hoovered up by people participating in proof of waste pyramid schemes.
Likewise, if google decides it wants to preserve its service for students and researchers, then I hope that people chewing up resources for malicious/questionable use-cases are booted off.
GPU manufacturers have been talking about splitting the GPU market with products that are locked into only serving specific use cases like PC gaming, machine learning and cryptocurrency mining.
That's to say manufacturers want to prevent someone who buys a gaming GPU from using it to mine crypto or for ML, and they don't want someone who buys a ML GPU to use it for mining, and so on.
Any good (fiction or non-fiction) book on this? By "this" I mean governments trying to control overtly or secretly the use of computing resources. Bonus points if it is not Charlie Stross (I can't stand his writings).
More likely the high end ML accelerators become so specialized as to diverge from GPUs. Carry on trying to use your gaming GPU for ML, with the knowledge that professionals are using hardware 100x as powerful.
But if you click that link for "Additional Restrictions" there's nothing about deepfakes mentioned for pro.
3. Restrictions.
In this Section 3, the phrase “you will not” means “you will not, will not attempt, and will not permit a third party to”.
When you use the Paid Service, you will not:
share your Google account password with someone else to allow them to access any Paid Service that the person did not order;
access the Paid Service other than by means authorized by Google;
copy, sell, rent, or sublicense the Paid Services to any third party;
use the Paid Service as a general file-hosting or media-serving platform;
engage in peer-to-peer file-sharing; or
mine cryptocurrency.
One thing I don't like is that they also banned SSH-ing to Colab, which means I can no longer remote SSH to VS Code and use Copilot on the stuff I am working on.
This more than anything really shows the weakness of the cloud pricing model. Cloud only makes sense for specific low/intermittent usage web based sites. Once you get into things like training data an other proccess intense workloads that need 100% gpu/cpu resource. The cloud provider is no longer interested in selling you their service. Self hosted infrastructure is dead, long live self hosted infrastructure...
Not even one is close to colab yet (first of google TPU is only way to access GPT-like models for most people, second you can get pretty neat GPU for few dollars when on other platforms you are paying a ton of money for that)
>What motivation does another company have for creating this service?
They won't need to create this deepfake training specifically, they just need to turn a blind eye to their existing hardware being applied this way.
Three motivations come to mind:
1. It's a convenient selling point to grow the user base. "We give you complete freedom to run your models and don't spy on you like the other guys do! We will never stop you from doing things unless the government forces us to!"
2. It's an easy way to frontrun criminal activity that is poised to disrupt society in the near future. Hire a CEO with a cool sounding hacker name, advertise on a bunch of telegram groups, while secretly logging all identifiers and making all inputs/outputs searchable by law enforcement :)
3. By owning the data used to train and generate Deepfakes, you have a rich dataset that could be used academically or for training counter-models to detect future Deepfakes.
Make people depends on ecosystem. I'll see AWS/Azure/NVIDIA doing smth like that as that just to create they own ecosystem as its extremely profitable to sell gpu VM's for them.
I presume it's like Microsoft turning a blind eye to unlicensed use of Windows and Office for home users. When they get hired they already prefer MS solutions they know since they were students.
The results of GPT-3, Imagen, Github Copilot, etc, have shown me two things:
1. A critical part of AI that was missing no longer is: the model of our world. In natural language processing (more generally anything that interacts with humans) it has been an eternal issue that to understand sentences you must know the content, and therefore must, in theory, know 'everything' there is to our world and culture. As far as I'm concerned, the current large transformer models have captured this. There is genuine intelligence here, a lack of reasoning still but an abundance of knowledge.
2. We are entering a terrifying era of computing. The amount of resources needed to get the genuine AI above, are monstrous (at least with our current methods). Not only raw computing power, but also data. This means an unavoidable centralization and monopolization of something that I predict will be an essential component for human-computer interaction in the decades to come.
What we're seeing here is a sign of (2).
My prediction? Google and Facebook are here to stay. Their primary income right now is advertisement, I expect this to change over the next decade. Specifically these two companies are sitting on the biggest 'oil field' in modern history: an endless pipeline of modern human culture, ready to feed into ever larger models of our world.
While I partially agree with the theses, one should note that in absolute terms the compute spent even on the largest models of their kind is still pretty tame, compared to SV software engineer salaries: you could train your own Imagen for ~200000$. This is pretty much kickstarter-tier, and not at the higher end of the distribution.
Instead of despairing, we should train capable models in the open, like BigScience does: https://bigscience.huggingface.co/blog/what-language-model-t... and petition our governments to train & make available large models under permissive licenses, for the benefit of the public and small businesses.
The big tech exceptionalism is overblown, you don't really need exotic engineering to train an LLM with Jax and a TPU pod, or whatever accelerator you managed to procure.
If my experience in machine learning (which admittedly is small scale) is applicable to these large-scale projects, I'd say that for each press-ready model there are probably dozens of previous attempts/versions that didn't make the cut. So I'd always add an order of magnitude on top of the final model cost.
> The big tech exceptionalism is overblown, you don't really need exotic engineering to train an LLM with Jax and a TPU pod, or whatever accelerator you managed to procure.
While I agree with you that the current best models are still within reach of motivated small-medium sized groups outside of big tech, if the scaling hypothesis holds up, the next big step forward is likely even larger and more expensive models. I was skeptical of the scaling hypothesis before, but much less so now.
> petition our governments to train & make available large models under permissive licenses
I would say that if it comes to this we are already firmly in the "centralization and monopolization" territory if we have to ask nation-states for their computational generosity.
If it's something that you think can be stopped, go ahead and try to stop it. Just know that you will be Open Sourced- even artificial general intelligence, when someday invented, will eventually have its design released in the clear.
Good! Deepfakes are going to be a massive problem for society going forward. Glad to see Google is at least recognizing it's responsibility as a compute platform here.
Honestly they're probably finding that, like cryptocurrency miners, free tier deepfake users born burn lots of GPU time compared to others who might run a couple GPU tasks for a little while to get a few results or test things.
That aside, I don't think I find this too upsetting. They don't let people e.g. mine cryptocurrency either, I think it's fine that they limit how you can use their free (or cheap) "service".