That aside, I don't think I find this too upsetting. They don't let people e.g. mine cryptocurrency either, I think it's fine that they limit how you can use their free (or cheap) "service".
I tried using PaperSpace for a while but it works maybe 1 in 3 times I tried to use it, they (temporarily) banned me for no reason, they send me "itemized" bills with empty line items, and their support was no help.
If getting a gpu is not practical, I'd try one of the K80s or other cheaper GPUs on AWS for all your debugging and then port to the more expensive instances only if you need them. Doing this as a hobbyist will be relatively inexpensive. I think a lot of the cost incurred as a hobbyist comes from reserving a gpu when you don't actually need it.
OVH charges $2 per hour for a V100S
I'm also not a fan of having to upload files to Google Drive and then write special code to import them. It means my code isn't portable to run locally or anywhere else and it's just extra effort. I just want normal file access and to write scripts like I normally would.
Still, I was spurred to ask the question because some people are categorically opposed to anything Google due to privacy issues.
That's a use case more suited to running a VM itself.
Hobbyists often use our free tier (30 hrs of GPU per month) or pay for more compute ($0.69/hour) to do a variety of GPU workloads.
We do prohibit things like DDOS attacks and cryptocurrency mining.
paid services on AWS:
- Amazon Sagemaker Studio
- Amazon Sagemaker Notebook Instances (think EC2 + jupyter + integration with AWS services)
They provide persistent storage and access to the full JupyterLab or Jupyter Notebook software, unlike Colab. I find this makes life far easier, since all my normal terminal workflows work fine.
OVH actually has a (distantly) similar service (much less user friendly than grid and without the whole "grid" feature, but that matters less for hobbyists if you're not doing a lot of parallel runs). I liked the OVH one a lot in principle, but in practice found it too buggy to use properly (and they don't have customer support). For a budget project it could be worth trying.
I've migrated several notebooks from Colab to JupyterLab when playing around with various generative art models; it's a drop-in replacement and runs on your local machine's hardware.
I did have to do some fiddling to install a bunch of Nvidia crap on my local machine, however; blame CUDA.
Disclaimer: I work for Deepnote
The GPU add-on cost is about 7.50$ per hour for a 12 GB K80 (which has 24GB, so you share it? Also, it's from 8? years ago?).
As an add-on, other vendors would give you V100s (possibly multiple ones).
I am pretty sure that's not actually what you are offering, because that would be ridiculous. So you probably want to update your pricing page ;-)
Perhaps it just follows Google cloud's user data policy as long as you're using a paid service? But the latter might be a legitimate concern; some customer may ask their data not to leave any networks outside the contract and depending on this types of public services will increases the likelihood of such accidents.
Same thing Microsoft does with Windows, github and VSCode.
Just hoovering up all the ideas you have.
I'd honestly be shocked if Google/Microsoft are monitoring the clients of their public cloud and keeping the data to create their own products off of.
A breach in that area would cost them potentially billions in losses.
Source: I work on FedRAMP compliance for GCP Databases.
AWS US East-West (Northern Virginia, Ohio, Oregon, Northern California) is FedRAMP compliant at moderate impact level.
Outside of the DoD space, the main distinctions between these offerings is where data can be stored and where and to what level of vetting vendor employees are.
they have access to the private data, by definition. the entire point is that it's not exposed nor is it used. clearly you've never worked at one of the companies in question lol. when you're on-call you might need to actually access private data.
true data sovereignty products are something countries in the EU are wanting USA companies to implement though, so then it's simply inaccessible, not public or private.
the industry is moving in this direction with things like AWS Outposts
When I was on-call and had to dive deep to solve customers' issues at one of those companies, we could only access anonymized data stripped of all PII (including emails and even IP addresses), to the point where the only non-scrambled things we could get were timestamps and error stacktraces/executed commands (which were also stripped of all PII, which made debugging a bit more difficult).
And every instance of access to that anonymized no-PII data required a physical authenticator, and it would automatically send an email to your manager (and some mailing group, i forgot) and create a special log that indicated "such and such was trying to access this anonymized ABC data for XYZ stated purpose".
and contact them through Signal or whatever.
Read this and learn!
The Gentleperson's Guide To Forum Spies (spooks, feds, etc.)
What's the harm in that, you know? The worst that would happen is that corporations would have a hard time astroturfing. The best that would happen is there'd be less PR.
All you had to do was copy one of the opening paragraphs from a thread on a forum website like dosbods.co.uk or housepricecrash.co.uk and then paste it into Google and then you could see the exact same starting conversations, word for word taking place on a variety of websites, like pistonheads, mumsnet and others. The press help elevate websites like mumsnet to help with the "consensus cracking" aka psychological manipulation.
Different avatars, but the conversations always started off word for word identical because invariably non spooky sock puppets ie members of the public would join in the conversation so then it would deviate off. This is one of the methods to phish people.
The spooks were on house price crash because they wanted to know why people on there were hoping for a crash. The reason being is maslow's heirachy of needs, suggests shelter is one of the basic needs, so mortgages are a way of controlling people en-masse which we have seen since the late 90's with property prices going through the roof and endless hours of property porn on tv. Northern Rock 125% mortgages showed people would over extend themselves for the idea that on paper they were rich and gambling on capital appreciation without realising tax rules may change in the future. Its one of the many many ways the so called free press have played their part in manipulating and controlling the public, along with the banks lending without caution etc etc, but this is the beuaty of the hierarchical structure of control, beit employment or other entities which control everyone's lives.
I've actually met former head of MI5 Andrew Parker, in Bishop's Stortford, but I didnt know him, I only recognised him when he did a speech on terrorism some years later that was broadcast on the BBC 6 o clock news. What was interesting is he used his real life accent when talking to me, but used the BBC Received Pronunciation when doing the speech on Terrorism. I also met James Brokenshire MP then Tory housing minister, with Dame Stella Rimmington & Baron Jonathan Evans at the pump house at Wastwater, Scafell Pike, the day after Theresa May called the election! It was their way of telling me they were operating on Housepricecrash.co.uk. Why else would they drag the govt housing minister up to Scafell Pike when he was supposed to be off getting lung cancer treatment!?!
Its the spooks job to protect the economy, so phishing people looking for a housing crash is one remit, however they didnt realise they were giving themselves away in more ways than one! I'm always reminded the smallest of viruses can kill the largest of hosts! ROFL
AWS has a similar quota policy, but they seem less strict about use cases.
Stop spreading fud.
Please don't promote FUD.
So, you're right. I started wiht a quota of 1 gpu and can make an instance, but if I request more quota it's immediately turned down (no human involved, afaict). It just says to contact your sales rep. I wasn't aware (wasn't the case when I worked there) they autoreject GPU quota increases.
Be less quick to "fact check" anything that disagrees with the conventional wisdom.
Anyway, I checked and AWS does the same thing now, too. In fact, AWS took 12 hours to respond (rather than decline immediately), and then said they'd think about it, without a real no.
I checked in on this and they are new controls (new since I last evaluated the cloud using a hobby account) due to miners. ANd the message makes it clear- if you want to proceed, talk to a human. I don't like that but it's the same way it works for my work account.
Wouldn't be surprised if this is the first step towards correlating "Are you the 1% using huge amounts of Colab time to only run deepfake models?"
We'll see what the follow-up is, but at least the initial implementation seems less auto-ban-y than more mature Google services default to.
As consumers we can use or not use platforms like Google, Twitter, etc. as we like, or not.
Colab’s ban will be good for small 3rd party GPU cloud providers and that is a good thing. Same thing with Twitter: in the last month there is now better material on Mastodon.
Can you use Colab from VSCode? Notebooks are great for exploration but not for complex projects.
You have to be careful to back up your code frequently (what I did was push to Github) since your ssh session goes away when your "kernel" (or whatever your colab session is called) goes away. You wouldn't have this worry if you were just using the web interface, since the code is always saved.
I'd imagine that, if you want to use VSCode, it's remote editing features, which I keep on hearing a lot about, would come in handy to edit over ssh!
He gave the same talk at Google: https://www.youtube.com/watch?v=gbYXBJOFgeI
It's slowly becoming reality.
Likewise, if google decides it wants to preserve its service for students and researchers, then I hope that people chewing up resources for malicious/questionable use-cases are booted off.
That's to say manufacturers want to prevent someone who buys a gaming GPU from using it to mine crypto or for ML, and they don't want someone who buys a ML GPU to use it for mining, and so on.
Any good (fiction or non-fiction) book on this? By "this" I mean governments trying to control overtly or secretly the use of computing resources. Bonus points if it is not Charlie Stross (I can't stand his writings).
Edit: Actually, it's not really clear.
In this Section 3, the phrase “you will not” means “you will not, will not attempt, and will not permit a third party to”.
When you use the Paid Service, you will not:
share your Google account password with someone else to allow them to access any Paid Service that the person did not order;
access the Paid Service other than by means authorized by Google;
copy, sell, rent, or sublicense the Paid Services to any third party;
use the Paid Service as a general file-hosting or media-serving platform;
engage in peer-to-peer file-sharing; or
This is an extremely expensive service to make and run, requiring tens of millions of dollars of hardware to get started, and doesn't produce revenue
Instead of folksy sayings, see if you can answer the actual question
What motivation does another company have for creating this service?
They won't need to create this deepfake training specifically, they just need to turn a blind eye to their existing hardware being applied this way.
Three motivations come to mind:
1. It's a convenient selling point to grow the user base. "We give you complete freedom to run your models and don't spy on you like the other guys do! We will never stop you from doing things unless the government forces us to!"
2. It's an easy way to frontrun criminal activity that is poised to disrupt society in the near future. Hire a CEO with a cool sounding hacker name, advertise on a bunch of telegram groups, while secretly logging all identifiers and making all inputs/outputs searchable by law enforcement :)
3. By owning the data used to train and generate Deepfakes, you have a rich dataset that could be used academically or for training counter-models to detect future Deepfakes.
1. Not really
3. No you don't
E pur si existit.
1. A critical part of AI that was missing no longer is: the model of our world. In natural language processing (more generally anything that interacts with humans) it has been an eternal issue that to understand sentences you must know the content, and therefore must, in theory, know 'everything' there is to our world and culture. As far as I'm concerned, the current large transformer models have captured this. There is genuine intelligence here, a lack of reasoning still but an abundance of knowledge.
2. We are entering a terrifying era of computing. The amount of resources needed to get the genuine AI above, are monstrous (at least with our current methods). Not only raw computing power, but also data. This means an unavoidable centralization and monopolization of something that I predict will be an essential component for human-computer interaction in the decades to come.
What we're seeing here is a sign of (2).
My prediction? Google and Facebook are here to stay. Their primary income right now is advertisement, I expect this to change over the next decade. Specifically these two companies are sitting on the biggest 'oil field' in modern history: an endless pipeline of modern human culture, ready to feed into ever larger models of our world.
The web-scale data is also available: we already have a dataset superior to one Imagen used: https://laion.ai/laion-5b-a-new-era-of-open-large-scale-mult...
Instead of despairing, we should train capable models in the open, like BigScience does: https://bigscience.huggingface.co/blog/what-language-model-t... and petition our governments to train & make available large models under permissive licenses, for the benefit of the public and small businesses.
The big tech exceptionalism is overblown, you don't really need exotic engineering to train an LLM with Jax and a TPU pod, or whatever accelerator you managed to procure.
If my experience in machine learning (which admittedly is small scale) is applicable to these large-scale projects, I'd say that for each press-ready model there are probably dozens of previous attempts/versions that didn't make the cut. So I'd always add an order of magnitude on top of the final model cost.
> The big tech exceptionalism is overblown, you don't really need exotic engineering to train an LLM with Jax and a TPU pod, or whatever accelerator you managed to procure.
While I agree with you that the current best models are still within reach of motivated small-medium sized groups outside of big tech, if the scaling hypothesis holds up, the next big step forward is likely even larger and more expensive models. I was skeptical of the scaling hypothesis before, but much less so now.
> petition our governments to train & make available large models under permissive licenses
I would say that if it comes to this we are already firmly in the "centralization and monopolization" territory if we have to ask nation-states for their computational generosity.
Honestly they're probably finding that, like cryptocurrency miners, free tier deepfake users born burn lots of GPU time compared to others who might run a couple GPU tasks for a little while to get a few results or test things.