Hacker News new | past | comments | ask | show | jobs | submit login
[dupe] Colaboratory, a free cloud based Jupyter notebook environment requires no setup (research.google.com)
113 points by saranshk 11 months ago | hide | past | web | favorite | 33 comments

See also Seedbank, which is Google's curated hub of Colaboratory notebooks: http://tools.google.com/seedbank/

I also wrote a blog post detailing my personal experiences with using/abusing the free GPU in Colaboratory to build text-generating neural networks: https://minimaxir.com/2018/05/text-neural-networks/

One of the under-appreciated aspects of Colaboratory is that it's completely integrated into the Google Drive ecosystem, including multiple real-time users of the same notebook (sharing the same VM). This was a real game-changer for me.

The real-time use-case has a nice wow factor to it; I've used it as a way to pair program for data science problems. The input cells sync in real-time (a la Google Docs), and so too do the output cells when one person runs a cell. And it's nice to be able to leave comment threads on a cell that can be resolved as a form of peer review.

But what made Colab a game-changer for me is how it let me seamlessly put my notebooks and a VM into Google Drive, making anything I put in a notebook accessible to anyone within my organization without needing to set up an environment, be it shared or local.

My last organization was a small rare disease research foundation, and I primarily worked on the fundraising side of the house; it was not a technical organization. When thinking about the longevity of my work, I realized that even the one person managing IT for them probably couldn't set up, let alone justify maintaining a networked Jupyter environment. So rather than ask for that and store all my analyses and small utilities on GitHub, I built everything on top of Google Drive and Collab. Folks were used to using Drive for everything else, so it meant my work was adjacent and discoverable to the team it was pertinent to and they could get access to both the outcomes of prior runs or change a few variables and run it again without me being needed. I left recently and I've still heard from a few former colleagues that they're still using many of the these notebooks and discovering others I'd built on their own time.

For a small data analysis operation in a Google Apps organization, Colaboratory is a godsend.

I've actually had the opposite experience, I upgraded my drive storage for an ML project and was still unable to load the datasets into a colab reliably. Hope this story gets better. In the meantime I'm using sagemaker and kaggle kernels.

> including multiple real-time users of the same notebook (sharing the same VM)

In my experience our cell contents were synced but our Python state was not. This makes collaborating highly confusing and error prone.

I only knew of Colaboratory before now because it's ended up contributing an awful lot of games to LeelaZero, the AlphaZero cone:


It's perfect to start exploring ML frameworks. It allows to switch into GPU mode (Tesla K80 GPU but with limitations - less ram etc.).

The provided VM has 13GB RAM: more than enough for beginner DL projects.

For non-beginner projects, you can use a batch generator to avoid storing everything in memory. (Keras has a good fit_generator utility)

I was referring to VRAM. It was < 500 MB. But you are right. It's more than enough for beginners projects.

Then you didn't get the K80. There are two types of gpu's in colab and it's really random which you get on a given run. One is extra weak, the other is a K80.

Are Colab notebooks fully compatible with Jupyter? (i.e. exporting to .ipynb is completely lossless)

I'm worried about Google embracing and extending Jupyter notebooks, and then deciding to retire the service a few years later.

Your `.ipynb` files can always be exported and is in the open source format, so yes it's completely lossless in terms of the data format.

But also python notebooks make assumptions about the jupyter environment they're run in, so you might have to tweak the ipynb if your environment is substantially different from what the colab machines use. E.g data source for your models.

We have used it to set up a small ML project. We decided to move away from it and to go back to Jupyter. We have found that file I/O is proprietary to Colab (e.g., the Google Drive interface). We had to rewrite that part of the notebook when going back to Jupyter.

Our main reason for moving away from it was the fact that it is difficult to run long jobs on Colab. It was good to start working on the project, but any real ML task took too long to finish, if at all.

I see it as a teaching tool for people who do not have admin access to install a full python environment or that are interested in trying out basic things before investing time and effort in setting up Jupyter.

> Our main reason for moving away from it was the fact that it is difficult to run long jobs on Colab. It was good to start working on the project, but any real ML task took too long to finish, if at all.

I like Jupyter notebooks (whether they're running in Colab or not) for data analysis and post-hoc model analysis, but I'd never recommend using them to actually train models in the real world, unless your real-world models are extremely fast to train (like, < 5 minutes). YMMV, but I constantly have to reconnect, restart the kernel etc. -- I consider them completely unreliable in terms of retaining anything I actually care about (i.e., any interesting model or result that can't be recomputed in <5 minutes). Of course, you can save snapshots along the way and resume from them, but to me the notebook interface has never really encouraged this sort of workflow -- IMO if you can't quickly shift+enter through every cell of the notebook when you first start it up and see all the same outputs you saw the first time in a couple minutes, it's probably not the right tool. (I brush up against or cross this line constantly myself, and it's always a painful experience.)

Maybe they'll get better and work for this kind of thing in the future, but for now I wouldn't recommend them only for anything beyond analysis.

Can you run GPU stuff for free? From reading the Faq it seems like you can. That seems pretty awesome.

Yes you can! We used it for a university project and it allowed us to test different model configurations and datasets much faster. Before that, we'd run the training over the day / night on one of our personal desktop computers with a dedicated GPU. Using Colab reduced the time needed for training by about 75%.

Of note, the cost of the VM and GPU used in Colaboratory is only $0.54/hr total. If you are doing a research project, it might be more pragmatic in terms of workflow to use a more powerful machine that's slightly more expensive.

Yep. Tesla K80 for 12 hours max at a stretch. Has awesome integration with man other services too.

Not sure where your 12 hour figure came from. I found this though:

> Colaboratory is intended for interactive use. Long-running background computations, particularly on GPUs, may be stopped.

It was mentioned in a mail inviting beta testers around November last year I think. Not sure. This article has a reference to the 12 hours claim. https://towardsdatascience.com/fast-ai-lesson-1-on-google-co...

Yeah - this kind of tool is ripe for crypto mining abuse.

I love this product, works with nodejs kernel also.

JavaScript ML! Wow, I didn't know they'd expanded kernel support beyond Python. Thanks for mentioning!

!npm config set user 0 !npm config set unsafe-perm true !npm install -g --unsafe-perm ijavascript zeromq node-gyp node-pre-gyp webpack !ijsinstall --install=global !jupyter-kernelspec list !apt-get install -yy git build-tools

After talking with Google support about adding the kernel, there probably isn't a menu item for it, so you have to set the language in the .ipynb json manually.

Very cool, could be useful for programming interviews, especially for data science and similar positions where interactive plotting might be important.

"free cloud based"

What exactly is paying for it, then?

When the same ppl want to deploy in production and choose google cloud instead of AWS.

Amazing. Hopefully, it continues to improve.

I tried it a few months ago and was unable to import pyhash.

  !pip install pyhash
  import pyhash
... seems to work fine.

Thanks, maybe something has changed since I tried it. I'll have another look.

Seems like a clone of https://notebooks.azure.com, what does this do that that doesn’t? When to use one or the other?

They're both just Jupyter with cloud backends. They all do basically the same thing with the exception of potential integrations with Google/MSFT specific things.

E.g Google lets you load data from drive.

So when making the comparison I'd just examine my resource needs / how easy I find each to use and/or resource pricing, then pick one that works. If neither suit my needs I'd just use Jupyter on my own machine(s). If you're on Azure the msft one might have handy integrations that minimize how much you need to think about ops. Similarly with colab / GCP.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact