Hello HN,
I'm Jakub and I'm the founder of Deepnote (https://deepnote.com/). We're building a better data science notebook.
As an engineer, I spent most of my time working on developer tools, building IDEs, and studying human-computer interaction. I helped build a couple of startups, I built tools for JavaScript development, and worked on Firefox DevTools. But once I started to work with data scientists, all those code editors and IDEs that I knew as a software engineer suddenly stopped being the right tool for the job. Notebooks were.
Notebooks as we know them today have many pain points (versioning, reproducibility, collaboration). They don't work well with other tools. They don't exactly encourage best practices. But none of these are fundamental flaws of the notebook paradigm. They are signs of a new computational medium. Much like spreadsheets in the 1980s.
Two years ago, my co-founders and I started to think about a better data science notebook. Deepnote is built on top of the Jupyter ecosystem. We are using the same format, and we intend to remain fully compatible in both directions. But to solve the above problems, we've introduced significant changes.
First, we made collaboration a first-class citizen. To allow for this, Deepnote runs in the cloud by default. Every Deepnote notebook is easily shareable (like Google Docs) and easy to understand even by non-technical users.
Second, we completely redesigned the interface to encourage best practices, write clean code, define dependencies, and create reproducible notebooks. We also built a really good autocomplete system, and added a variable explorer.
Third, we made Deepnote easy to integrate with other services. We didn't want to build another data science platform where people work with an iframed notebook. We want to build an amazing notebook that plays well with other services, databases, ML platforms, and the Jupyter ecosystem.
Check out a 2-min demo here: https://www.loom.com/share/b7e05ecca78047c2a2f687d77be8ecea
Building a new computational medium is hard. It takes time. Today, we're launching a public beta of Deepnote. Not everything works yet. Some pieces are missing. But we also have a lot in store, including versioning, code reviews, visualizations. We still have a lot to learn too, so I'd love to hear your thoughts and feedback.
My main question is how/if DeepNote addresses issues of reproducibility. Is this a priority for your team? You mention it a few times in your post here, but there is not so much in the docs -- I looked it up in and got just to this:
> Even though the Custom environment cache is implemented using Docker images, it doesn't primarily serve the reproducibility problem. The aim of the feature is to significantly speed up the start time of your projects. In other words, you should consider it to be only a cache at this point.
My experience with Notebooks suggests that the main (computational) reproducibility challenges were
A) 'hidden state' information (e.g. cells executed out of order, variables changed and then reverted but not re-run); and
B) no clear infrastructure for documenting/caching dependencies (I see you have a terminal option, and the web-based access should address some of this, but something like 'conda install environment.yml` doesn't seem possible out of the box.)
I would understand if these issues are not priorities for you, I don't think most data science projects need to be run in the far future and most teams can informally sync their dependencies.
If reproducibility is a core priority, do you plan to write something about DN serves that purpose? I'd be glad to take a close look if you do (I have written/worked a fair bit on this in the past).