
Show HN: dstack – an open-source tool to build data applications easily - kaudinya
Dear HN,<p>I am Riwaj, the cofounder of dstack.ai (https:&#x2F;&#x2F;github.com&#x2F;dstackai).<p>A few months ago, we built an online service that allows users to publish data visualizations from Python or R.  The idea was to build a tool that did not require additional programming or front-end development for publishing data visualizations. 
Such a code can be invoked from either Jupyter notebook, RMarkdown, Python, or R scripts. Once the data is pushed, it can be accessed via a browser.<p>Open-sourcing dstack:
During our customer discovery phase, we realized that dstack.ai should integrate a lot more open source data science frameworks than we integrated ourselves. For example, as a user, I want to push a matplotlib plot, a Tensorflow model, a plotly chart, a pandas dataframe, and I expect the presentation layer to fully-support it. Supporting all types of artifacts and providing all the tools to work with them solely seems to be a very challenging task.
With this, we open-sourced the framework. Now you can build dstack locally, and run it on your servers, or in a cloud of your choice if that’s needed.
More details on the project, how to use it, and the source code of the server can be found at the https:&#x2F;&#x2F;github.com&#x2F;dstackai&#x2F;dstack repo. The client packages for Python and R are available at the https:&#x2F;&#x2F;github.com&#x2F;dstackai&#x2F;dstack-py and https:&#x2F;&#x2F;github.com&#x2F;dstackai&#x2F;dstack-r correspondingly.<p>What’s next:
User callbacks-  so that application shows not just pre-calculated visualizations but also can fetch data from a store and process it in real-time. 
ML models- so that data scientists can publish a stack which binds together a pre-calculated ML model and user parameters
Use cases- Support specific use cases that help data scientists to build data science models into data applications as fast as possible.<p>We would be happy to get your feedback on the open-source framework and also get your opinion on what kind of use cases can be built on top of the framework? 
Thank you.
======
bicepjai
I am trying to use dstack on my device and it still asked for a login
information, which prompted me to read the terms and under "User Content" I
notice this

``` You hereby grant to Company an irreversible, nonexclusive, royalty-free
and fully paid, worldwide license to reproduce, distribute, publicly display
and perform, prepare derivative works of, incorporate into other works, and
otherwise use and exploit your User Content, and to grant sublicenses of the
foregoing rights, solely for the purposes of including your User Content in
the Site. You hereby irreversibly waive any claims and assertions of moral
rights or attribution with respect to your User Content. ```

Are these texts common ?

~~~
snowwrestler
Yes, pretty common. They are intended to avoid a situation where you post your
own copyrighted [1] content to their site, then sue them for displaying it
without a license.

The long list makes it seem very broad, but this phrase constrains it quite a
bit: "solely for the purposes of including your User Content in the Site."
This would prevent them from using your content in an ad, or selling your
content to some other company, for instance.

[1] Under U.S. federal law, all content is copyrighted upon creation. I hold
the copyright on this comment, and I have granted Ycombinator a license to
display it on the HN site.

EDIT - here is the relevant sentence from the HN terms of use agreement. It's
actually broader than the language you quoted.

> By uploading any User Content you hereby grant and will grant Y Combinator
> and its affiliated companies a nonexclusive, worldwide, royalty free, fully
> paid up, transferable, sublicensable, perpetual, irrevocable license to
> copy, display, upload, perform, distribute, store, modify and otherwise use
> your User Content for any Y Combinator-related purpose in any form, medium
> or technology now known or later developed.

~~~
bicepjai
does this apply only for contents posted on dstack data servers ?

~~~
peterschmidt
Yes, this Terms of Use covers only the use of dstack.ai (hosted version) and
doesn’t cover the open-source tool.

The open-source tool is fully covered by Apache 2.0.

------
gitgud
The github README is a bit confusing to understand (for someone with no
experience). The [1] landing page shows a much clearer outline on what this
actually is.

Seems like it could be good for making data-driven dashboard graphs. Although
[2] the react library looks like it needs a bit more work.

Congrats on shipping something though!

[1] [https://dstack.ai/](https://dstack.ai/)

[1] [https://github.com/dstackai/dstack-
react](https://github.com/dstackai/dstack-react)

~~~
peterschmidt
Thank you very much. You're right, the README file needs improvements. We also
don't have much tutorials yet that would show the tool from the practical
point of view. Our short-term plans include: 1) Improving the documentation
and writing more use-case specific tutorials; 2) Add more functionality for
more interactive applications, including Machine Learning applications.

EDIT: Speaking of the react library, we've just finished a refactoring and
plan to improve it too. Please don't hesitate to share your feedback, over
email or via GitHub issues. And thank you!

------
Cypher
I looked through github and see you have coded examples, can you also include
the visual output so I can get a sense of what it'll achieve without having to
set it up to see?

~~~
peterschmidt
Sure,

1\. Here's the most simple tutorial how to make an interactive dashboard and
share it: [https://docs.dstack.ai/tutorials/dashboards-
tutorial](https://docs.dstack.ai/tutorials/dashboards-tutorial) It includes
screenshots.

2\. Here's another tutorial with more realistic data: Output:
[https://dstack.ai/gallery/d/b56128a3-522e-42d7-8662-9b1a768d...](https://dstack.ai/gallery/d/b56128a3-522e-42d7-8662-9b1a768dbc8c)
The code for it is available at [https://github.com/dstackai/dstack-tutorials-
py/blob/master/...](https://github.com/dstackai/dstack-tutorials-
py/blob/master/covid-19-speed.ipynb)

Actually we have very few examples. We gonna make more of them within this
week.

------
peterthehacker
Thanks for sharing!

Can you elaborate on “What’s next”?

> User callbacks- so that application shows not just pre-calculated
> visualizations but also can fetch data from a store and process it in real-
> time.

How are you envisioning this working? Will dstack be like a database? How will
“user callbacks” be triggered?

~~~
peterschmidt
Hi, thanks for the question. This feature is still in the design stage. The
idea is pretty simple. Currently, you can push a pre-calculated visualization
and associate it with particular user input. However, in many cases it's not
possible to recalculate all possible combinations of user input in advance.
That's why we'd like to let user push not a visualization but a function that
produces a visualization. This function will be triggered when the user
changes input. Such a function can do a visualization on the fly and if needed
take the data from an external source.

------
helltone
Looks great, how do you compare to other alternatives in the same space?

~~~
lullibrulli2
What alternatives come to your mind? I'm looking for a solution like that and
would love to get some insights.

~~~
shakedown1
streamlit.io is one that i've just finished trying, and find it easy to use.

------
jordz
Does this need some License information adding to your repo? Something that
protects you from someone taking this and running their own paid hosted
version :) I take it it isn't MIT!

~~~
peterschmidt
We open-sourced it under Apache 2.0 which we find quite permissive and OS-
community friendly. You're welcome to run a hosted version. We actually have
it running on dstack.ai (it's free currently but we of course plan to have
paid features if there is such a need).

------
phtevus
Your website theme seems to be similar to many others lately re: the cartoon
figures etc. Is this a free/paid/opensource template?

~~~
peterschmidt
The one on the website if I recall correctly was mostly designed in house.
Gonna double check that with the designer. The blog post image was taken from
[https://icons8.com/illustrations](https://icons8.com/illustrations) (free).
There are actually a lot of free illustration libraries nowadays. One of my
favorite is [https://undraw.co/](https://undraw.co/)

~~~
phtevus
Thanks!

------
mushufasa
how does this compare to dash by plotly, or r shiny, in terms of the intended
use case?

It looks like you are more of a wholistic platform, including a workflow
scheduler etc.

~~~
peterschmidt
One things is certainly that we would like our tool to be agnostic to data
science tools and work with all of them. So you can use pretty much any
visualization or ML library.

Another thing is that we’d like to eliminate the need to do any programming or
HTML/CSS as much as possible.

Th jobs that are available as a part of the hosted solution is not yet part of
the open-source library but this is certainly something for us to consider
moving under open-source too.

We are currently at quite an early stage and a lot of work is still ahead.
We’ll appreciate any feedback and suggestions on where to steer the roadmap.

Gonna work on preparing more use-case specific tutorials within coming weeks.

------
PaulHoule
How do you deal with differential versioning of code and data, and the fact
that people don't always execute notebooks from top to bottom?

For instance, suppose I have a notebook that takes 2 hours to generate a
model. From the viewpoint of explaining it I'd like to make a notebook where I
start from the beginning, train the model, then use it.

If I want to show it to people I want to save all the results and re-render
them, not rerun the calculation, certainly if I want to show off the results
in a 1 hour talk!

From the viewpoint of reproducibility, however, you have to be able to run the
notebook from top to bottom and get a 'correct' result. I'm not going to say
the 'same' result because many calculations are stochastic in nature (e.g.
random numbers) or because often the data changes. (Let's say I have somebody
make a notebook that does April's sales reports -- shouldn't I just be able to
point it to the may data to make May's sales reports?)

Between the long time delays (longer than people can hold a context in their
mind, longer than they want to wait) for the system to settle down and the
total complexity I find that many people involved with data science violently
resist confronting the above issues. The effects are much like the visual
"blind spot" \-- you might get a series of projects that were 98% completed
but didn't quite deliver business value although everybody feels like they did
their part.

Like other vendors in this crowded space, dstack leads with technology as the
key problematic "e.g. supports Python and R", "matlib, Tensfolow, plotly, ..."

It's certainly true that people don't want to face up to reality in that area.
Maybe 50% or 90% of the "waste" in the area involves setting your dependencies
up, begging your boss to get you access to "the cloud of your choice if that's
what's needed". The trouble with is that investment in particular technologies
are of temporary value (maybe people will still be using R in 2030, maybe they
won't be using Tensorflow, almost certainly plotly gets bought by Google and
shut down by then)

Years back I researched the problem of running Tensorflow models that we got
off the pavement, building a database that says TF version X depends on CUDA
version Y, CNN version Z, and being able to have multiple copies of the
userspace GPU drivers installed simultaneously (e.g. just put 'em in a
directory and set the library path to point at 'em -- don't even need
containers!)

I could have sworn Google looked at my source because they did the one thing
that could have broke that strategy. Also the company I was working for lost
interest in that particular shiny thing. That's a basic problem with
maintaining a distribution of other people's software -- like treading water
it takes effort just to stay in one place.

The more fundamental problems that turn up in going from data to decision and
products are eternal and not tied to a particular technology. If you solve
those problems rather than chase the shiny you might break out of the pack.

~~~
peterschmidt
I agree with your point. Reproducibility and versioning is an important yet
ver challenging topic right now and not many seem to help with it. And it
might be that the problem is not specifically about tools but rather the
mindsets and workflows.

IMO dstack is a lot about process. Technologies can change. The process often
stays. We’d like to find the best way to solve problems people face every day
regardless a particular technology.

One more little thing which might be relevant is that dstack actually tracks
revisions. What we haven't figured yet out is how to link the particular
revision of the applications with the particular revision of the code /
notebook.

------
bmarotta
Nice initiative

