
Ask HN: Data scientists on HN, would love your feedback on our startup idea - tixocloud
Hi,<p>We’re building a tool to help automate and optimise data science work to help data science teams become more productive. Given your expertise, I would love to get your thoughts so we build something useful.<p>Please let me know if you can spare 10 minutes to help a fellow HN-we out.<p>Thanks in advance
======
mswen
Too vague of description. I have no idea what you are trying to build.

What are you trying to automate? Model building? If so are you trying to make
it easy for a business operations person to build their own models without
help from a data scientist? Or are you trying to automate the conversion of
models into production workflows that integrate with other enterprise
applications? That is data engineering.

What are you trying to optimize? My time in doing analysis? Trying to guide
analysis down useful paths and eliminate dead ends with features that are not
actually helpful?

What do you include in your domain of data science? Sometimes people reduce it
to neural networks deep learning architectures. Others include all types of
statistical learning. Others include NLP. While others might focus on IoT of
time series analysis. At least in part tools are tied to the specific subset
of the data science domain that a person is working within.

~~~
tixocloud
For automation, I am automating the data engineering portion so yeah,
productionalizing models, documenting, version control, etc.

There will be some pre-built models for a business operations person to
explore but a data scientist can also build their own model in R and/or
Python.

For optimisation, exactly. Looking to guide your dataset exploration to help
you find features faster. Summarise the data automatically and in the future,
suggest which features may be useful or not while still leaving you as the
expert to decide.

For my domain of data science, it will be in the realm of model building.
Think of it as more of a workflow tool to sync the data science team together.
So the intention is to be tool agnostic.

I’m also looking at reusable datasets that are certified and cleaned by data
engineers or other team members as well as pre-cleaned data available for
purchase to add to the model (looking to tap into free public open datasets as
well)

Hope that helps? Thanks for your comment. I’ll amend my post to make it
clearer.

