Hacker News new | past | comments | ask | show | jobs | submit login
Launch HN: Mozart Data (YC S20) – One-stop shop for a modern data pipeline
106 points by pfduke02 32 days ago | hide | past | favorite | 37 comments
Hi HN, we're Pete and Dan, and together with our team we’ve built Mozart Data (https://www.mozartdata.com/), a tool to get companies started on collecting and organizing data to help drive better decisions. Mozart is a “modern data stack” -- we set up and manage a (Snowflake) data warehouse, automate ETL pipelines, provide an interface to schedule and visualize data transformations, and connect whatever data-visualization tool you want to use. For most teams, in under an hour, you can be querying data from your SaaS tools and databases.

Ten years ago, we started a hot sauce company, Bacon Hot Sauce, together. But more relevantly, we have spent the last two decades building data pipelines at startups like Clover Health, Eaze, Opendoor, Playdom, and Zenefits. For example, at Yammer, we built a tool called “Avocado,” which was our end to end analysis tool chain -- we loaded data from our production database and relevant SaaS tools like Salesforce, we scheduled data transformations (similar to Airflow), and we had a front-end BI tool where we wrote and shared queries and dashboards. Today Avocado is two tools, Mozart Data and Mode Analytics (a collaborative analytics tool). We basically have been building similar data tools for years (though the names and underlying technologies have changed).

Dan & I decided to build a product to bring the same tools and technology to earlier stage companies (so that you don’t need to make an early hire in data engineering). We’ve built a platform where business users can load data and create & schedule transformations with just SQL, wrapped in an interface anyone can use -- no Python, no Jinja, no custom language. We connect to over 150 SaaS tools and databases, most just need credentials to send data to Mozart. There is no need to define DAGs (we parse your SQL transforms to automatically infer the way data flows through the pipeline). Mozart does the rote and cumbersome data engineering that typically takes a while to set up and maintain, so that you can tackle the problems your company is uniquely suited to do.

Most data companies have focused on a single slice of the data pipeline (ETL, warehousing, BI). The maturation of data tools over the last decade has made now the time to combine them into an easy solution accessible to data scientists and business operations alike. We believe that there is immense value in centralizing and cleaning your data, as well as setting up the core tables for downstream analysis in your BI tool. Customers like Rippling, Tempo, & Zeplin use Mozart to automate key metrics dashboards, calculate CAC and LTV, or identify customers at risk of churn. We want to empower the teams -- like revenue and sales ops -- that have a lot of data, know what they want to do with it, but don’t have the engineering bandwidth to execute it.

Try us out and see for yourself - you can sign up (https://app.mozartdata.com/signup) and immediately start loading, querying, cleaning, and analyzing your data in Mozart. We offer a free 14-day trial (no credit card required). After the free trial, we charge metered pricing based on compute time used and data ingested. We’d love to hear about your experiences with data pipelines and any ideas/feedback/questions you might have about what we’re building.

As a data professional, I've got to admit I'm having difficulty differentiating your service from any of a number of other similar offerings like stitch or supermetrics.

You say you charge metered pricing but this information seems to be missing from your site, I understand it's hard to price a new product but I personally need to know pricing before I am able to recommend a product to a client – so the more available this information is the easier I can compare you to others.

I do like the SQL transforms, they don't replace DAG orchestration tools like Airflow but it's a very nice feature that covers a lot of what companies with basic data needs will want.

ETL tools like Stitch provide similar & critical functionality. We do this and host/store the data, as well as offer SQL transforms. This enables teams to put together a data pipeline with just one tool.

In terms of pricing we charge by monthly active rows (MAR) and compute time. An introductory package with 500k MAR and 500k seconds costs $1000/month; but we try to tailor to individual company needs.

We just started using Mozart at Modern Treasury (S18), and have been really happy so far. We didn't want to spend a ton of time setting up data tooling, so we liked that we could use Mozart to get up and running really fast. All we've had to do is write our transforms, and things like snapshotting, scheduling, etc are taken care of for us. Pete, Dan, and team have been really responsive to our questions and good partners. At first I was a bit skeptical and we were just going to do Snowflake+Fivetran ourselves. But after talking to some of their larger customers, I was convinced that (a) it would save us time and (b) it could scale with us.

Thanks Matt, Modern Treasury is an ideal customer. They have all the chops to build the right data stack, but they’re laser focused on their core business. Great to be working together.

Reminds me of Panoply, which had native SaaS integrations, a managed Redshift instance on the back-end, and a BI layer on top. Basically a fully turn-key "modern data stack" [1]. The stack is way easier to operate than it has ever been before, but still requires folks with expertise to manage each of the components.

[1] https://blog.getdbt.com/future-of-the-modern-data-stack/

It’s an apt comparison. The “fully turn-key ‘modern data stack’” is exactly what we’re going for. A key technology difference -- managed Redshift vs. managed Snowflake. Because Snowflake separates compute and storage, the pricing and scalability as data volumes grow become meaningful.

Here’s a more thorough writeup from our CTO, Dan… https://www.mozartdata.com/post/mozart-data-cto-and-co-found...

Riffing off of BugsJustFindMe's comment, what kind of privacy/security can you offer at this time? I would love to use a tool like Mozart, but I work with data that contains protected health information (PHI). PHI requires a greater degree of privacy. People working with proprietary financial information have similar concerns.

I have a lot of experience with this coming from a background in healthcare. We are not HIPAA compliant yet, so that might be a dealbreaker for some.

There are workarounds, eg for database connectors, and some other connectors, we let you specify which schemas/tables/columns to sync, so you can choose to not sync PII columns (or hash them), and still get a ton of value from the other data and/or aggregates.

And not for PHI, but some of our customers pull all their data into Mozart, write some data transformations within Mozart to redact sensitive data, then use role-based-access-control to give the rest of the company full access to redacted tables, and only certain people have access to the full data.

That said, the security of our customers' data is our top priority regardless of what type of data it is. We're currently in the process of being audited for SOC2 type2.

Pete and Dan, thanks for the overview. Very Interesting. A few questions I'm hoping you might be able to clarify:

- do you have a wrapper around Snowflake? - do you support data streaming? - who are your target customers (size, domain, etc.)? - have customers identified gaps in their own data coverage/needs to use this pipeline (i.e. 1st party data is limited).. and if so, where do you point them to cover any gaps (e.g. external sources or partners)? - have you received any feedback that says whether customers are not able to make progress with their BI, not due to ETL, but as a result of poor/unmaintainable data modeling? - how do you handle scenarios where customers prefer to host their own data? Is that common? - is it possible for customers to run certain components of the ETL process/pipeline on their own systems? Have you found that to be a frequent request so far?

I'm really impressed with the list of data sources (120+) you have at launch. How long did you spend integrating each of these tools?

Having some experience here, I can say that this is typically not a quick process since it depends so much on third-parties, so it's really cool you have such a large library of connectors.

Thanks! As mentioned in other comments, we partner with and use PBF (Powered by Fivetran) for connectors we believe are best in class. We are committed to ETL reliability, and that ease of use/setup and automatically managing changes is critical for success. In addition to PBF, we leverage Singer Taps, and our team is adding to the long-tail of connectors.

Congrats on the launch! I don't mean to hijack this thread, but as a day-to-day data engineer, I can't help but think that even though this explosion of ETL solutions are undeniably helpful, they don't really get to the real root of the problem. These days, you've got every company -- from small startups to large corps -- warehousing data. But the real value proposition isn't just having access to that raw data, but rather drawing insights out of it.

I'm not sure this is even doable without a dedicated data scientist, but a potential solution is a two-way marketplace that connects companies with data scientists to help make heads or tails of the data they're storing. Otherwise, it's just sitting in a data lake somewhere. (Not sure if something like this exists already, I'm just thinking out loud.)

I’m in extreme agreement for part -- for a company to get value out of their data, you want someone skilled at data cleaning, cutting it properly, and teasing out the insights. Where I disagree is that the person can be a data scientist, but doesn’t need to be. I believe that there is a growing population of data savvy employees without that title, many of them might not even have data at all in their title (they are in business operations, marketing, finance, and sales) -- many of them write SQL and are very comfortable manipulating data in BI tools, R, Python, Excel, or GSheets.

I also believe that company context matters a lot. I think so much of getting started with extracting value from data is getting up the learning curve of understanding what it means (which columns have the truth). One of the reasons that we don’t have a lot of canned reports is that understanding these edge cases within a company often matters a lot (and that not accounting for the nuance can often lead to a misinference). With this in mind, the explosion of ETL solutions and products like Mozart Data means that others at the company can specialize in their business context, as opposed to needing someone who can do all aspects of data including engineering, data science, analysis, and communicating/presenting it.

> connects companies with data scientists to help make heads or tails of the data they're storing

the consulting "data scientist" is likely able to do better job if they have experience with the idiosyncracies of the individual company's operations. If you get a fresh data scientist every time they need to repeat the ramp-up period before they are in a position to maybe add value.

Suggests a model where company keeps the same consultant on retainer and brings them on board each time a situation pops up where the consultant may be able to assist

(this isn't a particularly novel suggestion, the same suggestion is made in a 60s/70s era thesis investigating how applicable operations research is to small businesses)

I'm curious what you've found works the best for finding people (employees, contractors, other resources) capable of drawing insights (or making heads/tails) out of the data? We're always trying to have a helpful perspective for customers - as well as wanting to give a great "push in the back" to get them going on that dimension as well.

I fully agree with your previous statements. I worked as a BI consultant for 5 years and 4 years inhouse. Consultant can be misleading, we really created stuff (not only hot air) ;-) ... We created visualizations (dashboards) and data models mainly with Qlik.(complete road from data extract to visualization/analytics)

I think the most efficient way is to have inhouse staff for visualizing data and extracting insights. They should be able to cover 70-90% of the demand. The remaining part and possible peaks in demand should be covered by a contractor(s).

In the long run, this makes sure you have a reliable contractor, who already knows you (the company) and also the infrastructure and meaning of your data. It helps a lot for example when an employee is sick or has left the company. So you can bridge the gap with almost no delay.

Most employees don't want (and often don't have time) to learn additional tools for anayltics & data visualization next to their daily business. And the data models quickly become too complex for "casual users". To make "self-service BI" really possible it needs a lot of work upfront (to prepare data etc.). I think I never saw "self-service BI" working in the real world. (maybe some "poweruser" in finance & controlling)

Imho the best case would be to have a specialized BI team which works together with the domain experts to create insights. Normally the persons from specific departments know their data very well and they are very helpful in the process of finding insights. They often have already done reports/calculations etc. before. But the manual process is just too complicated, too slow or whatever.

Shri here - former YC founder and previously at Eaze. Peter Fishman (founder of Mozart) is the most intelligent and pragmatic data leader I've met and was a joy to work with. If you're looking to setup your data stack, I can say wholeheartedly that you're in good hands with Pete. :)

Thanks!! More accurately put, you’re in good hands with Mozart Data. We have a product aimed to enable anyone to set up the data stack, without any data engineering needed. But ultimately, we want to help you be successful on your data journey, which is more than just a data stack - it is defining core tables, common analysis templates, and creating a great data culture.

Could you clarify how you are working with Fivetran for (some?) of the integrations?

Are you partnered with them or would there be additional Fivetran fees if an integration went through them? I noticed when clicking on the Xero integration.

We partner with and use PBF (Powered by Fivetran) for some connectors, which we believe are best in class. In addition, we are using Singer Taps and have also custom built some connectors. There are no additional fees for extract-transform-load, whether Fivetran or any other ETL service (we cover those). The primary additional data costs are for a BI tool, though there are a number of free options to connect to.

Thank you.

Could you contrast your offering with a Stitch/Snowflake/dbt setup?

Functionality-wise that stack would be very similar! A core design principle of ours is that you should be able to have the power of a modern data platform even if all you know is a bit of SQL. So our product is functionally similar to a stack like Stitch+Snowflake+dbt (and we use some of those under-the-hood), but we try to wrap it all in an easier-to-use interface (e.g. typically to snapshot a table you write a few lines of config code, whereas in Mozart you just flip a toggle), and be more cost-competitive for smaller orgs.

From the description, I thought you host/store the data, and provide analytics visualization too. But looks like you Move, Transform, Sync data. Doesn't this exist already?

Thanks for the feedback, maybe we need to make that clearer. We do host/store the data - under the hood we're using Snowflake for warehousing, but we don't currently provide visualizations too. Once your data is organized most people hook up a BI tool, and/or export to Excel/Gsheets.

Components of this certainly already exist, we're trying to put it all together in a single platform and make this functionality easier to use.

Got it. Love the name btw. Good luck!

Thanks! We couldn't resist a good pun on "data orchestration."


These are not idle questions:

1. What do multi-source joins look like?

2. How expensive are they as a function of the sizes of the "tables" being joined?

I should clarify, step 1 in most pipelines is pulling data out of the sources and replicating it in Snowflake. Then a multi-source join is a normal ANSI SQL join on literal tables in different schemas of the same database, not "tables".

(Some call this model "ETLT", where the first ETL part is just moving data from APIs or other databases into a shared db, and the extra "T" joining that data across sources or otherwise organizing it in useful ways.)

Thank you for your clarification.

do you run dbt under the hood, or did you create your own transformation layer solution?

We have created our own transformation layer solution, which includes scheduling, run & version history, and lineage; we do not use dbt under the hood. We share a philosophy of being able to write transforms in SQL one layer above the BI tool -- this leads to greater consistency of downstream answers and allows for business users and analysts to write the business logic into the core tables.

I might be interested, I have some questions thought - is it best to contact you via the form on the site? I don't see an email there or on your HN profile.

Best contact would be -- amadeus [at] mozartdata.com

Can I self host? So many of these services assume that I'm interested in sending my data to a third party, but I'm not. I want a tool, not a service.

Sorry, we don't support self-hosting yet.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact