Hacker News new | past | comments | ask | show | jobs | submit login

This looks like an amalgamation of 8+ open source projects or industries with products put forth by companies that have dozens of employees and worked on their products for years.

It also doesn't even categorize the products they compete with correctly[0].

Why not contribute some of your resources to one of the many active open source libraries already trying to solve some of these problems, and focus your engineering efforts on your core product?

[0] Fivetran is only considered "Orchestrate" but is actually competes directly with Alooma in the Extract and Load. Also, there are DOZENS of company in that space. https://gitlab.com/meltano/meltano/blob/master/README.md#dat...




What we're doing different is making one product that does the whole lifecycle instead of having to string tools together. It took us many months to string our toolset together and we felt there had to be a better way. Just like GitLab we try to leverage existing open source projects wherever possible.

I agree Fivetran also belongs in extract and load and updated it https://gitlab.com/meltano/meltano/commit/1df9813f5ab42c4479... Do you think it should be removed from Orchestrate? Any other suggestions for proprietary products in that category?


As someone who works very, very closely in this industry, I would just be very careful how much of this you think you want to bite off.

Consider how you trust using dbt more than rolling your own transformation tool. Why wouldn't this apply to the rest of your stack? The 10+ companies that offer data extraction and loading are likely a better choice. Again with Analytics - the dozens of companies that offer BI tools are probably going to be the better choice.

Maybe you can build all these tools better than the hundreds of companies with thousands of employees and millions of dollars. It just seems like the odds that you build the best of each is so unlikely.

I would have been more impressed if your team had designed some API that other tools/platforms could plug in to coordinate a lot of the above jobs with your CI system. There is a SERIOUS need for that and I've had a lot of conversations with companies about what that would look like.

To answer your quest, no, Fivetran does not currently belong in the orchestration area, IMO. I've heard they are soon to release some sort of orchestration tooling to compete with dbt, but it isn't the type of orchestration you get with Airflow.


slap_shot one of our major goals is to provide a solution that startups and small companies can utilize to start putting their data to work.

It shouldn't take weeks of effort, a data engineer, multiple proprietary solutions, and tens of thousands of dollars to answer key questions like CAC or the efficiency of a given marketing campaign.

We're hoping to lower the barrier to entry in both cost and effort, by providing an open source pre-packaged solution.


[Obligatory: Someone is downvoting you, but it isn't me. I upvoted you]

Yeah, I get that. The analytics space is very complex and companies, even ones with good engineering teams, don't have the internal knowledge or resources to typically put all this together.

In addition to working in this space, my copmany helps companies set up their analytics stack.

We typically set them up with one cloud-based data integration tool (the one with the most # of integrations they need at the best price), dbt, and one BI tool (usually Looker or Periscope, in that order). All in, that takes us a few weeks to get them set up and going.

I applaud your effort. I just struggle to understand why you accept punting on transformations (and using dbt (amazing library, by the way - great choice)), but then try to tackle something like integrations or BI tools. The complexity of both of those is massive and there are great open source efforts already out there.

I'm eager to see where this goes.


"but then try to tackle something like integrations or BI tools. The complexity of both of those is massive and there are great open source efforts already out there."

I would love to hear your suggestion for a great open source BI tool. We tried Superset and Metabase but both didn't came close to what we could do with Looker. That is why we're giving Meltano Analyze a shot.

BTW Do you want to do a livestreamed video call to discuss further in the 30 next minutes? You have a lot of knowledge. If so please email me and comment here.

Update: He did email and livestream will happen on https://www.youtube.com/watch?v=F8tEDq3K_pE


Sure - just shot you an email at website@yourhandle.com


What a great interview. @slap_shot, you had great questions and you are so well spoken. Really appreciate the feedback. We're all taking notes here. Hope you will keep an eye on our issue tracker for Meltano and give us your feedback as things come up.


I have no horse in this race, but this is so cool that one minute you're exchanging comments on HN and the next you're livestreaming a conversation on the topic! What a world :)


Glad you like it!


My pleasure! Will watch the repo closely and contribute if I can. Very curious to see where this goes.

Also, I had a coffee at 5PM with someone, which is way too late to be drinking coffee, and it is evident in how quickly I'm talking >.<


I liked the fast pace! But normally I do watch YouTube on 150% :) thanks for the chat


One of the maintainers of dbt here -- this was an amazing conversation and I'm so happy I caught it. Thanks both for sharing :)


The Meltano team loves dbt. Thanks for making and maintaining it. Maybe the best open source project in the data space.


Thanks to both of you for your time doing that discussion!

@slap_shot and anyone else — I'm curious if you have thoughts on, or even have heard of the Ballerina language? It's a programming language for doing data integration work, built by the ESB/integration consultancy WSO2. It seems to have a lot of eng resources sunk into it but surprisingly little fanfare.

The CEO's interview with the Software Engineering Daily podcast was great: https://softwareengineeringdaily.com/2018/07/12/ballerina-la...

The language site tends toward buzzword-salad, but clearly has had a lot of love and thought put into it: https://ballerina.io/philosophy/


This look very interesting! I haven't heard of it before, but you've definitely piqued my interest. Thank you for the links.


Non english speaker here you mentioned an OSS solution called "Inbulk" or something like that during the conversation. Could you spell it I'm pretty interested in finding out more about that project but google return a lot of unrelated result because of the name I guess...



For batch-type of workloads embulk has been really excellent tool for my company (for all extract and load steps. We do most of the transformations in db/warehouse)


> designed some API that other tools/platforms could plug in to coordinate a lot of the above jobs with your CI system

That's GitHub's strategy. Don't choose solutions for their customers. Be a platform other tools can plug into.

Gitlab's strategy is to cobble together a bunch of open source software (including their own) to provide a solution out of the box. It's not necessarily the best one for you, but it's certainly less effort for you.


On the analytics side, we're using GitLab CI as our orchestration tool. We're pushing it to its limits and trying to find ways to make it better for us (i.e. data teams) and for GitLab more generally.

I'd love to learn more about what you'd like to see CI be able to do from a dataops perspective.


Yo. Just keep doing what you're doing. I dig it.

I'm not 100% with all the tools you are using, but stringing together random SaaS tools and having to survey a random number of open source tools in order to assemble a sensible platform makes way less sense.

At the very least, what we end up with is a group of folks working together in the open to surface some of the limitations and challenges and attempt to work out some of the alternative solutions to the problems that arise in this space.

So, I applaud your effort. Ignore the salesmen and the haters.


Thanks for the positive comment! We're generally taking the same approach that was taken with GitLab the product: do it out in the open, iterate constantly, and work with the community. Especially doing it out in the open enables these sorts of _awesome_ conversations! And we definitely want feedback - this needs to work for more than just us!


Our goal is to meet our data team's need by answering our company's data questions.

A lot of the solutions out there are fantastic but aren't up to the tasks we are looking for. Why shouldn't the whole life cycle be in one tool, be open source, and be version controllable? That's what we are looking for in a tool.


There's no inherent reason that the whole life cycle can't be handled in a single tool. However, there have been tens of thousands of person-years spent on these tools, so people here are pointing out that it is a tall ask for any company to create one tool that integrates everything. This goes doubly so if it is only going to be a side project to GitLab itself.


Thanks. It's currently 4 devs full time jobs. I agree, a tall order. It's a lot of work. We won't reproduce every feature of every product tomorrow.


We'll only get there if Meltano get a lot community contributions. We think there is space for an open source end-to-end product that works out of the box. The contributions will tell if we're right.


You could say the same about Github/Gitlab themselves... that they mash together git and JIRA and .plan etc.


_Especially_ GitLab. Basically their entire product seems to be about building a whole bunch of separate tools and integrating them seamlessly into each other. GitLab has a built-in CI system, a deployment pipeline with Kubernetes integration, a built-in Docker container registry, performance monitoring tools for deployed applications, automated static analysis tools, etc. Describing it as "an amalgamation of 8+ open source projects or industries" seems pretty accurate.

That's by no means a bad thing though. While yes, there are downsides to tightly coupled tools, there are also advantages. If GitLab is trying to do the same thing for data analytics that they've already done for source control, they may very well succeed.


Thanks. That is exactly the plan.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: