
Show HN: Shipyard – Serverless Workflow Platform to Simplify DataOps - blakeburch
https://www.shipyardapp.com
======
blakeburch
Hey Hacker News! I’m Blake, one of the co-founders at Shipyard.

In my past line of work, I was running the data teams for a digital
advertising agency, handling high-throughput information for Fortune 500
brands. From my time in the trenches, I noticed that:

\- Cloud services are needlessly complex for the average user, but low-code
platforms can’t handle complex business problems or scale

\- Data Teams are often being tasked with navigating and solving technical
problems outside their wheelhouse (DevOps, app hosting, database management,
queueing systems, etc.)

\- Organizations struggle to make use of their data because solutions take
weeks to build and are difficult to get into the hands of others (typically
residing on someone’s local machine)

Over the past year, I’ve tried to synthesize what I’ve learned to solve some
of these issues. My goal is to make it easier for people to quickly action on
their data in an automated fashion.

With Shipyard, our team has built a high-end workflow product that allows data
teams to quickly build, automate, and share pipelines without the hassle of
managing infrastructure. Rather than highlighting features, I thought the
community might enjoy understanding our core product tenets.

\- Your code should run the same locally as it does on our platform. Code
shouldn’t be written to accommodate the limitations and configuration of the
tools you’re using.

\- Templates should be a first-class citizen. When you build something to do a
specific task internally, you should be able to share it for use by anyone in
your org, regardless of their technical skill. You should also have the
ability to easily see template usage and make wide-scale updates.

\- Workflows should be simple, flexible, and modular. Any shape of pathing or
parallelism should be supported. Data generated upstream should be readily
available to all tasks downstream. Components of a workflow should small in
scope and reusable anywhere.

\- Tooling should serve data teams in a way that lets them spend more time
focusing on what they’re good at - solving problems with data.

We’re releasing the product to the public today. Out of the gate, we have over
50 open-source blueprints built out to help teams quickly integrate their
cloud services together, without needing to write any code. However, you can
run any custom code of your own (Python, Bash) using your favorite packages
(DBT, Great Expectations, Pandas, etc.) and link it all together to create any
powerful data solution. We’ve seen teams use the product to build data quality
assurance alerts, 3rd party vendor integrations, run live marketing updates
via API calls, self-service data refreshes, and build out your typical ETL
pipelines.

We’ll be around today to answer any questions the community may have. Looking
forward to the discussion!

~~~
sails
Hi Blake, nice product, and great to see your templates on github, it really
helped to understand what they do.

I've been using Prefect for a while as an ELT coordinator, and pretty happy
with it. It is in use as a "smart-cron" rather than sophisticated DAG creator,
but so far I've been pretty happy.

From what I've read on your website and watching the video, Shipyard (love the
theme, but I'm all over nautical themes) seems like a "Zapier for
data/analytics engineering teams". How close is that to your idea?

From the demo, I imagine using this in the following scenario: 1\. Don't want
to self-host any infrastructure 2\. Need to write a few scripts to glue some
things together (client wants dataset in email weekly / lead scoring hack /
data quality check) 3\. Would benefit from some pre-built source/sink/chat
connectors and recipes.

I'd guess you aren't positioning this as a bulk ELT tool, but perhaps more for
a scenario where execution order is priority.

My only thought for an addition would be to enable self-hosting code for my
python/bash vessels, to ensure I have better control over deployments.

~~~
blakeburch
Hey Matt - You hit the nail on the head with "Zapier for data/analytics
engineering teams" and your example scenario. Our goal is to make it as simple
as possible to get any quick data scripts up and running. However, we are
positioning this as a tool where you can accomplish bulk ETL. The
infrastructure is designed to scale for long-running jobs with heavy data
usage. It's easier to demo and understand the "quick solutions", but we have
teams running Fleets that download and process 100s of GBs of data every day.

Agreed on the need to allow self-hosting the code. It's currently on our
roadmap to build in native Github connections so that teams can host the code
and version manage it there, connecting Shipyard as part of their CI/CD flow.
Having to copy/paste the code, or upload it, is a means to an end right now,
but not much different than the flow for something like AWS Lambda.

As for Prefect, we've heard a lot of great things about their setup. However,
there's a few areas that we find it lacking: \- Python Only. We want to enable
workflows that can connect solutions built in multiple different languages
(which we support through Bash for now). \- Infrastructure management is your
responsibility, so you still might hit snags and resource constraints. \-
Their workflow as code setup requires you to change how you write your scripts
to make them work in Prefect... and it's my understanding that all the steps
of a workflow have to live in a single script. We're opting for a more modular
approach, where code could easily be run on your local machine, or your own
infrastructure, without needing to rewrite it.

I'd love to get more feedback about your current setup and experience. Feel
free to hit me up at blake[at]shipyardapp.com

