Hacker News new | past | comments | ask | show | jobs | submit login
Launch HN: Patterns (YC S21) – A much faster way to build and deploy data apps
149 points by kvh on Nov 30, 2022 | hide | past | favorite | 36 comments
Hey HN, I’m Ken, co-founder of Patterns (https://www.patterns.app/) with with my friend Chris. Patterns gets rid of repetitive gruntwork when building and deploying data applications. We abstract away the micro-management of compute, storage, orchestration, and visualization, letting you focus on your specific app’s logic. Our goal is to give you a 10x productivity boost when building these things. Basically, we’re Heroku for AI apps. There’s a demo video here: https://www.patterns.app/videos/homepage/demo4k.mp4.

Our target audience are data engineers and scientists who feel limited by Jupyter notebooks and frustrated with Airflow. They're stitching together business apps (e.g. CRMs or email marketing tools), AI models, and building proprietary automations and analytics in between them (e.g. generating a customer health score and acting on it). We want to solve the impedance mismatch between analytical systems (like your pipelines for counting customers and revenue) and automations (like a "do something on customer signup" event).

We built Patterns because of our frustration trying to ship data and AI projects. We are data scientists and engineers and have built data stacks over the past 10 years for a wide variety of companies—from small startups to large enterprises across FinTech, Ecommerce, and SaaS. In every situation, we’ve been let down by the tools available in the market.

Every data team spends immense time and resources reinventing the wheel because none of the existing tools work end-to-end (and getting 5 different tools to work together properly is almost as much work as writing them all yourself). ML tools focus on just modeling; notebook tools are brittle, hard to maintain, and don’t help with ETL or operationalization; and orchestration tools don’t integrate well with the development process.

As a result, when we worked on data applications—things like a trading bot side-project, a risk scoring model at a startup, and a PLG (product-led growth) automation at a big company—we spent 90% of our time doing things that weren’t specific to the app itself: getting and cleaning data, building connections to external systems and software, and orchestrating and productionizing. We built Patterns to address these issues and make developing data and AI apps a much better experience.

At its core, Patterns is a reactive (i.e. automatically updating) graph architecture with powerful node abstractions: Python, SQL, Table, Chart, Webhook, etc. You build your app as a graph using the node types that make sense, and write whatever custom code you need to implement your specific app.

We built this architecture for modularity, composability, and testability, with structurally-typed data interfaces. This lets you build and deploy data automations and pipelines quickly and safely. You write and add your own code as you need it, taking advantage of a library of forkable open-source components—see https://www.patterns.app/marketplace/components and https://github.com/patterns-app/patterns-components.git .

Patterns apps are fully defined by files and code, so you can check them into Git the same way you would anything else—but we also provide an editable UI representation for each app. You work at either level, depending on what’s convenient, and your changes propagate automatically to the other level with two-way consistency.

One surprising thing we’ve learned while building this is that the problem actually gets simpler when you broaden the scope. Individual parts of the data stack that are huge challenges in isolation—data observability, lineage, versioning, error handling, productionizing—become much easier when you have a unified “operating system”.

Our customers include SaaS and ecommerce co’s building customer data platforms, fintech companies building lending and risk engines, and AI companies building prompt engineering pipelines.

Here are some apps we think you might like and can clone: 1. Free Eng Advice - a GPT-3 slack bot: (https://studio.patterns.app/graph/kybe52ek5riu2qobghbk/eng-a...) 2. GPT3 Automated Sales Email Generator: (https://studio.patterns.app/graph/8g8a5d0vrqfp8r9r4f64/sales...) 3. Sales lead enrichment, scoring, and routing: (https://studio.patterns.app/graph/9e11ml5wchab3r9167kk/lead-...)

Oh and we have two Hacker News specials. Our Getting Started Tutorial features a Hacker News semantic search and alerting bot (https://www.patterns.app/docs/quick-start). We also built a template app that uses a LLM from Cohere.ai to classify HN stories into categories like AI, Programming, Crypto, etc. (https://studio.patterns.app/graph/n996ii6owwi5djujyfki/hn-co...).

Long-term, we want to build a collaborative ecosystem of reusable components and apps. To enable this, we’ve created abstractions over both data infrastructure (https://github.com/kvh/dcp.git) and “structurally-typed data interfaces” (https://github.com/kvh/common-model.git), along with a protocol for running data operations in Python or SQL (other languages soon) in a standard way across any cloud database or compute engine.

Thanks for reading this—we hope you’ll take a look! Patterns is an idea I’ve had in my head for over a decade now, and I feel blessed to have the chance to finally build it out with the best co-founder on the planet (thanks Chris!) and a world-class engineering team.

We’re still early beta and have a long road ahead, but we’re ready to be tried and eager for your feedback!




Looks good! I'd love to see a bring-your-own-hardware/cloud version that still includes all other features / UI. I read you plan to open source the execution engine but not sure if that's the full solution. Would pay for this, but in a different cloud and where I can reuse the hardware I already pay for.


Still working out the details. Would love to hear what hardware you're using. Shoot me a message? patterns.app/contact


I really like the mix of drag and drop with the ability to also write code. This is something that I always run into with tools like Zapier - they get me close, but then I need a small amount of customization that isn't built in and then hit a wall. This seems like it might be a nice solution. Congrats on the launch.


It’d be great to show the debugging experience in the video (in fact, I’d prefer seeing that over the breadth of features). E.g. what happens when there’s a syntax error in my sql query or the python code fails on an invalid input?

That tends to be the critical make it or break it feature when you’re writing code in an app builder.


Agree, debugging is a critical user experience! In Patterns, you'll see the full stack trace and all logs when you execute Python or SQL.


Congrats on the launch! As an iOS dev who dabbles in ML, I'm having trouble understanding what you mean by "data applications" and who this is for. I'm guessing it's targeted at teams that crank out lots of small apps and therefore investment in learning your platform would make sense? It would be helpful if you gave clearer explanation of the use case(s), beyond the generic "customer data platforms, fintech companies building lending and risk engines, and AI companies building prompt engineering pipelines" (which, tbh, means nothing to me).


Our target audience are data engineers and scientists who are stitching together business apps like CRMs/email marketing tools etc, and building proprietary automations and analytics in between them like generating a customer health score and actioning on it.

One of the problems we’ve intended to solve is the separation of analytical systems (like all your pipelines for counting customers and revenue) and your automations (like do something on customer signup event).


That was a good and important question - I've moved cstanley's answer to the top text so other people will get the information sooner. Thanks!


Love the focus on these AI workflows. Any plans to support other AI models like Dalle / Stable Diffusion?


Absolutely. After we add better support for file storage, building workflows that leverage APIs like Dalle/Stable Diffusion become much easier.


What is the timeline on this? Thank you.


How is it different to node-based data tools such as Alteryx, Easy Data Transform or Knime?


Those are great tools, but built for a different era. We've built Patterns with the goal of fostering an open ecosystem of components and solutions that interface with modern cloud infrastructure and the rest of the modern data stack, so folks can build on top of other's work. As more and more data lives in the cloud, in standard saas, more and more businesses are solving the same data problems over and over. We hope to fix that!


So more of a development platform than an end user tool?


well, because of this ad, i've now at least heard of this one.


This looks like a data equivalent of WordPress, and I don't mean that in a good way. What happens when someone no longer wants to use Patterns?


Good concern. All Patterns apps are fully defined by code that you can download. We're building our open source execution engine, once that lands you'll be able to self host forever if desired


Cool video! It feels like the love child of Zapier and Airflow — in a good way!


Got a demo a few months back, was told Kotlin support (as alternative to Python) is on the roadmap... how soon might that be a reality?


Great launch guys. The marketplace portion will be huge.


Congrats Ken and Chris! I think Patterns is very interesting, having closely followed since the early days (but not affiliated).

For those interested, I've been doing some exploring around automation tools more broadly, and this is my attempt at orientation, within the context that Patterns exists (my own interpretation).

The paradigm is within the context of data analysts and data engineers required to centralise data for reporting and automation (see "analytical + operational workflows" on the Patterns website). This process tool chain broadly looks like the following aggressively reduced flow of core components:

`Extract > Integrate > Analyse/Report > Automate` (and Ops)

- Extract: Regularly pull data into a centralised storage/compute location from multiple source systems such as SaaS APIs, databases, logs etc (eg Airbyte)

- Integrate: Combine data into a coherent unified standardised data warehouse (data warehouse is a thing you build not a tool you buy) (eg dbt, Pandas)

- Analyse: Explore data, discover facets/aspects/dimensions in the combined data. Typically notebook type tools, and includes data science and analytics (eg Jupyter)

- Reporting: Dashboards and regular reports on formalised data concepts such as metrics, using BI tools (eg Metabase)

- Automation: Tools that exist to trigger actions on the data available in the system, typically by sending those actions into other systems. (eg Patterns!, also Zapier et al, which are more consumer oriented, no version control etc)

- Ops: The tools needed to effectively achieve the above, many sub-categories (eg Airflow)

---

Some observations to then mention, specifically about Patterns/Automation:

1. Probably best to be used in conjunction with the previous components. Automation tools seemingly can exist without, but the Integrate, Analyse and Reporting are specific, highly related and likely required processes at any kind of scale (team, data volume or complexity) at least this has been true historically, and companies have deployed significant resources to implement these tools.

2. However, one obviously can use just an Automation tool alone, as they provide the Ops and technical components to run the full `Extract > Integrate > Analyse/Report > Automate` process, and possibly with the least complexity, and the best containment of abstractions. This is a huge gain for small teams with limited resources.

3. My reservation for bigger teams is the "Integrate" component, which if not done carefully (regardless of where/how) leads to a mountain of technical debt to try and maintain the data transformations (data modelling), and nothing but care solves this very time consuming process

4. Data is stored on Patterns, and it would be interesting to know how users with pre-existing processes would extract data into other tools, say for example to write scored lead IDs into another system for targeting (see Reverse ETL tools).

5. Most Automation tools lack a real "user input" component by default, as they seem not to be designed to build user-interface CRUD apps per-se. This is similar to Hex.Tech (as far as I can tell?) which has an "app" interface, but users cannot really change the state of the app. If they reload the page, their inputs will be lost (I think I am correct here for Patterns - could a "non technical user" change the lead scoring parameters without getting into the code?) Feels like a simple feature, but probably very complex to implement.

6. The Patterns on face value feels like a better way to deploy data analytics in terms of what users are trying to achieve (Create and share insight, and automate in specific instances), with nice abstractions over storage, streams, scheduling, compute (none of which are worth direct contact for an analyst if they can avoid)

Distinct but possibly worthwhile mentioning these categories:

- Reverse ETL: A similar desire to Automation; send the data back to a tool that can use it to automate something

- No-code: providing CRUD App development capabilities without (much) code (eg Retool)

- Feature Flag/Remote Config: Provide non-technical users with an interface into the configuration of a web-app (eg Flagsmith)


Spot on analysis of the market — we also think the next frontier for data eng/science is towards further automation. After all, dashboards are input to a person that makes a decision and takes an action; and to your point, tools on the market don’t really serve this use case.

To this end, we plan to support more human-in-the-loop workflows by expanding on our dashboard feature and enabling stateful user input interfaces.

One small note to point #4, while we provide out-of-box infrastructure for easy setup, Patterns can run on your existing data warehouse — so it plays nicely with big teams that have existing tools.


> makes a decision and takes an action

Exactly, this is such an obvious pattern and yet the second half is so badly catered for.

#4 - Great, that is useful, not obvious from the docs/examples that this is possible.


Love the videos! Would it be possible to have a clickable interactive demo where we can see what each node is doing?


We have a clickable tutorial behind login, and we could easily do the same experience for an app we embed on the website. Thank you for the idea!


This is a step ahead indeed. good job!!


Wow! Congrats! Looks amazing! :)


Congrats on the launch! Looks really nice. Curious about the marketplace: how is it populated? Are these a set of apps that are available? Can third parties build apps that would be available? And could a user build a custom app?


The marketplace is an open ecosystem, yes! Anyone can build their own components and apps and submit them. More details here https://www.patterns.app/docs/marketplace-faq/, and guide for building your own: https://www.patterns.app/docs/dev/building-components. It's early days but our goal is coverage of all data sources and sinks, the ontology layer of common transformations and ETL logic, and AI / ML models.


This looks so good! Congrats to the launch!


Yahoo Pipes would be have been proud.


First want to say congrats to the Patterns team for launching a gorgeous looking tool. Very minimal and approachable. Massive kudos!

Disclaimer: we're building something very similar and I'm curious about a couple of things.

One of the questions our users have asked us often is how to minimize the dependence on "product specific" components/nodes/steps. For example, if you write CI for GitHub Actions you may use a bunch of GitHub Action references.

Looking at the `graph.yml` in some of the examples you shared you use a similar approach (e.g. patterns/openai-completion@v4). That means that whenever you depend on such components your automation/data pipeline becomes more tied to the specific tool (GitHub Actions/Patterns), effectively locking in users.

How are you helping users feel comfortable with that problem (I don't want to invest in something that's not portable)? It's something we've struggled with ourselves as we're expanding the "out of the box" capabilities you get.

Furthermore, would have loved to see this as an open source project. But I guess the second best thing to open source is some open source contributions and `dcp` and `common-model` look quite interesting!

For those who are curious, I'm one of the authors of https://github.com/orchest/orchest


Yes, great point, we share that concern. All of our components (patterns/openai-completion@v4) are open-source and can be downloaded and "dehydrated" into your Patterns app. They all use the same public API available to all apps.

We're working towards a fully open-source execution engine for Patterns -- we want people to invest with full confidence in a long-term ecosystem. For us, sequencing meant dialing in the end-to-end UX and then taking those learnings to build the best framework and ecosystem with a strong foundation. Stay tuned!

Thank you for the kind words and congrats on the great work on Orchest!


I am working on something adjacent to this problem. We focus much less on data pipelines but on automation, but in the end also have an abstraction for flows that one can use to build data pipeline. The locking-in issue was something I thought a lot about and ended up deciding that our generic steps should just be plain code in typescript/python/go/bash, the only requirement is that those snippets code have a main function and return a result. We built the https://hub.windmill.dev where users can share their scripts directly and we have a team of moderators to approve the one to integrate directly into the main product. The goal with those snippets is that they are generic enough to be reusable outside of Windmill and they might be able to work straight out of the box for orchest for the python ones.

nb: author of https://github.com/windmill-labs/windmill


Thanks for chipping in.

I’ve been leaning towards this direction. I think I/O is the biggest part that in the case of plain code steps still needs fixing. Input being data/stream and parameterization/config and output being some sort of typed data/stream.

My “let’s not reinvent the wheel” alarm is going off when I write that though. Examples that come to mind are text based (Unix / https://scale.com/blog/text-universal-interface) but also the Singer tap protocol (https://github.com/singer-io/getting-started/blob/master/doc...). And config obviously having many standard forms like ini, yaml, json, environment key value pairs and more.

At the same time, text feels horribly inefficient as encoding for some of the data objects being passed around in these flows. More specialized and optimized binary formats come to mind (Arrow, HDF5, Protobuf).

Plenty of directions to explore, each with their own advantages and disadvantages. I wonder which direction is favored by users of tools like ours. Will be good to poll (do they even care?).

PS Windmill looks equally impressive! Nice job


Yes, inputs/outputs is likely the most interesting problems for our diverse specs of flows.

Because data pipeline is not the primary concerns of Windmill, we took the stance that Inputs and Output of steps were simply JSON in, JSON out. For all the languages, we simply extract the JSON object into the different parameters of the main, and then we wrap the return into the respective language native serializer for the output (e.g JSON.stringify in Typescript). Then each step can use a javascript expression executed by v8 to do some lightweight transformation between the output of any step to the input of that step.

A lot of the simplification we made is actually parsing the main function parameters into the corresponding jsonschema, supporting deeply nested objects when relevant.

That works great for automation that do not have big input/outputs, but not for data. So what we do for data is to use a folder that we symlink to be shared by all steps if a specific flag for that flow is set. It also force us to have the same worker process all the steps inside that flow when otherwise flow steps could have been processed by any workers. It is very fast since it's all local filesystem but not super scalable.

I am not pleased with that solution and believe that if we were to expand on the data problem, we would certainly rely on fast network and HDFS/Amazon EFS/etc to simply share that mounted folder across the network.

Anyway, sorry for the rambling but I do feel like we're all taking different approach to the same underlying problem of building the best abstraction for flows and believe we might learn from each other's choices.

ps: congrats Patterns on the launch, the tool look absolutely amazing.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: