
Show HN: Arc – a declarative data transformation framework - seddonm1
https://arc.tripl.ai
======
seddonm1
Hi everyone I would like to show the project we have been working on for a
couple of years: Arc, an opinionated framework for defining predictable,
repeatable and manageable data transformation pipelines;

\- predictable in that data is used to define transformations - not code

\- repeatable in that if a job is executed multiple times it will produce the
same result

\- manageable in that execution considerations and logging have been baked in
from the start

\- MIT licensed open-source and cloud agnostic

We have seen that it is hard to scale data engineering teams in a code-first
environment. Arc solves a lot of the problems we have seen data
engineering/science teams struggle with. It:

\- makes data engineering accessible to audiences outside of data engineers -
you don't need to be proficient at Scala/Spark to introduce data engineering
into your team

\- has a Jupyter Notebook based development environment to quickly build logic

\- provides a clear path to production for machine learning (via MLTransform,
TensorflowServingTransform or HTTPTransform for models as a service)

\- has a plugin system allowing federated development for any features not in
the base framework

Currently it uses the Apache Spark execution engine but due to its declarative
nature can be executed against future engines.

Please let us know if you have any feedback/suggestions.

