
DataJet – Fully managed Apache NiFi as a service. Visual Data flows without code - amig0ld
https://www.datajet.dev/
======
tiew9Vii
Some positive comments here on NiFi. I found it to not be that great.

I used it 4 or so years ago but things may of changed.

1) you get a big canvas which is just drag and drop. The canvas can get huge
and scrolling across the screen etc

2) leads on from previous, vary easy to go and accidentally drag something to
the wrong place or stop a paticular flow

3) some flows get stuck then a pain to restart

4) it was hard to version and do the dev / test / prod deployment. You got a
huge XML file if you export the flows which do not diff 5) it’s hard to test.
You can create your own processors and unit test them then deploy the jars but
that doesn’t seem the nifi way. Instead you use the built in processors and
glue them together. They are configuration essentially, configured in a GUI so
you can’t test them programmatically. You can use groovy scripts support which
is built in but you hit the same issue, how do you test inside nifi

6) no multi tenant or weak acls. Bit of a pain if want a shared nifi server as
you have one massive canvas

I can see the appeal if a team had their own nifi instances but I find it
still a cludge and writing a small specific app / microservice far simpler and
more efficient for ETL pipelines etc

------
not_good_coder
Suspect....free trial without any mention of price. Makes no sense.

~~~
toohotatopic
Also: no personal details on the homepage about the team and the ownership
structure. Why should I share my data with somebody who doesn't share his.

------
mooreds
That's cool! I looked at NiFi for a past company and it seemed pretty put
together. IIRC, the UI felt a bit dated like Java Swing, but the capabilities
were awesome.

That said, two worries:

* marketing this is going to be a bigger lift. Folks don't build data flows for fun, they use them to solve business problems. What business problems will this solve?

* "visual" programming claims always make me suspicious, because at some point you usually have to dive below the abstraction.

Anyway, best of luck!

~~~
cpr
> "visual" programming claims always make me suspicious, because at some point
> you usually have to dive below the abstraction.

Yes, but if each node is a well-defined data transformation, visual flows are
a great way to express overall processing.

I.e., it's not general visual programming, which is a hard problem.

------
lima
Can NiFi pipelines be versioned in Git?

~~~
lars_francke
Yes, via the NiFi Registry

------
dbs
I wonder how successful are these "managed <open source stack> as a service"
startups.

~~~
Gys
Managed database services (for mysql and many others) for one are very popular

------
antman
NiFi is underrepresented online although it's very capable. Currently
extending a thousands of steps dataflow and it's a breeze.

~~~
goodfight
I'm curious what domain you are using Apache NiFi for? I have used it before
for analytical graph based pipelines. It is very capable but it definitely
takes a different mindset to architect problems with NiFi.

------
mjirv
For anyone who has used DataJet or knows about it - why would I choose it over
a tool like Matillion?

~~~
lars_francke
I don't know about DataJet or Matillion but I do know about NiFi.

One reason would be that it's Open Source, easily extendable and you can host
it yourself if needed.

(I'm not saying that this is a good reason for everyone but for some)

------
polskibus
Is NiFi that bad to run by yourself?

~~~
pinopinopino
No, I run a small nifi cluster (8 nodes) on aws and it is very stable in my
opinion. You need a zookeeper cluster, which is also very stable.

I made the setup scalable, which is a nice feature, but you can do without it.
Nifi has no problems being scaled down or up.

The biggest problem is when your zookeeper cluster goes down. Nifi happily
goes one with processing, but now it can't update the state of its processors,
so it keeps doing the same stuff over and over again. Perhaps this is already
fixed, dunno. I shut the cluster down now, if I detect zookeeper is down and
then start everything up again if zookeeper is ok. But this happens about once
in the two year.

It can also talk with LDAP, which is nice in corporate organizations. And it
has a shit ton of processors, that do the work for you. If you have lots of
etl flows, I can warmly advice to have a look at it.

I update it once every 6 months or so, it is an internal tool. I haven't got a
hassle with that yet.

------
amig0ld
Use the best in class open source solution for managing and integrating data
without the hassle of maintaining a secured, scalable cluster. Using DataJet
you can build data flows that are as simple or complex and you like with no
code.

Join millions of data engineers and developers in using the Apache NiFi based
platform to build anything from database replication to ETL to IoT data
ingestion.

~~~
MaxBarraclough
HackerNews does not appreciate marketing spiel.

Are you associated with this project?

~~~
BubRoss
> HackerNews does not appreciate marketing spiel.

Really? What do you think this link is? It just goes to a landing page to sell
you and get you to sign up. This is an obvious advertisement, why do you only
notice when someone makes it a comment instead of a link?

~~~
mbreese
The link is assumed to be marketing. We all know what it is. Hopefully the
site will tell us why it is interesting or useful. But, we know it's a landing
page trying to get people to sign up.

However, when you use marketing babble in a comment, it adds no value to the
conversation. "Join millions of..." that is just not helpful to the HN
conversation.

Now, if the comment was -- _" we developed this to try and solve X problem for
Y users. Let use know what you think or if you have questions..."_ \-- then it
might have been a useful place to start the conversation. Marketing copy is
not.

(And they probably did more harm than good by posting that comment)

