
Apache NiFi - boredgamer2
http://nifi.apache.org/index.html
======
cs02rm0
I've used it a fair bit, though not for a couple of years. Few points, some of
which may be out of date:

* I've seen customers fall into the trap of thinking they don't need expensive developers because you can drag and drop, just people who can use a mouse can crack on with NiFi.

* It persisted its config to an XML file, including the positions of boxes on the UI. Trying to keep this config in source control with multiple devs working on it was impossible.

* Some people take the view that you should use 'native' NiFi processors and not custom code. This results in huge graphs of processors with 1000s of boxes with lines between you have to follow. Made both better and worse by being able to descend and ascend levels in the graph. The complexity that way quickly becomes insane.

* You're essentially programming with it. I've no doubt you could use it to write, say, an XMPP server if so inclined. Which means you can do a great many things of huge complexity. Programming tools have developed models for inheritance and composition, abstraction, static analysis, etc. which NiFi just didn't have. The amount of repeated logic I've seen it's configuration accumulate is beyond anything I've seen from any novice programmer.

I ended up feeling like it could be an OK choice in a very small number of
places, but I never got to work on one of those. The NSA linking together
multiple systems with a light touch is possibly one such use case. For most
everyone else, I couldn't recommend it.

~~~
closeparen
I have not found that argument persuasive to the managers who believe that
coding is inherently wasteful if what you’re trying to do is _technically
possible_ in a workflow builder GUI.

~~~
marcinzm
That's true but, personally, I just avoid working for managers like that.

~~~
closeparen
I can avoid this at the line manager level, but Directors love these things.

~~~
tatersolid
No, we don’t.

------
mcnichol
We used NiFi...one of the worst experiences.

It installs like an appliance and feels like you are grappling with a legacy
tool weighed down by a classic view on architecture and maintenance.

We had built a data pipeline and it was for very high-scale data. The theory
of it was very much like a TIBCO type approach around data-pipelines.

Sadly the reality was also like a TIBCO type approach around data-pipelines.

One persons experience and opinion and I am super jaded by it due to some
vendor cramming it down one of our directors throats who subsequently crammed
it down ours when we warned how it would turn out. It ended up being a very
leaky and obtuse abstraction that didn't belong in our data-pipeline when you
planned how it was maintained longer-term.

I ultimately left that company. It had to do with as much of their leadership
and tooling dictation as anything else, NiFi was one of many pains. I am sure
there are places that are using NiFi who will never outgrow the tool so take
it with a grain of salt.

Said company ultimately struggled for the very reasons those of us who left
were predicting (the tooling pipeline was a mess and was thrashing on trying
to get it right, constantly breaking by forcing this solution, along with
others, into the flow. Lots of finger-pointing).

Sucks to have that: "I told you so..." moment when you never wanted that
outcome for them....I just couldn't be a part of their spiral anymore.

~~~
edmundsauto
Can you elaborate by what you mean on a TIBCO like approach? I haven't used
their tools, but would like to know more about the issues you ran into. What
were examples of the leaky abstraction>?

~~~
brutus1213
I'd like to second this request. I have encountered event buses and ETL in a
number of places over my career - I don't understand what the heck TIBCO does
beyond something simple like RabbitMQ/ZeroMQ. How is this different from Pub-
Sub (and its variants). Any pointers to books or blog posts would be really
appreciated.

~~~
mcnichol
TIBCO is very much providing queuing/caching to shuttle data from one point to
another.

The goal is even more-so to be the interconnect for all systems across a
varied enterprise at a higher level. It's all pub-sub underneath the hood.
Think cheap butts in seats doing the same work for a "negligible hit on
performance".

In the same way you can plug random devices into outlets around the house all
served by some powerplant you don't know (or even need to care about), TIBCO
attempts to provide that same interface.

Data does need some restructuring, whether these are aggregations,
transformations, etc. So they provide steps in the process where you can
perform these operations through a drag and drop UI.

There is an input defined and an output defined in XML that you don't have to
code, but is managed and can be seen. The engine beneath provides the lower
layers of routing, bytecode, implementation letting you just drag blocks
around on a screen "connecting things".

The goal is very pure: I have many people in my organization that know how
data flows, not all of them are developers. How can I enable them to connect
my organization without everyone needing to be a developer.

In theory and in practice are always the interesting observations. What I had
seen happen (as was mentioned somewhere else) is that very strong developers
became weak by relying on this tool (or merely left for adequately challenging
work). When the world moved on to something else, so much had changed it was
almost a career change to get back into development.

They went from understanding Java 3/4, JEE to Java 11, Spring, DI
Frameworks....I saw a lot give up or move over to product management roles.
This only made the tension between on-premise infrastructure teams and public
cloud teams more divisive and toxic. I don't think it's anything uncommon in
other areas, just feel like we've reached a full revolution in this particular
space (and not the first revolution either).

~~~
edmundsauto
Thanks for a clear explanation without dismissing the product as garbage.
(it's in that space where techies hate it, but it must provide value since
it's so expensive!)

Why do non-technical people need to understand the data flow? It seems like
documentation (data dictionaries) would be preferred. Or, are they useful for
very non-technical people, while TIBCO data flow understanding is useful for
people who are data savvy but not tech savvy?

~~~
mcnichol
It is a butts in seats equation.

If you can have less expensive operators driving and mapping the world and
place all the smarts in the pipes, you can drive down opex and divert cash to
capex for competitive advantage.

Linux and much of the streaming software world is smart people, dumb pipes.

If you invert that you have more automation, predictability, control at lower
cost. The risk is a lot of eggs in one basket and when the market turns, if
the company you are buying from mismanages tech, if they can't keep pace with
change...you go along for their ride. Every company big and small falls into
this technical debt. I have maby opinions on why as I am sure many do.

There is a lot baked into that comment but the constant tug-of-war every CIO
is trying to wrap their head around....how do we do more with less and gain an
advantage.

------
gopalv
NiFi's biggest strength is that it is a 2-way system - it is not Storm, it is
not Flink, it is not Kafka, it is not SQS+Lambda.

I like to think of it like Scribe from FB, but with an extremely dynamic
configuration protocol.

The places where it really shines is where you can't get away with those 3 and
the problem is actually something that needs a system which can back-pressure
and modify flows all the way to the source - it is a spiderweb data collection
tool.

So someone trying to Complex Event Processing workflows or time-range join
operations with it, will probably succeed at the small scale, but start
pulling their hair out at the 5-10GB/s rate.

So its real utility is in that it deploys outside your DC, not inside it.

This is the Site-to-Site functionality and MiniFI is the smallest chunk of it,
which can be shrunk into a simple C++ something you can deploy it in every
physical location (say warehouse or grocery store).

The actually useful part of that is the SDLC cycle for NiFi, which lets you
push updates to a flow. So you might want to start with a low granularity
parsing of your payment logs on the remote side as you start, but you can
switch your attention over it to & remove sampling on the fly if you want.

If you're an airline flying over the arctic, you might have an airline rated
MiniFI box on board which is sending low traffic until a central controller
pushes a "give me more info on fuel rates".

Or a cold chain warehouse which is monitoring temperature on average, until
you notice spikes and ask for granular data to compare to power fluctuations.

It is a data extraction & collection tool, not a processing and reporting tool
(though it can do that, it is still a tool for bringing data after
extraction/sampling, not enrichment).

------
monstrado
Incredible piece of software. I've used it in production at my last two jobs.
You can build almost anything in NiFi once you get into the mindset of how it
works.

A good way to get started with NiFi is to use it as a highly available quartz-
cron scheduler. For example, running "some process" every 5 seconds.

Disclaimer: I'm an Apache NiFi committer.

An article you might find interesting about it's ability to scale.

[https://blog.cloudera.com/benchmarking-nifi-performance-
and-...](https://blog.cloudera.com/benchmarking-nifi-performance-and-
scalability/)

Disclaimer v2: I used to work at Cloudera

~~~
nightowl_games
This page is classic Apache project in that I read it and have no idea what it
does. Can you high level explain what this thing is really for?

~~~
taftster
Agreed. So here's an attempt to describe NiFi at a high level.

Fundamentally NiFi is a "dataflow engine", a system that can be used to
automate data transfer from different and varying types of sources and sinks.
It has a fairly usable UI that enables a "dataflow manager" (end user) to
perform transformation, routing and delivery of data using a "drag-n-drop"
configuration approach.

Getting data into or out of your application/system, or performing simple
schema transformations, is a common (maybe tedious) task that most developers
face. NiFi helps connect the dots, so to speak, and decouples the
receipt/delivery of data away from your application. NiFi comes with a set of
"batteries included" connectors for almost every transport protocol you would
generally need. And it's modular so you can create your own processing
components as well.

NiFi is fundamentally modeled after what's called "Flow-Based Programming"[1],
which is a style of programming that facilitates composition of black-box
processing units. It can run at an enterprise or IoT level, depending on where
that decomposition best fits into your architecture.

[1] [https://en.wikipedia.org/wiki/Flow-
based_programming](https://en.wikipedia.org/wiki/Flow-based_programming)

(disclaimer: I'm affiliated with the NiFi project)

~~~
_JamesA_
Is this a layer on Apache Camel [1] or something completely different?

[1]: [https://camel.apache.org/](https://camel.apache.org/)

~~~
monstrado
It was built from scratch at the NSA and open sourced a few years ago.

~~~
zmmmmm
I'm curious how you would compare it to Apache Camel - when would you use
Camel and when would you use this?

We use camel with DSLs to make programmatic workflows that connect data flows
together. However Camel itself doesn't typically carry the data. Sometimes it
SFTPs files around etc., but mostly, it is just a messaging layer.

Is that the main difference here?

------
taftster
NiFi at first glance sometimes just looks like a glorified GUI for building
out a data-delivery application. But NiFi doesn't just compile an application
to be deployed on your network. Instead, the "power" of NiFi is that it allows
an operations staff to perform the regular day-in-day-out task of monitoring,
regulating and if needed modifying the delivery of data to an enterprise.

NiFi gives insight to your enterprise data streams in a way that allows
"active" dataflow management. If a system is down, NiFi allows dataflow
operations to make changes and deal with problems directly, right at tier 1
support.

It's often the case that an enterprise software developer has an ongoing role
of ensuring the healthy state of the applications from their team. They don't
just develop, they are frequently on call and must ensure that data is flowing
properly. NiFi helps decouple those roles, so that the operations of dataflow
can be actively managed by a dedicated support team that is more tightly
integrated with the "mission" of their dataflow.

NiFi additionally offers some features that most programmers skip to help with
the resiliency of the application. For example:

\- the concept of "back pressure" is baked into NiFi. This helps ensure that
downstreams systems don't get overrun by data, allowing NiFi to send upstream
signals to slow or buffer the stream.

\- data provenance, the ability to see where every piece of data in the system
originated and was delivered (the pedigree of the data). Includes the ability
to "replay" data as needed.

\- dynamic routing, allowing a dataflow operator to actively manage a stream,
splicing it, or stopping delivery to one source and delivering to another.
Sources and Sinks can be temporarily stopped and queued data placed into
another route. Representational forms can be changed (csv -> xml -> json,
avro), and even schemas can be changed based on stream.

Anyone can write a shell script that uses curl to connect with a data source,
piping to grep/sed/awk and sending to a database. NiFi is more about
visualizing that dataflow, seeing it in real-time, and making adjustments to
it as needed. It also helps answer the "what happens when things go wrong"
question, the ability to back-off if under contention, or replay in case of
failure.

(disclaimer: affiliated with NiFi)

------
banjoriver
NiFi is vey good at reliably moving data at very high volumes, low latency,
with a large number of mature integrations, in a way that allows for fine
grained tuning, and i've seen first hand that it is very scalable. It's
internal architecture is very principled: [https://nifi.apache.org/docs/nifi-
docs/html/nifi-in-depth.ht...](https://nifi.apache.org/docs/nifi-
docs/html/nifi-in-depth.html)

Out of the box it is incredibly powerful and easy to use; in particular it's
data provenance, monitoring, queueing, and back pressure capabilities are hard
to match; custom solution would take extensive dev to even come close to the
features.

It is not code, and that means it is resistant to code based tooling. For
years it's critical weakness was related to migrating flows between
environments, but this has been mostly resolved. If you are in a place with
dev teams and separate ops teams, and lots of process required to make prod
changes, then this was problematic.

However, the GUI flow programming is insanely powerful and is ideal when you
need to do rapid prototyping, or quickly adapt existing pipelines; this same
power and flexibility means that you can shoot yourself in the foot. As others
have said, this is not a tool for non technical people; you need to understand
systems, resource management, and the principles of scaling high volume
distributed workloads.

This flow based visual approach makes understanding what is happening easier
for someone coming later. I've seen a solution that required a dozen
containers of redis, two multiple programming languages, zookeeper, a custom
gui, and and mediocre operational visibility, be migrated to a simple nifi
flow that was 10 connected squares in a row. The complexity of the custom
solution, even though it was very stable and had nice code quality, meant that
that solution became a legacy debt quickly after it was deployed. Now that
same data flow is much easier to understand, and has great operational
monitoring.

Some suggestions: \- limit NiFi's scope to data routing and movement, and
avoid data transformations or ETL in the flow. This ensures you can scale to
your network limits, and aren't cpu/memory bound by the transformation of
content. \- constrain the scope of each instance of nifi, and not deploy 100s
of flows onto a single cluster. \- you can do alot with a single node, only go
to a cluster for HA and when you know you need the scale.

------
unixhero
Phew! Happy to have read the comments here. They say a lot. I will go with
Apache Airflow for all my workflow needs from now on. I wasn't entirely sure
if this was the best bet, but after seeing all of this I am now.

I know a massive installation [0] which is about to be open sourced, where
Apache NIFI is used in the middle of the stack as a key component. No
dismissal of the capabilities this package offers intended.

[0]
[https://sikkerhetsfestivalen.no/bidrag2019/138](https://sikkerhetsfestivalen.no/bidrag2019/138)

slides [slide #32]:
[https://static1.squarespace.com/static/5c2f61585b409bfa28a47...](https://static1.squarespace.com/static/5c2f61585b409bfa28a47010/t/5d76acacbff92a46b71dfdba/1568058561846/ELK-
i-solnedgang.pdf)

------
pacofvf
For the love of god, don't use NiFi to trigger an Airflow DAG.

~~~
recov
Can you expand? We just set this workflow up and it seems to be working fine.

~~~
pacofvf
NiFi is meant for stream processing and Airflow for batch processing, if your
NiFi triggers an Airflow DAG that means that your entire process is batch
processing and you shouldn't use NiFi in the first place. If you still want to
do stream processing then use Airflow sensors to "trigger" it.

------
yawz
If you're considering Apache NiFi, you should also look at Apache Airflow and
Uber Cadence to decide what model would work best for you.

~~~
ekianjo
They do have totally different use cases, so it should be fairly quick to
decide which one is for you.

~~~
mac01021
Can you explain what the different use cases are? In particular, what is
Airflow good at that Nifi is not?

I have used Nifi a little bit and Airflow not at all. Reading the home pages
of the two products, it's hard for me to know when it would be more
appropriate to use Airflow than Nifi.

They both schedule jobs and move data according to control flow topologies
that you build in a GUI, right?

~~~
lsofzz
From watching the presentation on Youtube
([https://youtu.be/sQCgtCoZyFQ](https://youtu.be/sQCgtCoZyFQ)), it seems NiFi
is geared towards acquisition of data, gluing/batching/massaging the flow
between systems and providing the necessary interfaces to downstream systems;

\- Would love to see the ability to develop custom NiFi processors in
Go/Rust/Elixir etc.

\- XML is a big pain in the rear.

\- Being container-aware is big win. Stateless is even better.

I see a good opportunity there for users like me to explorer NiFi's capability
in the future.

> They both schedule jobs and move data according to control flow topologies
> that you build in a GUI, right?

Airflow on the other hand is designed to run scheduled jobs (whether it be
batched or otherwise). The 'job' can really be anything - build / data
processing pipelines, system configuration management pipelines and so on. In
Airflow parlance, one can create connected DAG's as pipelines that massage the
data in a way you intend it to.

They both share some commonalities but I do gravitate towards their use cases
being subtly different and an important one highlighted above.

------
corndoge
Can someone explain what this is? I can't find anything on the website that
explains it

~~~
gregw134
Sure, I've worked with it. It let's you visually build data pipelines. It's
extremely useful for getting work done quickly--you just drag and drop
prebuilt connectors to things like elasticsearch, s3, or twitter and you have
a data pipeline, including automatic backoff and the ability to inspect the
data at each step. It's visual so it's easy to tell what's going on. Biggest
downside is it's not automatically distributed. You can set it up to be
distributed, but you have to do the plumbing yourself on the nifi graph by
dropping nodes for routing tasks between nifi servers. Overall, perfect tool
for quickly building a pipeline that can be easily shown to the business and
in which you can visually see all data and errors at each step.

~~~
hestefisk
Why would one consider distributing it in the first place? IMHO that would
require distributed transactions, which would make it hugely complex.

------
rfsliva
We are using NiFi as our dataflow engine for real time data ingest. We are
using a current version, 1.11.4, and have several instances running including
a development instance. The interface provides our team the ability to do
quick iterative development and testing. An example of one of our use cases is
we have 2 dataflows that ingest data from 2 different vehicle location/status
systems and pump them into SQL Server. At the same time another dataflow
merges the data from SQL Server and sends the data to Azure Event Hub. These
dataflows were easy to setup, test and extend. This replaced a process that
was written in Go.

------
endlessmike89
Nifi is a good (not great) tool, mostly because of all of the functionality
you get out of the box. It comes with almost any kind of connector you would
need for moving data. There's a pretty steep learning curve, but once you push
through that, creating a new data flow from scratch is quick and easy. It
sucks that other people in this thread have had bad experiences with Nifi, and
I can't say that I haven't. However, it has generally been a positive addition
to my team's stack.

------
haddr
Had some second hand opinions on running NiFi in prod and all of them were
rather negative, some saying it was a mistake. That was around one year ago. I
wonder if things have changed since then.

------
sixhobbits
I have never heard of this before, and I'm sad that profit-driven, marketing
speak has taken over even non-profit product pages.

> An easy to use, powerful, and reliable system.

This is the title. That's the most important sentence, and it's absolutely
meaningless.

It's bad enough that everything has to "sell" \- just describe to me what your
product does and I'll decide if I need or not. Don't try to convince me.

If you have to sell, do it by differentiating yourself from your competitors.
No one is calling themselves "Difficult to use, weak, and unreliable", so
saying the opposite is not differentiation.

When did we accept that marketing-speak was default communication. Can't we
have some landing pages that are essays? Or even a few paragraphs instead of
trying-to-be-catchy bullet point phrases in large font?

~~~
taywrobel
> This is the title. That's the most important sentence, and it's absolutely
> meaningless.

Well, yeah, it's meaningless if you cut off the second half...

> ...to process and distribute data.

That's what it does. The adjectives before it aren't the meat of the sentence.

------
pazo
I have experience from multiple projects with NiFi and it was the main reason
for me and others quitting the company. Somehow management were convinced by
some salesmen that this would be the golden bullet, however, all of their
deliveries were delayed. We experienced issues debugging flows with
performance problems, and even basic version control was problematic due to
ids being replaced every time.

------
josephmosby
NiFi is a fantastic tool for a certain set of organizational constraints.

* It doesn't need much in the way of dependencies to run. If you can get Java onto a machine, you can probably get NiFi to run on that machine. \- That is HUGE if you are operating in an environment where getting any new dependencies installed on a machine is an operational nightmare.

* It doesn't require a lot of overhead. Specifically, no database.

* You can write components for it that don't require a whole lot of tweaking for small changes to the incoming data. So, if I have a machine processing a JSON file that looks like XXYX and another machine processing a nearly identical JSON file that looks like XYXX, the tweaks can be made pretty easily.

So, if you're looking for a lightweight, low overhead, easily configurable
tool that may be running in an environment where you've got to run lots of
little instances that are mostly similar but not quite, NiFi is great.

If you are running a centralized data pipeline where you have a dedicated team
of data engineers to keep the data flowing, there are better options out
there.

------
tspann
No more XML. Check out NiFi 1.11.4, it does everything you need for easy
ingest. If you are reading some files putting them into Kafka or S3 or a
database or MongoDB or Hbase or Hive or Impala or Oracle or Kudu or ..., it's
genius.

[https://www.datainmotion.dev/](https://www.datainmotion.dev/)

~~~
advaita
As far as I see, your link has nothing to do with 1.11.4 release. May be you
wanted to link to a specific page?

~~~
tspann
[https://www.datainmotion.dev/2019/11/exploring-apache-
nifi-1...](https://www.datainmotion.dev/2019/11/exploring-apache-
nifi-110-parameters.html)

------
Sodman
Having used NiFi in production, my biggest issue with it is handling source
control and multiple environments. As the "IDE" is effectively also the
runtime, the lines between "local", "stage", and "prod" are easy to blur.

They have a built-in source control product called "NiFi Registry", which can
even be backed by git. The workflow for promoting flows between environments
feels clunky though, especially as so much environment-specific configuration
is required once your number of components gets high enough.

Moving our Java, Ruby or Go code between environments or handling versioning
and releases was a piece of cake, in comparison.

------
tomrod
Do I understand what this is: general purposes SSIS-type data integration,
pipeline, and workflow tool?

If so, how does it compare to SSIS, dbt, and other projects (please name!)?

Otherwise, what is an analogous toolset?

------
benjaminwootton
I have been working on a new product which competes with NiFi, providing
streaming data transformations.

Think, if order value > 100 and the customer has ordered 3 times in the last
hour and the product will be in stock tomorrow.

Kafka streams, Flink and Dataflow are super powerful and I think there is room
for a GUI tool.

Would be great to hear experiences of NiFi in this domain or discuss the space
with any experienced users. Will add contact details in my profile.

~~~
joshz404
Have you heard of BPEL? :D

~~~
zokier
From wikipedia:

> the open BPMS movement led by JBoss and Intalio Inc., IBM and Microsoft
> decided to combine these languages into a new language, BPEL4WS. In April
> 2003, BEA Systems, IBM, Microsoft, SAP, and Siebel Systems submitted BPEL4WS
> 1.1 to OASIS for standardization via the Web Services BPEL Technical
> Committee

if that doesn't scare you, then oh boy...

Just look at the list of organizations mentioned there, I don't know if you
could make it more enterprisy.

------
kentosi
For those who were around during the mid-2000s, is this basically another
revival of SOA (Service Oriented Architecture)?

I watched one of the explanation videos and it brought back memories.

My dislike of the phase back then, which I hope they've addressed now, is that
while everything looked find and dandy while designing things on a UI, when
something broke it was a whole heaps of generated XML no one could read.

------
aasasd
So Apache has at least a handful of software packages that do about the same
thing, but with different interfaces and connectivity?

~~~
aquaticsunset
This has sorta been my experience with a lot of Apache projects recently. The
differences between them are becoming quite nuanced.

I'm trying to piece together the main reason why someone would pick this over
Camel, or vice-versa. I know they're different - but not night and day.

------
jszymborski
So, the comments here have mostly ranged between neutral to negative regarding
their experience.

I have a problem where I want to stream data to an ML layer and then stream
that to a web app (e.g. Laravel or Django).

Reading the docs here, this seems like this would solve this problem, but was
wondering if people had alternatives given that people seem to think poorly of
this application.

~~~
contravariant
Nifi is fantastically good at one thing, which is dataflow. Where you've got
data coming in at point A, but you need it at point B, and for some reason
can't convince either A or B to connect directly.

It's not a message bus, nor is it a data processing framework, nor a
scheduler, nor an ETL tool. If you try to use it for one of those you're in
for a bad time.

What you're describing sounds like you might need a message bus (think ZeroMQ,
Kafka, etc). Assuming you're writing the software yourself and want to connect
it together.

~~~
dajohnson89
could you elaborate on the difference between what op is asking for and what
nifi/airflow does? to me the use case of moving data through a couple of
different services could be solved by a message bus OR nifi.

~~~
contravariant
Nifi is designed to handle the problems that crop up when two systems can't
talk directly to each other. It puts a buffer between them to allow one
process to keep sending data when the other isn't quite fast enough, it can do
some basic transformation when the data isn't quite in the right format, etc.

However if you just need something to send messages then you're better off
using a tool that does just that, you don't need the overhead of a system that
can connect arbitrary applications that talk in incompatible protocols you
just need a single protocol that allows your applications to send each other
data.

In their docs Nifi calls itself a dataflow tool and calls dataflow a necessary
evil. It's the band-aid you need when you've got a mismatch between the way
data is generated and the way it is consumed. It would be insane to
deliberately create such a mismatch just to use Nifi.

~~~
dajohnson89
ok that makes sense. but in that case wouldn't be easier to write your own
adapters for each data source?

~~~
contravariant
Well no, Nifi has _a lot_ of adapters that you can use out of the box, making
them yourself isn't easier.

------
ibishvintilli
I like the way how it buffers messages. You basically stop a process and it
will continue when it left off. It is easy to create a distributed cluster. It
has hundreds of different connectors for external sources. On the other hand
is very bulky. Making it to work with https was horrible. You cannot just put
it behind a reverse proxy.

------
dikei
We only use NiFi on the edge of our data lake. It's very good at bulk loading,
pulling log files/sensor data from hundreds of systems into our systems.

However, it does not handle small records well, and deploying custom
processors is a pain, so don't use it to replace your stream processing
framework.

------
gatorbait83
Our team found this adapter to integrate ML with NiFI pretty handy:
[https://dev.to/tspannhw/easy-deep-learning-in-apache-nifi-
wi...](https://dev.to/tspannhw/easy-deep-learning-in-apache-nifi-with-
djl-2d79)

~~~
tspann
Thanks! I wrote a few more of those.
[https://github.com/tspannhw/ExecuteClouderaML](https://github.com/tspannhw/ExecuteClouderaML)

------
takeda
It reminds me of LONI Pipeline[1], which was created for the need of a
neuroscience lab to process images of brain scans.

[1] [http://pipeline.loni.usc.edu/](http://pipeline.loni.usc.edu/)

------
onetrickwolf
I am new to this site, why is there just a link to Apache NiFI on the front
page? Is this somehow news? Sorry not trying to be rude just confuses me a bit
since NiFi has been around for some time.

~~~
andrewflnr
Someone found it interesting, I guess. :) Personally, I'm finding the
discussion very interesting, since I haven't seen it here before.

Sometimes Wikipedia articles hit the front page. It's fine. Usually.

------
throwawaysea
Is this an open source self hosted equivalent to
[https://aws.amazon.com/datapipeline/](https://aws.amazon.com/datapipeline/)?

~~~
sixdimensional
Yes, basically. AWS Data Pipeline includes spinning up and tearing down VMs to
run the pipelines.

Nifi has a much more extensible architecture though, you can pretty much do
anything you want in a data pipeline in Nifi, and open source of course.

------
fmakunbound
Reminds me of the early 2000s when we were all into BPM, graphical or
otherwise. The drawbacks are pretty obvious. I bet the engineers who built it
had fun, tho.

~~~
blendo
As a Java programmer back then, watching with horror as consultants brought in
their flavor of the month (MQ/Spring/Camel), I could only pad my resume, and
pretend to say "never look a gift horse in the mouth".

------
yalogin
For someone not involved in the web stack, reading through that page tells me
nothing. Can someone tell me what use cases this is meant to solve?

~~~
taftster
I tried to give an answer here. Does this help?

[https://news.ycombinator.com/item?id=23147150](https://news.ycombinator.com/item?id=23147150)

------
hestefisk
Does NiFi still use XML output for its source code? It makes it very hard to
put under source control. Overall it’s a nice tool and very fast.

------
dmtroyer
Has anyone replaced Mirth with NiFi for HL7 slinging?

------
meh206
Our devs hated it due to ease of accidental flow fsck ups. And good luck
scaling it or trying to put it behind an LB!

------
iofiiiiiiiii
> An easy to use, powerful, and reliable system to process and distribute
> data.

But what is it?

------
century19
Brought to you by the NSA ;-)

~~~
taftster
I mean, yes. Paid for by $US Tax Payers. I'm for one glad to see at least some
of my investment returned, regardless of your opinions about the NSA.

------
J0_k3r
inb4 this shit gets bloated to hell like apache httpd

------
mberning
We have a team using this at work. They had built a process and needed it to
be put on a VM and run periodically. They said the requirements were a dual
core machine and 8gb of ram. The “binary” was like 1.8gb. I’m sure it included
a jre and a full nifi runtime, but god damn that is ridiculous. Had this
process been built using go or crystal or something like that it probably
would have been less than a megabyte and able to run with 512mb of ram.

~~~
ismail
From the spec (8gb) it appears that NiFi was not needed. Were you co- locating
zookeeper on the same nodes?

Also you say it needed to be run periodically? It’s supposed to be a long
running service. If you need something that you can spin up and shutdown in a
container or VM then it is probably not the solution.

Use it if you have high volumes of data you need to transport from A -> B. We
run on a cluster with 256gb ram, 128 core per a node.

