
Connected Components – The one Graph Algorithm you need to know - mlwhiz
https://mlwhiz.com/blog/2018/12/07/connected_components/
======
KineticLensman
In case people are wondering, the algorithm is 'connected clustering'

> You can think of Connected Components in very layman's terms as sort of a
> hard clustering algorithm which finds clusters/islands in related/connected
> data. As a concrete example: Say you have data about roads joining any two
> cities in the world. And you need to find out all the continents in the
> world and which city they contain.

The article doesn't contain a concise pseudo-code description of the algorithm
but instead sneaks up on it by throwing in lots of python that implements
various graph operations.

~~~
dahart
> The article doesn't contain a concise pseudo-code description of the
> algorithm but instead sneaks up on it by throwing in lots of python that
> implements various graph operations.

True, that's really a lot of code while failing to outline the simple
algorithm or concisely describe the basic idea. It doesn't mention complexity
either, which is linear.

It might also help beginners to mention the image analogue, i.e. connected
components of a regular grid graph, which is called 'flood fill'.

> In case people are wondering, the algorithm is 'connected clustering'

FWIW, I've always heard the algorithm called 'connected components', and a
quick Google search on both terms gives me more relevant hits for 'connected
components'. In fact, I don't get a single directly relevant hit that
describes this algorithm on the first page for 'connected clustering'.

------
throwaway2016a
I find articles like this interesting. While I have hired plenty of software
engineers and data scientists without Computer Science degrees I do think
sometimes people without them are at a disadvantage.

Breadth First Search is a freshman year concept at many schools. Sophomore at
the latest.

On the other hand it gives me some respect for people who have to learn this
all themselves. Having 4 years to take off from work and just learn this stuff
is a huge luxury and not necessary in a lot of cases. But at least for me I am
very thankful for the structure of a formal degree. I don't think I could have
self-taught all of this.

~~~
ne9xt
I'm a mechatronics engineer working as an embedded software engineer. I've had
to rapidly learn and teach myself a whole lot to be able to do my job
effectively and correctly.

Given that, I studied many of the standard CS algos when I was brushing up for
a FAANG interview. I found this _gem_ of a guide:
[https://bradfieldcs.com/algos/](https://bradfieldcs.com/algos/) and after
working through the examples, I could understand the behavior and underlying
mechanisms of these algorithms. which completely helps when reading about
articles like this!

Anyway, I think one would only be at a disadvantage if they are not seeking to
continuously learn & satiate one's curiosity.

------
LawnboyMax
I guess it's also worth mentioning strongly connected components (SCCs) in
this context (i.e. there is a directed path from any node to any other node in
a component).

In the context of data analytics a SCC finding algo can be used to identify
bottlenecks of communication/transportation networks and fault tolerance of
distributed networks. In this case we are particularly interested in edges
that connect SCCs. Another application is clustering. For instance, finding
clusters of accounts that all follow each other on Twitter. In this case each
such cluster is a SCC.

Kosaraju–Sharir algorithm is an easy to understand SCC algorithm with linear
running time that basically consists of two BFS passes.

P.S. NetworkX ([https://networkx.github.io/](https://networkx.github.io/)) is
a great Python library for working with graphs. It has implementations of many
common graph algos including SCC finding algorithm (it is an implementation of
Tarjan's algorithm).

------
kingkongjaffa
I did this in a module called decision mathematics at A-level.

This is a nice implementation of classical graph problems.

Stuff like traveling salesman etc. Dijkstras algorithm.

The bit I am missing is how does this help me solve a day to day problem?

how can this be applied in something modern other than say a shortest path /
route finding algorithm?

I suppose it's a way to reach an end point most efficiently and that makes it
useful for nueral nets which are essentially tree's with weighted edge values.

the easiest path through the tree is something you could find and that might
illuminate a route to optimizing your "thing" whatever that is.

Am I getting the idea of this right or not? ....

~~~
daveFNbuck
It's actually not even a nice implementation of a classical graph algorithm.
They use a Python list as a queue. Python lists are only efficient when used
as stacks. calling list.pop(0) or list.remove(list[0]) takes time proportional
to the length of the list. This makes the algorithm take quadratic time, when
it should be linear.

For this to be a nice implementation, they should use the standard library
deque for the BFS or just pop from the end of the list to do a more efficient
DFS for the connected component algorithm.

------
blt
The title is absurdly reductive. Anyone interested in graphs should be
encouraged to learn about shortest paths, max flow, spanning trees, etc. and
to identify some of the major np-hard graph problems. "Addition - the one
Arithmetic Algorithm you need to know".

------
commandlinefan
If you find this interesting, you should check out Donald Knuth’s latest “The
Art of Computer Programming, 4A: Combinatorial Algorithms, Part 1”. It goes
into a lot of depth about the theory behind all of this. You don’t need to
have read the first three for 4A to make sense.

------
a-dub
Also known as union-find. Favorite implementation is Sedgewick's weighted
quick union with path compression from the first chapter of his algorithms
text. Also cool is an implementation in the graph chapter which actually makes
use of DFS instead.

