
Algebraic patterns – Semigroup - lebek
http://philipnilsson.github.io/Badness10k/posts/2016-07-14-functional-patterns-semigroup.html
======
chubot
I think of these programming constructs as Monoids rather than semigroups. A
monoid is just a semigroup with an identity element:

[http://mathworld.wolfram.com/Monoid.html](http://mathworld.wolfram.com/Monoid.html)

Most of his examples have an identity, giving a monoid.

    
    
        * (string, concat): identity ''
        * numbers with addition/multiplication: identity 0/1
        * numbers with min/max: Numbers in computers are 
          finite, so this is quite useful.  stdint.h defines 
          INT32_MAX, INT32_MIN, etc.
        * booleans with && / || : true / false
        * function composition: identity function f(x) = x
        * frequency map: {} or {A: 0, B:0, ...} depending on 
          how you define it
    

Comparators: Is this a bug in the post? It doesn't seem to follow the pattern
because the arity of the functions is different. In a semigroup or monoid all
the elements are supposed to be drawn from the same set.

Other examples I can think of:

    
    
        * Python's dict with respect to member .update() 
         (if you think of it as an operator rather than
         mutating member).  Identity is {}
        * Unix Pipes: identity 'cat' (note that the shell
          supports point-free programming)

~~~
cottonseed
As you say, most semigroups are just monoids where you forget the identity.
Here's an MO thread on examples of monoids that aren't trivial restrictions:

[http://math.stackexchange.com/questions/460/are-there-any-
in...](http://math.stackexchange.com/questions/460/are-there-any-interesting-
semigroups-that-arent-monoids)

~~~
moomin
Even with these examples, the absence of an identity can be useful. e.g. Non-
empty lists form a semigroup.

~~~
JadeNB
> e.g. Non-empty lists form a semigroup.

If I take everyone's meanings correctly, what you describe (a monoid with the
identity _removed_ ) is different from what cottonseed
([https://news.ycombinator.com/item?id=12110936](https://news.ycombinator.com/item?id=12110936))
describes, which is a monoid with the identity _forgotten_. That is, it's
still there, but we don't require it to be there axiomatically.

Note that _every_ monoid is a semigroup (because it satisfies a stronger
collection of axioms), but not every monoid with identity removed is a
semigroup (because it's possible to combine two non-identity elements to get
the identity element): consider, for example, the integers without 0.

------
skybrian
I'm not sure how useful this is in practice. For example, addition in actual
computer languages isn't necessarily associative and compilers often can't
treat them as associative.

Floating point numbers added in the wrong order can result in catastrophic
cancellation. So for example, as far as JavaScript is concerned, numbers
aren't associative.

Integers may overflow. Twos-complement addition is associative, but only
because if you get the same bogus answer in the case of overflow. If you
actually detect overflow, you may get overflow or not depending on the order
in which you add the numbers.

Oftentimes we ignore these issues, but it seems like if you're talking about
taking advantage of associative rules to reorder calculations, you have to
think about it. Thinking in terms of semigroups instead of concrete datatypes
seems like a likely source of bugs when your model makes assumptions that
aren't actually true.

~~~
GFK_of_xmaspast
Is unsigned arithmetic non-associative?

~~~
groovy2shoes
Integer addition is associative, even for fixed-bitwidth integers (which
constitute "the integers modulo 2^ _n_ " for some bitwidth _n_ ; e.g. 8-bit
bytes under addition form the group of integers modulo 2^8 (i.e., integers
modulo 256)).

Note that both the full integers as well as the integers modulo _n_ are groups
under addition. As groups are more restrictive than semigroups, all groups are
also semigroups (groups are semigroups that also have an identity element and
where every element is invertible).

Integer multiplication also forms a semigroup (and in fact a monoid, due to
identity), but not a full group due to zero not having an inverse. So, it's
usually said that "the non-zero integers" form a group.

~~~
JadeNB
> Integer multiplication also forms a semigroup (and in fact a monoid, due to
> identity), but not a full group due to zero not having an inverse.

Wait, what? Very few integers have multiplicative inverses (in the integers).
If you meant to speak of modular inverses, then even that is false for
composite moduli like 2^n (with n > 1), which may be why im4w1
([https://news.ycombinator.com/item?id=12113934](https://news.ycombinator.com/item?id=12113934))
asked if you meant to work modulo a prime.

~~~
groovy2shoes
Ah, you're right. I don't know what I was thinking. Thanks for the correction!

------
vikiomega9
I really wish the applications section had more detail or perhaps an
"editorial". I don't really follow the consequence of a frequency map being
associative.

~~~
noelwelsh
Yeah, the application section is pretty poor. The number one large-scale
practical application of semigroups (and, more often, their extension the
monoid) is in parallel and concurrent systems. Associativity says that order
of operations doesn't matter, which in turn implies you can distribute
computations however you like across multiple machines and still get the
correct result. This is the big idea leveraged in systems like Summingbird and
other big data frameworks (e.g.
[https://arxiv.org/abs/1304.7544](https://arxiv.org/abs/1304.7544)) and in
CRDTs.

~~~
Ixiaus
This is a good reply and I'll add to it.

Note that CRDT stands for Conflict free Replicated Datatype and there are a
few sub-acronym expansions: Convergent Replicated Datatype and Commutative
Replicated Datatype.

[EDIT] I wanted to add that CRDTs were mentioned here because in order for
them to provide the kind of guarantees they do they _must_ satisfy the law of
associativity (or, be a semigroup), commutativity, and idempotency. So,
technically, you have to go further than semigroups with _commutative
semigroups_. [/EDIT]

At Plum we made _heavy_ use of CRDT's on-top of an eventually consistent
whole-state replication technology called Plumtree, no relation to the
company's name, just coincidence, within the internet-connected dimmer we
built called the Lightpad.

The primary design goal was to enable advanced configurations: Arbitrary
groups of Lightpads could be controlled from any other one on the same network
by binding it to a specific finger gesture. We also did not want this
configuration to be dependent upon any single master, they had to be truly
master-less.

In-general, Eventual Consistency is pretty scary when you're deploying it to
embedded internet-connected devices where you can't immediately shell onto the
device like you can with a server in a data center. I have lots of stories
here I'll write about sometime of the pains I encountered while developing
this solution and how I ended up on CRDTs.

Strong Eventual Consistency (typically, CRDTs deployed on-top of an eventually
consistent substrate), though, is very safe and provides a lot of guarantees
about the concurrent behavior of the algorithm as it acts upon your data.

Using a CRDT (specifically the ORSWOT - Observed Removal Set Without
Tombstones) eliminated the majority of our pains and concerns with data-loss,
conflicting concurrent writes, and integrity in the face of flaky consumer-
grade home networks (Plumtree is highly-available and partition tolerant -
perfect for our needs where we have to be able to continue moving forward even
if a _majority or more_ of the Lightpads go offline and never come back up).

~~~
ergl
It's always good to hear about IOT companies using CRDTs because it seems like
the perfect fit for it, yet you don't hear very much about it. Also
interesting hearing you mention you were using plumtree for your replication
backend.

To elaborate more on your point, CRDTs are super helpful while trying to
distribute a lot of devices, but they only get you so far. While building
Lasp[0][1], a distributed data-flow language using CRDTs, we found out a lot
of scalability problems with naive usage of CRDTs. We are aiming to reach
10k-20k nodes in the near future, so we are focusing a lot on reducing network
usage.

State-based CRDTs send their full state to all their peers, which works ok
when your states are small, but they introduce lots of overhead in any other
case. Operation-based CRDTs only send the actions performed on it (add 1, or
rmv 2, for example), but these are not idempotent and require a lot of
guarantees from the underlying distribution backend.

We are focusing on using Delta CRDTs, that combine the low network usage of
operation-based structures, with the idempotence of state-based approaches.

Using plumtree for your backend makes it resilient to network failures, but
using the default full membership protocol makes it almost unusable when
you're dealing with a big number of nodes. Using alternative protocols like
Hyparview greatly reduces the number of protocol messages in your network.

Finally, since Lasp is a data-flow language, we are applying control-flow
analysis to select and remove unused or intermediate values in the program,
thus also reducing network usage.

[0]: [http://lasp-lang.org/](http://lasp-lang.org/) [1]:
[https://github.com/lasp-lang/lasp](https://github.com/lasp-lang/lasp)

~~~
Ixiaus
Yeah I'm excited to see what you guys come up with on delta CRDTs.

Our node cluster sizes are fairly manageable so we can tolerate the
inefficiency right now but it will be nice to have a solution in-place when we
want to optimize.

I did not know about HyParView. Thank you!

------
kccqzy
Might be slightly off topic but in the recently released Glasgow Haskell
Compiler, semigroups are now built into the base library. No more dependencies
when using it.

~~~
Bromskloss
How does that work in Haskell? Is the user of a semigroup somehow guaranteed
that its operation is associative?

~~~
wereHamster
The person who implements Semigroup for a data type must ensure that the laws
hold.

~~~
haimez
Which is where property based testing really shines. Still requires the
implementer to be a good citizen and write those tests though.

~~~
kristianp
AKA QuickCheck.
[https://en.wikipedia.org/wiki/QuickCheck](https://en.wikipedia.org/wiki/QuickCheck)

------
dkarapetyan
Incidentally the typical mathematical curriculum does not even bother with
monoids and semigroups when it comes to the initial introduction. The
introductory material jumps right into the definition of a group and then
mentions on the side that removing a few key pieces leaves you with a monoid
and a semigroup. Same with fields, rings, rigs, etc. The most constrained
piece is introduced first and then dropping requirements leaves you with
something less constrained for which you can prove fewer theorems.

I learned groups and fields first and then learned about the more relaxed
versions. I'm not sure which approach is better though. Groups are certainly
more fun to play with but monoids probably show up in more contexts.

