

Q about Lisp union - paul_reiners

As I thought I understood it, Lisp union implements set union.  So, I thought that<p><pre><code>  (union '(a a) '(a b))
</code></pre>
would return<p><pre><code>  (a b)
</code></pre>
And, in fact, it does.  This led me to think that<p><pre><code>  (union '(a a) '(b))
</code></pre>
would also return<p><pre><code>  (A B)
</code></pre>
However, it returns:<p><pre><code>  (A A B)
</code></pre>
Why does the fact that there is one less 'a in the second argument in the second call mean that there is one more 'a in the result?  This is definitely not set-theoretic union.  Why is the behavior of Lisp union defined this way?<p>Here is the sequence of calls again:<p><pre><code>  [1]&#62; (union '(a a) '(a b))
  (A B)
  [2]&#62; (union '(a a) '(b))
  (A A B)</code></pre>
======
paul_reiners
Actually, I take back my statement that this is not set-theoretic union.

If I'm willing to accept '(a a) as representing the set '(a) when I use it as
an input to union, I should be just as willing to accept '(a a b) as
representing the set '(a b) when I receive it back from union.

------
philh
Union seems to be implemented in clisp by copying the second set, and adding
to the copy any elements from the first set that aren't in the second. If
there's a duplicate in the first set, they both get added because it's the
original of the second set that gets tested against rather than the copy.

So (union '(a b) '(a a)) gives (b a a).

------
maximilian
My set theory knowledge just says that the union of two sets gives you all the
elements in them, with no repeats, so I would also follow your thinking. Even
when I work with mathematica, I often union two lists to remove duplicate
entries, because mathematica makes sure to remove duplicate entries.

The no-duplicates would seem to be a very important thing about union, and
when they are removed or not. Union is just the complement of intersect, which
would be the empty set for (a a) int. (b). So the union would be all the
elements: (a a b) and after removing all duplicates you'd get (a b), which is
exactly the behavior what you and I expect. Must be something with the
duplicates rule.

------
aggieben
makes sense to me, and it seems right by set-theory too. Take this all with a
grain of salt, though, because it's been a while since I was taught set-
theory, but here's my stab at it:

there is an intersection between '(a a) and '(a b); the element 'a. Thus, the
union would be the set of things inside the intersection --- 'a --- and the
parts outside it --- 'b.

With the second union, there is no intersection, so the union is merely the
set of things not in the intersection, which is the original sets, resulting
in '(a a b). There is no relation defined in the union operation between those
things not in the intersection, so they remain in the union set.

------
konsl
From [http://www.cs.queensu.ca/software_docs/gnudev/gcl-
ansi/gcl_9...](http://www.cs.queensu.ca/software_docs/gnudev/gcl-
ansi/gcl_932.html):

... If either list-1 or list-2 has duplicate entries within it, the redundant
entries might or might not appear in the result ...

~~~
paul_reiners
Yes, you're right. The fact that they leave the behavior of union in this case
undefined in the standard is surprising to me. The fact that SBCL and GNU
CLISP implement it the way they do is just bizarre to me.

~~~
konsl
It was probably left in for better performance. Glad I could help.

