
Programming Algorithms in Lisp: Data Structures - nafizh
https://lisp-univ-etc.blogspot.com/2019/08/programming-algorithms-data-structures.html
======
tom_mellior
Everything about this is bizarre. It spends a lot of time explaining what a
pair is, but then just assumes that computational complexity and O-notation
are known.

Union-find is a strange choice as a first data structure, but OK, maybe it
could be made to work. However, none of the variants presented is like union-
find in actual implementations. There is no path compression. The second
variant of uf-add looks like it tries to do something path compression like,
but it's in the union or (usually) find operation where it needs to be.

And then, the uf-union operation: No practical union-find implementation
accesses the list of all entries in this operation. A linear traversal over
all entries defeats the purpose of having all operations be amortized almost-
constant time. (Also, uf-add seems to allow adding new entries to the data
structure but does _not_ add them to the points list. There are some bugs
waiting to happen.)

I won't try to judge the Lisp part of this book, but the algorithms and data
structures part is deeply flawed, and nobody should try to use this as a first
or primary learning resource.

~~~
vseloved
> but then just assumes that computational complexity and O-notation are
> known.

At the bottom, you can find a reference to the previously published chapters.
Complexity & big-O was addressed in the initial. (This is part of a book, all
the org details are explained in part 1)

> There is no path compression.

The second variant is path compression. Yes, it may also be implemented in
find, but it will make the code more complicated, in my view.

> And then, the uf-union operation: No practical union-find implementation
> accesses the list of all entries in this operation.

You are correct here. I'll make a change, thanks.

> Also, uf-add seems to allow adding new entries to the data structure but
> does not add them to the points list

This isn't needed here as it is supposed that we already have the list of
points, it's just not arranged for efficient disjoint test. I, actually, had
the reference to it, initially, but removed as I considered it redundant. I ll
think of adding it back.

~~~
tom_mellior
> At the bottom, you can find a reference to the previously published
> chapters. Complexity & big-O was addressed in the initial.

I stand corrected, thanks. I knew this was part of a book and did go to the
sidebar to check the titles of older chapters, but there the first part is
called '"Programming Algorithms" Book', not 'Introduction & Complexity' like
at the bottom. Going only by the title, I didn't think there would be
complexity there. My fault.

> The second variant is path compression.

Yes, but only on the "add" operation. It's more important to have it
elsewhere. Imagine a set of elements {a, b, c, d, e} where you do a sequence
of pairwise union operations on (a, b), (b, c), etc., and end up with the data
structure degenerated into a linked list a -> b -> c -> d -> e. If you now
repeatedly check whether a and b are in the same class, you end up searching
the entire list twice, every time you do this search.

You could try adding path compression in union, but I don't think a complete,
efficient solution is possible: You would always be able to construct small
trees that you would have to link to produce bigger, deeper trees.

That's why path compression is always where it matters and can make a useful
difference: In find. In the example above, depending on the details, you would
have one or two full linear searches, and then every query on a and b would be
constant time.

> Yes, it may also be implemented in find, but it will make the code more
> complicated, in my view.

That's one of the reasons why union-find might not be the best introductory
data structure. As for your view, to some extent, when you teach standard
stuff, your view is not as important as actually teaching what you claim to
teach. If you claim to teach union-find, you don't get to pick something
somewhat similar but with radically different complexity.

> This isn't needed here [...]. I ll think of adding it back.

No, don't! You're right that a full list of nodes is never needed by the data
structure itself. A _user_ of the data structure might need one, but also they
might not. I have in the past used union-find in settings where it was more
convenient _not_ to have a complete list of entries.

~~~
vseloved
I examined Union-Find in more detail and found the critical flaw in my
description that you were pointing too. Unfortunately, I forgot about it and
had a simplified view of this data structure that focused on improving Find
but neglected Union. The funny thing is that it was discussed, at times, at
group job interviews that I participated as one of the interviewers, and we
didn't go these deep to uncover that flaw :) (or, maybe, it was long ago
enough for me to forget).

So, I've corrected the Union-Find example to be in line with proper
presentation
([https://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf](https://www.cs.princeton.edu/~rs/AlgsDS07/01UnionFind.pdf))
removing uf-add in the process. Thanks for the very valuable input.

Surely, that made the example not so bright (as we couldn't reduce everything
to a constant-time version with very simple code), but I don't think that the
example is inappropriate. It still remains quite simple from the code
standpoint. Another reason I picked it was that it didn't require an
explanation of any additional data-structure, even, an array, and all the
information could be efficiently represented using structs. This still holds.

------
verandacoffee
In the first installment of this series, the author use := for assignment,
instead of setf. This is at least using undefined behaviour, as keywords are
meant to be constant symbols, not themselves assignable to. Sbcl allows
assignment to the function slot of a keyword, but LispWorks doesn't. Using
undefined behaviour almost certainly guarantees incompatibilities between
implementations, and should always be avoided. Here, it is a central feature.
This makes the book untrustworthy, whatever good features it might have.

I must point out that setf is extensible (defsetf and so on) and therefore
much better anyway.

(I must say though that one of the poor choices made in the C-family of
languages is the choice of = as assignment, when Algol already had used := for
this.)

~~~
vseloved
This was already discussed in great detail (greater than such minor thing
should be) in the comments to the first and next chapter. And, by the way,
Lispworks also allows it (see here:
[https://github.com/vseloved/rutils/pull/39](https://github.com/vseloved/rutils/pull/39)
\- a person even bothered to provide a fix to make it work there). So, get
over it: this book is not about nitpicking but about trying to explain things
in an intelligible and accessible manner. A luxury I didn't always have with
these topics...

~~~
verandacoffee
I'm not sure pointing out undefined behaviour is nitpicking. It could in some
cases be crucial. At the very least, it ties your code to only some Common
Lisp implementations.

I know you can assign to keywords in LispWorks (using a restart), but that
should be done in your rutils library, wrapping the definition of :=, :*, :/.
And then you should do the same for all other implementations, which may
differ in what condition is signalled, and whether there really is a restart
for this, and of course whether it actually is allowed in that implementation.

~~~
vseloved
Sorry, getting back to the initial point: can you point to a statement in the
standard that forbids assigning to symbol-function of keywords or dims it
unspecified behavior? I see only this:
[http://www.lispworks.com/documentation/HyperSpec/Body/t_kwd....](http://www.lispworks.com/documentation/HyperSpec/Body/t_kwd.htm)
that doesn't state anything like this. Besides these constraints, the keyword
symbols are normal symbols so they should abide by the same rules.

~~~
verandacoffee
Following your link: "3\. It causes the symbol to become a constant variable."
and following the link to constant variable: "constant variable n. a variable,
the value of which can never change; that is, a keyword[1] or a named
constant. ``The symbols t, nil, :direction, and most-positive-fixnum are
constant variables.''"

This doesn't mention the function slot of the symbol, so one could perhaps
argue that therefore the function slot is not meant to be a constant variable,
but one could equally well argue the opposite. The various implementation
obviously differ on this. That alone means that this is either undefined
behaviour, or that some of them have this as a very long-standing bug, at
least since their inception.

~~~
vseloved
> This doesn't mention the function slot of the symbol, so one could perhaps
> argue that therefore the function slot is not meant to be a constant
> variable

Exactly

> but one could equally well argue the opposite

No, as variables and functions are different parts and there's a more general
rule about functions applied to keywords.

Your argument seems to be based on a notion that keywords are something unique
and special in CL. They are, but only to the extent that is specified in this
section. Otherwise, they are the same as other symbols.

To me, Lisp is not about "forbidden all that's not explicitly allowed", but
about "allowed all that's not explicitly forbidden" mentality. And it's,
actually, an important trait of the language that makes me value it more than
others. So, sorry, I value your opinion, but I think that such things as :=
need to exist if only to broaden the horizons of people with such opinions :)

~~~
verandacoffee
> ...I value your opinion...

And I value your work on this book.

I apologize for having said at the beginning that "This makes the book
untrustworthy, whatever good features it might have." It was unwarranted, and
not substantiated, and much too harsh.

I do however stand by my claim that it is undefined behaviour. (You could help
by making the suggested changes to rutils.)

------
OskarS
> _To implement an arbitrary data structure, these languages provide a special
> mechanism called records, structs, objects, etc. The proper name for it
> would be "tuple". It is the data structure that consists of a number of
> fields each one holding either a primitive value, another tuple or a pointer
> to another tuple of any type. This way a tuple can represent any structure,
> including nested and recursive ones. In the context of type theory, such
> structures are called sum types_

Wait, what? Tuples aren't sum types, they're product types. Like, they're the
"cartesian product" of some other types. A sum type would be like a tagged
union (or Rust enum) where something can be either one type or the other. I
haven't studied a lot of algebraic type theory, but Wikipedia agrees with me
on that point [0]

Seems like a fairly elementary mistake in an article of this kind...

EDIT: also, this, a few paragraphs down:

> _Moreover, in C++ the pass-by-reference has two flavors: pass the pointer (a
> common approach) and pass by what 's actually called a reference, which is a
> syntax sugar over pointers that allows accessing the argument with non-
> pointer syntax (dot instead of arrow)_

Pass-by-reference in C++ is most certainly not just syntactic sugar for
passing in a pointer. A compiler may implement it that way (or it may not),
but it's totally wrong to say that passing by reference is just syntactic
sugar so you don't have to use arrow notation. The most obvious differences is
that a reference can never be null, it can't be changed to point to some other
object, you can't do pointer arithmetic to it and you always dereference it
when use it. Also: in idiomatic C++, "passing by reference" is far more common
than "passing by pointer".

I'm going to stop reading this article now. I don't have a lot of faith that
this person really knows what they're talking about.

[0]:
[https://en.wikipedia.org/wiki/Algebraic_data_type](https://en.wikipedia.org/wiki/Algebraic_data_type)

~~~
vseloved
> Wait, what? Tuples aren't sum types, they're product types.

Thanks for the correction. As I wrote in the introduction, there will be some
errors in these beta-version chapters that are published on the blog (as I,
obviously, don't have correctors and reviewers), so I'm thankful to all who
notice and point those.

> it's totally wrong to say that passing by reference is just syntactic sugar

OK, I'd say that it's still syntactic sugar with some benefits :) But I don't
agree that it's totally wrong. Some of the things you mentioned are just side-
effects (that you can't do pointer arithmetic or pass a null pointer.
Actually, it seems that you can with some jiggling:
[https://stackoverflow.com/questions/8571078/pass-by-
pointer-...](https://stackoverflow.com/questions/8571078/pass-by-pointer-pass-
by-reference/8571101#8571101)). The point I wanted to make is that pass-by-
reference is mostly the same thing. It's, actually, quite a confusing topic
(as are many C++ solutions) that was not properly presented to me when I
stduied it in school, and this book neither tries to present it, but I had to,
at least, mention it here as I was talking about different passing styles.

~~~
OskarS
It's not mostly the same thing, the difference is pretty significant both in
practice and in a "type-theoretic" sense. I get what you're saying, that "pass
by reference" and "pass by pointer" are both methods to pass an object into a
function without copying, but they are very different. An article like this
where the whole point is to talk about these things at a more fundamental
level, it is bad to claim that they are.

~~~
vseloved
The point of the article is not to talk about low-level specifics of some
language, apart, from, maybe, Lisp sometimes. But, surely, not C++ (although,
if you could provide a link with a good and accessible explanation of the
differences of the 2 styles I'd add it as a reference for further reading).
Stil, I don't see that fundamental difference that you mention. If the
differences originate from the points that you listed in the original comments
can you, please, make the link more explicit? I understand pass-by-reference
as being a slightly higher-level and safer C++ alternative to C's direct
pointer-based style. But, surely, I may be missing something. But, it seems
that the people who answered the SO question I linked are also missing it. For
example, here's a quote: "Basically, references and pointers are not very
different from an implementation point of view, the main (and very important)
difference is in the philosophy: a reference is the object itself, just with a
different name."

~~~
pjmlp
Here is pass by reference and pass by pointer in Pascal.

    
    
        type
    
        Point = record
            x, y : integer
        end;
    
        procedure ByPointer(p : ^Point);
        begin
            p^.x := 12
        end;
    
        procedure ByReference(var p : Point);
        begin
            p.x := 12;
            p.y := 33
        end;
    
        procedure example
        var
          p : Point
        begin
           p.x := 23;
           p.y := 28;
           WriteLn('The point is ', p.x, ' ', p.y);
    
           ByPointer(@p)
           WriteLn('The point is ', p.x, ' ', p.y);
    
           p.x := 23;
           p.y := 28;
           WriteLn('The point is ', p.x, ' ', p.y);
    
           ByReference(p)
           WriteLn('The point is ', p.x, ' ', p.y);
        end;
    
    

Actually the concept is already exposed in Algol, a couple of years before C
was even an idea.

~~~
vseloved
good point, so, here, it is exactly syntax sugar as, unlike in C++, nothing
else except syntax is different from the pointer-based version. Not to mention
that, in the article, I was mentioning the general concept that has one (and
single) variant, in C and Java, and 2 in Pascal or C++

------
seanhunter
The word "continuous" here is used (incorrectly) to mean "contiguous".

"Continuous" means ongoing in time (as in "continuing"). "Contiguous" means
"touching on a boundary". So a "contiguous data structure" is one that
"occupies a single chunk of memory and its contents are stored in adjacent
memory blocks."

Continuity would only really mean anything for time-series data.

~~~
lgessler
At least in general usage, outside of any domain-specific conventions,
"continuous" doesn't have to have the temporal sense. The first entry in the
OED for "continuous":

> a. Characterized by continuity; extending in space without interruption of
> substance; having no interstices or breaks; having its parts in immediate
> connection; connected, unbroken.

> 1673 N. Grew Idea Phytol. Hist. ii. iii. 65 It is compounded of two Bodies.
> The one parenchymous; continuous throughout; yet somewhat pliable without a
> solution of its continuity.

> 1704 I. Newton Opticks ii. ii. 41 The dark intervals must be diminished,
> until the neighbouring Rings become continuous, and are blended.

> 1796 R. Southey Joan of Arc vii. 6 Round the city stretch'd Their line
> continuous, massy as the wall Erst by the fearful Roman..raised.

> 1859 C. Darwin Origin of Species xi. 352 In most cases the area inhabited by
> a species is continuous.

> 1879 J. N. Lockyer Elem. Lessons Astron. vi. 228 If we light a match and
> observe its spectrum, we find that it is continuous—that is, from red
> through the whole gamut of colour to the visible limit of the violet.

> 1881 J. C. Maxwell Treat. Electr. & Magnetism (ed. 2) I. 6 Without
> describing a continuous line in space.

~~~
seanhunter
OK well "contiguous" is more specifically applicable here at least. If you
search "Continuous data structure" on ddg for me it auto-includes results for
"contiguous data structure".

------
zshrdlu
I'm enjoying this series. A while back I was playing around with binary
arithmetic coding and statistical modelling. For the modelling I was using a
simple Markov model, and I hit performance issues. I explored all kinds of
trees but eventually settled on Hash tables. Eventually I had to use my own
specialized bit vectors and hash tables, but it was my first brush with a
problem-space where data structures and algorithms are critical.

The algorithm worked amazingly well for small strings, outperforming the PAQ
versions I tried, but it lost compression power as the input got longer. I
didn't have the mathematical expertise to figure out why that was the case.

Still didn't get it to the kind of performance I liked, but maybe someone with
greater insight could take a look at it: [https://github.com/shrdlu68/ac-
experiment](https://github.com/shrdlu68/ac-experiment)

------
taeric
Common lisp's multiple values took some getting used to. In particular, having
gotten used to tuples and lists, it is somewhat bending to realize you can
return multiple things without either.

~~~
dreamcompiler
The key feature of multiple values is that each value gets passed back to the
caller on the stack, not the heap. So they don't cons garbage the way
returning a list or struct would.

This is a natural thing to do in assembly language. Common Lisp just makes it
available in a high level language.

~~~
heisig
Even better, the first few multiple values are typically returned in CPU
registers and not on the stack.

------
meenzu
This is great thanks for creating this! I love the first few chapters and I'm
really liking the code explanations in this one

