

How to go about designing Data structures/models for a large program? - varrunr

I am writing a chess program but my experience with writing large programs from scratch is limited, and I would like to get better at it. Can anyone suggest me good practices/best way to start with the design of the classes and data structures.
======
EdwardCoffin
I suggest starting with a simple, straightforward representation for the data
model, and start programming the functionality you want. As you implement this
functionality, you will encounter pain-points which make you wish you'd used a
different representation. After a number of these, you may have some ideas for
a different, more appropriate representation you wish you'd used. If the two
representations are close enough, you may be able to just refactor what you
have into this new form, then continue. If they are different enough, declare
what you have done so far a learning experience, abandon it, and start over,
using your new idea(s) for a new representation.

I can think of two different major projects I've done in the past five years
in which I started again from scratch, after major revelations made it clear
to me that it was the right thing to do. In one I settled on the third object
model, and the other I know I did at least two. These were quick iterations
though: I settled on the third and final object model within a month of
starting, and the project completed in seven months, so it's not like I was
throwing out that much work.

Avoid premature optimization with all your might.

Fowler's book Refactoring might be helpful with ideas of how to morph data
models.

A couple of quotations from More Programming Pearls by Jon Bentley:

"Plan to throw one away, you will anyhow." [Fred Brooks's law of prototypes]
"It is faster to make a four-inch mirror then a six-inch mirror than to make a
six-inch mirror." [Thompson's rule for first-time telescope makers]

~~~
varrunr
Thanks a lot! I am also looking for some resource which would help me design
classes well in C++

~~~
EdwardCoffin
I'm afraid I don't know C++, so I can't offer any suggestions there. I can
recommend that you take advantage of the type system, since you have one, to
enforce constraints in your data structures rather than enforcing them in
code. By this I mean (for instance) that if one object can have a relationship
with any of a subset of the rest, strive to make it impossible for that object
to relate to one of the other objects. In Java one would perhaps do this by
making those objects implement a special interface, and having the one object
relate to things with that interface. I know C++ has an equivalent (pure
virtual classes?), but I don't know what it is.

One caveat is that it is quite possible to take this advice a little too far,
and wind up with an overly complicated object model. It's a fine line, and
quite hard to see where it is. I know that early on, I used to have a lot of
simple object models whose constraints were enforced in code which seemed
easiest, but much later in the project it made the program overall much more
complicated, because as I built more and more code to manipulate the model, I
found I was repeatedly having to enforce constraints that I might not even
remember properly (or worse, not enforcing them at all, and encountering
runtime errors due to situations that shouldn't have arisen in the first
place). Later, I tried for more complicated models, but occasionally got
paralyzed in attempting to capture every single constraint in a hugely
complicated model. There's no principle for seeing the line that I can see,
just painful experience.

If you have a C++ guru or mentor, try to get them to recommend a good C++
codebase with a good data model that you can use as a reference. If you can
get one, you might study it carefully and contemplate the way they designed
the model, considering what the alternative representations were, and what
consequences would have arisen from using one of these alternatives. If you
can discuss them with a guru, so much the better. I'd say this is how I
learned the bulk of what I know (or think I know) about the design of object
models.

One of the most influential experiences in my career was integrating a third-
party framework that was under development into my employer's system, also
under development. This meant that I didn't do much coding of my own, instead
I was regularly taking new versions of the third party system, and figuring
out how to weld it into our constantly changing codebase without changing
either too much. To do this I had to spend a lot of time studying both systems
and talking with developers and designers on both sides. I learned a lot from
that. Fortunately, the designers of both systems (of the parts relevant to me,
anyway) were really talented, and strove to do things the right way, even if
it meant throwing things away or slipping the schedule a little. To this day I
will often solve design problems by thinking back and theorizing how the
designers of those systems would have done it.

I should also mention that the final redesigns I mentioned in my previous
comment were almost always the result of my taking my hard-won lessons of the
first month or so of coding, and sitting down for a week with a pencil and
paper and just drawing object relationship diagrams instead of coding. I'd
draw the diagrams, then mentally go over the various tasks the program would
have to perform, imagining how it would manipulate the object model I had
drawn before me. This exposed many flaws very early in the process, preventing
me from doing much unnecessary coding. You can use a semi-formal methodology
here (like UML, but without all of the formalism - you just need to have a
clear way of representing object hierarchies). Simple boxes and lines are fine
too.

------
EdwardCoffin
Using version control and frequently checking in your changes (I'm thinking
several times a day, not a big checkin once a week) will allow you to pursue
avenues of exploration, possibly realize they are dead-ends, and roll back to
an earlier state with no difficulty. This can be really liberating. Don't
worry if they are non-working checkins (do those on a branch if you want),
just make sure you have preserved significant stages in case your subsequent
work doesn't pan out.

Implementing some form of "print myself" function (operator<<, I think - it's
toString() in Java) on each object in your data model, so you can see
something within a debugger, beyond the usual "object at address 0xDEADBEEF"
type of information can be very helpful.

Since you are working on a program in a well understood domain (chess), there
is a lot of available data (PGN databases) and a number of other
implementations. I think it would be worth spending some time building some
scaffolding that will let you take advantage of these for testing. For
example, if you are building a move generator, you can build some harnesses
that will take libraries of PGN games, and use your move generator to
successively generate the candidate moves for each move of each game, and make
sure that the move that actually followed is one that your move generator is
capable of generating. You can also build a harness to compare your move
generator's candidate moves to another implementations', and make sure that
they agree. This kind of thing can take a long time to run, but if you set up
some continuous integration kind of thing to do it continuously,
asynchronously, or to do it overnight, you can catch some problems early
enough that you can handle them more gracefully than later, especially if the
cause is ultimately the object model - you want to fix that before you build
lots of code that depends on it and is resistant to changes.

I'm not sure whether this is as easy in a non-GC language, but I've found
using immutable data objects makes many things more easy to understand. I'd
seriously consider making a position (a particular state of the chess board)
be immutable, and applying a move to that position produces a new position
instead of modifying the position. The functional language literature has a
fair bit to say about the benefits of this approach with respect to
understanding what a program is doing at any given time.

