
Code icebergs - bjplink
http://www.gabrielweinberg.com/blog/2010/11/code-icebergs.html
======
sdrinf
Code iceberg is in the eye of the beholder. Recently started bizdev-people
consistently underestimate the time requirements for certain well-exercised
tasks.

Some of the most common icebergs are:

-form validation (seriously -one of the most highly exercised user-interaction paths; it's all over the place, and scales semi-exponentially with the number of fields)

-search ("how hard could it be? you just put an input form there, then figure out what the user thought, then display it" -exact quote)

-anything that has to process natural language. I mean _everything_. Wanna split up a text into sentences? How do you differentiate between dr. mr., 2004. jun. , and valid sentence-enders? Generating a definite article ("a", "an") before a noun? Keep in mind that 1,2,@,$,=, and other characters might also be valid noun first-letters :) etc.

In my experience, the best anti-iceberg pattern is to follow a portfolio
approach, and for each requirements which smells like iceberg, have a fallback
plan in place -ie. after N hours of sunken investment, execution shifts to
plan B. Usually works out much better, than banging away on the same problem
for days.

~~~
praptak
_"How do you differentiate between dr. mr., 2004. jun. , and valid sentence-
enders?"_

Fun fact: there exists a convention stipulating a double space after a period
that ends a sentence. Not that I'm advocating relying on this for any serious
purposes

~~~
jerf
HTML ended that. I have followed all periods with two spaces in this post. How
can you tell on the screen?

In fact I just checked, and HN is honoring the two spaces, in that they are
output to the actual HTML sent to your browser. And of course, yes, there were
other trends that would have ended this anyhow, "two spaces" in meaningless in
a non-monospaced font and ever more stuff is going proportional as the
computing power necessary to do that continues its steady march from
"prohibitive" to "trivial", but the WWW certainly beat the corpse to death
again.

~~~
nitrogen
The monospaced or proportional fonts aren't the issue; it's that HTML treats
all sequences of whitespace within text as a single space. It's quite annoying
to those of us who like our double-spaced sentences. When a word processor
generates HTML, every double-spaced sentence ends with this:

    
    
      [space]&nbsp;
    

Or even this:

    
    
      [space]<span class="something-about-space">[space]</span>

~~~
jerf
HTML and proportional fonts are two separate issues. HTML ignores them, which
is one problem; two spaces in a proportional font being less immediately
obvious than on a typewriter is another problem. You can come up with some
other issues too if you think about it. It all adds up to a dead tradition.

------
Hexstream
Try to beat this one:

I spent _4 months_ , _full-time_ (I think I was a bit depressive and
unproductive, though (that might have to do with the difficulty of the problem
a bit)) to make a goddamn "error message merging" system where you specify
some merge rules for error messages, and then said messages are "merged"
efficiently at runtime (with another of my adaptations of the awesome Rete
algorithm).

For instance, as a trivial example merging "Please enter your Username." and
"Please enter your Password." could yield "Please enter your Username and
Password."

Merging error messages efficiently with a great concise syntax looked SO EASY
=/ I was wondering why the hell no websites (that I know of) do that because
it's a pretty obvious feature to me... Well, now I know. People don't really
mind that much about these things (maybe they just have low expectations) AND
it's really hard to implement.

I finally made it, right now the implementation is utter crap, with some
missing features and some bugs but the general architecture is there and
works. I'll clean it up and document it within a few months, probably.

It's one of the hardest things I ever made so far in programming.

~~~
GFischer
Heh, I was asked to do the same thing for the current system/website we're
building here at the corporation I work for.

Fortunately we were able to convince them it wasn't time well spent, but it
would be neat.

------
edanm
Steve Yegge once wrote a post on this sort of thing. It's called "Have you
ever legalized marijuana?", but it doesn't focus on marijuana all that much.

Well worth the read: [http://steve-yegge.blogspot.com/2009/04/have-you-ever-
legali...](http://steve-yegge.blogspot.com/2009/04/have-you-ever-legalized-
marijuana.html).

~~~
Nitramp
Except that his conclusion is way off for the Marijuana part. There are lots
of countries where Marijuana is legal and/or tolerated, so apparently the
legal systems are a lot more flexible than the US, or Yegge's estimates for
refactoring costs in the legal system are way off.

------
pmjordan
I've probably fallen for these more often than I'd like to admit. Yet
sometimes a naïve "how hard could it be" is exactly the right approach that
leads to unexpectedly simple solutions.

------
marcusbooster
Every now and again I'll remember that guy who said he was going to build a
Stack Overflow clone in a weekend. I still chuckle.

~~~
jules
This might not be such a fantasy as you think it is. Yes, it will be very
unpolished. But you can get the basics in place in a couple of hours: login,
questions, answers and voting. I agree though that to get it to the level of
stackoverflow will be very hard indeed in a weekend ;)

~~~
nl
Anything is easy if you ignore the hard parts.

------
derefr
I think a code iceberg is actually a symptom of either working at too low a
level, or relying on a library with missing/broken features. You see libfoo,
and you think "great! I'll use that to implement foobar-ization
functionality!" but then, after playing with it for a bit, you realize that
foobar-ization actually requires you to do all sorts of crazy things with the
output of libfoo before you can use it in XYZApp.

Now, you can put all those crazy foobar-izing things _into_ XYZApp, and
that'll work—but they should really either go into libfoo itself, or into a
new library (libfoobarize) that uses libfoo.

This is the case with the example in the article: DuckDuckGo shouldn't be
parsing Wikipedia to make its own abstracts. MediaWiki already creates
abstracts—they're just _bad_ abstracts. The correct thing to do, since
MediaWiki is just a regular ol' FOSS project, is to write a patch that makes
MediaWiki spit out _good_ abstracts, that _are_ actually trivial to use in
DuckDuckGo. Or, even better, if you know MediaWiki cares about having good
abstracts, just submit it as an issue to their tracker and let them do it for
you. In other words, repeat the programmer's litany to stave off NIH: "It's
not my job. I shall buy, not build. 80% of the features at 20% of the cost.
Don't ask a question, send a message. No god-objects. Encapsulate,
encapsulate, encapsulate."

Note that, of course, there are cases where there really _is_ no libfoo—but
then you're doing something totally new, and you can tell the client right up-
front "no one's ever done this before, so we have to schedule time for R&D
before we can even tell you how much time this feature will take."

There is also the case where the only libfoo/libfoobarize is a proprietary one
used by the people you're trying to steal market-share from by implementing
this feature, in which case you can tell your client "we know it's possible,
but we don't know how long it took them to build it. What we _do_ know is that
no one else has yet copied them, which means that foobar-ization _isn't_
trivial. It'll probably take a while."

------
gfodor
Another class of code icebergs are numerical algorithms. Often a few dozen
lines of Matlab or R can be the result of months of effort. Failed approaches,
tolerance thresholds, manual data cleansing, and more can all end up living as
a few lines of math.

~~~
leif
Most math takes the form of icebergs. :-)

------
dhruvbird
Hey it's great you got a name for them now!

You won't believe how many times I've been in a discussion about something and
the other person has said "oh that's easy to do" or "it can be done in a few
hours" when in fact if they were to go into the details, they would see the
hiding devil...

~~~
praptak
My personal red-flag phrase is "You just need to ..."

Favorite occurence: "You just need to build a state machine." Yeah, saving the
whole browser-side state (did I mention third-party GUI components?) of an
application and reestablishing the server-side session state to match it is
really easy with this piece of sage advice.

~~~
damncabbage
"Just" is unofficially a banned word around here. :)

------
patio11
I always annotate these on development plans with HBG: Here Be Dragons.

~~~
caf
Enquiring minds want to know: Why does G denote Dragons?

~~~
patio11
iPad typing is not recommended before morning coffee.

------
amix
I think great products tend to have a lot of complexity, but most of the
complexity is hidden away from the user. This picture sums up this thought
(and it mimics Gabriel's iceberg metaphor quite well):
[http://amix.dk/blog/post/19555#The-essence-of-minimal-
produc...](http://amix.dk/blog/post/19555#The-essence-of-minimal-product-
design)

------
bediger
I think this is one way that J.P. Lewis' "Large Limits to Software Estimation"
(<http://scribblethink.org/Work/Softestim/softestim.html>) comes out in
reality.

Naturally, there's an uncountable infinity of ways that comes out in reality,
but Code Icebergs are a common way.

------
moondistance
On the topic of mining Wikipedia, DBpedia (<http://dbpedia.org>) is a
fantastic source for structured Wikipedia content. Extracting data is pretty
easy with SPARQL.

Freebase (<http://freebase.com>) isn't bad, either.

~~~
jules
The DBpedia license would force him to license his code under GPL or similar,
no?

~~~
moondistance
I haven't read the GNU Free Documentation License
([http://en.wikipedia.org/wiki/Wikipedia:Text_of_the_GNU_Free_...](http://en.wikipedia.org/wiki/Wikipedia:Text_of_the_GNU_Free_Documentation_License)),
but I don't think so.

"DBpedia is derived from Wikipedia and is distributed under the same licensing
terms as Wikipedia itself" <http://wiki.dbpedia.org/Datasets#h18-19>

Please correct me if I'm mistaken!

------
bialecki
Twitter is one of my favorite examples of this. Lots of people look at it and
say, "it's so easy implement to build." I'd like to see them scale it to
millions of users.

~~~
boulderdash
scaling twitter _is_ easy. it is getting the users that is hard. I can say
this because I was part of a service in 1999 that had more users and volume
(bytes/hits/etc.) yet didn't have nearly the same issues.

~~~
nl
_I can say this because I was part of a service in 1999 that had more users
and volume (bytes/hits/etc.) yet didn't have nearly the same issues._

Not to be rude, but I doubt it.

Twitter currently has 175 million users[1]. Estimates in 1999 for the online
population of the entire internet were 259 million, with 110 million in the US
[2].

In 1999, I imagine Yahoo and maybe a couple of other sites
(Microsoft/Excite/AOL/Lycos?) were getting similar traffic numbers to what
Twitter does today. BUT the scaling is very different, because Twitter
requires fan-out of messages, which none of those sites did.

[1]: <http://www.pcmag.com/article2/0,2817,2371826,00.asp>

[2]:
[http://web.archive.org/web/20000208141450/http://c-i-a.com/1...](http://web.archive.org/web/20000208141450/http://c-i-a.com/199911iu.htm)

------
callmevlad
A great example comes from 37signals -
[http://answers.37signals.com/basecamp/52-how-can-i-move-
or-c...](http://answers.37signals.com/basecamp/52-how-can-i-move-or-copy-to-
do-lists-between-projects) \- look for Sam Stephenson's reply (from Sept 8th)
in which he explains all the edge cases below the surface.

------
staktrace
There's the classic rule of thumb from Brooks' Mythical Man Month: turning a
project into a product takes 9x the effort.

------
rams
Icebergs are a good analogy for technical debt as well. The hackish stuff
below the surface, often done against better advice invariably causes serious
damage.

