
1% the Code (2001) - mr_tyzic
http://www.colorforth.com/1percent.html
======
sam_lowry_
6 years ago, I rewrote 30 000 lines of C++ code in 1000 lines of Ruby.

A year ago, I rewrote 424 lines of Java+Spring+Hibernate in 18 lines of bash.
This is less glorious, but if you compare the size of the deliverable, it's
39Mb for the J2EE webapp against… 772 bytes for the shell script.

At my current job, I maintain an app with 500,000 lines of code that could
easily fit in 20,000.

Code bloat is indeed a recurrent problem.

~~~
mightybyte
7 or 8 years ago I had to take over a Java project on the order of ~10k lines.
I reduced the code size by 77% by rewriting it in, wait for it...Java.

~~~
userbinator
Java is quite a bit more verbose than some other languages, but I think the
culture is a significant factor; C# is similar. I've noticed the tendency to
overabstract and overgeneralise, apply design patterns for the sake of using
them, etc. is particularly strong. It's inexplicably hard to trace out the
train of thought that makes things like this appear:

[https://news.ycombinator.com/item?id=4549544](https://news.ycombinator.com/item?id=4549544)

I've written small Java apps --- one class only, i.e. one source file ---
which needn't be any more complex, yet coworkers have said there's something
uncomfortable about my code; but for some reason can't explain exactly why.
I'd get comments like "wouldn't it be better if you made this (only used once
and very trivial) line of code a separate function?" "could you use more
classes?" (for a <100LoC script-ish thing that had almost no duplicated code
nor much in the way of loops.) It's almost as if they can't get their heads
around how _simple_ something can be, so it somehow feels very wrong to them.

Then there's the extremist "premature optimisation is evil" attitude, which I
think is completely misguided because more code and complexity is bad not only
in terms of computer time but also programmer time --- it takes more time for
programmers to design, write, read, and debug more complex code.

~~~
rms_returns
The verbosity in Java is soon going to be reduced by an extensive degree by
the three new features included in `Java 8`, namely:

1\. _Lambdas:_ Lambdas are brief and to the point, the expressions reflect the
terseness of `C/C++`. In many situations, you can totally do away with
interface creation by just using a lambda instead. Consider all the lines of
code saved in a legacy code by just using lambdas in code!

2\. _Default methods:_ Again, `Interface patching` is a curse that a lot of
Java projects suffer from. Requirements change slightly and what your typical
enterprise Java dev does is add patches upon patches and layers upon layers in
an interface. With default methods, you can just add default methods to your
existing interfaces without affecting the ABI of your app. Again, tons of
redundant code saved here.

3\. _Type annotations_ : Again, your typical Java devs have to resort to ugly
hacks to place any constraints on the object properties. With type
annotations, all you have to do is:

    
    
        @NonNull String str;
    

And lo and behold! The variable `str` cannot be assigned any `null` value. So,
lots of validation code being saved here again.

To the best of my knowledge, none of the above features exist in the `C#`
language (yet), so I take it as a Java's edge over C#.

~~~
paulddraper
If by "soon" you mean a year and a half ago, then yes, soon :)

~~~
rms_returns
I don't know what major project you are talking which uses these features. I'm
not talking about small projects by John Joes, I'm talking about LARGE
infrastructure projects. For example, Android SDK is extensively used in Java
world, and the Android SDK has tons of Interfaces as either arguments or class
helpers which can be substantially overhauled by using these features.

~~~
SideburnsOfDoom
Please don't move the goalposts; You stated that c# does not have lambdas, but
has for 9 years now.

You also asked "I don't know what major project you are talking which uses
these features". As far as I have seen, and I've seen a fair amount, any and
every non-trivial c# program (and many of the trivial ones too) use lambdas.
Even if it's just in the small-scoped, rote use e.g. of

    
    
       var matches = someList.Where(item => item.x > y);
    

Technically that counts as "using a lambda". Don't knock it for being trivial,
it's a gateway drug ;)

------
archgoon
>But I'm game. Give me a problem with 1,000,000 lines of C. But don't expect
me to read the C, I couldn't. And don't think I'll have to write 10,000 lines
of Forth. Just give me the specs of the problem, and documentation of the
interface.

I could be wrong, but I'd wager that a lot of projects that are 1,000,000
lines of C don't have well defined interfaces or documentation.

However, if the author wants to give it a shot, they could try replacing v8
with pure forth. That's a fairly well defined problem, they just have to
maintain compatibility with the exposed v8 api.

~~~
kctess5
I'm not sure the stats on this one, but it seems to be that something with
1,000,000 loc would _have_ to have decent documentation, or it would be
totally unmaintainable and wouldn't grow to that size without dying a horrible
death.

~~~
protomyth
You would think that, but it can grow without understanding by adding code at
the edges which expands then starts the cycle over. A code archeologist[1]
would see a central island with bridges to developed islands with more bridges
and various ships running between interfaces. Its particularly fun with
multiple subsystem teams. Each one builds its own island of code and the
bridges are scary.

1) please say that exists

~~~
archgoon
> 1) please say that exists The term 'software archaeology' has been used for
> a while

[https://en.wikipedia.org/wiki/Software_archaeology](https://en.wikipedia.org/wiki/Software_archaeology)

And the wiki page suggests there are some consultants in the field.

Many programmers trying to understand open source projects will dabble a bit
in this.

~~~
hga
I have been just such a consultant/employee doing this, even started calling
it software archaeology some time in the '90s. It's a very painful way of
earning money, but there's a whole lot of "poorly documented or undocumented
legacy software implementations" out there. Generally effectively
undocumented, for many projects start out with some documentation, don't keep
it up to date, and by the time you get your hands on the mess it all,
including comments, are more a statement of intention at one point in time
than something directly and immediately useful.

~~~
alch-
> It's a very painful way of earning money

But.. strangely fun, in a twisted way? I love a blank canvas or a well-written
codebase as much as the next guy, but I find that fixing "legacy"[1] codebases
can be pretty enjoyable.

Slowly figuring out and fixing a "legacy" codebase, to me, is basically
solving a giant, complicated puzzle that was left for you by your
predecessors.

Granted, you're mired in evil gunk from the past, but every time you figure
out a small piece of the garbage and refactor it into something nice, you get
to feel awesome. Of course, this assumes that you've convinced management that
the codebase must be tamed[2] and that this will take time; otherwise you're
just fighting with your hands tied.

Then again, maybe I'm just a masochist.

[1] "Legacy", because people don't like it when you call it "evil gunk from
the past that must be destroyed".

[2] And it must indeed be tamed, because otherwise it will just grow more and
more evil until development grinds to a halt. If management understands this,
they'd be crazy to choose the "let the beast grow" option.

~~~
otakucode
It certainly has its moments. Hell, just looking through the legacy code and
seeing what other people did can be entertaining in a 'holy shit would you
look at that!' way. I will never forget when I was working on a codebase in
which the author apparently did not know how to loop or how to hold user
interface controls in an array or something. There were 10 textboxes on the
screen. Instead of looping through the textboxes and calling a stored
procedure with the content of each one (it was PL/SQL inside an Oracle Form...
a dead structure never meant to be used to create a full application which is
what they'd done with it), they first handled the case where textbox 1 had
content, but the others did not, with one procedure call. In the else, it
handled the situation that textboxes 1 and 2 has content but the others did
not with 2 procedure calls.... and on and on and on for all 10 textboxes.
Pages and pages of nearly identical code. I would have expected any self-
respecting programmer to either have said 'there HAS to be a better way' and
stubbornly search for such a way, refusing to proceed along the terrible path
I saw before me, or else leave the profession entirely. But someone somewhere
went through and constructed the whole repugnant edifice...

------
RodgerTheGreat
The most important idea in Forth is factoring. Breaking complex tasks into
small pieces. In an ideal Forth codebase, most definitions are about a line
long, contain at most one control structure and reference no named variables.

Forth's stack-oriented semantics make function calls very cheap- a function
call and return is literally two instructions, whereas languages which use
activation records will turn a function call into dozens of instructions-
copying arguments around, backing up registers, etc. Since it's
computationally inexpensive, you don't need to sweat over whether your
optimizing compiler will manage to inline a function.

Short, simple and mostly pure functions are easy to test in isolation- there
aren't deeply nested code paths to exercise.

Lots of short functions allow your codebase to reuse more of those functions-
any redundancy can be collapsed together, both shrinking the codebase and
improving test coverage. This effect is magnified in more complex programs.

You can do this sort of thing in other languages, but forth's syntax _and_
semantics for function definition and invocation are both very lightweight,
and this makes it much easier to apply.

~~~
tehwalrus
It sounds like what Clean Code[1] really wanted. I still think it's a great
shame that all his examples are in Java (although, in fairness, they are very
pseudocode-ish).

[1]
[https://books.google.co.uk/books/about/Clean_Code.html?id=dw...](https://books.google.co.uk/books/about/Clean_Code.html?id=dwSfGQAACAAJ)

------
benbenolson
I think that the author of this article is mistaking LOC for readability.
Despite colorForth being wonderfully concise and independent of any operating
system, that doesn't make it convenient or readable for future programmers.
Just because something is a number of lines longer than another something,
doesn't mean that it's less readable.

Take Perl, for example. I often write Perl scripts, and they're very few
lines, but nobody can read them, which is really what matters. So, I'll expand
it to be more C-like (calling functions with parentheses, using less implied
arguments and scalar/vector contexts, etc.), thus making it very easily
readable by people that don't even know Perl. I regard this revised version
much better than the more Perl-like and obfuscated (although shorter!)
version.

~~~
nchelluri
I do think readability is key, and found the article intriguing so I took a
very brief look at some colorForth sample code:
[http://colorforth.com/ide.html](http://colorforth.com/ide.html) . It may be
that I'm just extremely new to a very foreign programming language, but I did
not understand it at all.

~~~
djsumdog
I thought I was reading a satire article. That was the first source I found
when looking through the site. I still wasn't sure if it was satire or not.

------
superuser2
>Code is scattered in a vast heirarchy of files. You can't find a definition
unless you already know where it is.

Not if you're using a proper IDE (or the right emacs/vim plugins, which are
effectively an IDE).

>Code is indented to indicate nesting. As code is edited and processed, this
cue is often lost or incorrect.

Not if you're using a proper IDE configured for the project's chosen style. My
engineering org also has linters in pre-commit hooks and automatically
commenting on code reviews.

>Sometimes a line of code contains only a parenthesis, or semicolon. This
reduces the density of the code, and the difficulty of reading it.

Depends on the style guide you choose. Also, some people find this more
readable.

>There's no documentation. Except for the ubiquitous comments. These interrupt
the code, further reducing density, but rarely conveying useful insight.

The only documentation I've ever found useful (aside from comments) was
automatically generated from comments, explaining the signatures of methods
and what they do.

~~~
seanwilson
Not commenting on any language in particular, but you can make any language's
issues easier with a combination of a better IDE, commit hooks, linters etc.
However, it would require less effort if the language itself enforced these
constraints so everyone using it is writing code the same way.

------
modulus1
> It's strongly typed, with a bewildering variety of types to keep straight.
> More errors.

Odd complaint, especially to be levied against C.

~~~
bluetomcat
If anything, C is frustratingly weakly-typed. The integer conversion rules and
the signed/unsigned mess are its biggest mistakes.

~~~
cbsmith
Yup. People get static and strong typing murder up.

~~~
Jtsummers
I think you meant mixed up. It certainly does seem, though, that thinking your
weak and static type system is strongly typed can lead to errors that could
kill a program, and, depending on what the program does, harm people and other
systems.

------
mwcampbell
This post by Yossi Kreinin offers an interesting perspective on Forth and
Chuck Moore's general approach to programming:

[http://yosefk.com/blog/my-history-with-forth-stack-
machines....](http://yosefk.com/blog/my-history-with-forth-stack-
machines.html)

~~~
lolc
Interesting take.

I remember Forth only from the time I was looking to replace `dc` with a more
powerful calculator. I figured Forth could fit the bill because it was a stack
language.

Whoops we didn't get along immediately. I realized that Forth was not meant to
be my calculator, if anything I could have made it my calculator.

------
fffrad
When you look at code you wrote in the past, whether it is 1000 lines, 3000 or
100,000 you could always think of a way to remove the redundancies and such
and turn it into a smaller number of lines.

But the first time you write it, the first time you are solving the problem,
it is much harder to focus on that part.

This is not to say that it's ok to have unmaintainable million lines of code
application, but more that it is not something we willingly do to make
ourselves look more important in a company.

------
chipsy
Proclaiming that the syntax only causes problems stands in complete defiance
of most people's preferences in programming syntax.

We like syntax errors because they mold thoughts more precisely. Going against
that is also going against a lot of "productivity enhancers" \- we build them
because we expect to have a lot of code, and we want to reduce the kinds of
errors that it may contain. That doesn't mean that we should abuse them to
write as much code as we can, but it does turn out that way in practice.

So in an odd way Moore and co are right - we would be more free if we also
constrained ourselves more on this point.

But the broader thought is that you can apply such constraints at any time, in
tandem with what you already like. And that is more likely to produce an
innovation than "just Forth".

~~~
Guvante
I think that needless compiler errors is an interesting point that isn't
discussed often, so that at least is an interesting point.

We have those productivity enhancers because the language makes it easy to
make a mistake there. Ideally we would want to eliminate those mistakes
without the syntactic headache.

Although obviously how to do that is a complex exercise, I do think that
saying "code bloat is bad" even against "bugs are worse" can provide
interesting insights into what we really want from languages.

~~~
dietrichepp
It's discussed fairly often with static analyzers. Static analysis is a
powerful tool, but false positives erode user confidence in the tool.

I've worked with code bases where thousands of warnings (and even several
errors) were left in place, which meant that you had to basically ignore all
diagnostic messages every time you compiled. At that point, you might as well
be 2>/dev/null. Someone even wrote a wrapper script that would diff the stderr
against the "known good" stderr.

My general impression of that project was that it was dying a slow death
unless someone went in and restored developer confidence in the build process,
and the most likely person was me :-/

------
throwaway999888
If some C programmers have that _scoff at higher level programmers_ attitude,
Chuck Moore is the equivalent but addressed at C programmers (and higher, I
guess).

I have a feeling he's not the kind of person a C programmer could say things
like "C is high-level (obvious one-to-one mapping) assembly" and "C is
(blanket) efficient" around, and get away with it.

------
Procrastes
Chuck Moore came out and spoke to us at work not long ago, and I have to admit
I had a total fanboy moment. There are very few people in the world with as
clear a view of how to make an entire system, hardware and software, directly
reflect the problem to be solved with no cruft. There's alot to learn just
from the way he decomposes a problem.

~~~
otakucode
As things have been developing, first with virtualization becoming widespread,
and then with things like Docker continuing the trend of packing things off
into 'separated' containers, I have wondered at what point we will get to the
place where a tool is developed which provides a highly modular kernel along
with the tools to quickly spin up a whole system which has exactly and only
the resources a particular subsystem needs. And, of course, a basic system
that spins these things up when needed. Not just microservices, but atto-scale
services. It sounds like this is the sort of thing Forth promoters could help
us realize.

------
nickpsecurity
1% the code, 0.01% the reader comprehension. I'll stick with a good ALGOL or
LISP language anyday. Especially with macros plus support for real HW rather
than 18-bit etc.

------
overgard
I agree with how verbose some of these languages are, but this is a bogus
argument. You can shorten almost any design on a rewrite, regardless of
language. Not to mention the "density" metric is ridiculous. Minified
javascript is certainly more dense than regular javascript, but nobody would
argue it's simpler.

------
necessity
People like to blame programming languages for programmer's faults.

------
smartmic
I just started to learn the Factor programming language. I think of it as a
modern implementation of a stack based language like Forth (similar what
Clojure is for Lisp).

I admit, the first step were a bit brain-twisting, but after getting hands
dirty, programming feels relieving. Like a rain shower on a steamy hot day. No
worrying about syntax and punctation, elegantly shuffling the stack is fun.
For me a perfect demonstration of KISS.

------
mhd
As someone who's still doing a lot of Perl: Can't we just ease up on the line
count fetish? Some languages are really bloated, but I'd still say that more
often than not, the sheer amount of lines isn't the most major factor of a
program's cognitive load.

------
qwertyuiop924
You have got to be kidding me. I mean come on, you're joking, right? If you
gave me some qualification, maybe. I know that FORTH is more compact than C,
and the code decrease in size might be reasonable in some cases, but the
reasons you give are insane! come on!

Elaborate Syntax: Your syntax is simple, C's is less simple. If you want
elaborate, look at PERL or Haskell. Redundancy and Confusing Types: Fair
enough, but the types have to be the ones they are because C's supposed to be
close to the metal. Therefore, you need specific bit sized types. It could be
better than it is, though.

Strong typing causes errors: Um. Take that up with the angry Haskellers lining
up behind you. I'll just duck off this way... Honestly, C's type system is
pretty awful, but that's not because it's strongly typed.

Infix:...Yeah, pretty much, but the mainstream won't except anything else.

Parens: Tell that to the lispers. Parens shine for syntax parsing, and the
incredibly common editor support makes it even nicer than Python for editing.
Unclear how source will be translated: Well, I guess, but more so than
anything higher up the stack, like, just to pick a random example, most modern
FORTH implementations. Subroutine calls are expensive: Compared to what? 1 mov
or push instruction for for each arg, and than a call instruction, at least on
x86. Most other processor architectures seem to have similar things going.
Those are all pretty fast.

Elaborate Compiler, Object Libraries: Yes, compared to FORTH, the compiler is
elaborate, but that isn't why object libraries are distributed, seeing as
there's a C compiler for every system under the sun. And distributing object
libraries isn't exactly hard.

Lots and lots of files: Yeah, this sucks. But every other system is going to
have the same problem, and a proper module system that associates functions to
files is really the only way to fix it, aside from oddly specific
introspection utilities. Most other languages, say, for example, FORTH, don't
have this, AFAIK.

Indentation as an indication of nesting: And you suggest counting braces as an
alternative? or just not having nesting? Because most languages, say, for
instance, FORTH, have nesting.

No Docs: In BAD C code, this is true. As in bad code in any language, like,
for instance, FORTH, as any language designer knows that the language cannot
make bad programmers good.

Names are Hyphenated: "I think hyphens suck" isn't a problem with the
language, it's just a personal preference. Constants are named: With my
passing familiarity with FORTH, I don't know what he's on about here. Could
somebody explain, so I can see if I agree?

Preoccupation with Contingencies: Because everybody loves leaky abstractions
that fail at critical moments for ill-defined reasons.

Conditional Compilation: This feature sucks. It really does. The problem is, I
don't see what else they could have done. So my response here is, Do you have
any better way to do cross platform compatibility? Because I would honestly
love to see it. I'm serious.

Hooks: Ahhh, I see. You're from the beautiful-diamond camp of software design.
I can recognize you guys by your catchy slogan, "Don't Design With The Future
In Mind." Programmers best interest to exaggerate complexity: No language can
fix this. It's impossible, because this is a social pressure.

Portability: Yeah, basically.

Maybe I haven't seen enough C. Maybe I haven't seen enough FORTH. Maybe I just
can't get into Chuck's mindset here. But his claims just seem insane.

EDIT: I read some more off of Chuck's pages, he seems to be thinking more in
terms of embedded architectures and single purpose code. Fair enough, in those
situations, some of his reasoning makes sense. But the presentation of this
page still makes him look like he's off his rocker.

~~~
0xcde4c3db
> Maybe I just can't get into Chuck's mindset here.

I think this is the crux of it. As far as I can tell, Moore's attitude is
basically that "software architecture" in the conventional sense is mostly a
lot of churn solving problems of its own creation instead of solving
user/customer problems. Looking at it from another angle, the main function of
operating systems, virtualization monitors, container frameworks, etc. is to
let multiple applications share a single computer. Moore's approach is to make
lots of computers so that you can easily afford to have multiple computers per
application. [1]

[1]: [http://www.greenarraychips.com/](http://www.greenarraychips.com/)

~~~
qwertyuiop924
That certainly explains some of it, but some of his points still make no
sense.

------
HillaryBriss
It has elaborate sytnax

