

What I Learned from Watching Notch Code - Mizza
http://gun.io/blog/what-i-learned-from-watching-notch-code/

======
AndyKelley
"Here's the most important bit: Notch's testing is mind-bogglingly thorough."

For the record, this is completely contrary to his testing method for
Minecraft. Every time there is a new version of Minecraft, there are more bugs
than features. This is even true when a "bugfix" release comes out. More bugs
are introduced in a "bugfix" release than are fixed. Seriously, check it out:
<http://www.minecraftwiki.net/wiki/Version_history>

I say this as the owner of a library to provide bot API to Minecraft.
<https://github.com/superjoe30/mineflayer>

All I'm saying is... "thoroughness" and "testing" are not two words I expected
to be in the same sentence as Notch's methodology.

~~~
regularfry
Maybe he's learnt.

~~~
Dylan16807
The pre-1.8 release less than a week ago had an obvious and crippling bug. An
experience system was added, and when you died you dropped your experience.
But instead of doing so as a single object, it made thousands of objects each
holding one single XP point. This made the game completely unplayable if you
were in the same area, and even crippled multiplayer servers for people not in
the area.

Also, while that is the most recent glaring bug, in the past there have been
multiple occasions where a release broke trees so that the leaves didn't decay
after the trunk was removed. Punching down a tree is something you do in the
first 30 seconds of a normal minecraft game.

~~~
ugh
It was a pre-release. Also: leaves not decaying was the intended behavior at
the time, it wasn’t a bug.

But that’s not really the point. There is no way to test Minecraft thoroughly
if you are pretty much on your own. It has too many features. Minecraft is too
complex for this method to work.

Not that it matters, really.

~~~
AndyKelley
I disagree with this. Using my bot framework I actually began writing a test
suite. I stopped because it wouldn't actually have been very helpful to what I
was doing due to various reasons, but I progressed enough to know it was very
doable.

------
chrisrhoden
moved from the blog comments section:

Woah, man. Woah.

I think you're missing the primary benefit here, which is that he avoids
regressions. By the time you are adding feature 256, you have 255 features
which are going to possibly be affected by the code changes you introduce.
This is a huge part of software engineering, and one that has gotten lots of
attention over the years.

The process you describe for your development sounds extremely tedious, but
probably won't break down until month 2 of development on a team of one. Once
you reach a level of complexity beyond this, that's where automated testing
proves it's salt.

The style of testing you're describing is commonly referred to as integration
testing or acceptance testing, because it is designed to test the full stack
in harmony. There are a number of great frameworks out there to help you do
this. Cucumber is the one that's gotten the most love in the circles I am
familiar with. You can write your steps in python or javascript, so don't
worry that it's written in Ruby.

The typical thing to do once you've started doing automated testing is to
actually _write your tests first_ , watch them fail, then write the code to
make the test pass. This forces you to ensure you have good test coverage
(every feature is tested) and has been shown to result in better designed
systems.

You have tons of reading to do if you want to learn more about this, but hit
me up if you want a basic rundown.

~~~
genbattle
I think games are hard (if not impossible) to test with unit tests or codified
tests like the ones you mentioned.

At the company I work for we rely mostly on manual testing for testing the
games and applications we make. We do some automated testing in the form of
Tinytask tasks that are left on overnight to hammer an application. In terms
of release testing, especially with a custom game engine, there's no easy way
to codify a test which actually plays through a game where there is some
random mechanic, or checks for things like the visual accuracy, windows
flashing up on the screen when they're not supposed to, etc.

Web development frameworks like Selenium are great for UI testing, but they
require identifying interface elements by their IDs, or performing a 100%
image recognition match for targeting. If anyone knows of a good UI testing
framework that would work for games/DirectX apps, I'd definitely love to hear
about it.

I'm not bagging automated testing, I think it's a great way to save time and
avoid the boring bits. I just don't think it's very relevant to game design,
but for the OP's enterprise app job it would definitely be suitable.

~~~
chipsy
I think the #1 reason why games don't need a lot of automated testing is
serialization, or rather the common lack of it in most game worlds. If the
data gets corrupt, you can reset the game and all is well again. When the rare
corner cases pop up, it's easy to play whack-a-mole with them, because they're
all "shallow" in the sense of the game starting from a clean reset frequently,
so you can usually reproduce the bug quickly.

Of course, if the game does need heavy-duty serialization of everything, all
bugs are potentially deadly. And this is largely the case for the line-of-
business, social, and productivity apps, because that data is considered
mission-critical. Corruption is not OK, reverting to backups is to be avoided.
When a game has that kind of requirement, things get a lot tougher - and so
games have naturally evolved to favor minimizing the save data to nothing or a
few stats.

And this is borne out by looking for contrapositives. There are two genres
that have a history of tending to be buggy because of some form of long-term
data corruption: Large-scope RPGs and turn-based strategy games. It's hard to
reach the bugs in those games, so it's also hard to fix them.

~~~
genbattle
This is the difficulty for the company I work for. We make a range of desktop
applications including launchers, productivity apps, media apps, games, etc.

Because all of these apps are based on a game engine, we get the complex and
hard to test bugs that games get (texture cache corruption, null pointers in
the scene tree, etc.). There's much more serialization and pressure than with
pure games (but still not as much as with enterprise applications) because
we're dealing with transactions with 3rd party services, or because a screw up
could corrupt a user's whole music collection, or because users don't expect
their music player to crash every 4 hours.

The game engine adds a huge amount of extra variability to our applications;
we have to not only watch out for obscure bugs in our custom script code
(which the engine silently tries/fails to execute anyway), but also in the
(C/C++ based) game engine it is driving. The upside to using an engine with a
simple scripting language is the shorter development times to get things off
the ground and the high-performance/shiny visuals, but I feel it costs us much
more down the line in terms of stability and extensibility.

~~~
palish
Wait, wait... "texture cache corruption", and "null pointers in the scene
tree"?

Edit: It's both scary and awesome that you rigged a game engine to manage a
music library a-la iTunes. Are you profitable?

~~~
genbattle
I'm not sure I would be allowed/qualified to say much about our financials.
From what I can gather we are profitable, with ongoing contracts for clients
such as Dell.

[http://www.unlimitedrealities.com/blog/video-the-dell-
stage-...](http://www.unlimitedrealities.com/blog/video-the-dell-stage-touch-
ui-and-music-stage-apps/)

Those issues I mentioned are examples of things that we have actually run
into.

edit: Our specialization in touchscreen development is really what drives the
company, but hardware accelerated graphics are also a huge draw.

~~~
spoondan
At the risk of going too far off topic and without intending to be too harsh:
at least in the video, that interface looks nice, but there is too much lag
between when a tap/gesture is made and when the interface reflects it. This
breaks the direct manipulation metaphor and is, I think, a showstopper. The
pinch to zoom example was particularly off-putting: how can I know how much
I'm zooming when the interface doesn't track my gesture? I cannot imagine
being happy with pinching, waiting a second to see what's happened, then
pinching and waiting to adjust (repeat until I get it right or am too
frustrated).

~~~
genbattle
I don't think there's too much danger in going off-topic if the conversation
is interesting and has some substance.

Yea I think lag in touchscreen interaction is a huge problem. I think this is
largely a result of hardware. We've had to ship on atom-based hardware with
2003-era DX9 graphics. Even the latest Intel chips with DX10 are horridly slow
when it comes to graphics.

The other side is actual touchscreen hardware. The machine used in the video
is an HP Touchsmart, which uses an optical touchscreen panel. These panels
have a latency of around 100-200ms, and that's before we even start processing
the touch event information. Capacitive touch sensors are much better, but
the're expensive to manufacture above about 10 inches.

In the end lag/accuracy is a reality of the low-cost hardware OEMs use, and
there will always be some trade-off between hardware cost and performance.

------
chime
I do something similar with my web-app code. I keep an iPad open next to me
when I code. Anytime I hit Cmd+B in my text-editor, it rsyncs the site to the
server and some JS code running on the iPad automatically refreshes the page.
It works quite well for creating HTML layout and updating CSS nearly-on-the-
fly. If I have 4-5 browsers open on my Mac, they all instantly update too. So
all I have to do is Cmd+Tab through them to verify it all looks good.

~~~
g0atbutt
Can you go into more detail about how you set everything up? This sounds
immensely useful.

~~~
chime
Editor: BBEdit. Cmd+B is hooked to the following AppleScript for my project:

    
    
        tell application "BBEdit"
            save front document
        end tell
        
        tell application "Terminal"
            if (count of windows) is 0 then
                do script "/path.to.my/make.command"
            else
                do script "/path.to.my/make.command" in window 1
            end if
        end tell
        
        tell application "Google Chrome"
            activate
        end tell
    

My make.command (simplified):

    
    
        #!/bin/sh
        coffee -c mycode.coffee
        lessc -x mycode.less > mycode.css
        rsync -avz --delete --force --exclude ".DS*"
              -e "ssh my.ppk" /localpath/www user@me.com:/remotepath/www
    

On my server, I run a changed.php file:

    
    
        function fileHash($fn) {
            echo sprintf("%u.", crc32(file_get_contents(dirname(__FILE__) . $fn)));
        }
    
        header('Access-Control-Allow-Origin: *');
        fileHash('mycode.js');
        fileHash('mycode.css');
    

On all my HTML pages, I run this piece of CoffeeScript (compiled above to JS):

    
    
        # defined elsewhhere:
        #   debugmode (bool)
        #   reloadIfChangedLast (empty string)
        #   getCurrentView() returns #pageHashURL
    
        reloadIfChanged = ->
            if not debugmode
                return false
            $.ajax
                type: 'GET'
                url: 'me.com/changed.php?rnd=' + Math.random()
                error: ->
                    setTimeout reloadIfChanged, 500
                    return
                success: (data) ->
                    if reloadIfChangedLast and data and
                       reloadIfChangedLast isnt data
                        reloadIfChangedLast = data
                        document.location.href = 'index.html?rnd=' +
                                                 Math.random() +
                                                 '#' + getCurrentView()
                    else
                        if data
                            reloadIfChangedLast = data
                        setTimeout reloadIfChanged, 500
                    return
            return
        

It may seem like a lot of hassle but for me it's just copy-paste and edit a
few things per project. And the benefits are tremendous.

~~~
GeneralMaximus
Brilliant. AppleScript is underrated as an automation language. Sure, it's a
bit cumbersome to write, but nobody says you _have_ to use it for everything.
It plays well enough with shell scripts that you can leave the heavy lifting
to another language and use AppleScript as a bridge between the command line
and the UI. Me and a friend once wrote a bot in Python to control various
music players from IRC. The bot uses AppleScript to talk to iTunes on OS X,
and GObject introspection (or whatever they use on GNOME these days; I have
little experience with desktop Linux) to talk to Rhythmbox. See
<http://github.com/GeneralMaximus/amazing-horse>

All of Apple's official applications support scripting, so do most good third
party apps. You don't need extra JS on your page to autorefresh Safari. This
little snipped reloads the front-most Safari tab:

    
    
      tell application "Safari"
        set sameURL to URL of tab 1 of front window
        set URL of tab 1 of front window to sameURL
      end tell
    

You can even send Safari a snippet of JS to run. It's possible to automate any
UI interaction by sending mouse clicks or keyboard events to _tell application
"System Events"_.

You can look at any app's AppleScript dictionary using the AppleScript Editor.
Go File -> Open AppleScript Dictionary ... and then pick your app.

~~~
anthonyb
I do a similar thing in bash for more normal development - basically a script
to run my unit tests, look for OK/Error in the output and print either a
couple of bars in green or the error output in red. It runs over and over
again in a separate monitor, so as soon as you have wrong code, you know about
it.

------
technomancy

        He began by building the engine, and to do this he used 
        the ‘HotSwap’ functionality of the Java JVM 1.4.2, which 
        continuously updates the running code when it detects 
        that a class has changed.
    
        If anybody at Google is reading this, if you add this 
        feature to the Android emulator and I will literally 
        drive to your house and kiss you on the mouth.
    

Indeed, the fact that Android _doesn't_ already have this makes it hard for me
to take it seriously as a platform. Immediate feedback is crucial to any
development environment.

~~~
aboodman
Use an actual device, not the emulator. The emulator is super slow, but with
an actual device you can have 30s iteration times.

Edit: I know 30s is still nowhere near ideal. Hell, on Linux Chromium (a
massive project) we have faster iterations than that. But it's good compared
to the Android emulator.

I think you should be able to go even faster, but I need to dig into how Ant
actually works. It seems to be redoing more work than necessary for each
build.

~~~
atomicdog
Bit OT, but, what device would you recommend as a cheap 'testing' platform?
I'm not looking for a new phone/tablet, just something for development.

~~~
aboodman
Sorry, I don't know. I use a Nexus S as my primary phone so that's what I test
on.

------
Markku
I learned some things too. However, even if you have a tiny piece of software,
like a game that you can reliably complete manually in 20 minutes, completely
playing it through after every change is way too slow for "live coding". You
should see your results instantly. Just like your test suite should run in
seconds. If you have to wait minutes to compile, deploy and test your
productivity will fall. A full integration/acceptance test suite can take
longer, but it should be as automatic as possible and you can't use it for
rapid feedback like this, but more for regression/acceptance testing.

Notch was able to take advantage of immediate feedback for the most part of
his coding. The first hours spent on the rendering was tested by watching the
world being rendered live. I've been doing the same developing a game in
Clojure. When he got as far as the gameplay, he slowed down considerably,
since he had to wander around in the game world to test each new feature. He
for example made temporary shortcut passages, so he was able to test the boss
monsters, without actually having to pass through the levels.

He is a person who can concentrate on delivering results and hack away at code
for long stretches. This is the kind of code he presumably has done again and
again for years. Who professional web developer can't hack together a small
site as quickly? Where most people fail is attention span and drive! I know my
unfinished projects speak of that :)

Most interesting part to me, of watching Notch code, was to see how he used
his tools. And it was inspiring and motivating to see the progress. The actual
code was very hackish but quality code is not important in a throw-away
project anyway :)

------
wnight
The way the article describes Notch's testing methods sounds like a glaring
anti-pattern. This reminds me of my first QA job... Thankfully another post in
here suggests that he actually takes pains to mold the level design, etc, to
make the game more easily testable.

For those who didn't RTFA, though because this isn't Slashdot there shouldn't
be any, Notch is reputed to have given some segment of his game a complete
replay every time he made a change though it isn't mentioned exactly how often
this is or how big the change is. A note is made that his build scripts make
this almost instantaneous.

This is a problem because manual testing takes forever and causes tester
fatigue. You stop doing a good job.

It can be harder to test behavior in a 3D game than a text filtering app but
imho a good design is a testable one. (This does not necessarily mean I'm that
good of a designer yet...)

Automate, automate, automate.

------
pyrotechnick
I'm convinced the moderator of this blog has actually fallen in love with
Notch. They won't approve my comment because it's derogative to Notch's dev
practises...

~~~
Markku
Especially in this Ludum Dare case, development practices or implementation
details do not matter as much as results. And he did get results. That's why
it was interesting. I'd like to hear of what you think was wrong and why? This
was his show of productivity and results. Other people have other ways and
other circumstances can be completely different. And should be different.

