

Obviously Correct: implications for language design - samstokes
http://blog.ezyang.com/2011/10/obviously-correct/

======
JoshTriplett
The article suggests that static typing precludes duck typing. However, duck
typing can work just fine at compile time. Duck typing simply means typing
based on the ability to use particular methods or access particular
properties. That just means the type system needs sufficient expressiveness to
say that a function takes anything for which a particular method or property
exists.

Haskell's typeclasses, for instance, support a form of duck typing. Go can do
something similar with its interfaces.

Duck typing does not imply dynamic runtime typing; static duck typing seems
just as useful.

~~~
m0th87
also known as structural typing:
<http://en.wikipedia.org/wiki/Structural_type_system>

~~~
JoshTriplett
Exactly the term I wanted, thanks! Just didn't manage to remember it.

Structural typing allows the creation of functions which only require their
inputs to have the specific methods or properties needed to perform the given
function, rather than an exact type. Effectively, this creates a minimal
supertype for each function, such that the function can accept any subtype of
that minimal supertype.

~~~
beza1e1
The common bias against structural typing is that there might be types with
the same structure but subtly different semantics, e.g. coordinate systems.
Same structural type, different semantic:

    
    
      struct carthesian_coords { int x; int y; }
      struct polar_coords      { int r; int theta; }

~~~
JoshTriplett
Most of the structural typing systems I've seen count the names of fields as
part of the type, not just the types of the fields. So, those two structs
would have incompatible types in such a system.

------
einhverfr
It seems to me that much software can use a layered approach to get the best
out of both systems. For example, in LedgerSMB, we use a highly engineered
database on PostgreSQL, which means we can make the code "obviously correct"
with regard to a number of invalid data bugs. We can then assign security to
API operations declaratively using SQL roles. This makes the code obviously
correct for certain classes of security issues and helps to mitigate other
classes of security issues (for example SQL injection).

One of the key things to understand though is that "obviously correct" is
always relative. There are, in fact, ways to circumvent these measures, and
understanding where the problem areas lie. This means that most of the code
may be "obviously correct" and some may be a bit less so. Knowing where that
code is allows one to spend review time there.

The application above is written in Perl. However we are moving to Moose
because it gives us a better ability to declare data constraints there.

I guess I'd call it "declaratively correct" instead of "obviously correct."

------
michaelfeathers
Nice set of qualities that make aspects of code obviously correct, but he
forgot tests.

Like static typing, they are not perfect. They only catch certain classes of
errors. But, when you have them, you have the assurance that the code behaves
as it does in the test when the same conditions occur in production.

~~~
ezyang
Disagree. Intuitively, the difference is that an "obviously correct"
methodology changes the _way_ you write code, whereas tests don't. Tests have
to be run; the mere existence of a test doesn't mean your code is correct.
But, crucially, the things a test checks don't generalize.

This isn't a perfect dividing line. If you add hooks for DI because you need
to inject a mock, arguably that's changing the way you code (this is the
schtick of TDD, after all!) Arguably, you have to "run the typechecker" in
order to see if you actually have well typed code, and for a sufficiently
powerful type checker this might be like running a program anyway. But
hopefully the basic gist of the argument is there.

~~~
Locke1689
_But, crucially, the things a test checks don't generalize._

I think this is the most important thing to note. When I say

    
    
        Example test_fac1: (factorial 3) = 6.
        Proof. reflexivity.
    

This is completely different from the statement

    
    
        Theorem eq_fac : forall (n : nat),
           fac n = prod 1 n
    

Tests are simply a mathematical proof of a relation on a specific subset of
the domain and codomain of the function. Types, in the broad sense, are
constrained proofs over the properties of all elements in the domain and
codomain.

Types are useful -- they're perhaps the greatest success of formality in
software engineering ever. However, they're not complete and completeness is
_hard_. Types succeed because they provide a lot of benefit for very little
pain (the constraints you discuss in your post). Your assertion is simply
reinforced when we look at the other classes of correctness that we could
guarantee. We could write fixpoint definitions for all functions which require
structural recursion. We could push all side effects to typed lambda calculus
sugar.

Fundamentally, we could _prove_ our code correct. But that's a pain in the
ass. And it _still_ doesn't work. Tests complement proofs -- they ensure that
our own conception of the definition fits our expectations. We could do all
the work to formally prove our conjectures, but if our definitions were wrong
the conclusions would be useless.

What I'm saying is that all of these things should work together -- and the
result is a balancing act. We want additional guarantees that our programs are
correct, but it requires us to program differently because only certain types
of programs have the properties that we wish to exploit. The benefit is that
we _do_ have this additional information, so I agree that we should _use it_.

------
chalst
>What all of these “obviously correct” methodologies ask you do is to
sacrifice varying degrees of expressiveness at their altar.

In the case of static types, this is simply untrue. Embedding dynamic types
into static type systems is a triviality. With compiled code, there is
generally a substantial speed up at runtime assocaited with a good static type
scheme.

See, e.g., [http://suereth.blogspot.com/2010/07/monkey-patching-duck-
typ...](http://suereth.blogspot.com/2010/07/monkey-patching-duck-typing-and-
type.html)

~~~
ezyang
I think this is missing the point. On a practical level, most of these
"obviously correct" methodologies have escape hatches: i.e. FFIs for memory
safety. On a theoretical level, by embedding dynamic types in a statically
typed language, you have a stratification where code written in the host
typing is "clear of typing bugs", but code written in the embedded typing is
less sound. You _need_ to distinguish between these two layers.

~~~
chalst
The point made in the OP is that expressiveness is sacrificed by having these
safety features. This is simply not the case with static typing, because of
the ease with which dynamic typing is embedded in a static type system.

"code written in the embedded typing is less sound" - Well the point maybe is
that static typing did not bring quite the safety some thought it did: it only
promised that these functions must return, if they do return, a value of a
given type. The ability to model dynamic typing inside static typing shows
just how weak this guarantee can be. More generally, it show there is no
expressiveness sacrifice.

~~~
ezyang
I think the crux of the issue here is "triviality."

I claim that it is nontrivial to embed dynamic types in a statically typed
language. Usually this is due to the need to add lots of explicit coercions in
order to interface with all of the code is actually statically typed. I can
embed a dynamically typed programming language in my statically typed
language, with its own libraries, but that's hardly "trivial".

Difficulty matters. Otherwise I can claim that I can bypass memory safety by
writing a simulator for x86 in a memory safe language, and, well, the code
that runs might violate memory safety (in some alternate sense), and thus,
there is no expressiveness sacrifice!

~~~
chalst
"lots of explicit coercions" - even without recourse to generics, the overhead
tends to be small. Really, try it. There's no need to write intepreters, let
alone architecture simulators.

I tend to think the attraction of dynamic typing is the ability to run
incomplete or ill-typed programs. If we have type inference, there is no real
advantage in terms of conciseness.

------
rdtsc
> CPython was never explicitly engineered for performance, whereas the JVM had
> decades of work poured into it.

* Often IO latency and throughput is such that fast CPU processing doesn't really matter. IO is the bottleneck and Python is often fast enough.

* Look @ speed and improvements in PyPy (keeping in mind their team size and budget) in just recent years vs JVM's man hours over the years. Just because there were decades of performance poured into it, doesn't make something proportionally better.

~~~
ezyang
Interestingly enough, I was talking to Quora engineers about the problems they
were facing scaling their website, and there was no doubt about it: CPU was
one of the big problems. Yes, yes, I know, don't overengineer your systems in
the beginning, but I really get the sense that if software needs to scale, it
is going to need to scale in all dimensions, including CPU.

I'll note that the "JVM has had more hours poured into it" argument is meant
to give PyPy the benefit of the doubt!

------
tmcw
Quora's move from Python to Scala is a 'clear indicator' of the 'fact' that
type safety is better? That's just a weak and clearly incomplete line of
reasoning.

~~~
ezyang
Yeah, my pre-readers jumped on that too.

