I used to be a huge dynamic languages fan, with python being my favorite language over the majority of my career. Then I worked on a large python project of ~100k LOC with a team of ten. That's when I realized that writing code faster isn't the problem. Reading it is, making changes to code someone else wrote is, and refactoring across dozens of modules is a problem. Static languages help a lot with all three. I still love dynamic languages for small tasks, but I'd rather use a static language like Go which still keeps much of the dynamic feel, and helps catch my mistakes at compile time. I actually just set it to watch the project directory for changes and recompile automatically in a terminal on another screen (actually to run the unit tests, which includes compiling.) That's constant feedback and comparable to dynamic languages speed for edit-compile-test runs. But the best part is the unit tests. They run so much faster with Go that I don't get distracted waiting for them to finish, and that keeps me in the zone and much more productive.
I disagree, I warmed to dynamic typing specifically due to the experience of working with 2-3 million lines of C++ code. The problem with static typing at that size is that you very frequent impedance mismatch. A lot of the code you write ends up being about translating from one type to another which is almost identical.
You also get dependency problems you don't have with dynamic languages because you aren't required to import type definitions to use an object from somewhere else. Your code easily get tightly coupled. You end up with a nightmare of compile times.
The couplings get so intricate over time that it becomes and almost intractable problem to figure out how to take it apart.
That is much easier in a dynamic language. You can tackle problem far more independently from each other because dynamic languages don't impose as much dependencies between types.
> You also get dependency problems you don't have with dynamic languages because you aren't required to import type definitions to use an object from somewhere else
The dependency is still there, you are using that object in your code.
Dynamically typed language gives you the worst of both worlds: same dependencies and zero discoverability.
> You can tackle problem far more independently from each other because dynamic languages don't impose as much dependencies between types.
This doesn't make sense. The code is exactly the same except that you don't have type annotations with dynamically typed languages. In other words, you can't tell exactly how the code you're looking at interacts with the rest of the code base.
Which makes it impossible to untangle when it grows and also for automated tools to figure out what these dependencies are.
> The code is exactly the same except that you don't have type annotations with dynamically typed languages.
No they are not the same.
Here's some key differences:
- If at runtime an execution doesn't even reach the code, then it doesn't matter at all for that particular execution whether the types were "correct". With static typing you need to get it right for all possible executions before you can even compile, regardless of how unlikely or unimportant those executions are. In practice, having to get it all right for all executions before you can even run one is a massive productivity loss in a lot of scenarios.
- Not only might a "type" of something in a dynamic language not be expressible statically (at least in majority of static languages) in the first place, dynamic code is never dependent on any particular type. You might get an unintended "type" in and the code can still work. There is a dependency on infinite amount of unexpressible "types".
You are completely ignoring the point of dynamic typing and you are using it as if it were static typing without annotations, by your own admission. But that's your problem. Similarly I could also use statically typed language as if it were dynamic and then claim that static typing is exactly the same as dynamic except more verbose.
Not possible with static typing and type inference (which is static typing without annotation) but done all the time with dynamic typing. (Though still not a good idea, in my mind, but to each their own.)
So I just told you about a few things that are physically impossible in static languages, therefore dynamic language cannot be a static language subset. For example, you first need to prove that it's possible to tell a static language that "for now, for this execution, I don't care about these 3150935031 trillion code paths being correct, just these 123 code paths, so just compile and do it". Without that you are just trolling.
I made this point in an earlier comment, but I'm a firm believer that if the speed at which you can physically type code is the bottleneck, then you are going to eventually have much bigger problems. Hastily written code will require more time to fix than it will cost you to slow down and think about a good design, regardless of language.
The effort of writing (typing) code is directly (at least) linearly proportional with the effort of reading said code (including the time when you're thinking in your head before you typed it in). And by "effort", I'm strictly referring to the physical aspect of reading the text, disregard the comprehension part.
As many advocates of static typing emphasize the importance of reading/ refactoring in large code base, this would be a huge factor as well.
If you can code quickly, then the effort required to refactor previously written bad code is lower. I.e it is not that you can type faster that matters, it is that you can fit more change iterations in the same time frame. And many times multiple change iterations are required because you can't see beforehand whether a particular refactoring will be a net positive or negative.
Strongly typed languages beat dynamic languages hands down for refactoring. The ability to know immediately what broke without having to run all your code paths dominates any time savings gained by typing faster.
I'm a fan of strongly typed languages with good type inference (and therefore less typing), but even in verbose languages like Java this is true about refactoring.
I agree, I think there are many more issues to consider. However, the OP seemed to be focusing on that particular argument, hence my comment. A few examples:
"The results were that the people using the dynamic version of the language got stuff done much quicker"
"it took less time to find those errors [in the statically typed language] than it did to write the type safe code in the first place [in the dynamically typed language]."
"[programs written in]" dynamic languages took significantly less effort to create (less time)"
I mean, unless you live under a rock you have to realize that there are a lot of complex non-trivial apps that run in languages like Python, JavaScript, and Ruby.
Sure, but that doesn't prove anything besides the fact you can write apps in any language.
The question is more: would writing these apps in a statically typed languages have been "better" along certain axes (easier to maintain, easier to write, easier to evolve, easier to debug, faster to ship, easier to recruit for, easier to ramp up newcomers on the team, etc...).
You said that it "quickly crumbles". What exactly does that mean? Sounds like you are softening your stance now, so why even so something so definitive?
Because automatic refactoring tools are not available for dynamically typed languages, programmers are more hesitant refactorings. So the technical debt accumulates and the code gets worse because of that death spiral.
That's what I mean when I said that dynamically typed languages crumble under large code bases.
I'd rather use a static language like Go which still keeps much of the dynamic feel, and helps catch my mistakes at compile time.
Right on. It's the "dynamic feel" which is really the key. It's the ability to iterate quickly, the access to anything in the runtime, and the ease of building software tools. It's an environment that makes you powerful by enabling everything and staying out of the way. Whether that's provided by "dynamic" typing is just an implementation detail.
Unused imports and variables being errors is pretty squarely counter to that philosophy.
I don't think Go represents the "static typing, but lenient" philosophy particularly better than C#, for example, does. Go seems to focus on simple typing, not lenient typing. There's a big difference between the two.
I am never bothered by unused variables when I'm doing exploratory programming in Go because if the variable is still unused, I haven't written enough code to get to the point of running it for testing.
I used to be bothered by the import thing, but then I installed goimports, and now Go is actually better than other languages for imports. Just write "foo.Bar" in your code and goimports will automatically add "baz.io/foo/" to the imports section on save.
How about optional type systems? I feel like in Go I have to use something (like an interface) to achieve the feel of dynamic typing but it feels much worse than having optional types or no types at all.
100k lines of Python? I find it hard to believe that this project is properly organized. Rarely do you see 100k lines of Python solving a single problem, instead a project will be divided into many components, some of which can be reused in other projects. There shouldn't be any difference working on 1k line or 1 million-line code bases as long as the code is decoupled. Testing each part in isolation helps a lot (as opposed to running tests for a big monolithic ball of mud), as it ensures contributors are not introducing crazy dependencies.
I'm not against types at all, but don't think they solve problems related to big code bases.
Is there ever a project of that size that's properly organized? We did our best under the constraints we had. There was always a long wishlist of things we would change if we had the time, and some of them we were able to do.
> But the best part is the unit tests. They run so much faster with Go that I don't get distracted waiting for them to finish, and that keeps me in the zone and much more productive.
You're describing a problem specific to dynamic languages with slow implementations (Ruby, Python). Languages like JS that have optimized multi-tiered JITs such as V8 will essentially run your unit tests just as fast.
Is that really the case for unittests? (this will be different for actual run in production) Apart from some things shared across the whole codebase (loggers, db access, etc) most of the code is unlikely to ever get "hot" enough for the optimised JIT to kick in.
You'd think so, but in reality if you take something like a big Java codebase and disable the compilers, then run the unit tests, they are always waaaaaay slower.
The "shared across the whole codebase" stuff tends to be much bigger than you'd think. For instance the collections libraries, string manipulation, any core code or data structures in your own codebase, etc, all get compiled very fast. You don't have to run something very often for it to become eligible for compilation.
While that's true about the usage I'm not sure it applies in that comparison. All primitive collections, string manipulation, simple ops, etc. are already written in non-interpreted language and compiled AOT in Python. I expect they're also native in V8 and Ruby, although I don't know that codebase.
I'm not sure about "core code". I mean, the context is "unit tests", not "functional tests". If you have hot core code in your unit tests, then something smells... Hot helpers, and cross-cutting utilities, sure, and I mentioned them.
If the code isn't hot enough to leave the interpreter into tier 1, then the engine has judged that interpretation is likely to be faster than non-optimized compilation plus machine code execution. An AOT compiler with no interpreter (such as Go) only has the latter option. So, assuming the JIT is tuned properly, the resulting turnaround time will be faster than AOT.
But your comment wasn't about Go, it was about "slow dynamic" (python, ruby) vs V8. I'm saying that's just "non-optimized" vs "very likely non-optimized", so multi-tier JIT shouldn't affect much of the unittest runs.
Even following that logic about JIT vs AOT in a single run, the python vs v8 equivalent would be that: you have non-optimized code vs non-optimized code which still has to keep track of execution counters, therefore the one without counters should be faster.
Regarding unittesting Go vs V8... I'd really like to see a big project where it could be compared - unfortunately that's not realistic.
In the case of, say, CPython vs. V8, it may well be a tossup. But I highly suspect that V8 would usually win even for unit tests because it'd be able to optimize functions that many unit tests share. For example, it could optimize the test harness itself (important when it's a nontrivial test harness like Cucumber or whatever), it could optimize the string search/regex functions, and so on.
Several flavours of Lisp have been used for building very big pieces of software (e.g. Emacs, Orbitz). Same for some Erlang products. I think this depends on the architecture, documentation, etc.
Dynamic typing allows for loose protocols, which is sometimes very advantageous when you don't know exactly the software you are building. PG explains this much better.
Nonetheless, I'd love to see more optional static typing systems on top of Lisp.
You cannot program effectively unless you understand the syntax and semantics of the data representations you are using, and when you do understand those things, perhaps you are unlikely to make type errors in the first place.
I meant the domain of the problem, so I was saying dynamic typing is advantageous for exploratory programming and modeling. Whereas static typing is easier to use if you have pre-planned all your programs.
I wasn't intending to disagree with you; my intent was to suggest a possible explanation for why static typing may be less helpful in practice than some people (myself included) would have guessed. To put it another way, programmers have to be aware of their data's types regardless of whether they are using a statically or dynamically typed language. If this is correct, then we can assume that the developers of successful large projects have this ability, regardless of the language used in that project.
From my experience, a statically-typed language also lets me think upfront about the structure of the data that a program will manipulate. Getting your data structured properly from the start would result in straightforward algorithms and less future refactoring, which is certainly more efficient than throwing away and rewriting code every few months.
Absolutely this. Most programs are really all about data. I build my data structures first. Always. If someone can't tell me approximately what data I'm working with, I won't work on that project. Data first. Then the data drives an interface. When you have those two pieces of the puzzle figured out, then we can start designing the code that solves the problems needed to get the data from the source to the destination.
I'm sure there are domains I'm not experienced in where that Dev model doesn't work. Like games or embedded systems and stuff. But for building web applications or statistical platforms, that's my workflow because if you don't get the data right, you're fucked.
I think the data-first approach sometimes leads to a problem of: "We saw we could, but never asked if we should."
The particular scenario I have in mind relates to a company with an internal system for managing work from proposal to invoice. There was a lot of data there, but it was never exactly clear what parts are used by what processes, what rules are invariant, and aspects of it existed purely for historical reasons. Sometimes you could implement a feature for Important Manager #1 only to discover that Important Manager #2 doesn't want it to apply to his employees. Then you'd put in special-case-code, and the dance continued.
In a situation like that, I'd far rather focus first on what the goal and process should be, and then use that occasion to refactor the stored data to accurately represent the evolving model for "how we do business".
I think you may be overstimating the quality of data I'm griping about. Some of the tables are mostly-unused metadata about optional key-value pairs stored in other tables. Some of the key-value pairs are queried constantly but still haven't become "real columns" because nobody wants to rock the boat.
So instead there are dozens of extra joins and queries going on, and looking at CREATE TABLE definitions helps you find only about 60% of the data-points you might be interested in. In some places entities are related not by a link-table or foreign-key, but by having a similar prefix in a text-value. (So WHERE clauses contain substring checks.)
I believe one of the many contributing causes is that people tried to store their data ASAP before they knew exactly how they wanted to use it, and then the next time they assure themselves the data is technically available without stopping to consider whether it's available in the right way.
It has nothing to do with upfront waterfall design specifically. Making choices about how you represent information is an integral and unavoidable part of algorithm design at any level of detail.
There are no projects for which a certain amount of thinking before you write is ill-advised.
Indeed, and personally I find that having the types around also makes it much safer to do the inevitable rounds of refactoring once you figure out what you're really building. (Especially in a language like Haskell where most of the type information is actually inferred from context. It requires absolutely minimal type annotation.)
EDIT: Maybe it's just my bias, but it seems to me that most of the "dynamic" crowd just haven't tried a truly "agile"[1] statically typed language like Haskell or O'Caml.
[1] Hey, if they want to abuse the connotations of out-of-context words to describe the paradigm, so can I. :)
This is obviously not true. Many (if not most) projects in the Node.js community for example, are made up of hundreds of thousands of lines of dynamically-typed code (mostly npm modules written by other people) and these projects are usually very easy to manage (many Node.js developers wont even realize that they have so much code running behind the scenes because third-party Node.js modules are taken for granted). The secret is breaking everything up into well-defined modules.
I suppose the same can be said about Python and other dynamically typed languages.
Sure, and windows adds ~40 million lines of code to most .Net projects. For comparison just the Linux kernel is ~12M lines of code.
It's not that dynamic languages force bad designs, it's that they promote bad designs. Breaking a project into lots of tiny chunks is one way to manage complexity, but it also adds a lot of overhead as components can't assume much about the other side of there API. A good tradeoff for an OS, not that great for a FPS.
That said, there are times where dynamic languages allow you to get by with far less code.
> I'd rather use a static language like Go which still keeps much of the dynamic feel
If I'm going to use a language for the purpose of having a type system that's supposed to keep me out of trouble, it's not going to be something like Go, in which I'm going to spend time fighting (or end-running around) the type system where it's not expressive/flexible enough, but I'm not going to get much in the way of useful help except on the most basic level.
There are diminishing returns with the type system. Having a super expressive/complex type system like Haskell can be good, but it also adds a lot to the overhead in keeping the type system happy. It can distract you from actually solving the problem at hand. I think Go strikes a happy balance between the two extremes of dynamic and super-powerful type systems.
I may be biased, but seriously? Go is like on the very very super-primitive end of that scale. Only something like C is more primitive. Java has a more powerful type system! (Standard) Haskell isn't even near the most powerful extreme.
Even the person that gave the talk would have agreed with you. That was his gut feeling as well. Then he did and look at actual research. The whole point of the talk is that the actual evidence doesn't back up your or his gut feelings.
This is why I find the gradual typing paradigm so interesting! You begin by writing your program in an untyped language, allowing for easier iteration and more exploratory programming. As you codebase grows, you transition modules over to static typing.
Agreed 100%. This is actually the reason that I have been picking up go lately, you get some of the more lenient syntax of python, with static typing guarantees.
I'm surprised that people only seem to be considering Go as an upgrade to static typing, from languages like Python or Ruby.
Kotlin has a very lightweight syntax with lots of type inference as well, but also has many features that people often seem to miss in Go. It also has transparent access to all Java libraries, which is a huge number, which is a big benefit. It seems like a natural path for people who liked dynamic languages. It's much less well known, of course.
Well there are a few reasons that I decided to pick up go rather than something like kotlin.
1. No jvm needed. This may not be a benefit if you need to access java libraries, i just like that you can compile a binary.
2. Great concurrency support
3. Seems to be more popular. This may sound like a poor reason to choose a language, but if its more popular there would be better documentation, libraries, ide support, etc
That's usually a good heuristic. But Kotlin has better IDEs, tools, debuggers, profilers etc than Go. Partly because it's made by a developer tools company (they make many of the top IDEs), and partly because it can leverage all the tools made for Java.
WRT not needing a JVM, I don't perceive much advantage to avoiding one unless you need small downloads i.e. desktop apps. But Go isn't suitable for that use case really anyway.
I've found that if you have strict static explicit extensible typing and a lot of patience, you can get very nice results. It takes a long time to get the ball rolling if you're writing C++ with a lot of enums and structs (and no pointers or templates, except those the STL uses internally); but once you've built your infrastructure, everything else becomes much easier.
I think the pro-type inference crowd really misses this point.
Types have two jobs: improve performance and force documentation of expectations. Of the two jobs, the latter is more important because you can hack the performance issue with JIT and whatnot.
Extreme type inference risks losing the whole point of static typing. If you make a function, it should have its types documented, period. Either the type expectations a function has are trivial, in which case you may as well just write them out, or they are complicated, in which case you absolutely need to document them. Type inference is great for function invocation and okay as typeclass/generic system, but in a large project, you should never create a function or method without specifying the types you expect to handle.
Having worked with (and sometimes implemented) different flavors of both static and dynamic languages over the course of my 20+ years in programming, I still feel a little guilty about not coming down on one extreme side like most of my colleagues.
I do like the fact that static typing gives me more confidence when refactoring things, and I do appreciate the speedier execution, but at the same time when programming in a dynamic languages I only rarely run into bugs that could have been avoided by a (moderately strict) type system.
To me, the best feature of dynamically-typed languages are dynamic container objects (JS object literals, PHP arrays, Lua tables, Lisp/Scheme lists), because instead of having every single container be a purpose-built one-off abstraction that is often not even completely inspectable, much less extendable, I can just have one standard structure for any data.
In the end though, both paradigms are tools to be used appropriately. While there is a great deal of problems that can be solved with either of them, they clearly each have strengths and weaknesses to be understood within the context of the program's purpose.
I'm in the same boat. For most of my career I've worked with dynamically typed languages, but I recently spent a year writing iOS apps with Swift. For me, the biggest win with statically typed languages isn't necessarily catching those obvious bugs (which, as you said, are fairly easy to avoid with discipline), but rather the increased discoverability. I can quickly glance at the type definition for a function/method and generally get an idea of how to use it, whereas the equivalent in dynamic languages usually involves reading documentation or looking at examples. And when this is integrated into your development environment like in the JetBrains products, Xcode, or Visual Studio, it's all seamless in a way that's not really possible with a dynamically typed language.
Of course, the other side of it is that statically typed languages tend to be less expressive, which makes things like JSON processing needlessly verbose. So it goes both ways.
I _like_ looking at header files. They're a good, compiler-mandated summary of what all this code is _for_. Figuring out what Python code does is always such a pain.
(I do, however, develop mostly in C++, so there's probably all kinds of workflow crap I don't know about in Python)
I still regret the death of Strongtalk --- a gradually typed Smalltalk variant.
It gave you dynamic types, when you wanted them; and Smalltalk is pretty much the most dynamic of dynamically typed languages. And it gave you static types, when you want them; including proper parametric types and polymorphism. So you got the best of both worlds.
Some of the good statically typed languages give you type-safe inspectable one-off containers with minimal syntax: tuples. Combined with type inference, this tool is pretty powerful at passing bunches of disparate data around while preserving type safety.
Unfortunately many industrial languages, like Java, C++, or Go, don't have this feature. .NET has them, though.
Even in Python I greatly appreciate `collections.namedtuple` that helps me create cheap ad-hoc containers while preserving excellent readability of the code. The latter helps avoid bugs greatly: e.g. it is much harder to write `foo.bar` by mistake than to put a wrong index in `foo[2]`.
It's been a while since I last wrote C# code, does it really allow for heterogeneous value types? If so, that's cool. However, from my recollection, the C# API itself is all about opaque objects and behaviors. That's not a criticism of the paradigm, and I remember liking the C# standard libs and structures a lot better than Java's, but it's not something I would equate to generic heterogenous containers in dynamic languages. It's not in question whether you can replicate some of that in .NET, it's more about whether that's a core language idiom or not.
Edit: `collections.namedtuple` is Python, mixed that up.
It's hard to tell exactly what GP is referring to. It could be the parameterized Tuple type, which has built-in implementations for up to 8 fields. But those will all just be named Item1, Item2, etc.
The conversation makes me lean more towards C# anonymous types, which are just syntactic sugar for creating classes with public, read-only properties. But they feel like creating named, statically-typed tuples.
I must admit, the idea of an un-specialised tuple with 15 elements is pretty damn scary. If I saw that in a codebase it'd be a real WTF moment. After three or four fields you should really be naming them, and if your language makes defining small classes painful, it's better to fix that with a Scala-type case class syntax (or data classes in Kotlin).
Those parameterized Tuple types are from the olden days of before LINQ. Now C# has anonymous types, which can handily replace them. In the end, these are compiled down to classes with read-only properties, so it's almost no pain to implement such classes.
var v = new { Amount = 108, Message = "Hello" };
Console.WriteLine(v.Amount + v.Message);
The problem with anonymous types is any time you have to come back up from any inner scope that you are inside of. Fortunately, stuff like ReSharper makes it easy to transform an anonymous class into a real one, but then you have to battle an explosion of DTO objects, unless you are ruthless about consolidating similar but not-quite-the-same data together.
That being said, anonymous types are a godsend for doing EF and LINQ stuff, where the scope is limited. Beats the hell out of the old ADO.NET stuff. Also for easily creating garbage to get serialized for JSON returns to front-end web code.
Sure; the side effect of being "anonymous". (Which really just means that you can't write code to address it directly; it's a perfectly normal .NET class under the hood.) But you hit it exactly that they address that transitional pain point between addressable object and serialization format.
I tend to specifically not pare down data objects that are "similar-but-different", as I usually find it's nothing but a pain point later when they evolve to look more "different-but-with-a-couple-similaries". Easier overall to just call different things different names from the start.
I'm assuming you're talking about heterogenous collections, which while totally possible with powerful static type systems, I'd consider not to be a great idea. If even I don't know what's in my collections now (if I knew I could model it in the type system – it takes 30 seconds at most), how on Earth is my successor going to know in six months time? Heterogenous collections are a maintenance nightmare. I have only ever seen them used in a way where they should be refactored first thing in the morning.
Ultimately, that's what makes HList / HMap / Records so interesting to my mind - heterogeneously typed containers without the manually-generated boilerplate.
To me, the best feature of dynamically-typed languages are dynamic container objects (JS object literals, PHP arrays, Lua tables, Lisp/Scheme lists), because instead of having every single container be a purpose-built one-off abstraction that is often not even completely inspectable, much less extendable, I can just have one standard structure for any data.
This is perfectly possible in static languages as well.
It depends on the language and it depends on how it's idiomatically used. In theory, you can make capability claims about most languages interchangeably, but the question is if the language really affords it or not. For example, I would say that Rust's enum gives you a lot of this capability easily, whereas C/C++ usually does not - unions and explicit casting not withstanding. But I do fully acknowledge you could make that claim for almost any language, and be technically correct.
Dan Luu covers a ton of literature here[0]. The take away is that all of the major studies are flawed in pretty serious ways. Enough that you can't draw strong conclusions.
The study the OP references (from the original talk) seems suspect to me; in particular, the fact that the researcher created both a static and a dynamic version of his own language. Generalizations about the time required to "get stuff done" in all (or even most) statically/dynamically typed languages feel unwarranted. And then there are statements like this:
"he found [that] it took less time to find [type] errors than it did to write the type safe code in the first place."
Maybe that's an indication that his homegrown statically typed language was a crappy language, rather than exposing some fundamental truth about statically typed languages in general?
The first thing I wondered about on seeing that claim was what sort of error reporting was implemented in the type checker and the (dynamic) language runtime. Comparison against C for "what does static typing offer" also feels like something of a strawman.
Furthermore, if the programs were small, it is likely that their authors would have the whole thing in their mental 'working memory', and therefore likely to catch type errors as they made them.
"he found [that] it took less time to find [type] errors than it did to write the type safe code in the first place."
But type errors in dynamically-typed programs are found by testing, and testing time grows much faster than linearly with respect to program size, while the amount of type information grows approximately linearly (or perhaps less than linearly, if Halstead's relationship between program length and the number of distinct operators and operands holds.)
> Maybe that's an indication that his homegrown [...] language was a crappy language, rather than exposing some fundamental truth about [...] languages in general?
Can you come around and make this same argument any time someone makes a blanket statement about static languages being unconditionally better than dynamic ones?
I guess what I was hoping to emphasize is that the number of variables involved in trying to quantify something like this make it difficult (probably impossible) to have an accurate comparison -- even more so when attempting to generalize the results of those comparisons to larger categories of programming languages.
I haven't read that paper yet, but this sort of test seems like it should have the least noise in the results. I'm also surprised that the solutions took longer in the typed language, but it really depends on how much cognitive overhead the language imposes. For instance, modern typed languages are moving to implicit types for local variables, while the alternative can be a lot of pointless typing (on the keyboard) [though there are debates to be had about making some of those types explicit to aid readability]. If the languages really are similar, the only types you have to specify are for member variables and functions, right?
I agree that studies of the form, "How long does it take to implement some app A in language X" are asking the wrong question.
I thought we all knew and agreed that the difficult part of most applications is maintenance. How does static or non-static affect a new hire's ability to get up to speed and support an app? How does it affect long-term maintenance, refactoring, adding features, and fixing bugs?
I'm not entirely sure how to answer that question, but I'm not convinced that controlled studies will point to an answer.
It might be interesting to look at corporate metadata and try to find how how much companies spend over the lifecycle of an application on the whole development and maintenance of a project. See if you can find projects with enough similarities to warrant a comparison and compare a Java app with a Python app. Probably impossible, honestly.
I don't have a strong opinion on the matter. In my experience, th e design and leadership of a project have a far bigger impact on the ease of development and maintenance than the type system. I've seen very large but very elegant C# programs that were a pleasure to work on and maintain. I've seen Python web systems that are an absolute nightmare to work on and reason about.
Probably most people here have experiences going both ways with different type systems. I blame the developer for god-awful messes and praise the developer when I see clean, sensible code. I never blame the language. Except for JavaScript. I will always blame JavaScript. Hell, JavaScript is probably what's really causing global warming.
In general, a clear head and some discipline count more for code quality than any type system, and the lack of either definitely does more damage than any type system can.
But I'd go along with the idea that statically typed languages force more discipline on younger developers who haven't yet got the experience to even know what clear-headed ness even means.
Working in C# was a great experience for me coming from a self taught Python background, no question about it. It also improved and helped me structure my Python code, and becoming better at designing good Python code in turn helped me write better C#.
There can be a virtuous cycle in working with different languages and type systems.
But we're still gonna grab out the most universally reviled and attacked paper on the subject over and over again because it says what a lot of dynamic typing evangelists really want to hear.
This is why I think gradual typing with structural types is the Right Way, and will be what all modern programming languages move towards (like how we don't have to worry about manual memory management anymore).
Javascript with Flow, or TypeScript is a great example of this. Perl6 is using gradual typing.
The key thing is that there are times where you want the inflexibility of static typing, and there are times where you want the benefits of dynamic typing. Structural types also remove dependencies because the function defines the structure of the type it expects, and not a specific reference.
Also, I think github repos (and the person's tests) are heavily heavily bias towards individual projects. There's a massive difference between something only one person works on, and something that an entire team is developing over the course of years (as team members come and go).
The biggest one, IMO is that dynamic languages are simpler and more flexible.
We can usually describe the semantics of a statically typed language based on a simpler dynamic language together with a type system^1. The purpose of the type system is to ensure that certain undesired behaviors do not occur when we execute the program. For example, adding a number to a string, calling non-existing method in an object, and so on. From this point of view, a type system adds static guarantees to your program (certain classes of bugs never happen, the editor has an easier time doing autocomplete, etc) at the cost of increased complexity and learning curve (you need to learn the type system in addition to the runtime semantics) and decreased flexibility (every type system inevitably disallows some programs that would not have resulted in one of the undesired behaviors had you just had faith and ran everything).
I also think it helps to see static typing as a spectrum. For example, most languages only detect division by zero and index-out-of-bounds errors at runtime. Dependently-typed type systems can be used to guarantee that errors like these do not happen at runtime but they are much harder to learn (lots of advanced type theory is involved) and use (you need to do a lot more work to appease the type system)
^1 - A notable exception are language that use type-based name overloading. But if you "desugar" the overloading then the styatic types are usually eraseable at runtime.
While I appreciate the sentiment of settling debates with data, you have to actually measure the right things. That statistic that says that only 2% of bugs would be prevented by static types? That's based on the assumption that everything that would be a type error in a statically typed language manifests as an exception in Python. But that's obviously not even close to being true. The really annoying bugs that type systems prevent are far more insidious than that. For example Python will happily let you use the greater-than operator on an int and a function. So if you accidentally write `f < 10` instead of `f() < 10`, you will not get a type error, but your program will have a bug that manifests in your program logic going wrong and leading to wrong results somewhere down the line.
I've analysed all bug tickets for a Python system at a previous job for several months, tracking how many of our bugs would have been prevented by a Haskell-style type system. This isn't very scientific either, but for my sample it was somewhere between 70%-90% depending on your interpretation. I'm not saying this generalises to all projects, but I can definitely say that the 2%-number is hilariously wrong.
I would think that the increased productivity of Python over C++ has more to do with Python being a higher level language than static typing per-se.
I'm translating some Python into C++ right now for performance reasons. Manual memory management, lack of list comprehensions, and many other things are making it slower to write than the Python was but having to specify types is really the least of it. The extra visual noise often makes the code less readable but that isn't true of all static languages.
It's good to see hard data on this. I hope more people will publish similar research in the future. I've done a lot of work with both statically and dynamically typed languages and I've always known that dynamically languages were more productive.
I've often made comments on HN suggesting my preference for dynamically typed languages and these comments often got downvoted - This surprised me because I thought that my view was the consensus.
To be fair, I think the rise of the web and in particular of the JSON format for APIs and the use of typeless NoSQL databases have favored dynamically typed languages. JSON objects have no type, so when you write statically typed code, you have to add logic to cast everything into concrete types instead of accepting the data as provided. If you use a NoSQL database, you will get dynamic typing in the storage layer as well so you won't have to worry about types anymore... In such a scenario, you can enforce the consistency of various parts of your data as much or as little as you like.
I believe that dynamic languages make initial development faster, and maintenance more expensive. In particular with static types it is much easier to launch into a refactor and depend on the type system to tell you about dependencies that you forgot about. Similar refactors are scarier.
That said, I personally prefer working in dynamic languages. It is more fun for me. But I don't think it is necessarily the right choice for all employers.
This seems littered with potential fallacies to me, but I'm only basing that off the summary of the video. It isn't really useful to lump all statically-typed languages into one bucket since some have confusing compiler messages and some don't. Same with counting "type errors" of dynamic languages, it seems the definition of "type errors" would entirely depend on how exhaustive the hypothetical static type system would be. Also, it's kind of weird (and I've seen this before from other dynamic type enthusiasts) to imply that unit tests are only for dynamically typed code - we use them in statically-typed languages too!
I generally hold the point of view that dynamically typed languages are great for prototyping and when you are in the "build fast and break stuff" phase, and that statically typed languages are better for when correctness and maintenance matter more. It's generally mapped my career path as I've moved from being a freelancer for small clients to a contractor/consultant for large clients. One basic indication for when you might want to consider pulling more statically-typed languages into your stack is when you start running into those really confusing run-time bugs/behaviors that are really hard to track down.
Entirely anecdotally, I found that in the same amount of time I can write down more functionality in a (toy) Haskell program than in a (toy) Python program. Partly this is because finding trivial errors becomes easier (despite cryptic compiler messages), partly because the language and standard library often let you express algorithms in fewer words. (Due to this, without a strict type system and type inference, Haskell code would likely become brittle far faster than Python code.)
This 'Unreasonable Effectiveness' naming scheme has really started to be abused, ha? I mean this really just points out that dynamic languages are somewhat faster to work with than static languages - hardly 'Unreasonable'. I think as a rule large project/codebase -> static typing is nice, but for fast scripting/problem solving dynamic is clearly the way to go (hardly original). I don't think the cited data here provides a good argument against that - the study involved a quick problem (several hours), and the bug breakdown does not convince me that having type checking does not prevent many bugs in the long term. The final point about anti-modularity is just weird given OOP and all. Still, some interesting data.
Good to see some effort put into making a case in the debate (for either side). I drank the Haskell koolaid pretty hard a few years back, and I wanted to believe that all the anecdotes people used to support the language had to be true. You're more productive because of all the compiler can do for you, you're safer, you're more correct, etc... My experience after some time was that significant effort was spent with me serving the type system, rather than it serving me. The article's statements are in line with the lessons I've learned, and I'm not going to feel bad about throwing about such an anecdote around on a message board...but more to the point, I feel GOOd that some people are taking strides to make this conversation less anecdotal.
That first datapoint... at worst, programs took 60000 seconds to write. That's 1000 minutes. That's just under 17 hours, or two days. In that time, it took less time for the programmer to catch type errors than for the programmer to write the static typing to catch the errors. I can buy that.
Now make the code base 10,000,000 lines. Maintain it for two decades with dozens of programmers. Now how relevant is that 2-days-at-the-worst datapoint?
He's doing experiments with programming in the small, probably with students as subjects. Not useful. Typing is most valuable for enforcing consistency within large programs, or between software components from different sources, where a change in one part may break other parts. If he wants to experiment with this, he should have groups of three or four students independently write parts of a program, then integrate them.
There's convergence on when to declare types. As I mentioned in a previous post, the trend is toward declaring function parameter types, but inferring the types of local variables. Go and Rust take this route. C++ now supports it with "auto". Even Python is considering adding optional function parameter type declarations.
I would love to get a report from someone with a lot of Erlang experience.
Erlang is a great example of an industry oriented dynamically typed programming language.
But... they recently, maybe more 5 years ago now, went to a lot of trouble to integrate a really nice static type system into it. They also had a type analyser, dialyser, for a long time before that.
The impression I had was that some significant number of people experienced some dynamic typing pain and a lot of effort was put into reducing this by strengthening the type system.
I am just a part time Erlang hacker, so I will keep out of this debate. But I would love to hear from anyone who experienced this in a professional development environment.
I know everyone has their preference here so it doesn't help to just talk about anecdotes.
That said, the author really doesn't seem to take note of the largest benefit of type safety: Self-documentation.
Looking at someone else's source code with type hints gives you so much more of an idea of what's going on and what sorts of parameters any given function might take. With static types, that documentation is guaranteed to be right.
Dynamic languages also encourage the use of clever reflection in ways that make your code unreadable to someone who has limited scope of context.
For me, I'd rather be using a lib created in a static language.
I'm glad to see this... I've felt for many years now that I'm far more productive in dynamic languages than static ones. I always really liked C#, but since I started using node.js more, I find I get a lot more done, when I use modules and functional patterns, I tend to avoid a lot of the bugs I get in C#/Java with simpler code that's easier to understand... Unit tests become only slightly more interesting as I have to use proxyquire to override the request system for the module I'm testing, but in the end the actual code is easier to reason with. Also, DI/IoC is mostly un-necessary, which is a huge painful issue when debugging .Net code (and would presume Java is similar).
It does take a bit of discipline to keep things smaller/modular and not do too much in any single file/module.. but overall the code comes out more reliably. It works very well for front end systems, and direct services they work with... farther back, using another systems language may be a better option.
> Another point he made is that writing static types is often gross and unmaintainable whereas writing unit tests not.
I suppose they can become "gross and unmaintainable", if you do it badly. And I'm sure some people do it badly, but... really? That sounds like someone didn't know how to use static types. (Yeah, I know, No True Scotsman...)
I was about to comment about my personal experience, but realized the irony before posting. I won't say I'm convinced either way, but it's great to see some data enter the conversation.
Your irony meter is more sensitive than a large number of commenters who's responses are variants of "no, because... anecdote!"
I agree, it is interesting to see data, though I struggle to design a study in my head for how to really show this either way. But... anecdote... it does amuse me how various languages claim to improve development time/safety/bug rates/cost, (OOP, Types, Functional, etc). But these big claims are not quite born out by companies investing in those languages wiping the floor with their competitors. If there really was a significant difference, I'd expect evolution to take its course.
Attempting to argue with ideology is a self-defeating proposition.
In the presence of 'belief' there is no room for reason.
Which reminds me. I haven't seen Dogma in a while. Metatron is also one of my favorite roles played by alan Rickman. I guess I found my plans for this evening.
One of his slides says that C++ and Java is equally productive, but less productive than C. Are we supposed to believe that OO is a net negative but automatic memory management makes no difference?
The earlier parts of the talk generalize out to advanced languages like F#, Ocaml. but the "data" part of the talk assume static languages are completely represented by C++, Java and C#. It was pretty disappointing.
I've been a lot of programming in both lately (Clojure and Objective-C), and did a ton of Java earlier in my career, and I find I like both. The biggest pain point for me in Clojure is the inability to easily refactor.
To overcome the limitation that the machine can't help as much when using a dynamic language, I've found that writing good unit tests with solid coverage gets me pretty far, and shortens the feedback loop when I break something.
But yeah, I sure do miss the right-click and 'rename' feature from Eclipse and Xcode when writing Clojure.
I've used both dynamic and statically typed languges, and also dabbled in some of the more flexible static languages like Haskell and Scala. While I personally prefer statically typed languages, I find Perl 6's gradual typing to take an interesting approach. Perl 6's type checker operates differently on functions and methods: functions, roles, and private methods are checked at compile time, but public methods are checked at run time. As Jonathan Worthington wrote in a presentation on Perl 6's approach, lexical scoping is "your language" but a public method call is "the object's language": "it's for the receiving object to decide how to dispatch the method," which might be familiar to people who have used Smalltalk. This approach also allows easy interoperability with libraries from dynamic languages, such as using Perl 5 or Python code in Perl 6.
(working full-time on JavaScript code for last 4 years)
TL;DR good dynamic codebase is possible, but not when you throw too many devs at it (particularly junior devs) and only when you set the very high quality bars. Otherwise, in finite time the codebase will converge to a big pile of mess.
IMO the big dynamic-language codebase with multiple people working on it can only survive with extensive test suite, lots of mock data available, automated quality tooling (jshint etc) and proper code review in place. All of those are well-known best practices, but they require good developers and certain discipline.
In particular, when the app retrieves lots of data from a server, it's easy to get lost when the frontend app doesn't even know what data it operates on, it just knows it gets "some JSON". In old-school java world you'd have this mapped into a bean and you'd at least know what you work on.
Previous project I worked on was an inherited poor JS codebase with lots of unrefactorable magic happening inside, multiple event buses and what not. Since JS is a very dynamic and very permissive language, someone can abuse some language constructs to make it very hard to reason about the code.
In a pathological extreme you may get "everything is an object" codebase but no idea what keys each object has.
Bonus points if the code is passing some JSON around and adding some keys to that object in an ad-hoc manner, scattered along many functions.
No tests and no mocks meant that to learn that, you'd have to run the app and put breakpoints. But, the same variable might have had totally different type of data inside depending on how some if-elses executed.
Most of our bugs were due to some objects not having some keys (sometimes) for some reason etc.
So, if you want to slice off an object and name it just use ES6 classes.
class NameMe {
constructor (obj) {
for (var prop in obj) this[prop] = obj[prop];
}
}
This literally just spits out a named object with an identical structure to the object it's being created with.
If you want to make all keys visible and guarantee no exceptions are thrown when a key isn't set, just define default values.
class NameMe {
string = '';
number = 0;
bool = false;
array = [];
object = {};
constructor (obj) {
for (var prop in obj) this[prop] = obj[prop];
}
}
You could always gasp initialize everything to null if you're concerned about the defaults screwing up application logic. That's how static type systems address initialization by default anyway.
Want to enforce specific types? Add setter methods that include validation checks. Do I need to continue?
None of these characteristics are difficult to define in Javascript. Structuring data in a manner that's easy to reason about is no more difficult in Javascript than it is in any typed language. Even before the introduction of classes you could achieve the same using prototypes.
If the code you've encountered was difficult to reason about, it's because the devs who wrote it suck at writing code that's easy to reason about. Self-documenting code is a naming problem not a type theory problem.
Any code base, whether static or dynamic should contain an extensive collection of mock data and unit tests to verify common cases and check for breaking edge cases. Unless the project is a one-off fire-and-forget implementation. In which case; who cares, maintenance is somebody else's problem (I'm not stating this is a good ides, just what usually happens in practice).
----
Bonus: How about a couple of cases that 'literally can't even' be done in statically typed OOP languages.
Extending existing objects:
Sure, you can use object inheritance but that requires explicitly defining a class that represents the new structure. Except that now creates a deeply coupled relationship between the child and parent classes. Which will inevitably become a maintenance issue if the business logic ever changes. Say hello to technical debt.
Proof: Why do all of the collection classes in Java inherit from vector?
You can accomplish the same in JS in any object (class based or not) using .extend() (provided by most 3rd party frameworks). Which essentially does a deep copy of all the object properties. No deep coupling between objects, no technical debt incurred.
Multiple Inheritance:
I'm surprised nobody in the OOP community talks about this anymore.
Since OOP relationships between objects only point to the external interfaces (ie class definition) rather than reference the underlying data, there needs to be a system to represent those relationships.
Multiple inheritance was cast off as not technically feasible due to the possibility of users creating circular references. Ala 'the diamond problem.'
The solution. Branch everything off of a central 'god object' and represent all relationships between classes as a directed acyclic graph.
This is an artificial constraint that only applies to OOP. That not only makes it unnecessarily painful to reason about the structure of class definitions, but it makes it impossible to pass data to adjacent leaves in the DAG without some form of external global state (ie the singleton pattern).
In JS it's very easy to use multiple inheritance. Just create an object and add data from as many other objects as you want using .bind(). Done with an object but still need to maintain it's state? return a new function that maintains a reference to it's enclosing function. Ie use it as a closure.
Stop for a second to consider. The .bind() method, which is crucial to adding functional-like aspects to an imperative programming language, is physically not possible in OOP. The closest equivalent is to pass in ref/out params as arguments into the constructor (ie creating more deep links to external classes).
My personal experience with a large project which I started in Python and later moved to Haskell: I did indeed get things done quickly in Python and had majority of the problem solved. Then I had few nasty bugs which made me change/refactor the code and thats where my problem started. I quickly realized refactoring a huge codebase in Python was really difficult. Maybe there is a better way to organize my python code, I do not know. Then I moved to Haskell (partly because of the excellent Parsec library which made things very simpler compared to the yacc style PLY I was using in Python). Initially, the fix-compile-execute cycle was really painful, but I soon realized how I could figure out some functional bugs (not type bugs) just by reasoning about the types. The compiler too helped in some cases with valid type conversion errors. I would have caught such issues in python only if I had a very large test suite which covers this corner case. Nevertheless, I am happy with the move and my love for static typing is only going up everyday.
A lot of points being made seem to focus on how quickly one is able to write code, and how dynamically typed languages are better since "you can write code faster!". If the speed at which you are physically able to write your code is the bottleneck , you are going to have problems regardless of whether your language is dynamically typed or statically typed.
Does he mention performance? (I havent watched the whole video) That would be the biggest reason to go for C++, say, for me, as a game programmer.
Also (once again) this is a very webapp developer orienteted talk, stuff like "stringly typed programming" and "all we do is put string in http requests" ignores all non webapp devs.
This does not take into account the fact that IDEs provide massive amount of help for statically typed languages and marginal to no help for dynamically typed ones.
For instance, every IDE I have tried fails miserably on Python. Javascript is even worse, even with a very smart tool like Idea.
Interesting - the primary point that devs spend more time with the type-safetyness than they would fixing the few type bugs they'd have made matches my impressions.
That said, when it comes to tests there are definitely times I miss having strong typing.
I recall one poster several years ago remarking with surprise when he discovered that not everyone agreed with him on what was most important to optimize: dev time or run time.
In my world, devs are massively overworked/overneeded, so anything to reduce dev time that doesn't cripple the product sounds like a good thing. Different markets will have different needs, but I've found a number of devs that consider run time vastly more important regardless of market.
If you want to argue that most hip dynamic languages will allow faster development than most hip static languages in many situations? Sure, I'll buy that. E.g. jumping from C++ to Ruby for a bit was, for many of the projects I worked on, a major productivity boost.
But I buy it because it's got enough weasel words, and it's focused on the languages in practice rather than the actual language attribute. Because I can think of clear and concrete examples where adding static typing helped: E.g. using Typescript to add static typing information to existing Javascript. This massively improved my speed in picking up new APIs via judicious abuse of Intellisense. And typescript as I'm using it is doing very little more than just adding static type information - much closer to purely comparing static vs dynamic typing than any combination of languages I can spot on these charts.
The article's statistics doesn't measure the productivity difference of static typing vs dynamic typing - it measures certain statically typed languages vs certain dynamically typed languages (among a million other caveats.) C++ vs Python? The biggest difference there isn't static vs dynamic - the former is weighed down by some of the most cumbersome explicit type annotation in existence (when explicit type info isn't fundamental to static typing at all). Worse, it has the nastiest grammar causing horrible build times, an absolutely insane dependency system, and a standard so absolutely rife with implementation defined, unspecified, and undefined behavior that it doesn't even define the size of it's common integer types. And I've yet to meet a C++ codebase that doesn't use it's common integer types.
The article also dismisses type errors too lightly, IMO. "Out of 670,000 issues only 3 percent were type errors (errors a static typed language would have caught)". But does this include such things as SQL injections, where SQL Data was mistakenly treated as SQL Commands? While most APIs don't leverage the type system to differentiate these in a way that will cause errors (be it at runtime or build time), I do consider this a type error, one that could be caught with type system. Unfortunately, handling the single vanilla string type you typically get tends to take precedence over creating such a type separation...
> People believe static typing catches bugs. People believe static typing helps document code. People believe static typing makes IDEs work better and therefore save time, etc. BUT … those are all just beliefs not backed up by any data
Yeah... no. It's backed up by data. Mathematics, even.
Types enable safe automatic refactorings. Without types, refactorings can break your code if not supervised by humans.
It's not that it can make just IDEs 'work better'. The compiler can work better with a great static type system. Haskell, for instance, is mind blowing.
Perhaps the problem is that static typing is confused with how some languages implement it. It can, and has, been implemented such that it dramatically improves compilation, correctness, static analysis and documentation without making code horrendously verbose.
Mind blowing is the amount of output and obscurity of the error message you get from the type checker. It's harder to understand the error message than to debug the program if compiled "unsafely".
It's a threshold I guess. My formal proof teacher said he thinks in type checking unification all the time. To him a program is a proof tree construction.
With time the way I approach code is also much more and abstract. You look at diagonal invariants more than code itself.
This isn't true. It is hard to understand (definitely) if you're new to Haskell and what the error is actually saying. As you get more familiar the errors become more clear.
What do you consider a refactoring? By wikipedia it's "Code refactoring is the process of restructuring existing computer code – changing the factoring – without changing its external behavior." By definition if some operation breaks code, it's either not refactoring, or it's due to a bug.
Have you got some examples for your issue? I'd say that with static typing you can apply refactoring and with with dynamic you can try and hope you applied refactoring (because everything about your variables could change at runtime).
You can refactor anything but refactoring can only be performed automatically and safely with types around. Without types, a human is required to verify that the tool didn't break the code with its refactoring, because it's just guessing at this point.
There is no guessing, there is just a false sense of security.
I have seen plenty of bugs introduced by refactoring in static languages. Static typing cannot verify correct behavior.
I sit on the fence in the dynamic vs. static typing debate, but I think the uncertainty of a dynamic language is often an advantage because you are much more likely to pay attention to the behavior of the system and not just the types. And at the end of the day, it's the behavior we're interested in, not the types!
I've seen multiple times when bugs have been pushed to prod due to a false sense of security because "it compiled".
What kind of bugs do you mean? Of course the refactoring functions can have bugs in them, but static typing provides enough information to refactor correctly, or to tell you that it cannot be done automatically.
Or in a different way: what's the example of a transformation which is valid relative to all types, order of allocation, synchronisation and any other elements of the included language and results in a different behaviour. (excluding runtime introspection that is) And can anything in the example identify why it cannot be done automatically? (also valid outcome)
> There is no guessing, there is just a false sense of security.
How is it false?
> the uncertainty of a dynamic language is often an advantage
How can uncertainty ever be an advantage over certainty?
> I've seen multiple times when bugs have been pushed to prod due to a false sense of security because "it compiled".
That same program would have compiled just as well with a dynamically typed language. Except with potentially more bugs that the compiler didn't catch.
Considering the examples use C,C++,Java as the statically typed languages didn't impress me. There are far better languages. Also, performance of the solution will be worse in dynamic languages.
> This first slide is from a research paper where the researcher wrote his own language and make both a statically typed and dynamically typed version then got a bunch of people to solve programming problems in it. The results were that the people using the dynamic version of the language got stuff done much quicker.
Does this first plot control for notions of quality and extensibility of the different solutions? A faster-to-develop but sloppier solution in a dynamic language which requires more painful investment to refactor for future use cases should not necessarily be viewed as better. If you are only saving short term time at the expense of much more long-term time, then whether it is a net win for you depends on your discount function.
> What was most interesting was that he tracked how much time was spent debugging type errors. In other words errors that the statically typed language would have caught. What he found was it took less time to find those errors than it did to write the type safe code in the first place.
For which developers, and with what level of experience with static typing? This was true for me 3 months after I started learning Haskell. Now I have > 8 years of Python experience and less than 2 years experience with Haskell and the type system demonstrably speeds me up. Way, way faster to use Haskell's type system first than to use Python's type system and trace backs to debug type errors later. (I still like and use Python a lot -- just sayin.)
> The guy giving the talk, Robert Smallshire, did his own research where he scanned github, 1.7 million repos, 3.6 million issue to get some data. What he found was that there were very few type error based issues for dynamic languages.
> So for example take python. Out of 670,000 issues only 3 percent were type errors (errors a static typed language would have caught)
This strikes me as one of the most problematic parts of the post. To me this just seems to be evidence that in Python, at least, TypeError is more common when you are using something interactively, and you can resolve the issue for yourself (because it generally directly means you are using it wrong, and it's not the library's fault).
This also resonates with my experience with Pandas on GitHub. Early on there was a lot of TypeError stuff with index-related issues, but once the bulk of that work became mature, index errors were then a signal of a novice user who needed to change the user code, and not at all an indication of a library problem worthy of opening a GitHub issue.
It seems totally reasonable to me to hypothesize that the types of problems worthy of becoming GitHub issues are not usually TypeError. But TypeError might still be a huge proportion of all of the errors encountered out in the wild.
Further, there's also some selection effects here for users who actually post things to GitHub. When I worked in quant finance, and everything was in Python, it was an hourly occurrence for hugely important parts of the system to hit type errors, and they were all incredibly painful to fix in the legacy code. This was just accepted as a way of life, and because the invest staff weren't incentivized to care much about code, they usually just hacked their own work arounds, and would never have dreamt of actually opening a GitHub issue about type errors (that would be way too slow of a dev cycle for them, which is why the state of the code was so poor in the first place!)
> His point there is that all that static boilerplate you write to make a statically typed language happy, all of it is only catching 2% of your bugs.
This is absolutely false and not a valid generalization of the presented data. For one, a major claim of static typing proponents is that by writing with static typing, it eliminates bugs from ever being introduced, and allows you to use a compiler workflow to verifiably remove entire classes of bugs. When you run some bit of Python and it does not produce a TypeError -- that doesn't mean the code is free of errors. It might just mean you got lucky that the data or the user selections or whatever didn't happen to hit the TypeError corner case. With a static language, you know that certain classes of errors are not even possible -- not just that they didn't happen to occur this one time, but that they cannot occur. This is very different.
Further, another claim of static typing proponents is that the design process of code with static also leads to fewer bugs because the mandate for static types forces you to clarify befuddled design ideas before the program will work. The benefit of this is murkier, for sure, but it's still something that can't be addressed by this particular data.
> Some other study compared reliability across languages and found no significant differences. In other words neither static nor dynamic languages did better at reliability.
It's interesting to me that that chart doesn't include any functional languages. Let's try it again with a pure functional language and see, and then also compare, say, Clojure with Haskell. If it keeps on robustly bearing out the same trend, then I might start to question my current beliefs on defect rates in dynamic, imperative languages.
> Part of that was reflected in size of code. Dynamic languages need less code.
This again is relative to the ability of a developer and also relative to different types of tasks. However, it's not really fair to compare languages like C, where brevity of syntax was not too big of a language design priority, with a language like Python, where brevity of syntax is sometimes militant (just try talking with "Pythonistas" on Stack Overflow about why one-liner-ness is really not that useful). And also, at least part of the result is fixed for you: static typing at the very least requires the extra type annotations -- although here again you could try against something like Haskell where you have very powerful type inference. I would be extremely surprised if, for equivalently experienced developers, Haskell programs were not consistently shorter than Python programs.
> He points out for example when he’s in python he misses the auto completion and yet he’s still more productive in python than C#
Try Jedi in emacs (or whatever the equivalent must be in vim). Although, I for one hate IDEs (get off my lawn) and I also hate autocompletion and editor utilities that jump to function or class definitions. I've never noticed a significant speed up from these, except possibly when I am merely reading code from a large codebase that is brand new to me. But I have often experienced huge slowdowns from the features getting in my way.
> Another point he made is that writing static types is often gross and unmaintainable whereas writing unit tests not.
See Haskell. Also, writing unit tests can be a nightmare in OO and imperative settings, where you need some inscrutable cascade of mocked architecture to be able to test things. This is where something like Haskell's QuickCheck can make life a lot easier. I'm sure you could cook up something like that in Python too. But I strongly believe that writing unit tests in Python is way uglier and more frustrating than writing type annotations in Haskell.
> Static types are also anti-modular. You have some library that exports say a Person (name, age ..). Any code that uses that data needs to see the definition for Person. They’re now tightly coupled. I’m probably not explaining this point well. Watch the video around 48:20.
This seems just wrong to me. You can declare structs as static in C and provide public helper functions that internally create data types, apply other static functions to them, and the produce results from them. In Haskell, it's very common to avoid exporting value constructors for data types, and to instead provide helper functions that allow for the implementations to remain hidden from anyone using the module. Modularity really has nothing at all to do with the dynamic vs. static typing debate.
I'll also throw one more downside of dynamic typing into the ring -- you sometimes will see really poor attempts to use so-called "defensive programming." In Python this is an especially bad code smell -- you'll see a huge block of assert statements right at the top of a function definition, in which all kinds of type properties and invariants of the arguments are asserted, so that TypeError can be raised immediately.
For one, in a dynamic typing setting, it's probably better if that stuff is the burden of the caller rather than the callee, in the spirit of a function "doing one thing and doing it well" it shouldn't also have to carry around all of its own type and invariant assertions. Notice that in a static language though, this isn't a problem and even is a huge benefit because it doesn't require the huge, human-error-laden block of asserts to achieve it. Just a nice, simple static typing annotation and then the compiler will deal with it.
Related to this, and as a final point, we should also need to give more "severity" to dynamic typing exceptions that occur at run time due to type errors. For example, in the financial job I mentioned before, it would be common place for an analyst to submit a very large batch processing job to the internal job manager. Some of these jobs took > 48 hours to compute and the output would mutate databases and so on.
So when someone set it running on Friday evening and expected there to be results in a database on Monday, imagine how awful it was to see that a TypeError had occurred and that not only did your manually created assertions fail to capture it, but also, there was no way of proving it couldn't happen without just running your code -- so you burnt maybe 30 hours of computational effort just to be told that upon hitting a certain point in the code, here's a TypeError.
This kind of error, which is categorically eliminated from possibility in a well-written static language program, should count for way, way more than a simple and stupid "oh I tried to call the API function with a list instead of a tuple, whoops my bad, let me just arrow-up in IPython and do it again" Type Error (though it's not clear to me that several of the referenced data in the post would make this distinction or penalize these types of errors more).
> You can declare structs as static in C and provide public helper functions that internally create data types, apply other static functions to them, and the produce results from them.
You can do that, with training and careful effort. But it was a design flaw that you have to do it manually, and that it isn't mandatory and trivial for even beginners to do. At the time C was "designed," this wasn't necessarily known to be important. We have no excuse today. But languages which do this wrong by default are still popular.
> In Haskell, it's very common to avoid exporting value constructors for data types, and to instead provide helper functions that allow for the implementations to remain hidden from anyone using the module.
In general, if calls require knowledge of type information at the call site, and the type needs to change for any reason (which becomes more likely as type annotation reaches further into program semantics) then all the call sites will need to be updated, or there will be an error.
In any published library, this means backward compatibility is completely broken and everyone else's code needs to change.
This is a misdesign in C and in a number of "statically typed" languages which crib from it.
> you'll see a huge block of assert statements right at the top of a function definition,
I almost never see this. The only time I see it is when a dogmatic true believer in the ideology of static typing writes Python. People can do stupid things in any language.
> this isn't a problem and even is a huge benefit because it doesn't require the huge, human-error-laden block of asserts to achieve it.
Humans are still required to provide type information, which means they can still make errors. Even better, correcting these errors often affects the interface at call sites, which means the fix has to break backward compatibility.
> so you burnt maybe 30 hours of computational effort just to be told that upon hitting a certain point in the code, here's a TypeError.
You were not reasoning correctly about your code. Proper testing should have been your safety net, but you weren't testing properly. If you are even vaguely trained and you are even vaguely trying, writing code which emits TypeError in production takes some doing.
The number of shops which never have problems in production is vanishingly small in ANY language.
It sounds to me like you got started in Python, and are identifying beginner's mistakes with the language itself.
> In general, if calls require knowledge of type information at the call site, and the type needs to change for any reason (which becomes more likely as type annotation reaches further into program semantics) then all the call sites will need to be updated, or there will be an error. In any published library, this means backward compatibility is completely broken and everyone else's code needs to change.
Notice I said you avoid exporting the value constructors. You're still free to export or not export the data type itself as you wish, allowing users to reference the type in type annotations while still not letting them ever construct their own value of the type except through helper functions.
This achieves even better modularity, because then in the implementation file, you can change what happens with the value constructors however you want, and you can service backward compatibility to your heart's content without ever requiring the users of the data type to even be aware that anything is changing.
Maybe you are referring to something else, but I am referring to data type and value constructors in Haskell. The data type itself is a distinct semantic construct in Haskell from the constructors of values of that data type, and they can have different privacy properties.
> I almost never see this.
Well, I've seen it over and over in production critical code in three different organizations ... so our anecdotes disagree.
> Humans are still required to provide type information, which means they can still make errors. Even better, correcting these errors often affects the interface at call sites, which means the fix has to break backward compatibility.
It depends on the language. In Haskell for example, you could just make a type union, one for allowing passage of the old-style interface and one for the new, corrected version. It's very easy to do, still has the upsides of type checking, and doesn't break backward compatibility.
> You were not reasoning correctly about your code. Proper testing should have been your safety net,
Except you missed the relevant test case, whereas a tool like QuickCheck would have had a better shot at discovering a corner case that humans couldn't have anticipated.
> It sounds to me like you got started in Python, and are identifying beginner's mistakes with the language itself.
I'm not sure what you're referring to. The code I was working with was written by a mix of many Python developers. Some were core committers to the Python language itself; some were data analysts who didn't want to be programming.
I can say that I haven't had significant front-end experience in Python. But I've touched a lot of most other major areas, particularly in very low-level NumPy code, LLVM stuff with both Numba and llvmlite, pandas, Excel tools, and many different database technologies and ORMs.
I will say though, that in the projects where we switched from pure Python over to statically-typed Cython, it cleared up tons and tons of our issues, many of them almost over night.
Rather than me finding beginner mistakes in Python, it seems to me like you worked on one single system that suffered a lot of issues with backward compatibility, and you're generalizing that backward compatibility experience to other areas where you're less familiar (like solving the same backward compatibility stuff in Haskell).