Hacker News new | past | comments | ask | show | jobs | submit login
Relating natural language aptitude to differences in learning programming (nature.com)
81 points by mooreds on March 3, 2020 | hide | past | favorite | 48 comments



> one participant was excluded because he was an extreme outlier in learning rate (>3 sd away from the mean).

Can anyone explain why did they do this? Was it just to prove their hypothesis? I bet this guy had really high numeracy scores as well.

Edit: They also botched some of the numbers:

> (Fig. 1A: mean learning rate = 1.25, range = 0.81–2.0, sd = 0.24)

The 2.0 point is not included in Fig. 1A, the highest learning rate is 15 lessons over 10 sessions or 1.5, so they included the outlier in that range but removed it in the plots.

Also I don't find it unreasonable that a person completed 20 lessons compared to the others 15, so I don't see why this was so extreme that he had to be removed. You could argue that the later lessons are harder so it was unreasonable, but then you just confessed that the learning rate metric is not linear so your regression models doesn't work since it assumes that going from .5 to .6 is equal to going from 1.5 to 1.6! Even though as we can see from the plot many of the best learners got stuck at lesson 16 for an entire session without making progress, while the slowest learners took 10 sessions to reach roughly the level these fast learners reached after the first. That is almost a 10x difference, but this plot shows it as just 2.5x.


Removing outliers is a common practice in data science. A quick search on "why removing outliers" returns a lot of discussion on the topic of when it should be done and when it should not. I encourage you to read some to get familiar with the arguments.

In this specific case, I suppose they could have added some motivation for their decision, but I don't think they did it in bad faith. They only had 36 samples, and a 3sd event occurs once in ~300 cases: this hints to the fact that that data point might have been erroneous, or that there could have been some other factor (e.g. earlier exposure to programming concepts - even disguised as something else).

If you are interested in the study, you can find the data and the script in the linked repository: https://github.com/UWCCDL/ComputerWhisperers The range for the learning rate seems wrong. If you check the data, there is no data point with 2.0: https://github.com/UWCCDL/ComputerWhisperers/blob/master/Com...


> They only had 36 samples, and a 3sd event occurs once in ~300 cases: this hints to the fact that that data point might have been erroneous

Or it hints that the distribution of learning rates is not gaussian. When there's an "n-sigma" event, it's usually much more likely that the model is wrong than that the event is that rare.


> Was it just to prove their hypothesis? Yes

Removing outliers is common, but extremely sketchy. The existence of outliers in the dataset at all is most likely because they are assuming a probability distribution that does not hold. If they are using the wrong distribution, none of their results mean anything.


> Can anyone explain why did they do this? Was it just to prove their hypothesis? I bet this guy had really high numeracy scores as well.

It's likely a predetermined exclusion criterion (although they should explicitly state that). An extreme outlier on the primary measure may suggest that the subject was not doing the task correctly (resulting in an extremely low score) or it may suggest that the subject misreported their lack of programming knowledge (resulting in an extremely high score).

Depending on what you're looking at, it can be standard procedure to exclude extreme outliers, but it should be announced as predetermined and be principled.


With statistics you want to generalize. Find the simplest model that catches/explains most variation. Leaving that one outlier in would result in a worse model.


I wonder if this predicts any better than a generic IQ test?

The idea behind IQ is that lots of different cognitive tasks are correlated.


From their supplementary table this looks like a general cognitive ability effect, although they don't seem to have included the range of measures that are typically included in general cognitive ability measures.

The short answer is that it's difficult to say from their results, and they don't explicitly test that, but it looks like it.


That would throw under the bus a lot of race-and-IQ suppositions seeing how Africans commonly speak 3 or 4 languages, sometimes more, and a large majority of white people and Asians are monolingual.


A majority of Europeans speak 2 languages, a quarter speaks 3 or more. So it isn't a white thing, it is an American thing. And it isn't because Americans are dumber than Europeans, it is just because learning new languages makes more sense for Europeans than for Americans due to those languages being smaller and foreign languages being physically closer.

Also mastering a single language is harder than learning to get by in multiple languages, as even though a majority of Europeans can speak multiple languages a majority of them are not really all that good even in their mother tongue as we can see from standardized testing. They aren't worse than Americans, but they aren't better either, they get roughly the same results.


>A majority of Europeans speak 2 languages, a quarter speaks 3 or more.

You haven't met many Europeans have you. Not all of them are Dutch, Scandinavian, urban youth or immigrants.

>a majority of them are not really all that good even in their mother tongue as we can see from standardized testing

This makes no sense. You can't be not good in your mother tongue - it's your mother tongue. Standardized testing has no say about how a language is structured, that's an absurdly prescriptivist point of view. As for foreign language ability, in real life it's not assessed by making them sit down and pass test questions, it's assessed by having them speak and write close to a native. Someone who casually uses slang, "bad forms" and occasionally makes native-sounding mistakes is more proficient at a language than someone who speaks an academic version of it and got good grades at a test.


I had a great-grandmother from rural Moldova. She spoke Vlach, Bulgarian and Russian.

Speaking three languages is not a big deal when you live in a village where everyone else speaks them.


Apologies, I'll amend my previous comment.

>You haven't met many Europeans have you. Not all of them are Dutch, Scandinavian, urban youth, immigrants or throwawaybbb's great-grand-mother.


My grandmother knew 6 languages.


> a large majority of white people and Asians are monolingual

Please edit "white people" to "American and Chinese white people"... people from smaller countries are more prone to knowing more languages :P Also, people who migrate or travel a lot. Also, since race discrimination is still a thing in the US and not only, smart people of color do their best to appear at least as smart as they really are, and knowing foreign languages probably helps a lot.

Being white and of the right nationality in a developed country simply allows you to be lazier, and you can afford to appear "dumb" in casual life, or to focus on only what you care about, hence you'd probably not bother learning any foreign languages, the ROI would be very bad for you, others will learn your native language instead... We all have limited amounts of brainpower and we try to invest it as well as we can.


Number of language spoken is largely a function of the environment and doesn't have much to do with intelligence. If you take a highly multilingual African and drop them in America, in two generations you will likely have mostly monolingual descendants.



You'd have to link to something more specific than "African immigration to the US" to make your point.


I had always hypothesized that numeracy is, besides being generally useful of itself, mostly useful to programmers as application domain knowledge rather than as an indicator of learning programming concepts or programming languages. That said, an awful lot of certain problem domains are heavy in mathematics.


(This is armchair psychologist speculation...)

I wonder if numeracy is actually a correlated factor to the propensity for modeling systems, which is very much relevant to learning to program in any programming language. In my anecdotal experience, being fluid with numbers often comes from having a bias towards thinking about the world in terms of numerical relationships.


Honestly the biggest benefit I can see with knowing some math beyond basic arithmetic for some types of programming is understanding code complexity and big-O notation.

If you're doing 3D graphics projections you're going to want linear algebra. If you're doing financial forecasts you'll probably want to know calculus. If you're modeling throughput of large menssage-passing systems with multiple cooperating processes you need to know ratios and proportions. If you're the one maintaining the query planner for a database or maintaining a large, high-performance data store you're going to want to understand relational algebra.

For a CRUD application or some light server automation understanding data flows, boolean logic, and some really light set theory helps but is not necessary to get started. There's an awful lot of code out there that is not actuarial, simulation, or engineering code.

People understand things numerically, sure, but also spatially, linguistically, mechanically, or logically. Mathematicians and physicists can give you more precise descriptions of many things that people don't do much math about in their daily lives. Just like people don't need to be linguists to communicate, people don't need to do a lot of heavy math just because the underlying tools are based in math and mathematical logic.

Math can definitely help one understand the system better and excel at optimizing code and solving certain programming issues. And as I said, a lot of the problem domains people write code for are themselves very heavy with math. There's a reason CS grew largely out of mathematics and physics departments at universities, but this study was about understanding the concepts of programming in a high-level language.


I don't disagree with you, but as far as I'm aware the study was talking about "numeracy", not math.

From the Wikipedia article on "numeracy":

Fundamental (or rudimentary) numeracy skills include understanding of the real number line, time, measurement, and estimation.[3] Fundamental skills include basic skills (the ability to identify and understand numbers) and computational skills (the ability to perform simple arithmetical operations and compare numerical magnitudes).

More sophisticated numeracy skills include understanding of ratio concepts (notably fractions, proportions, percentages, and probabilities), and knowing when and how to perform multistep operations.[3] Two categories of skills are included at the higher levels: the analytical skills (the ability to understand numerical information, such as required to interpret graphs and charts) and the statistical skills (the ability to apply higher probabilistic and statistical computation, such as conditional probabilities).

A variety of tests have been developed for assessing numeracy and health numeracy.

That's just basic arithmetic, not any of the heavy math that you described. I'd argue that for a lot of CRUD apps, numeracy as a skill is quite important - though that's not relevant to the topic of picking up programming languages/concepts.


Well, numeracy and "fundamental numeracy" / "rudimentary numeracy" leaves a bit of wiggle room. Conditional probabilities and the ability to interpret charts already are quite a bit more advanced than percentages. None of this is really as helpful as Boolean logic for basic programming skills in a high-level language.


They asked the subjects to write a Rock Paper Scissors game in python. Then asked programmers to mark what people did. They unfortunately didn't provide the source code the subjects wrote.

From what I've seen online this is done with an if/else tree which is very far from a good solution.

The way I would model this is with a group [0]. Something you can't possibly see without a very substantial background in mathematics.

Programming has as much to do with learning a programming language as novel writing has to do with learning punctuation.

[0] in Python3.6+:

    import random

    class hand:
        wins = {'r': {'r': 'r', 'p': 'p', 's': 'r'},
                'p': {'r': 'p', 'p': 'p', 's': 's'},
                's': {'r': 'r', 'p': 's', 's': 's'}}
        def __init__(self):
            self.value = random.choice(list(hand.wins.keys()))
        def assign(self):
            value = input("enter r, p or s: ")
            assert value in hand.wins.keys()
            self.value = value
        def __mul__(self, other):
            return hand.wins[self.value][other.value]
    
    if __name__ == "__main__":
        for i in range(10):
            h1 = hand()
            h2 = hand()
            print(f"{h1.value} * {h2.value} = {h1 * h2}")


I don't have time to read this whole study, but am a little skeptical of the hypothesis.

Much of natural language aptitude is about fuzzy logic with lots of exceptions (in English at least) and contextual intuition (especially in Chinese). Whereas programming computers requires more mathematical oriented thinking and raw working memory for abstract symbolic logic.

I recall Joel Spolsky (I think) wrote in one of his essays a long time ago, that one of the obstacles to learning programming for some people is that they kept trying to find some kind of meaning in the symbols. Eg they remember things by associating meaning with them. Whereas people who pick up programming more easily have no need to find meaning, they just remember things directly without any kind of association aids.

Natural language is all about meaning on multiple levels - literal, contextual and implicit. Someone naturally good at that may not necessarily be naturally good at programming.

For anyone who read the study, was this addressed in any way?


"Across outcome variables, fluid reasoning and working-memory capacity explained 34% of the variance, followed by language aptitude (17%), resting-state EEG power in beta and low-gamma bands (10%), and numeracy (2%). "

Seems fluid intelligence is twice as important.


> Much of natural language aptitude is about fuzzy logic with lots of exceptions (in English at least) and contextual intuition (especially in Chinese).

Not saying you're wrong, but do you have a source to back up this claim?


Linguistics, literature, epistemology, philosophy, cognitive behavior studies,... for starters.

Language is a social construct. What you see on paper or a screen, the vocal sounds you hear as someone speaks, those are just physical representations. They don't carry any inherent meaning.

All human interaction is based on a shared understanding of the world. When I state that "one plus one equals two", I assume that my audience and me agree on what each of those terms in that sentence means. That we share the same frame of reference to assert that statement as true or false.

I could easily invent an entirely new language where the only difference is that the words "two" and "three" swapped meaning. At that point, the sentence "one plus one equals three" carries just as much truth, as long as we share a common understanding of what "three" means.

The difficulty then is that a common understanding of shared meaning isn't always clear cut. Culture, education level, aptitude, personality,... all impact how we perceive and interpret the world, and how we'll use language to build an abstract mental model from which we can can assert our own individual identity.

What many programmers tend to forget is that a "high-level programming language" is exactly that: a vocabulary and a syntax that mimics a natural language which allows you to describe the world. It's NOT merely an abstraction of low level internals of a computer as is often assumed.

Implementing a feature request then consists of interpreting whatever you've read in a spec, user story, brief,... transform that into your own mental model and then express that using the limited formal set of symbols provided by the language. Functional testing is basically verifying if your own unaware assumptions and biases didn't break a shared framework of understanding or created a disconnect between you and the client or the user.

Developers have a penchant for creating new frameworks and languages based on a fallacy: that theirs will be able to somehow "fix" the "fuzzy logic of language", whereas that's inherently impossible. Unless, you're the last person alive with no-one to challenge you on how to interpret of things, that is.


The scatter plots seem quite spread out. If you removed one outlier person from the numeracy plot it would be much more correlated. Maybe that person is a special case somehow.

I wonder if there is any formalized way in statistical hypothesis testing to quantify how much the conclusions could be changed by removing at most k (e.g. k=1) number of datapoints.


It is suspicious that they had so many with 0 correct answers out of 8. Here are the 4 easier ones of the questions:

> If the chance of getting a disease is 10%, how many people would be expected to get the disease? Out of 1000?

> If the chance of getting a disease is 20 out of 100, this would be the same as having a _____% chance of getting the disease.

> Imagine that we roll a fair, six‐sided die 1000 times. Out of 1000 rolls, how many times do you think the die would come up as an even number?

> In the BIG BUCKS LOTTERY, the chances of winning a $10.00 prize are 1%. What is your best guess about how many people would win a $10.00 prize if 1000 people each buy a single ticket from BIG BUCKS?

https://onlinelibrary.wiley.com/doi/10.1002/bdm.1751

I guess a lot of people never learn what % mean, and since basically all questions are related to % they would fail just from that. So I'm not sure if this is a good test of numeracy, feels like it relies too much on a few key items.

Edit: Also that test was designed so it gave a normal distribution for a typical population, in this study we see a lot of 0's and then no 1's and 2's. So it feels like the population tested wasn't representative, or some of the students asked just didn't bother with the math questions. Since students are forced to participate I wouldn't be surprised if a few of them just skipped trying at all on the math parts.


>> Imagine that we roll a fair, six‐sided die 1000 times. Out of 1000 rolls, how many times do you think the die would come up as an even number?

A classic example of a question you can only answer if you don't know what you're talking about. I bet they think the answer is 500.

500 is the most likely result, but the odds of actually getting 500 heads on 1000 flips of a fair coin are 2.5%, 1 in 40. A little ways out, at 505 (or 495) heads, the odds have fallen all the way to... 2.4%.

This is kind of like asking "Imagine that we roll a fair, six-sided die one time. Out of that one roll, which number do you think would come up?"

Except, assuming you make the best possible guess both times, you're more than six times as likely to be right for my revised question.


Isn't the correct answer about 500? Sure, getting exactly 500 would not be highly probable. But getting approximately 500 would be.


Is "approximately 3.5" the answer to "which number will come up when I roll this 6-sided die once?"?


The Expected value would be exactly 3.5 ( https://en.wikipedia.org/wiki/Expected_value#Finite_case ). But isn't the question malformed since for a single role only one of the discrete values can come up?


How is "which number will come up when I roll this die?" better or worse posed than "how many even numbers will come up when I roll this die several times?"?


Good point but the solution is not stats for how much the results depend on k data points. The solution is to have a much larger sample. N=36 clearly does not support far reaching generalizations.


That's actually pretty typical for behavioral science scatter plots. Getting p>.95 on a regression line is not something that is obvious from looking at the raw data.


I think eyeballing a scatterplot is often more useful than just looking at the p-value.


One of the big takeaways from my stats class is that eyeballing a scatterplot is much less useful than you think it is.


Eyeballing something is useful to get theories that can actually explain the data. Numeric statistics are useful to disprove those.

If you are at an exploratory phase and your plots look a sky map, then you have a bad data representation. On that case you can't even extrapolate a positive correlation into a theory that one value will grow when the other grows on any specific case.


I am missing something. This is a study with 36 triallists fitting responses to a relatively high-dimensional model with several input and explanatory variables.

I think one could reasonably expect to see a wide range of outcomes in this circumstance, but it does not seem newsworthy.


I have been wondering about this for a while, because I have some anecdata of a few kids with very strong language aptitude but somewhat weak numeracy. And they are pretty advanced for their ages in learning programming.


Interesting, although of course these things are very hard to study rigorously. When I did the UnderC interactive interpreter, my hypothesis was that people learn programming better in a 'conversational' setting, just as with human language. It remains an hypothesis - it is probably more true for some people, who just like REPLS :). A stronger case can be made for "interactive rich environment", e.g. the classic LOGO experiments.


I don't understand how an n=36 study got into Nature in this day and age.


This is "Scientific Reports", an open access journal also published by the Nature publishing group. It's not the actual journal "Nature".


The title should read

"Relating natural language aptitude to differences in learning basic PYTHON programming"

as it doesn't represent other programming discipline (e.g. low-level-programming, concurrency)


I'm curious as to whether the correlation between natural language aptitude and programming aptitude runs both ways.


How many programmers scored higher on verbal SATs than math?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: