Hacker News new | past | comments | ask | show | jobs | submit login

> If you have the research paper, someone in the field could reimplement them in a few days.

Hi I did this while I was at Google Brain and it took our team of three more like a year. The "reimplementation" part took 3 months or so and the rest of the time was literally trying to debug and figure out all of the subtleties that were not quite mentioned in the paper. See https://openreview.net/forum?id=H1eerhIpLV




Replication crisis: https://en.wikipedia.org/wiki/Replication_crisis :

> The replication crisis (also called the replicability crisis and the reproducibility crisis) is an ongoing methodological crisis in which the results of many scientific studies are difficult or impossible to reproduce. Because the reproducibility of empirical results is an essential part of the scientific method,[2] such failures undermine the credibility of theories building on them and potentially call into question substantial parts of scientific knowledge.

People should publish automated tests. How does a performance-optimizer know that they haven't changed the output of there are no known-good inputs and outputs documented as executable tests? Pytest-hypothesis seems like a nice compact way to specify tests.

AlphaZero: https://en.wikipedia.org/wiki/AlphaZero

GH topic "AlphaZero" https://github.com/topics/alphazero

I believe ther are one or more JAX implementations of AlphaZero?

Though there's not yet a quantum-inference-based self-play (AlphaZero) algorithm?

TIL about the modified snow plow problem is a variation on TSP, and there are already quantum algos capable of optimally solving TSP.


I think I agree with everything you've said here, but just want to note that while we absolutely should (where relevant) expect published code including automated tests, we should not typically consider reproduction that reuses that code to be "replication" per se. As I understand it, replication isn't merely a test for fraud (which rerunning should typically detect) and mistakes (which rerunning might sometimes detect) but also a test that the paper successfully communicates the ideas such that other human minds can work with them.


Sources of variance; Experimental Design, Hardware, Software, irrelevant environmental conditions/state, Data (Sample(s)), Analysis

Can you run the notebook again with the exact same data sample (input) and get the same charts and summary statistics (output)? Is there a way to test the stability of those outputs over time?

Can you run the same experiment (the same 'experimental design'), ceteris paribus (everything else being equal) and a different sample (input) and get a very similar output? Is it stable, differentiable, independent, nonlinear, reversible; Does it converge?

Now I have to go look up the definitions for Replication, Repeatability, Reproducibility


Replication: https://en.wikipedia.org/wiki/Replication (disambiguation)

Replication_(scientific_method) -> Reproducibility https://en.wikipedia.org/wiki/Reproducibility :

> Measures of reproducibility and repeatability: In chemistry, the terms reproducibility and repeatability are used with a specific quantitative meaning. [7] In inter-laboratory experiments, a concentration or other quantity of a chemical substance is measured repeatedly in different laboratories to assess the variability of the measurements. Then, the standard deviation of the difference between two values obtained within the same laboratory is called repeatability. The standard deviation for the difference between two measurement from different laboratories is called reproducibility. [8] These measures are related to the more general concept of variance components in metrology.

Replication (statistics) https://en.wikipedia.org/wiki/Replication_(statistics) :

> In engineering, science, and statistics, replication is the repetition of an experimental condition so that the variability associated with the phenomenon can be estimated. ASTM, in standard E1847, defines replication as "... the repetition of the set of all the treatment combinations to be compared in an experiment. Each of the repetitions is called a replicate."

> Replication is not the same as repeated measurements of the same item: they are dealt with differently in statistical experimental design and data analysis.

> For proper sampling, a process or batch of products should be in reasonable statistical control; inherent random variation is present but variation due to assignable (special) causes is not. Evaluation or testing of a single item does not allow for item-to-item variation and may not represent the batch or process. Replication is needed to account for this variation among items and treatments.

Accuracy and precision: https://en.m.wikipedia.org/wiki/Accuracy_and_precision :

> In simpler terms, given a statistical sample or set of data points from repeated measurements of the same quantity, the sample or set can be said to be accurate if their average is close to the true value of the quantity being measured, while the set can be said to be precise if their standard deviation is relatively small.

Reproducible builds; to isolate and minimize software variance: https://en.wikipedia.org/wiki/Reproducible_builds

Re: reproducibility, containers, Jupyter books, REES, repo2docker: https://news.ycombinator.com/item?id=32965961 https://westurner.github.io/hnlog/#comment-32965961 (Ctrl-F #linkedreproducibility)


alltime classic Hacker News moment: "heh someone in the field could write this in a few days" "Hi its me 3 of us literally work at google brain and it took us a year"


I don't know the subject matter well enough to make the call, but it's possible the OP is making a general statement that's generally true even if it's not in this specific context of it taking a year.


One thing that's not appreciated by many who haven't tried to implement a NN is how subtle bugs can be. When you look at code for a NN, it's generally pretty simple. However, what happens when your code doesn't produce the output you were expecting? When that happens, it can be very difficult and time consuming to find the subtle issue with your code.


How come you weren't able to just get it from DeepMind given that they are a subsidiary of Google? Is there a lot of red tape involved in exchanging IP like that?


They were & are very protective of the AlphaGo "brand", is the best-case explanation.


That defeats the purpose of reimplementing it?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: