Differences between the word2vec paper and its implementation

DannyBee · on June 3, 2019

Speaking as someone who has read about 40 years of papers in compiler optimization, it's very interesting.

In the early days, there were fairly exact algorithms that worked as described, and were implemented as described, but were pseudocoded in papers. Where the pseudocode differed from implementation, differences were described in great detail (IE they may say an array can be shared but isn't to make the pseudocode easier to describe).

Then, over time, things start to get further away from that. You start to see papers published with algorithms that either don't work as described, or are so inefficient as to be unusable. Like literally cannot work. Where you can get source code, later looking at the source code shows that's not what they did at all.

One infamous example of this is SSAPRE - to this day, people have a lot of trouble understanding the paper (and it has significant errors that make the algorithm incorrect as written). The concept sure, but the exact algorithm - less so. Reading the source code to it in Open64 - it is just wildly different than the paper (and often requires a lot of thought for people to convince themselves it is correct).

It's not just better engineering/datastructures vs research algorithms.

The one shining counterexample is the Rice folks who wrote their massively scalar compiler in nuweb (one of many literate programming environments), so the descriptions/papers and code were in the same place - these are very very readable and useful papers in my experience.

Nowadays it's coming back to the earlier daysdue to github/et al. People seem to try to make the code more like the paper algorithm since they now release the code.

Word2vec appears to be a counterexample (maybe because they released the code they didn't feel a need to get the paper as right)

bayareanative · on June 4, 2019

Editors gotta be more rigorous and only accept papers with completely reproducible portable examples, i.e., docker images, literate code and source code repos. Pseudocode is helpful to be platform neutral, but if it's not precise enough to implemented as code, then it's still a proprietary figment of someone else's imagination akin to the squishy social sciences where almost anything goes, not rigorously reproducible science. Keep the standards high or the quality will taper off.

PS: Sometimes I think some researchers think they're helping themselves keep their research proprietary so they will able to monetize their special knowledge or implementation, especially if no one else can make it work ("knowledge" (job) security/silo). Why do the hard work of figuring out how to make a novel AI/ML algorithm if it can be readily commercially monetized without recompense? (Modern Western civ doesn't have a good patronage system to uniformly support arts, trades and sciences.)

opportune · on June 4, 2019

In some fields, like in ML/AI or in other data-sciencey fields, keeping your code / training data closed prevents other researchers from building or improving on your work. It's more than just monetization, in that case it's just tragedy-of-the-commons career growth

duckmysick · on June 4, 2019

> keeping your code / training data closed prevents other researchers from building or improving on your work.

I might be naive in this regard, but isn't that the main point of doing research?

dekhn · on June 4, 2019

Sure, but the challenge is, if you're in a competitive field and you make it easy for your competitors, you're going to have a harder time establishing job security. Currently many fields like biology, CS, ML, physics have fewer well-paying job positions than contenders, so every competitive advantage makes a difference. I know plenty of professors who got there by carefully writing papers that lacked specific details on how to reproduce a cutting-edge, "hot" paper.

opportune · on June 4, 2019

In theory, yes. The actual main point of doing research, from the researcher's point of view, is to advance their career. Typically this means posting cutting-edge results in high impact journals using novel methods. Relinquishing control over crucial details allowing a researcher to continue publishing high-impact papers would increase the competition for the researcher in that academic space, making it harder for that individual to advance their career.

duckmysick · on June 4, 2019

How do the sources of funding (R&D divisions, universities, governments) fit in this picture? Do they have access to the "secret sauce" or do they have to pay extra consultation fees on top of research grants.

gdy · on June 4, 2019

"Sometimes I think some researchers think they're helping themselves keep their research proprietary so they will able to monetize their special knowledge or implementation, especially if no one else can make it work ("knowledge" (job) security/silo)."

My very limited experience says it is true. While our team worked for a commercial company in a certain highly competetive industry, I've both read many papers describing algos that on a closer look couldn't work as described and myself was forbidden to publish anything allowing our competitors or potential buyers to reproduce our algorithms. Our team ended up submitting a paper that was turned down by a reviewer and called an advertisement for the lack of details. None of us wanted to 'fix' that by adding fake details as in the papers we had read, so we ended up not publishing anything while working on the projects in that industry.

p1esk · on June 4, 2019

And who exactly is supposed to verify that code is reproducible for all the papers that get accepted?

Qworg · on June 4, 2019

The editors, previous to accepting. If it doesn't run out of the box and reproduce the result, it doesn't get published.

elcomet · on June 4, 2019

It's not the editor's job to replicate the results. Imagine the same requirements in medical research, this would be crazy. Editors accept a paper based on the quality of the paper and the claim.

Then the paper needs to be replicated a few times by independent research teams to be considered valid. This is the step we are probably missing in AI. Any published paper (and even unpublished, on arxiv) are considered valid and by the community but sometimes cannot be replicated.

sampo · on June 4, 2019

> It's not the editor's job to replicate the results.

They obviously meant the reviewers. Not everyone is intimately familiar with academic terminology.

elcomet · on June 4, 2019

The reviewers are commissioned by the editor, so I would say it's the same thing. Unless you meant the reviewers should replicate the papers with their own ressources, but I think it's not what we want. Reviewers are benevolent, and they don't have time nor resources to replicate results.

Maybe a solution would be a platform, like a CI for machine learning where authors would send their codes, and the CI would run it for them, and make the results public. Then everyone could check the code and results, reviewers included.

p1esk · on June 4, 2019

Please keep in mind that currently there are a lot of unqualified people reviewing papers, even at top conferences, as evidenced by OpenReview.net. There was an experiment done at NIPS a few years ago where something like half of the rejected papers got accepted after having been reviewed independently by different reviewers, and/or the other way around. If people can't even agree if paper is good or not, while assuming all results are reproducible, then we have a bigger problem.

To me, getting the right people to review the paper is more important than reproducible code.

scott_s · on June 4, 2019

There is a difference between unqualified people, and qualified but busy people. While there may be unqualified people reviewing papers, even qualified people who don't put in enough time and effort to review a paper comes to the same result. But the solution for that problem will look different.

There's also a big difference between accepting a paper and judging if it is good. At a top conference like NIPS, lots of good papers will get rejected. That's the nature of being a top conference in a hot field. It's similar to a high school valedictorian not getting into Harvard: it's not a judgement that they are a bad student. But when you're an elite thing, and lots of people want in, you end up looking for reasons to reject rather than accept. You will end up rejecting good papers/students in such a case. NIPS, or NeurIPS now, is a particularly interesting example, as a colleague recently told me that they just got 12,000 submissions this year.

There's also an issue in computer science papers where people may not like your paper based on personal taste. They may not be convinced you're solving a real problem, and if your paper involves a new artifact, the design of it. I refer to these as "your baby is ugly" reviews. They don't think you're wrong, they just don't like it.

I think people's time is a bigger thing than the wrong people. One problem, I think, is that we tend to compete for these once-a-year conferences, causing an enormous time crunch for reviewers. VLDB has a great model where they do rolling acceptance throughout the year, accepting a few papers every month, even though the conference happens once a year (http://vldb.org/2019/?submission-guidelines). I believe this is the best hybrid between the typical CS conference system, and the more common journal system in the rest of science and engineering.

p1esk · on June 5, 2019

Submitting a paper to a conference for it to be published 11 months later is not going to work - the paper will be hopelessly obsolete by then.

A better solution imo is to improve openreview.net process somehow, so that I'm motivated to go there, find papers relevant to my research (ideally get notified when such papers are posted for review), and leave a review (and perhaps vote on other reviews just like we do here on HN, or do something else to influence the acceptance decision). Obviously there should be methods to prevent abuse: moderators, reputations of reviewers, restricting reviewers to those with specific publications (e.g. based on keywords), etc. Btw, code reproducibility could be used to give some weight in such public review process.

scott_s · on June 5, 2019

> Submitting a paper to a conference for it to be published 11 months later is not going to work - the paper will be hopelessly obsolete by then.

That's not how the VLDB process works. What they did was establish a PVLDB journal which has a monthly deadline, and it accepts 5-12 papers a month. The papers are public on the website a few months after acceptance. (See: https://vldb.org/pvldb/vol12.html) The VLDB conference is then all of the papers that appeared in PVLDB in the past year.

I would be in favor of exploring your model as well, but I also see the hybrid model developed by VLDB as superior to the standard conference submission and review process.

elcomet · on June 5, 2019

This is ok if you can upload preprints on arxiv or equivalent. The obsolescence of papers already happens now, when a conference is 6 months after the submission, and the preprints system works fine.

Qworg · on June 4, 2019

This is a reasonable approach - a forcing function to require working code (social or otherwise) would be a good first step.

p1esk · on June 4, 2019

Some code might require 8 GPUs and a week to verify a single result. There could be a dozen results to be verified in a paper.

Are you sure you have thought this through?

anewhnaccount2 · on June 4, 2019

The experiments themselves should be conducted on fully logged public infrastructure so that it impossible to miss anything out. This final log would include the inputs: shell commands/git repositories/Dockerfiles, training and testing data; important intermediate data: Docker images and trained models; and outputs: tables of numbers or what have you. This way, the whole thing is vouched for automatically, and the editors don't need to do anything.

It would be a big change, but this is actually possible, and it would make things easier for everyone down the line.

solidasparagus · on June 5, 2019

Who is going to pay for that?

Qworg · on June 4, 2019

I have thought it through. There are multiple major papers that are published each year that purport to show an advancement, but fail to demonstrate that advancement under different random parameters.

This was a major topic at NIPS 2017, yet doesn't seem to be important enough to address.

tsbinz · on June 4, 2019

8 gpus and a week is cheap even, there are popular papers that need upwards of 10k GPU hours.

sampo · on June 4, 2019

Well, it usually takes months to get a paper reviewed anyway, so it's not like there isn't time for that...

pfortuny · on June 4, 2019

It is not like, ehem, Elsevier does not have the money to pay for this...

solidasparagus · on June 5, 2019

AI papers do not get published by Elsevier

kaitai · on June 4, 2019

Agree with sibling commenters that editors shouldn't be doing this themselves -- in my field, for instance, an editor is just the paid-in-prestige prof who agrees to look through articles submitted in a certain area, find referees for the paper, cajole the referees to review it at 6 months, cajole the referees again at one year, etc., and then make a decision based on the referee reports. (Clearly I'm not in CS as a 1.5-year turnaround is not so nuts in my field.)

Anyhow, I think Qworg has a point in that the editors do have responsibility for what they accept. It's true that referees do not have the resources to reproduce results: I for instance have limited time on the shared high-performance computing resources we have, so I'm not going to use it on someone else's paper. However, as another commenter says, this could be paid for. A journal could potentially "sell" this as a competitive advantage: papers in this journal were actually reproduced, by paid staff who are neither editors nor referees, people paid in currency that can be exchanged for goods (not academic currency).

nullwasamistake · on June 3, 2019

I ran into the same thing when implementing a hash table sorting algorithm from a recent paper. Nowhere in the paper does it mention that table size MUST be a power of two.

The new table structure is supposedly more memory efficient than anything else out there. That's the whole point of it.

So it's nice that they leave out "btw only works with size power of two". I got 2/3 of the way through an optimized implementation before I realized it wasn't any more efficient than the previous contender, in fact much less so, if you couldn't fit your set into exactly a power of two size.

Not outright lying but creative ommission for sure

bjourne · on June 4, 2019

Backing arrays for fast hash tables are always powers of 2. That's how they are implemented in all mainstream languages. Details are often omitted in academia because of publishing constraints or because the authors thought they were obvious.

rurban · on June 4, 2019

Fast yes, but more insecure. Using prime sized tables will get all hash bits being used. With power of 2 you can get away with the mod, but it's easier to crack, and the grow factor is bigger than the ideal fibonacci sequence, or a prime sequence.

bjourne · on June 6, 2019

As long as you have a decent hash function that distributes the bits properly it doesn't matter. IME, it makes no difference performance wise if you the golden ratio (1.61...) as the growth factor or 2.

ur-whale · on June 4, 2019

I think it's ok to names names in this specific instance: you'll save other people a lot of pain.

nullwasamistake · on June 4, 2019

It's only been implemented ~3 times besides the paper reference and I do too much ranting on HN to risk connecting it to my professional image. Particularly critisicm of big tech companies I may want to work for someday. Sorry.

It's very niche, I doubt many will run into it. Writing your own hash table is generally considered a bad idea :). I was just very bored and annoyed with memory limits. It will only affect implementors, which is less than 5 people in 4 years. Good example of implementor beware though, I'm sure it's a common issue

opportune · on June 4, 2019

Is it more efficient in terms of algorithmic complexity or just performance on real systems?

I don't think it's creative omission if it's a complexity improvement but it's definitely worse if the performance gains were just empirical but relied on that one condition.

bollu · on June 3, 2019

As someone who works on compiler optimisations, I'd love to get a link the Rice compiler's source code --- do you happen to have a link?

acqq · on June 4, 2019

It also appeared interesting to me, but then I discovered that I already have a nice book from the Rice compiler people:

https://www.amazon.com/Engineering-Compiler-Keith-Cooper/dp/...

And the example of the paper with literate code:

https://www.cs.ucsb.edu/~ckrintz/papers/shared.ps.gz

Still it would be nice if somebody decides to get and also make available the papers or code for which the links are now dead, e.g. those linked from here:

http://keith.rice.edu/publications-2/technical-reports/

DannyBee · on June 4, 2019

Not sure i still have it around ping me at the email in my profile and i'll see what i can do. It used to be on their ftp server, which they took down.

I expect if you email ken kennedy over at rice, he may be able to find it as well.

ms013 · on June 4, 2019

Ken died like 15+ years ago. Contact John Mellor-Crummey instead: he still does compiler/language work there AFAIK.

DannyBee · on June 4, 2019

Sorry, yes, Ken passed away, i meant keith cooper, i was writing quickly.

bollu · on June 4, 2019

You seem to have taken down your e-mail information on your profile :)

karlding · on June 4, 2019

What if you simply worked on the AST/token level and converted that into pseudocode, assuming you either annotated your functions with a description or named them following some sort of convention? Then you could write your code and then "compile" your source files to pseudocode.

Seems like this could be useful for writing specification documents or anything safety critical..

RyEgswuCsn · on June 3, 2019

I find the title of the article rather exaggerating...

As of the first difference pointed out in the article, one of the CS224D lectures on word2vec did addressed it:

https://youtu.be/aRqn8t1hLxs?t=2650

It was also mentioned later in the lecture that having two vectors representing each word is meant to make the optimisation easier (so it's kind of a trick); at the end, the two vectors learnt will have to be averaged over in order to reach a single vector for each word.

To be fair, the fact that each word is represented by two vectors was also mentioned in the original paper describing word2vec:

https://arxiv.org/pdf/1310.4546.pdf

On page 3, just beneath equation (2).

Why so surprised?

billconan · on June 3, 2019

For the past week I have been frustrated by an opensource code of a deep learning paper. This type of things are so common in academia. The particular code I looked at has missing documentation, hardcoded local paths, broken dataset download links and broken pretrained model download links. I have to fix bugs before the code can run. I'm very curious how did the author run that code with the bugs.

I call them insincerely opensourced projects.

liuliu · on June 3, 2019

Maintaining a pipeline that works takes effort that can be spent somewhere else. As long as it can be reproduced with a little effort from me, I really don't mind. I would rather they spend time to try a lot of different things, build intuitions, and search better backbone structures than maintaining a CI system so that every time their code updates can train imageNet to < 20% top-1 error rate.

That should be left to TensorFlow model zoo or GluonCV model zoo. I simply look at these research "open-source" as reproducible research.

PaulHoule · on June 3, 2019

Yeah but I wonder what fraction of CS papers test the algorithm that they thought they were testing...

seanmcdirmid · on June 4, 2019

A lot of CS papers aren’t testing algorithms at all.

opportune · on June 4, 2019

It is very common in the last few pages of a CS paper to benchmark your results against other algorithms or models

seanmcdirmid · on June 4, 2019

And it’s also very common not to. CS papers are pretty diverse in what they are talking about, many aren’t algorithm papers.

billconan · on June 3, 2019

I'm not asking for much, but the code should be runnable at least.

This is why I think jupyter notebook and things alike are so important as the future publishing media. Reproducibility is very important.

taeric · on June 4, 2019

Jupyter notebooks are laughably not reproducible right now. They may get there, but they encourage bad development habits. Actively.

colechristensen · on June 3, 2019

It is very common. Scientist code isn't software engineering code (and software engineering code is very often worthy of criticism)

It's just good news when you can find source code at all instead of just being told vague things about something the author did.

itronitron · on June 4, 2019

In my opinion, a paper should describe the contribution in sufficient detail that it can be implemented by someone with similar expertise. I believe that would foster greater innovation as people would necessarily have greater familiarity with all of the details, and they would better see why the thing works or doesn't.

raverbashing · on June 4, 2019

> Scientist code isn't software engineering code (and software engineering code is very often worthy of criticism)

Absolutely correct, and not in a good way

Though to be fair, people doing research have other interests besides maintainable code (though half or more of the annoyances will probably come and bite them later)

curl_e · on June 4, 2019

The good news is people are trying to address this. Starting in the UK there is a movement for Research Software Engineering https://rse.ac.uk/ and happening right now is the German RSE conference https://www.de-rse.org/en/conf2019/

dTal · on June 4, 2019

>It's just good news when you can find source code at all instead of just being told vague things about something the author did.

This is extremely common, and I don't see why it should be allowed. I could easily write a paper claiming some kind of interesting result, fake a few "ground truth vs my algorithm" pairs, and talk a bit of mathy-sounding piffle about what it does - and it would be indistinguishable from many papers I have read.

In my opinion - provide runnable source, or what you have isn't a paper, it's a boast.

nickpsecurity · on June 3, 2019

One explained to me here that the problem is they're mainly rewarded by funding agencies for how many papers they publish. That's why quality went down in general. Then, since funding agencies aren't rewarding code, they're either not encouraged to get it right or actively discouraged from doing it to focus on next paper.

This sad situation is both bad for open source and science. Bad for science since inaccurate results that probably aren't even reproducible aren't the knowledge advancements we need. Funding agencies like NSF need to change their policies to address the root cause here. Universities will follow where the incentives lead them.

not2b · on June 3, 2019

Then some other academic can get credit for publishing a rigorous paper describing the actual algorithm, explaining why the original paper is wrong (without speculating on the motive, just the facts), and why the approach taken in the code is in fact better. Bonus points for describing a further improvements, with working code.

YeGoblynQueenne · on June 3, 2019

In machine learning, you don't get credit for publishing rigorous papers. You get credits for publishing papers that show improved performance:

One big challenge the community faces is that if you want to get a paper published in machine learning now it's got to have a table in it, with all these different data sets across the top, and all these different methods along the side, and your method has to look like the best one. If it doesn’t look like that, it’s hard to get published. I don't think that's encouraging people to think about radically new ideas.

Now if you send in a paper that has a radically new idea, there's no chance in hell it will get accepted, because it's going to get some junior reviewer who doesn't understand it. Or it’s going to get a senior reviewer who's trying to review too many papers and doesn't understand it first time round and assumes it must be nonsense. Anything that makes the brain hurt is not going to get accepted. And I think that's really bad.

Geoff Hinton interview:

https://www.wired.com/story/googles-ai-guru-computers-think-...

I guess people look at statistical machine learning and deep learning, see all the formulae and hear all the calculus terminology and think - "oh, wow, that's a really rigorous field! Look at all the formalisms!".

But it's not. It's an extremely, almost exclusively, empirical field. The mathiness and the formulae are just unfortunate attempts to pass off the whole endeavour as something that it's not- some kind of careful science that uncovers deep truths about intelligence and cognition. In truth, it's all just about beating other peoples' systems in very specific benchmarks.

If it wasn't for this culture of pretensions to higher science, machine learning papers would most likely be written with much more clarity than they are now and mistakes like the one described in the above article would be rare.

ur-whale · on June 4, 2019

>It's an extremely, almost exclusively, empirical field.

Fully agree, and it's a necessary disease in young fields like ML (akin to grid search in fact).

But at some point, there will need to be some sort of theoretical foundation brought to bear or advancement will grind down to a halt

And the academic reward mechanism needs to start reflecting that fact.

inimino · on June 4, 2019

I've been quite disappointed to see how many papers from "cutting edge" research groups are chasing small improvements in well-known benchmarks by finding new techniques that happen to work, while there is a lot less effort put into finding out why. I guess Geoff Hinton's explanation about what gets published today explains it.

patrick5415 · on June 4, 2019

Yet there are approximately 0 papers doing that. Flawed or not not, the original paper still beats the socks off your approach on novelty. And (perceived) novelty is the lifeblood of the acedemic.

kaitai · on June 4, 2019

There is a phenomenon in academia I call the "pissing on a lamppost" phenomenon. A well-known researcher mentions his thoughts on (topic) in the penultimate paragraph of a paper on something else. When you talk to them, they say "oh, I don't actually know how to do that, but I think it should work!" So some sucker like me goes & figures out how it actually works, but... BigName has already discussed it so it's not a new result (see citation for penultimate paragraph) so we are not interested in publishing it :)

You can get around it if you show BigName is wrong but if they've been vague enough that's hard.

throwaway287391 · on June 3, 2019

This sort of thing is aggravating to read. Frankly it comes off as really entitled. As researchers, the expectation is now that we not only have to do the research and write a paper like the good old days, but we have to release the code too. Okay, fine. But now that's not enough either -- the code has to be well-documented and clean. Ugh, alright, fine -- it's going to take me a few extra weeks of not doing research, but I'll clean up all the code, rerun experiments to make sure it all still works like before, and add a bunch of documentation. But no, still not good enough -- it has to run at the press of a button in your particular programming environment. If we don't know how to write a script (or couldn't be bothered to spend the time writing one) to check that the data is on disk and, if not, crawl a website to download some huge dataset in one click, test our code on your OS, your CPU/GPU/TPU/..., etc., we were being "insincere" with our open-sourcing efforts.

KirinDave · on June 4, 2019

Pardon me, maybe I just misunderstood the whole idea of research but what good is it if it's not reproducible?

I can understand it may be part of a meaningful personal journey for you, and I appreciate that. But if no one else can validate your research they're correct to discredit it and you.

So what is the optimal outcome here? Should we hold you to a standard of reproducibility even if it is as minimal as, "actually describe your algorithms correctly and don't misrepresent a piece of code and a paper?" Or should everyone just decide you can find your own research funding if it's not going to help anyone?

archgoon · on June 4, 2019

The original idea behind "reproducible" is that the ideas conveyed in the paper should be enough to reproduce the results. Physicists and biologists are not expected to drive over to your lab to figure out what's wrong with your setup.

Now, that said, reproducibility is terrible in many fields. CS has an opportunity to act as a trailblazer here, but it should be noted that this would be holding themselves to a higher standard than their peers in other fields. As a result, there's going to be a learning process for everyone as they figure out how to make this all work. :)

throwawayjava · on June 4, 2019

Some pretty good computer science got done before devops was gifted to the world.

And some pretty good science got done before computer scientists were gifted to the world.

I'm genuinely skeptical that modern software engineering practices are a good way of thinking about reproduction in science. Even in computer science. There's a lot that scientists can learn from software engineering (and in fact I've helped run workshops in the past on exactly this topic), but science is not engineering.

KirinDave · on June 5, 2019

> Some pretty good computer science got done before devops was gifted to the world.

I'm happy to talk about this if you want. One of the most important aspects of this work was that people like Dijkstra started using notions that approached what real computers could read while remaining human-readable. This is some measure of classical "reproducibility". And work like McCarthy's was revolutionary in part because it was a definition of reproducibility as a result!

I can give examples of shockingly good papers that are struggling to see the light of day in their industry because they're written in ways that make them hard not only to understand, but to reproduce.

So don't presume to lecture me about this. Part of the reason the word2vec paper stands out is precisely because this is such a deviation from the norm to have a paper misrepresent its most fundamental component: the algorithm.

KirinDave · on June 5, 2019

Okay, by the standard you laid out: the paper that is the subject of this paper failed.

But similarly, if you say, "Here are our statistical models, they're in a notation for a private, custom MCMC system you can't use" that's probably failing the standard even if the work is good.

seanmcdirmid · on June 4, 2019

If I have to go dig up someone’s code to reproduce their result because I don’t believe/understand the idea in its published form, they’ve already lost me.

Word2vec is a great example of a piece of research that conveys a great idea with plausible results, where in fact, the numbers are less useful than the story behind them. Of course, google is a big corp that can just put out useful interesting stuff without worrying about how to play the science game.

p1esk · on June 4, 2019

Pardon me, maybe I just misunderstood the whole idea of "reproducible" but it is only not reproducible if you tried to implement the ideas described in the paper yourself, and I mean really tried, and contacted the authors for help, and still failed to achieve the claimed results.

throwawayjava · on June 4, 2019

> if you tried to implement the ideas described in the paper yourself, and I mean really tried, and contacted the authors for help, and still failed to achieve the claimed results.

This is the actually important definition of "reproducible".

"has an install.sh" is really nice, but it's far more important that an informed reader can recreate the artifact for themselves from the written description.

The "must be able to apt install or it's not real science" is a particularly dangerous path to go down. It's how you end up with e.g. shit loads of well-engineered LISP or FORTRAN code with no actual scientific insights or knowledge transfer. Which has actually happened in the past. A lot.

telchar · on June 5, 2019

I don't think people in this thread are insisting on a downloadable package. Just that the algorithm published in the write-up accurately describe the work that was actually done to achieve the stated results. If even that threshold can't be met, IMO the authors are polluting the literature with bad work.

derefr · on June 4, 2019

> and contacted the authors for help

Isn’t the whole point of a paper, to obviate needing to talk to the authors? Like, so that science is a ratchet that doesn’t slip backward the moment the authors die?

visarga · on June 4, 2019

Related to w2v, the proof is in the vectors. They work. And, by the way, w2v has had a whole family of variants which were built on top of the original implementation (such as Doc2Vec, Doc2VecC and others).

Related to most ML code, it's not written in C but in Python frameworks that implement whole layers as building blocks. So it's not as hard to read/change. You can take something complex like the transformer and understand it in a few minutes. The most puzzling part with framework code is figuring out the shape of tensors and, sometimes, what each dimension means.

Regarding reproducibility - it's hard to achieve on account of parallelisation. You see, neural nets use floats, and floats are not real numbers. Half the float numbers lie between [-1, 1] and the rest outside. If you do 1e-10+1e10 you get 1e10. So if you add 1000 times 1e-10 to 1e+10 you still get 1e+10, but if you add them up together first, the result is different. So float summation is not commutative. Depending on race conditions the order in which summation occurs could change and the result change as well, even if you use the same random seeds. And if we give up parallelism then we can't run the experiments any more.

Dylan16807 · on June 4, 2019

For almost all ML training, it's straightforward to break it into a fixed number of chunks without race conditions.

That's if float rounding issues are even a big problem in the first place. If your results are within .1% over a few runs it's reproducible enough for most purposes.

throwawayjava · on June 4, 2019

> what good is it if it's not reproducible?

I am one of those people who actually did the extra weeks/months to properly test/review/document/release my code and data sets (you can apt-get install my "research artifacts").

In retrospect, it was a poor use of my time and a poor use of my sponsoring institution's time. "apt-get install" is NOT what we mean by "reproducible" in science.

High-quality or easy to install code is not necessary for a result to be reproducible. True reproduction would mean coding the algorithm from scratch by following the written description, and that's what it literally means in most other fields of science.

You can't download & install a Large Hadron Collider in an afternoon. Does that mean the LHC experiments are "not reproducible"? Of course not.

But that's not even the important point. The really important point is that, in most cases, high-quality code is not even sufficient for a result to be reproducible! See: the article we're discussing.

IMO, the "blindly rerun the code" definition of reproduction is actually a HUGE barrier to creating a true culture of reproducability in computer science. It results in super lazy reviewing where "public source code that's easy to install and puts the correct-sounding shit into STDOUT" becomes a stand-in for "paper actually describes a novel idea in enough detail that it can be truly reproduced".

> So what is the optimal outcome here?

An optimal allocation of scientists' time and effort.

As a scientist who has actually done that leg work, I don't think packaging code so that it runs with a single click is the best use of public money in science in 99.9999% of cases. That time is much better spent on writing and other dissemination explaining the ideas that make the code work (in some cases well-documented source code is the best description but in other cases prose is much more effective and illuminating). Or on coming up with new ideas that are even better than the old ones.

Which I guess is just another way of saying that scientists should spend their time on science, not engineering.

P.S. When shitting on "scientists" for not being good enough software engineers, please remember who's going to be doing the actual work you're demanding. It's mostly phd students who make $30K/yr. And they have to do this work in their free time because their 60 hr/wk day job is fully allocated to doing the actual science. I.e., treat scientists who maintain their code as you would treat FOSS contributors who are making 5x-10x+ less than you while working longer hours. Because maintaining high-quality code is something they are almost certainly doing in their free time.

KirinDave · on June 5, 2019

> High-quality or easy to install code is not necessary for a result to be reproducible. True reproduction would mean coding the algorithm from scratch by following the written description, and that's what it literally means in most other fields of science.

Which failed here. However, you're going to have a difficult time convincing many people that the attributes you described would be bad properties, just that they might not justify their expense.

> You can't download & install a Large Hadron Collider in an afternoon. Does that mean the LHC experiments are "not reproducible"? Of course not.

By the same token though, the LHC repeats experiments and solicits feedback on how to improve their methods, which they go to great lengths to publish and simulate, because they're aware of this problem.

> IMO, the "blindly rerun the code" definition of reproduction is actually a HUGE barrier to creating a true culture of reproducability in computer science. It results in super lazy reviewing where "public source code that's easy to install and puts the correct-sounding shit into STDOUT" becomes a stand-in for "paper actually describes a novel idea in enough detail that it can be truly reproduced".

Ah, yes. Yes. "If this code is TOO reproducible then people might reproduce it, and handwave handwave the quality of papers would decline.

That's certainly NOT the case in pure CS papers, which have only improved since the days when folks felt that "Lenses, Bananas and Barbed Wire" was how folks should go about writing papers.

Now, physics might be different. But there is surely a middle ground between, "I've shipped you a LHC just plug in in lol" and "This paper doesn't even remotely describe how we achived the results."

If you believe that wasting the time of scientists is bad, then surely you're for clear papers with accurate descriptions of the methods so that those who go and reproduce your work are not sent on wild goose chases?

> As a scientist who has actually done that leg work, I don't think packaging code so that it runs with a single click is the best use of public money in science in 99.9999% of cases.

No, we got that part. But surely someone does and maybe you can design your work to leverage that rather than reproducing and discarding scaffolding. My big concern here is that a lot of scientists (like you claim to be) are underqualified and unpracticed at software, and thus are surely seeing at least some aspect of their work distorted by software and hardware issues.

> Which I guess is just another way of saying that scientists should spend their time on science, not engineering.

Scientists are not going to be able to escape engineering. No one else is going to build what they need besides them.

> please remember who's going to be doing the actual work you're demanding. It's mostly phd students who make $30K/yr. And they have to do this work in their free time because their 60 hr/wk day job is fully allocated to doing the actual science.

Yeah, I'm aware. I suspect their lot would be better if your attitude wasn't that their work is disposable and unimportant.

patrick5415 · on June 4, 2019

Reproducible research does not mean you don’t have to do any work. The original author isn’t going to come to your lab and clean your pipettes nor should they. Grow up.

cfcf14 · on June 4, 2019

The research published in this case (the original word2vec paper) is quite literally not what it claims to be. It is factually incorrect. Is that really the responsibility of the individual attempting to reproduce the work to identify and correct?

How wrong should the original work be before it ceases to be a case of 'clean your own pipettes'?

patrick5415 · on June 4, 2019

I’d say at the same point that others in the field would generally agree it’s fradulent. I’m not familiar enough with the field of word2vec to say if this is the case or not. But regardless, asking that the implementation be representative of the paper is a lot different than demanding docker containers and install scripts.

KirinDave · on June 5, 2019

Patrick,

Could you PLEASE consider reading the article we're all discussing before you roll into the comments section of an article about it with your strong-but-loosely-held opinions? You're arguing against a point that almost no one is putting forward (that software should be one click).

liquidify · on June 3, 2019

If there are bugs in your code that would have prevented you from being able to do what you claim to do, then to the readers of your paper, it seems more likely that you might not have performed the experimentation that you claim to.

throwaway287391 · on June 3, 2019

What? This has nothing to do with bugs in the code that would've affected the original author's ability to run the original experiments described in the paper. The user I replied to is talking about things like hard-coded data paths and wanting to be able to run the code at the press of a button on any platform. The expectation for this level of cross-platform compatibility is difficult to achieve for engineering firms with millions to invest in making it work properly. He's mad that individual machine learning researchers don't always do a great job of making it happen for their cutting-edge research which was a difficult enough engineering task to get running on their machine.

gcommer · on June 3, 2019

Reread the original post:

> I have to fix bugs before the code can run. I'm very curious how did the author run that code with the bugs.

throwaway287391 · on June 4, 2019

You're right, I misread that part.

jfoutz · on June 4, 2019

[flagged]

dang · on June 4, 2019

Personal attacks will get you banned here. Would you please review the site guidelines and not post like this again?

https://news.ycombinator.com/newsguidelines.html

jonas21 · on June 4, 2019

FWIW, I downvoted because because you somehow managed to break about half of the HN comment guidelines [1] in a single comment, in particular:

* Be kind. Don't be snarky. Comments should get more thoughtful and substantive, not less, as a topic gets more divisive.

* When disagreeing, please reply to the argument instead of calling names. "That is idiotic; 1 + 1 is 2, not 3" can be shortened to "1 + 1 is 2, not 3."

* Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith.

* Please don't make insinuations about astroturfing. It degrades discussion and is usually mistaken. If you're worried, email us and we'll look at the data.

* Please don't comment about the voting on comments. It never does any good, and it makes boring reading.

[1] https://news.ycombinator.com/newsguidelines.html

throwaway287391 · on June 4, 2019

I don't have much to say to most of this, but I'll point out that I didn't and couldn't have downvoted you as none of my throwaways have >=500 karma or whatever the threshold is.

jfoutz · on June 4, 2019

[flagged]

throwawayjava · on June 4, 2019

Man, I hate when I get dragged into these sorts of intense emotional downs by silly internet conversations with strangers.

FWIW I undowned your posts; internet karma is about communication/moderation, and sometimes has the opposite of the intended effect. I've been there.

Anyways, find some time to pet a dog/cat or fly a kite today.

p1esk · on June 4, 2019

Typically when I post code for my papers, I try to clean it up, because it's an ugly mess that is hard to understand (even myself sometimes). This clean up process might introduce bugs. I'm not going to thoroughly test everything, I have next paper deadline coming up and my adviser could not care less about my code. But, if you're interested in my research and having trouble reproducing my results, email me and I'll help you.

throwawayjava · on June 4, 2019

...and university/company sometimes makes you wipe out the git commit log, so you can't always just use "HEAD at time of submission".

Pro tip: at paper submission time, md5sum your code and also git tag it in your private repo. When you release the code, if you have to / want to release with a clean history, make submission-time your initial commit and then make the current state of the repo your second commit. I've never encountered an institution that won't allow that level of history in the code release, even places that are pretty hard core about wiping pre-release history.

caenorst · on June 4, 2019

And don't you think it would have been more efficient to write it down clean at first with good comments and code practice ? Those good practices are here for a reason and not just for annoying you for the sack of "beauty". You would save more time for yourself when you wanna expand the paper further or just reuse some part of it, have better reputation for your work, and gain some citation because people will have the tendency to use your code as a starting point.

inimino · on June 4, 2019

"Writing it down clean at first" implies that you know what is going to work when you start. If that's the case, it's not research.

patrick5415 · on June 4, 2019

No, it would not be more efficient. I’ve tried this route and it hands down consumes way more time. Until it’s clear that something will get re-used, it’s most efficient to leave it dirty.

The software types love to sht on researchers for their poor quality code. Yet a few threads over, there’s always a discussion about how more tests don’t make it onto the kanban board (or whatever you guys are using these days) when crunch time approaches. It’s not just us.

randcraw · on June 3, 2019

Agreed. If your results are not reproducible by others, the publication should issue a retraction and withdraw your work. Code that works is essential.

dodobirdlord · on June 4, 2019

Nonsense! The "reproducible" in "reproducible result" doesn't mean anything except that what is described in the paper can be reproduced. Code that works is nice and all, but totally irrelevant to the question of reproducibility. Would you claim that that the Higgs paper must be retracted because CERN won't ship you the LHC? They've got ATLAS and CMS to demonstrate reproducibility, and they're wildly different!

Volt · on June 4, 2019

I hope you're a scientist.

Veedrac · on June 3, 2019

If it was just one thing or the other, perhaps he wouldn't have been so critical of it. People are going to understand if you require a few paths to be changed or some files to be downloaded manually into specific relative locations. The ask is only that somebody should be able to use the code without doing their own novel research, and be able to verify it's doing what it is claimed to be doing. If this means your download links need to point to the right location, the code has to run in a location other than /home/hellokitty/ml, and that it needs to meet standards of readability—well, good, that's what the scientific process is about.

caenorst · on June 3, 2019

I have spent enough time to debunk papers to know that a lot of researchers are shamelessly cheating or make mistakes (as EVERYBODY does). Reviewing your code should be part of the peer review process.

Sorry but I don't trust your research just based on yours words and cherry picked images.

Also research is incremental, so producing a proper code on which other can work on top of should be part of the CONTRIBUTION.

ur-whale · on June 4, 2019

I fully agree, and would actually go a lot further.

A very large number of research papers I've read in my life were so poorly written as to being near impossible to understand.

The time it took to infer the actual meaning of certain parts of certain papers was vastly larger than just reading the code would have been.

It has become very clear to me that being a strong CS researcher does not necessarily imply mastery of language or even the very basic ability to explain one's ideas. As a matter of fact, with very few exception (eg Feynmann), the two skills seem to be at odds with one another.

On the other hand, code can be read and understood much more readily, especially when it can be experimented with, if only to insert printfs in it to understand what it does.

The culture of scientific publication in CS must change. The "I'm pressured by my advisor and therefore have no time to publish my code" argument is complete bollocks.

The code should come first and your advisor should pressure you to publish that first (assuming it actually works).

If someone wants to read your poorly worded explanation of how it works, fine, publish a paper.

Show me the code first.

namibj · on June 4, 2019

I'd much prefer some code with a license that doesn't make looking at it a hazard for writing (A)GPL later, than to have to read the paper many times until I figure out how they meant to combine loads of prior research that is only mentioned to exist on the side, without any info on whether it was used in this or not.

Being able to compile in some way and, possibly after some bug-fixing/compiler-pleasing yielding useful results that approach/relate to those in the paper, greatly increases my confidence in the researcher and his work. Pretty code is nice, but it's weighed separately, not affecting the appreciation for the original research work itself. Only the existence of code with an AGPL or less restricted license affects how I perceive the paper/original research.

hoseja · on June 4, 2019

Your paper has to be well documented and clean. The code is often more important than your paper.

ur-whale · on June 4, 2019

You might want to read up on docker.

krageon · on June 4, 2019

I had the great displeasure to work on multiple state-of-the-art algorithms related to sticking things in buckets with varying degrees of optimisation and/or supervision. There are a ton of reasonably recent papers for that, some of them have source code (most of them have nigh-impenetrable theory on why their algorithm is the best) and every case I tested was emphatically much worse than the technologies we've been using for 10+ years in any practical sense.

That problem is compounded with the issues you describe, which I'd categorise as probably the best case. I did not find even a single case of a properly opensourced repository that just compiled somewhere that wasn't the researcher's laptop.

nullc · on June 4, 2019

There appears to be something of a software engineering crisis in academia.

Imagine that you are a graduate student or a newly minted PHD who happens to be pretty good at practical software engineering: You can leave academia for FAANG and a solid six figure income, or you can stay struggling, poorly paid, unable to get tenure in academia. In some groups software engineers are treated as glorified typists, too valuable enabling other people's work to be a principle researcher themselves. I've heard from a number of people that they avoid hiding their programming skills to avoid that trap.

This has many consequences beyond the obvious direct ones like papers with unusable implementations.

Academic work is often valuable even when it isn't practically useful but in many cases where academics are /trying/ to do something practical they fail because their community lacks the engineering experience-- I've seen a fair number of papers presenting optimizations as useful engineering when in reality their approach only makes sense because of inefficient non-programmer tools make a weird set of primitives fast (e.g. using a matrix multiply in matlab where in C you'd simply write a loop). As a practitioner this is frustrating because it sometimes requires re-implementing the approach to realize that it was only 'fast' compared to a pants-on-head-foolish approach.

It also can create essentially fake results.

Many times the lack of software engineering is compensated for by mocking components out in ways that would give faithful results if the researcher understood everything, but the whole point of expirementation is that the researcher doesn't! This mocking is seldom disclosed in papers. For example, I've encountered multiple papers that claimed to implement some enhancement to Bitcoin and tested it in a test bed but in reality they just added sleep()s with appropriate times, guessed out of specious reasoning like "our function X should be 10x slower than a signature", and hardcoded constants (computed in mathmatica or sage) for their messages. ... not realizing that sleep() isn't the best analog for cpu busywork in a multithreaded program!

Another kind of fake result I've encountered which is even less directly a result of the software engineering shortage is in signal processing literature. While working on audio/video compression I found it common for algorithms to be presented without various constants and after reimplementing and asking the authors for their constants I found that they'd been cherrypicked for the ten images used in the paper, and that the whole approach doesn't actually work. This is a kind of ineptitude (or outright dishonesty) that would be much less common in a world where reviewers received a working and usable implementation in source form-- but that can't be expected in a world where qualified software engineering is not readily available to researchers.

I don't have any proposed solutions but I think it's important to acknowledge that it is a common and serious limitation to the usefulness and accuracy of contemporary research.

amelius · on June 4, 2019

I think there should be more software engineering in academia for different reasons too.

In the past academia and government agencies developed the internet and its communication protocols; this was (imho) a much better internet than the corporation-dominated internet we have today.

I'd like to see a resurge of this type of internet. Let us use well-researched open protocols again; and let us use applications which are really built for people, not advertisers. Let corporations build the hardware, but let us keep our data far away from them.

And besides the computer science / software engineering branches, I'd like to see other fields (like industrial design, and for instance even sociology) to join the development of a better digital future. The kind of future where everybody profits, not just shareholders of big companies.

izacus · on June 4, 2019

> Another kind of fake result I've encountered which is even less directly a result of the software engineering shortage is in signal processing literature. While working on audio/video compression I found it common for algorithms to be presented without various constants and after reimplementing and asking the authors for their constants I found that they'd been cherrypicked for the ten images used in the paper, and that the whole approach doesn't actually work. This is a kind of ineptitude (or outright dishonesty) that would be much less common in a world where reviewers received a working and usable implementation in source form-- but that can't be expected in a world where qualified software engineering is not readily available to researchers.

This happened so many times when I was working on NLP/CV algorithms - I've read and implemented many algorithms from papers just to find out that they only produce the amazing improvements on cherrypicked dataset. On pretty much all other practical data the algorithms performed worse and in many cases even crashed!

kaitai · on June 4, 2019

I just wish I could upvote your comment three more times.

To compound the problem, the fact that you're competing against many of your peers for the few jobs there are, and jumping positions every 1-3 years after grad school, means that best practices are not disseminated by working with a stable set of coworkers. This is similar to your 4th paragraph point.

As a mentor of grad students on projects that these days are increasingly using machine learning, I am trying to figure out how to add this stuff to my teaching load, essentially, because the pain of my students using nested for loops when they could just use a single vectorized expression is real... but I myself have no formal CS training......

GuyPostington · on June 3, 2019

Have you been able to reach out to the author and get clarification?

billconan · on June 3, 2019

I tried to file bugs on github, only to find out that people (sometimes 2 years ahead me) was blocked by the same issue and there are no answers.

They didn't even bother to merge my pull requests.

scott_s · on June 4, 2019

There's probably a disconnect in expectations. The researchers are probably publishing the code more as a document, to say, hey, this is how we did this. It sounds like you have expectations that the code is part of an ongoing project. But ongoing projects require maintenance, and the researchers have likely moved on to new things.

IfOnlyYouKnew · on June 4, 2019

https://github.com/jonadsimon/entendrepreneur-web?

utopcell · on June 3, 2019

I think it very unfair to the original set of word2vec papers to be talking about 'academic dishonesty'. This is a case of a user that has little to no experience with neural networks. There are a ton of articles describing the need for random initialization [1][2]. In fact, if one spends a few seconds thinking about it, the need is evident. Without it, the NN cannot perform symmetry breaking: If inputs are set to zero, all neurons will perform the same calculations, rendering the network useless.

[1] google: "neural networks vector initialization" [2] http://deeplearning.ai/ai-notes/initialization/

bollu · on June 3, 2019

That's hardly the point of the article --- the actual paper does not describe the use of two separate vectors for each word. The initialization was an interesting tidbit.

exgrv · on June 3, 2019

Except it does? After Equation 2: "v_w and v'_w are the input and output vector representations of w."

andreasvc · on June 4, 2019

bollu, multiple people have shown that your claim that the paper doesn't match the code is flat out wrong. I think at this point you should issue a retraction of your wildly inappropriate suggestion of academic dishonesty.

utopcell · on June 4, 2019

Of course it does. For word w they are v_w and v'_w on Eq.4 for the case of SGNS [1].

[1] https://papers.nips.cc/paper/5021-distributed-representation...

b_tterc_p · on June 3, 2019

On a similar note, a long time ago I read the Doc2Vec paper, then looked at popular Doc2Vec implementations. They didn’t seem to do the same thing. The paper said you basically make vectors for words, then append on an additional space that represents the additional information of documents as opposed to single words.

All popular implementations I found seemed to put the document vectors into the same space as the word vectors. They also didn’t seem to do any better than a tf-idf weighted average of word vectors... curious if anyone has ever bumped against this.

gojomo · on June 4, 2019

The only code released by the 'Paragraph Vector' paper authors was a small patch, from Mikolov, that added paragraph-vectors to the original `word2vec.c` implementation in a very simple way: treating the 1st token of each line as a special paragraph-vector, still string-named (and allocated in the same lookup dictionary). Only by convention (a special prefix on those paragraph-vector tokens) could collisions with similarly-named word-vectors avoided.

That's a nice minimal way to demo/test the idea, but limited and fragile in other ways. The initial gensim implementation did something similar, then I changed it to use a separate doc-vectors space, to better support a lot of options (including the PV-DM mode with a concatenative input layer – which has never been confirmed to perform as well as the original paper implied).

b_tterc_p · on June 4, 2019

Insightful. Thanks

woliveirajr · on June 4, 2019

This. And my thesis was about how tf-idf and embeddings with deep (deeeeeep) neural networks could be give better results in authorship attribution.

visarga · on June 4, 2019

> embeddings with deep (deeeeeep) neural networks

Embeddings are shallow, and what comes after them is usually less than 5 layers of LSTM, not a deeeeeep neural net (maybe, deep only horizontally, on the axis of words).

slx26 · on June 4, 2019

Just as a curiosity, complementing what others have already written... I read part of Mikolov's thesis and code in the past (when I was still studying at the university, so I might have got everything wrong (I still don't get half of it :D)). First I found it quite shocking that the code was so bad. The training code was pretty confusing to me, and I found the lack of useful comments discouraging. The test code (which loaded stored embeddings from a file and allowed some basic operations) was even much, much worse. Like, declaring three variables (a, b, c) and reusing them for different things in the main functions without explaining anything, and doing linear searches through the whole embeddings to find a word vector... very ugly and scary things.

So, I had a very bad impression of the code. But then, I checked the thesis, and I found it awesome. The amount of tests and implementations the guy made, and how he showed in practice how better results could be achieved in a good number of different setups... I found it really impressive. But such great work paired with such bad code! I was just a CS student, so I found it shocking. Nowadays I realize he was simply focused on a different thing, and the results he obtained were indeed outstanding and talk for themselves.

It's easy to look back and criticise the code, but when you look at the work he did in perspective... it's completely unfair to ask more from him (admittedly, they had time to address some of the issues later, but they probably had better things to do too).

newen · on June 3, 2019

This kind of things happens all the time in academia. The authors are either constrained by space due to paper limitations or they are too lazy to explain all the little details that go into the algorithm.

I used to do research in computer vision a few years ago and it used to be that people won't publish their code and they purposely won't put in all of the details of the algorithm in the paper. Many of those algorithms were patent pending and I assume the authors were hoping to make some money from the patents. Compared to that, it's a lot better nowadays where most of the popular papers come with published code.

bollu · on June 3, 2019

Is this really that common? That's disheartening, I want to spend time in academia but experiences like this are sucking the fun out for me...

austinjp · on June 3, 2019

Academia isn't flawless, it's no paradise. It's just humans and their factions, with all the good and bad that brings.

> The authors are ... constrained by space due to paper limitations

This is very real. Different journals have different criteria: word count; formatting; the number of tables you're permitted to include; etc. It's archaic and daft but that's the truth of it. And that's before we even get started on the really bonkers stuff like author order, impact factor, reviewer workload vs lack of pay, publish or perish, and so on.

nmstoker · on June 3, 2019

Yes, this was a huge disappointment when I read Chemistry & Physics years ago. The naive view of papers was that they were to move human knowledge forward, but it became clear they were an elaborate knowledge withholding device!

Fortunately the trend seems to be towards better levels of openness, but it varies by subject. Stories like the BASF team being unable to reproduce vast numbers of established published techniques are way too common.

toast0 · on June 3, 2019

I tried to make use of some public audio research and it was pretty bad. There was an audio comprehensibility competition a few years ago. Some of the papers submitted are still around, as well as the summary paper describing the results. But many papers are hard to find, and those that claimed to have source code available are hard to find --- i was able to get matlab sources for a few algorithms, but they somehow work on the example files, but mostly crash on my files.

It's a shame because I understand the idea of the paper, and have an excellent place to apply it, but I lack the DSP background, so I can't really rebuild the code from scratch -- so the work is not able to be used.

DoctorOetker · on June 5, 2019

this sounds interesting, would you care to reference the paper in question?

toast0 · on June 5, 2019

I'm not sure if I can find the exact paper anymore. This was in response to the Hurricane Challenge, a summary of results is available [1]. I tried to use code for uwSSDRCt available from the legacy page of the conference [2], under the link "Live and recorded speech modifier", direct download here [3].

The basic context is verification code delivery -- I'm playing pre-recorded samples of numbers to users, and can't control or sample the noise (either transmission or environmental), but would like to enhance intelligibility to reduce user effort, improve experience, and reduce costs.

[1] https://www.research.ed.ac.uk/portal/files/17887878/Cooke_et...

[2] https://web.archive.org/web/20131012005150/http://listening-...

[3] http://www.laslab.org/resources/LISTA/code/D4.3.zip

gubbrora · on June 3, 2019

Yes it's common that a lot of "little details" are left out. Yes it does make things harder. No it's not a big enough problem that you need to be disheartened. You could even see it as an opportunity to raise the bar and stand out positively.

gameswithgo · on June 3, 2019

welcome to earth. we apologize for the mess.

chriskanan · on June 3, 2019

Completely agree. The situation today is far better than a decade ago, with code releases for machine learning and computer vision papers being much more common than before.

I try to make my students release polished and easy to use code, but it can sometimes fall through the cracks due to deadlines, etc. Many projects are the output of a single PhD student.

newen · on June 4, 2019

Yep, it's so much better now in computer vision with early publishing in arxiv and published code. I feel like that is one of the reasons why research in CV is progressing so fast.

Also, I think it is something of a fact of life that you can't put all of the details of your algorithm in the paper, not just paper length limitations. I have actually tried to do this in two papers by putting all of the details in the supplements and the work to explain all the details and justify my choice of parameters and decisions for edge cases is almost as hard as writing the main paper. It becomes hard to justify the time spent pretty fast. Also, putting this in the main paper makes your beautiful explanations be tinged with edge cases and digressions haha.

izacus · on June 4, 2019

Even worse, when actually researching CV, it seems that leaving out the details can even be deliberate in papers - to hide the fact that the authors tuned and cherry picked input datasets so they show the improvements they claim. Actually implementing the code and running on more standard datasets tends to quickly put results of many papers into question.

gojomo · on June 4, 2019

They're not really that different.

There's only a second vector for a word in the (common, default) negative-sampling case, where each predictable word has a distinct "output node" of the neural network, and the second vector is the in-weights to that one node. Still, most implementations don't emphasize this vector – the classic "word-vector" is a word's representation when it's a neural-network input. And in the hierarchical-softmax training mode, there's no clear second vector.

I suspect the original word2vec authors left out a clearer description of the initialization as they were following some oft-assumed practices implied by their other descriptions.

Another minor difference between the literal descriptions, and original C implementation, was a slightly different looping order in skip-gram training: holding a target-word, and then looping over all context-words, rather than holding a context-word, then looping over all neighboring target-words. One of the authors once mentioned that the shipped approach was slightly more efficient – maybe it was due to CPU cache issues? In any case all the same context->target pairs get trained either way, just in a slightly different order.

eggie5 · on June 3, 2019

instead of thinking about what it is in practice: skip-gram negative sampling, I think it's much more intuitive to think about what it is in theory: extreme multi-class classification.

word2vec is a multi-class classification problem with a softmax output layer and cross-entropy loss. The novel part of word2vec, in my opinion, is two:

1. dataset (proximal input word & output word) generation from documents eg: skiagram, CBOW, etc 2. engineering speedup for softmax: Approximate Softmax eg Negative Sampling using NCE, hierarchal softmax, etc

If you just build word2vec w/o step 2, it's a easier to understand. Then when you get that working, add in the negative sampling speedup trick which isn't core the theoretical algorithm.

utopcell · on June 4, 2019

Can't really call it a speedup trick, since it actually improves the performance of the embeddings but in terms of qualitative understanding, I see where you're coming from.

xxxpupugo · on June 3, 2019

The title reads to me like hyperbole.

The implementation can differ, they got time to refactor/optimize it after the publication. But they can't probably revise the paper itself. As long as the code is there and can produce said/better result, then it is probably your responsibility to keep the differences in check.

It is actually quite common for deep learning papers overall, the github repo gets updated after the paper is out, and you will find the divergence lying there.

MichaelStaniek · on June 3, 2019

My intuition for that, and you can tell me if its wrong.

The normal explanation for Word2Vec is 2 weight matrices, so the formula looks like this: (One_hot_input x W1) x W2, which is then softmaxed.

W1 then is the matrix that contain our focus embedding from, but if we only evaluate specific words on the target side, then W2 are actually our context embeddings, and the normal multiplication then is focus_w x context_w.

Am I wrong?

bollu · on June 3, 2019

Now it's `one_hot_focus x W1 x (one_hot_context x W2)^T`. So we still pick one row of the matrix from the focus and context embeddings, but they're separate embeddings.

MichaelStaniek · on June 3, 2019

Yes, but thats also what happens in the normal formulation, no? So the second weight matrix actually are our context embeddings?

avn2109 · on June 3, 2019

New user "MichaelStaniek" has a grammatical, relevant, good-faith sibling comment to this which is (imho) inexplicably banned, almost certainly due to a mistake, and I hope somebody will unban him.

lelf · on June 3, 2019

It’s patented BTW — https://patents.google.com/patent/US9037464B1/en

rurban · on June 4, 2019

Turns the patent only describes the paper, but not the implementation. Great, and somewhat ironic

skythomas · on June 3, 2019

Grossly unfair title