Hacker Newsnew | past | comments | ask | show | jobs | submit | muds's commentslogin

Much of your argument rests on refuting the notion that the author feels "entitled" to a high-paying job. In that point, I agree with you. Any engineering undertaking is most productive when it is a meritocratic and competitive pursuit. People that feel "entitled" to an engineering job unfortunately need a reality check on their true competitiveness.

However, that doesn't seem like the authors core point. The authors' core point here is that they feel that the level of competition is past the point where their meritocratic achievements have any weight because to be competitive in the present marketplace, they need to either (1) inherently be _born_ in a different country with a low cost of living, (2) give up certain basic freedoms, (3) settle for a less skillful job where they can be an outlier in the distribution (for how long?) etc. -- all of which, to them, feel less meritocratic.

Of course, they might also feel "entitled" to a job, but that's not the interesting part of their argument (at least to me).


For the last decades being born in the western world has been an advantage. There is an irony in it beeing gradually reversed.


Putting papers and code on arXiv shouldn't be punished. The incentive to do this is to protect your idea from getting scooped, and also to inform your close community on interesting problems that you're working on and get feedback. ArXiv is meant for work in progress ideas that won't necessarily stand the peer review process, but this isn't really acknowledged properly on social media. I highly doubt the Twitter storm would have been this intense if the twitter posts explicitly acknowledged this as a "Draft publication which hints as X." But I admit that pointing fingers at nobody in general and social media specifically is a pretty lazy solution.

The takeaway IMO seems to be to prepend the abstract with a clear disclaimer sentence conveying the uncertainty of the research in question. For instance, adding a clear "WORKING DRAFT: ..." in the abstract section.


I think you missed the point that data needs to be collected and presented ethically. It's not about it being a work in progress and not peer reviewed.


I agree that the data collection process wasn't ethical, and the professor should definitely be reprimanded for that. It's extremely sad that the coauthors weren't aware of this as well. And I feel terrible for the undergrads: their first research experience was publically rebuked for no fault of their own.

However, there is no shortage of projects with sketchy data collection methodologies on arXiv that haven't received this amount of attention. The point of putting stuff on arXiv _is_ that the paper will not pass / has not passed peer review in its current form! I might even call arXiv a safe space to publish ideas. We all benefit from this: a lot of interesting papers are only available on arxiv v.s. being shared between specific labs.

I'm concerned that this fiasco was enabled by this new paradigm in AI social media reporting, where a project's findings are amplified and all the degrees of uncertainty are repressed. And I'm honestly not sure how to best deal with this other than either amplifying the uncertainty and jankyness in the paper itself to an annoyingly noticeable level, or just going back to the old way of privately sharing ideas.

Maybe this is the best case scenario for these sorts of papers? They pushed a paper on a public journal, and got a public "peer review" of the paper. Turns out the community voted "strong reject;" and it also turns out that the stakes for public rejection are (uncomfortably, IMO) higher than for a normal rejection. Maybe this causes the researchers to only publically release better research, or (more likely) this causes the researchers to privately release all future papers.


I agree with the ideal you're speaking of, but the lecturer that uploaded the paper was promoting it on Twitter in a way that really isn't consistent with that sort of intent. This entire scenario seems like a failed attempt to market his own academic brand, without care for the underlying scientific content. It's not a genuine sharing of an unfinished idea or early observation.


he didnt have the permission to post the data. a similar protest would have been made if he had posted the data on github, stackoverflow, reddit, or hackernews, and none of those are as peer reviewed as academic journals


It was a "copyright" issue, nothing to do with "ethics".


> The incentive to do this is to protect your idea from getting scooped, and

The other side is flag planting with half-baked ideas and results.


the official stance of arXiv is that it is intended for papers that are finished and ready for submission to peer review, i.e. the authors believe it's finished and publishable.

in a practical sense, most people don't think of it this way though. putting something on arXiv means "i want people to be able to cite this, and for it to show up on Google Scholar", which might include works in progress, short notes, or lots of other things that don't fit the official criteria.


I'm not sure what to make of this post. There is always a degree of uncertainty with the experimental design and it's not surprising that there are a couple of buggy questions. Imagenet (one of the most famous CV datases) at this point is known to have many such buggy answers. What is surprising is the hearsay that plays out on social media that blows the proportion of the results out of the water and leads to opinion pieces like these targeting the authors instead.

Most of the damning claims in the conclusion section (Obligatory: I haven't read the paper entirely, just skimmed it.) usually get ironed out in the final deadline run by the advisors anyway. I'm assuming this is a draft paper for the EMNLP deadline this coming Friday published on arxiv. So this paper hasn't even gone through the peer review process yet.


ImageNet has five orders of magnitude more answers, which I would assume makes QA a completely different category of problem.

The authors could probably have carefully review all ~300 of their questions. If they couldn't they could have just reduced their sample size to say 50.


I admit that Imagenet isn't the best analogy here. But I'm pretty confident that this data cleaning issue would be caught in peer review. The biggest issue which I still don't understand was the removal of the test set. That was bad practice on the authors' part.


In general with evaluations LLVMs I keep seeing issues that would be caught by a human carefully looking at the results. To a nonexpert who occasionally peeks at things it seems like having a small dataset that's just slightly too big for you to manually review is a bad practice.

It also seems like 100% accuracy should have raised red flags, especially if you know your dataset isn't perfectly cleaned.


I've been struggling with keeping track of research experiments and code at the same time. This seems pretty cool! I like how this method is language agnostic and uses "matured" tools. Question: I'd love to give this a try; do you have any public code snippets?


https://en.wikipedia.org/wiki/Axiom_(computer_algebra_system...

This is (most of) the source code for Axiom. The code is extracted from the latex for these pdfs.


FYI: axiom-developer.org/ seems to be down


Respectfully, I call bullshit. You can't quantify success in a PhD with a single variable. There are a billion ways research can go right or wrong -- irrespective of your personal pedigree. Your ideas might be too early or late for your community to grasp, maybe you appeal to the wrong audience, maybe you're unaware of an application of your research, maybe you don't have the right set of collaborators or need a perspective that, often times, emerges out of a lucky encounters with someone. I respect your experience but I don't want it to give people the wrong idea about research success...

> Paper authorship had NO correlation with research success

I think paper authorship demonstrates that you're willing to put in a non-trivial amount of work to persue a problem. That seems to be atleast one attractive skill in a PhD, wouldn't you agree?


Uhm, what exactly are you trying to get at? I said subject GRE is a very good measure of eventual success in academia, do you have a solid response to that or just a rambling tirade?

Paper authorship if the student is the first author shows grit and “gumption” I suppose? As if that’s what’s needed in academia at this moment (it’s important but not the main requirement). But almost no undergrad gets a first author paper. They get mentioned in the middle because they ran a bunch of sds gels. I wasn’t even interested in trying to become a professor and I got 10 papers before I finished my PhD, do you know how many I (or any of the folks I actually know who are now professors) had during our undergrad? Zero. And not for lack of trying. You know who actually got papers? The son of the department head.


The original comment, to me, reads more like "subject GRE is a definitive measure of eventual success in academia." I was arguing against the definitive part. Thanks for the clarification. Maybe it might be a good measure for your cohort, you, and people in similar situations.

> Almost no undergrad gets a first author paper

Maybe this is different in different fields but we have a lot of undergraduate first author papers in programming languages and machine learning. I mean -- through and through -- undergraduate students bringing up a topic, getting guidance from professors and senior PhD students, getting results by the end of the semester, and publishing results by next year. Even the people who end up "running the sds slides" either fall out by next year or end up working towards their own first author publications. I've always chalked this up to the experimental setup cost being very cheap in CS compared to in the "hard sciences" so most undergraduate students are already comfortable with all the tools they need to do research.

> I wasn't interested in trying to become a professor

I think this is precisely the variable that a standardized test cannot account for! I feel an "authentic" undergraduate research experience is successful if it helps students realize if research is right for them or not.

> ... papers ... the son of the department head...

I see where your frustration is stemming from. Sorry this was your first experience with undergraduate research.


I dunno, my snarky answer would probably be: Just as if they'd test for A, but you actually need to be good at B.

I've not been to any school in the US and I don't have a PhD, so all these acronyms like GPA only have a vague meaning to me. But as someone who kinda disliked school and didn't put much effort in it, I can see how school grades can be a bad proxy for university. I didn't even do that well in math in school, but I managed to go above and beyond with all my math classes for my CS degree. I'm not completely sure how the admission for PhDs work and if they look at your BSc/MSc tests... but again, I guess doing ANY work related to finishing a paper is probably closer to what you'll be doing later than e.g. your grades in database stuff when you'll be researching programming languages...


You can't quantify success in a PhD with a single variable.

Of course, but I'd bet the farm there's a very strong correlation!


I have 20+ years experience in academic research.

Connections and hustle are very strong predictors of success.

Test scores do indicate ability, and ability is helpful but far from sufficient.


Connections and hustle are stuff you can learn. At least enough to become a professor. Most of my nerd friends who became professors learned these during their PhDs. I didn’t say they became famous professors just folks doing decent research in different corners of academia. I’ll still get my house in the subject GRE as the single most important metric if I were to choose a grad student. Not that I’m in that game anymore of course.


Can you clarify for me, a non-native speaker, what you mean with hustle here? Energetic activities or fraud? (I guess the first, but I also heard of a lot of academic frauds)


I find differentiable programming languages really fascinating. Think about this: a differentiable programming language is still a programming language. If the language is designed to facilitate a smooth optimization landscape, it's actually possible to "learn" programs with gradient descent. This opens the door to a lot of cool possibilities:

- programming languages which use neural networks as primitive functions (think `result = sum([mlp(input) for input in list])`. NN's are (understandably) notoriously bad at learning simple operators [1]. Differentiable programming over a language defined by aggregation functions (map/fold/sum/mean/etc.) allows us to bypass learning some simple functions.

- Flipping this around, we can use neural networks that use differentiable programs to regularize the outputs. Assume we have a NN that learns the speed of a car from a video. We know that a car's speed cannot exceed (say) 200mph. Make a differentiable program to express this and use it to regularize the output of the network.

- Reusing the image->NN->speed example again, use the differentiable program to identify speeds/conditions where using a neural network policy is unsafe and switch to a (less-performant) handmade policy instead.

Some more thoughts about this: https://atharvas.prose.sh/differentiable_dsls

[1] https://dselsam.github.io/posts/2018-09-16-neural-networks-o...


Two points,

(1) everything which makes programs useful is impure device access and state change, discretely sequenced over time

(2) grad. desc. et al. do not learn discrete constraints (hence why NNs are bad at learning operators: they cant. x+x is defined fa. x; not fa x. in the training set).


> everything which makes programs useful is impure device access and state change, discretely sequenced over time

I haven't heard about this before actually. I'd love to hear more about this! "Impure," here, is PL terminology for functions that affect global state/arguments when you run them. right? So, brainstorming a bit, what this means is that making a diff. programming language that treats a NN module as a pure function won't actually be beneficial? I'm not sure if I'm drawing the correct conclusion but this is a really interesting point. Don't have an answer for this (yet!).

> grad. desc. et al. do not learn discrete constraints

Great Point! To push back a little on this. You're right that any discrete constraint will always mess up the smoothness of the function (eg: less-than-g is not smooth at x=g). However, we can engineer our way around this by relaxing a discrete constraint to its closest smooth approximation! So, we can implement the less-than-g function as a sigmoid that is shifted by +/-g. This introduces a parameter to control the slope of the sigmoid. In practice, I haven't had much difficulty learning programs even with a really steep slope for the sigmoid.


(1) Yes, the modern ML/AI lot seem to ambiguously use a purely mathematical meaning to "computer" -- which is useless. As useless as any pure mathematics. If we only had this a "computer" would be a theoretical curiosity, like a 200-dim sphere.

The real-world computers we care about run algorithms whose semantics is given by the properties of the devices real computers use. This double meaning to "computer" has caused a lot of superstition in the ML/AI space.

Real computers are engineering devices which shuffle electrical signals around to useful devices.

There is no reason to think that "pure algorithms" have any use at all, as with, eg., a 200-dim sphere. They're only useful if they can be given a semantics which exploits useful properties of devices. (cf. with physics, where a 200-dim sphere could be useful if it models some actual system).

(2) This isn't enough. Consider learning the rules of chess; or likewise, the inference rules of mathematics. f(x) = 2x^2, f'(x) = 4x, etc.

Search spaces constructed for a grad. desc. search are very infinite; and the solutions we need are infinitely precise. Discrete approaches to search(ing for solutions) are necessary.


> As useless as any pure mathematics

Are you hearing yourself talk? Do you know why you have (to just name one example out of many) thousands of pictures on your phone, and not just a few? Because of pure mathematics. Because of compression: Even JPEG2000 from back in the day uses intricate and beautiful compression algorithms based on wavelets.


I think this is partially true; there is some support for logical statements and control flow in differentiable programs -- at least in Jax. Further, think Deep Mind have a recent paper on a DL sequence learning methodology able to learn control strategies for lots of games simultaneously. I think this is a good example of learning discrete constraints with an NN.


Two counter-points (appreciate some counter-counter-points :):

1) The discrete sequencing is an epiphenomenon. The underlying processes are continuous changes in voltage and current flows. (I'm not sure if Planck scale considerations can throw a wrench in this though. Would love to be educated here.)

2) Our brains do not have ostensibly discrete neural processors. I don't think gradient descent is comparable to how the brain learns, but I think there is some reason to think that it is possible to learn symbolic processes in spite of having a processor that isn't especially built for it.


You're making a genetic fallacy here: that since the origin/ground of something has property C, it's product must have it too. This isnt so.

I agree that reality is fundamentally continuous. However cognition isnt; and many things arent.

A frequency is discrete. A length is continuous. These properties aren't eliminable for one another.

Here, whilst i'd agree that all physical process going on (everywhere) have essential continuous properties; they also have essential discrete ones. The issue is that grad. desc. alone does not give you the right kind of discrete ones.


I don't see how nuclear is any more dangerous than any other renewable sources of electricity. The law of conversation of energy dictates that the residual waste energy has to go somewhere for any process. Any and all energy production mechanism are going to cause some disturbance in the natural order of things. It just turns out that the emissions due to "renewable" sources isn't something that does widespread ecological or societal damage. A hydroelectricity plant's "emissions" are in the form of riverbed erosion however, we can optimize around that by putting it in places which minimizes ecological and societal damage. For nuclear, we minimize this damage by placing the waste matter in large indestructible concrete bins.


Maybe because of the potential toxicity of the waste product, and the lifetime of this toxicity?

storing it safely is a bit more complex than "large indestructible concrete bins"

nb. I'm not anti-nuke, just pro good risk analysis


Awesome work guys! A couple of knee jerk reactions while playing around with this:

1. In my work (also at UT actually: Hook 'em), we've found that the hallucination problem is, in part, lessened by over-parametrizing the model. Places that have the budget to do this have noticed that the performance of ml4code transformers increases linearly for every 1e3 increase in the number of parameters (with no drop off in sight). Love to hear your thoughts on this.

2. I'm concerned that finding code snippets from a short form query is underspecifing the problem too much and may not be the best user-interaction model. Let's compare your system to something like Github Copilot. I pass a query:

> how to normalize the rows of a tensor pytorch

With GitHub Copilot, I can demonstrate intent in the development environment itself with an IO example / comment / both and interact more efficiently. If I see errors in the synthesized snippet, I can change the query in >1 second etc. Etc. This is hard with a search engine style interactive environment. For this query, I had to navigate to the website, type in the query, check the results (which were wrong for me btw. Y'all might need to check correctness of the snippets), copy back the result, maybe go to the relevant thread and parse more closely etc. A good question to keep in mind here would be to figure out how to make this process more interactive.

3. Finally, I just want to say that the website is phenomenal, even on mobile. Kudos on the frontend/backend/architecture side of things.

Also, don't let my or anyone else's comments take away from the awesome work y'all have done!!! I pulled out that example from a paper I read recently called TF-coder. They have a dataset of these examples as part of their supplement material. All the best!


It can be really tempting to think about research progression on a "linear" timescale but more often than not it eventually ends up following an "exponential" curve because of technical debt. And there appears to be a _lot_ of techniques used here which we don't fully understand.

I wouldn't be surprised if a specifically engineered system ten years from now wins an ICPC gold medal but I'm pretty sure that a general purpose specification -> code synthesizer that would actually threaten software engineering would require us to settle a lot of technical debts first -- especially in the area of verifying code/text generation using large language models.


Hmmm. I can't get the model to recognize a damped sinusoidal wave (10, 0, -10, 0, -5, 0, 5, ...). Does the model have the capacity to express such a function?

An equation is available here: https://en.wikipedia.org/wiki/Damping

Pretty neat otherwise!! I especially love the interface. I wonder if there is a plug and play framework for deploying pytorch models on a website.

EDIT: They seem to be using https://streamlit.io . Seems like a neat tool.


Looks like it fails quite hard for "natural" sequences. I input the Bitcoin price and I got some ridiculous zigzag around a constant instead of say, a power law fit.


Not really a recurrence relation. Perhaps try it on Prophet:

https://facebook.github.io/prophet/docs/quick_start.html


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: