Hacker News new | past | comments | ask | show | jobs | submit | i_love_limes's comments login

Great idea! The scrolling is a bit broken though. I can't scroll down the text on my trackpad without it going to the next article, and also it just a lot of articles with a tiny movement of the trackpad.

Otherwise it's a really fun idea! Can I suggest you also scrape from https://www.medrxiv.org/? This is where a lot of medical research preprints, not arxiv


I added now the MedRxiv API, you can try it out and tell me if you notice something weird.

I saw that medrxiv has an API so I'll add it too!

About the behaviour of the scroll it's a bit buggy haha but I'll try to fix that, thanks for the feedback.


I'm having a hard time googling the book you mentioned. Is it by any chance called 'They Thought They Were Free: The Germans 1933-45'? That's the closest thing I could find.

https://en.wikipedia.org/wiki/They_Thought_They_Were_Free


Yes. "They thought they were free" is the main title. "The Germans 1933-1945" is the subtitle.

So, it is like that, but it's also not. A couple decades ago, with the advent of GWASes, because of all the multiple testing that was going on, there was an agreed new p-value threshold p < 5e-8. This was to account for multiple testing going on (how that number came to be requires more explanation of LD + other things).

That is the minimum threshold. This study found that peak was at p < 1e-37 or so. But that is where the biological analysis begins. Unlike social scientists, we don't stop with the statistical correlation, we then go on to look at what we know about that gene, the type of mutation, if it's a loss or gain of function, what role that gene has in various tissues, etc. And mendelian randomization is another way to unpick the causal direction of effect.

Not to say this is the truth or causal, but it's a lot closer to causal than what you are implying.


In other words, there is a likelihood of 1/10,000,000,000,000,000,000,000,000,000,000,000,000 that the statistical result was due to random chance.

In more casual sciences p < 0.05 is considered the limit of significance, i.e. less than 1/20 likelihood of statistical testing favoring the tested hypothesis over the null hypothesis due to random chance


If you are testing a single hypothesis, sure. But nowadays, statistics training really weighs on bonferroni correction, or other methods to deal with the issue raised by the above-referenced XKCD, whentesting multiple hypotheses.

> This study found that peak was at p < 1e-37 or so.

If true, this would be cause for someone to read through the study to check there are no maths errors, and if it holds up then to take action immediately.

This isn't a "wait for more science to come in and confirm" type thing.


> This study found that peak was at p < 1e-37

This is many orders of magnitudes better confidence than any physics experiment, it feels unlikely a biological result can even be this strong, so it makes it sound like a statistical error.


Bioinformatician here. These kinds of p-values are common in these kinds of experiments (GWAS, or association studies), and happens almost automatically once you get enough statistical power. The big problem is that once you have so much statistical power, you get very small p-values from small effects, and then the often-overlooked assumptions behind frequentist statistics begins to matter. Are your samples _really_ idenpendent and identically distributed values? No. Are they really normal distributed? No.

Also, things like gut microbiome and depression are linked through what some people call the _crud factor_, which are weak correlations between nearly all social aspects. For example, probably depressed people eat differently from non-depressed people, causing changes in their gut microbiome. Probably, there are variations in human population's depression rates and obesity rates (correlated with gut microbiome) that somewhat correlate. When you have enough statistical power, you see the crud factor everywhere.


I really appreciate your explanation regarding the crud factor. It adds a lot of intuition.

Out of curiosity, what is your perspective on gnotobiotic systems? I distinctly recall an example of a gnotobiotic mouse that, upon being provided a microbial sample from an obese mouse, started to have substantial increases in body fat despite a simultaneous reduction in feed. Would that type of experimental approach still run into the statistical difficulties you mentioned?


Well no. It's not uncommon for p-values to be even lower than that. We are talking about a specific SNP (allele) having a specific mutation being predictive of a phenotype / outcome.

So, a specific SNP mutation being predictive of a gene expression / protein is basically a p-value of 0.

Can't speak for physics experiments, but this is almost certainly not a statistical error


I have never heard of FireDucks! I'm curious if anyone else here has used it. Polars is nice, but it's not totally compatible. It would be interesting how much faster it is for more complex calculations


Other high ranking military officers that have worked closely with Trump disagree. I might be inclined to believe them over you, unless you've also worked with Trump? Or are you just someone that he would call a 'sucker'?

https://www.nytimes.com/2024/10/22/us/politics/john-kelly-tr...


Yes you should listen to an actual grift and live in fear.

People like me won’t. You not being able to resonate is what makes you and I different - and one of us capable of defending freedom and the other not.


Ah, so he's a person who built and fought for democracy, but not the right person who built and fought for democracy.

And it's really not hard to find more veterans supporting Harris; just the top two search results:

https://commondefense.us/vets-for-harris

https://votevets.org/press-releases/votevets-makes-historic-...


> and one of us capable of defending freedom and the other not.

Did you just imply that these high ranking military officers are not the ones actually defending everyone's freedoms?

Please stop with the talking points and actually think about what you are repeating again and again.


I have a question that hopefully a molecular biologist can answer. Can tools like this potentially create protein structures that specifically bind in certain cells? Or is this more about a way of being able to create proteins for genes / structures we haven't been able to before?

I'm very interested in my research at the moment in pleiotropy, namely mapping pleiotropic effects in as many *omics/QTL measurements and complex traits as possible. This is really helpful for determining which genes / proteins to focus on for drug development.

The problem with drugs is in fact pleiotropy! A single protein can do quite a lot of things in your body, either through a causal downstream mechanism (vertical pleiotropy), or seemingly independent processes (horizontal). This limits a lot of possible drug target as the side-effect / detrimental effect may be too large.

So, if these tools can create ultra specific protein structures that somehow only bind in the areas of interest, then that would be a truly massive breakthrough.


For anyone who would like to know more about designing proteins with a certain function, target, or structure in mind, the term to search for is "rational design."

https://en.m.wikipedia.org/wiki/Rational_design


Thank you for this, terms of art are the silent gatekeepers...


As an aside, learning the precise terms for concepts in fields in which I'm a layperson (or simply have some cobwebs to shake loose)--and then exploring those terms more--is something that I've found LLMs extraordinarily useful for.


Also "off target effects".


This research is focused on modeling individual protein binding sites. Pleiotropic effects and off-target side effects are caused by interactions beyond the individual binding sites. So I don't think this tool by itself will be able to design a protein that acts in the way you describe (and that's putting aside the delivery concerns - how do you get the protein to the right compartment inside the cell?).

But novel binding domain design could be combined with other tools to achieve this effect. You could imagine engineering a lipid nanoparticle coated in antibodies specific to cell types that express particular surface proteins. So you might use this tool to design both the antibody binding domain on the vector and also the protein encoded by the payload mRNA. Not all cell types can be reached and addressed this way, but many can.


Yes, in principle but there are huge limitations and challenges to using a protein as a drug in living organisms. It has to be injected to avoid digestion, and a protein can't just pass into a cell, it needs to get in somehow. Current peptide drugs like insulin are identical to, or closely mimic natural small peptide hormones that bind to receptors on the outside of a cell. However, there is a possibility of using gene therapy to directly express a novel protein drug inside of the cell. A novel protein is also likely to trigger an immune response- so that type of gene therapy is mostly useful when that is actually desired, e.g. as a vaccine.


they can generate proteins that bind to specific structures with high accuracy, achieving true cell-specificity and avoiding unwanted pleiotropic effects involves many more variables beyond just protein-protein interactions. These tools are more about expanding our ability to target previously "undruggable" proteins rather than solving the cell-specificity problem outright. however they could be valuable components in developing more targeted therapies when combined with comprehensive research on pleiotropic effects across multiple omics levels. real breakthrough will come from integrating these protein design capabilities with a deeper understanding of complex biological systems and developing strategies for precise delivery and regulation of these novel proteins in vivo.


Not an expert, but you could imagine a protein with two receptors that are required for activation. One of them binds to a protein that is only present in the cells of interest, and the other one binds to the actual target.


[flagged]


What LLM wrote this??


So I write a completely defensible rant full of truly interesting and well-informed perspective, get downvoted, and then get accused of being an LLM.

A perspective I'm sure the down-voters have zero cred to doubt. Style issues aside, I don't think there's a serious molecular biologist on the planet who would take issue with the actual gist of what I said.

Pleiotropy: a thing happening can cause more than one other thing to happen. We really need jargon to keep that in mind?

An LLM? This is what I get for writing with passion? Creatively? Daring to play with words? I'm an LLM? For writing anything that doesn't fit your norms? Wow.

How do I get MORE downvotes? They seem like badges of honor in this case.


I suggest you read 'Study Size' under 'Methods'. Detecting an odds ratio of 1.3 with 80% power, and rounding way up to 3000 cases. To me that's not a small sample size, but definitely not the highest OR to aim for. Different cases vs. controls is not a problem for this study design. You pointing that out as a negative makes me think you might not know as much about epi study designs as your comment lets on


You are trying to justify one strand of unethical behaviour (how humans have bred dogs) with another. Not sure that holds up.


Can you unpack why you think breeding dogs - by which I take you to mean all dog breeding, not just the kind that leads to serious physiological issues such as breathing difficulties in pugs - is unethical?


>how humans have bred dogs [was unethical]

This is far from being a consensus, I think.


I'm not sure if I see this line of reasoning. Nature has countless symbionts, parasites etc, are those all unethical?


My ethical framework agrees with you, but ethics, like everything, are relative.


For those not in Academia, it's worth noting that when you get to his level, it's more like being a VP or senior leader. Ultimately, yes you are responsible for the quality of output of your team, but you are absolutely not looking in detail at every paper, more the ideation and focus of the department.

He should be grilling his assistant profs / research fellows to get their act together and raise the bar, but this doesn't show malfeasance.


Important point, thank you.

It doesn't mean we shouldn't work to fix the systemic issue though. It may not he his "fault", but how do we improve our system as a soceity so we put ourself on a upward trajectory not the seemingly downward trajectory we've been on lately.


I'm glad the GP pointed this out, because nobody talks about it. The "PI" of a lab is manager, so these issues are like an accountant embezzling money without the manager knowing. Maybe the manage should have put better processes and monitoring in place, but they were trusting their team members to behave properly, which is not unreasonable IMO.

The real issue is that the people who conduct the fraud are usually grad students or postdocs, whose entire future depends on the success of their research project. Fake results are pretty much guaranteed.

Think about it this way: Imagine you have a lab group with 10 PhD students, each with their own hypothesis to investigate (e.g. intervention A will reduce the rates of disease B in mice). What are the odds that all 10 students will prove their hypothesis and generate publishable results? No way it's 10/10 obviously...it's more like 5/10. So what is supposed to happen to those 5 students that were tasked with investigating the bad hypotheses? Our current system implicitly penalizes these students to the extent that their careers in academia are over, and even earning their PhD is not certain. BTW, I know I am being overly general here, and the student will likely have several parallel projects on-going, can pivot to other things, etc, but hopefully my general point is clear.


It is not unusual, and indeed it is what should be expected, for honors to be accompanied by liabilities.

Ambitious PI's want bigger labs that lead to the recruitment of better students, who then produce more impactful papers, which then support the demand for more funding. It is a positive reinforcement cycle that eventually leads to bigger, better, and more popular labs. Those are the honors.

The liability is that if your name is in the article as a senior co-author, you are just as responsible as the first author for errors or fraudulent research. The senior PI's actual contribution should not matter, their name is there, the publication is used to support their career, they recruited the students or postdocs.


> you are just as responsible as the first author for errors or fraudulent research

I know what you're trying to say, but I think you're making it too black-and-white. There are two nuances I'd like to point out: Firstly the senior author did not actually perpetrate the fraud...this has to mean something when assessing blame I think. Secondly, the senior authors do not really have the ability to filter out fraud, assuming it's done cleverly. What can they do aside from reading the drafts and scrutinizing the data/methods/interpretation? Are they expected to have a team of shadow PhDs doing the same experiments to ensure reproducibility?

No doubt some PIs create an environment that encourages fraud, and that's a problem. But the point I'm trying to make is that if we want to solve the problem of scientific fraud we need to be honest about the source of the problem. In my opinion, it's the fact that a student's entire future is wholly dependent on a good result. The senior author already has a job, probably tenure, and plenty of other projects on the go, so one failed project is not a problem. The cost of failure to the student on the other hand is essentially infinite!


> What can they do aside from reading the drafts and scrutinizing the data/methods/interpretation?

You would surprise how few of the big lab's PI's even do that. And since a big lab, say in biology, can send out 40–50 papers a year, there is no time for the PI to think deeply about hypotheses, methods, data collection. But having a big lab is a decision, as I wrote in my previous comment: honors/grants and liabilities.

> In my opinion, it's the fact that a student's entire future is wholly dependent on a good result.

That's very true, but there is also a thing called personal responsibility. Any non-violent "fraud", any "criminal", has some reasonable motivation behind their actions. But committing fraud is not an inevitability, and a lack of strong punishment that has origins from understanding the motivations behind those actions punishes people who behave, loosely speaking, properly.

Years ago, when I was doing academic research, I asked a colleague of mine if they would change some of their research results if the fraud (a) was never discovered and had no general consequences, (b) led to a publication in Science, Nature, Cell, etc. that would semi-guarantee a tenure-track position, and with that, "bread on the table" for the family, the kids, the aging parents. They said they would never do that, but was it true for them? Would it be true for me? Since the question is legitimate, strong punishment is needed to reduce the occurrence of fraud in research.

And since there is a tenure-track position available for dozens of good applicants, it is natural that a good result will make the difference between having a professional life in academia or not. But is it not the same, with the kind of "good result" depending on the field, for all those fields in which there are many more participants than "winners"? An immediate parallel can be made with doping in sports.


Your point is clear and extremely important. Edison supposedly once said, "I didn't fail. I just discovered 99 ways not to build a light bulb." Or something like that. Among those 99 failures were 20 super meaningful discoveries. In those failures the world's understanding of material science advanced in ways that affected a million later research projects.

A PhD candidate who can prove a hypothesis wrong should often have their work valued as much as one who proved the opposite.

But consider something like the invention of Paxos. If you leave out one small piece, you fail. All that time and effort seems wasted. You haven't proved anything true or false. You've just failed. But if you've documented your failure sufficiently, somebody might come behind you and fix that one little piece you got wrong.

One of the problems with our current system is that three years or ten years of research never gets published or properly documented for posterity because it didn't succeed. Even failures should be written up and packaged for the next grant to extend the exploration. There needs to be some reward for doing that packaging. Maybe we can call it a PhaD (almost PhD). Do you award a PhD to those who take up their own or somebody else's PhaD and complete it successfully?


I had a bit of a eureka on this subject this afternoon: when looking at scientific fraud and who to blame, we (as a society) tend to focus on who stands to gain if the fraud is successful, but instead we should look at who stands to lose the most if the fraud is caught.

Let me explain via a 2x2 matrix (which I highly doubt will render properly, but here goes):

Actor | Fraud is successful | Fraud is caught

------------| -------------- | ----------------

Professor | Scenario A | Scenario B

Student | Scenario C | Scenario D

Scenario A: If a fraud is successful, the senior author gets a small benefit in the form of a slight raise, incremental increase in success rate of next grant, maybe an award, some endorphins from the praise. Very minor actually.

Scenario B: If the fraud is caught the senior author's career could be in shambles, like resigning from their tenured position, losing investors in spin-offs, humiliation, etc. They have a lot to lose.

Scenario C: In the event of a successful fraud, the student stands to gain a lot in the form of job prospects, future income, and generally accomplishing their life's ambitions. There is a huge payoff for the student in this scenario.

Scenario D: If they don't perpetrate fraud (to salvage a bad result), their career in academia is over, they have wasted 3-4 years of the life, which is the same outcome as if they did perpetrate fraud and got caught. The student has nothing to lose!


I ran into this too. I run a very silly slack bot for my friends, and it randomly cycles through pictures that we all have created. Initially it was completely random choice per invocation. Had to change it to a randomly sorted list that was then stored and iterated through until the list is depleted, then it's re-randomised. For the same reason, complaints that actual random choice chose duplicate pictures too often


Note that if your playlist is append-only, you can use format-preserving encryption and just store a seed, a counter, and the length of the list when you started, instead of storing the whole shuffled list.


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: