Hacker News new | past | comments | ask | show | jobs | submit login
Pumas AI: A platform for pharmaceutical modeling and simulation (pumas.ai)
123 points by KenoFischer 5 months ago | hide | past | favorite | 33 comments

This is a really cool effort to overhaul the pharmaceutical software stack. They just had their big 1.0 launch today. Also 100% written in Julia. Super fast and uses all the latest tricks and ideas from pharmacometric science. I'm not working on this myself (other than fixing the occasional bug they run into), but some of my colleagues do. I'll see if I can point them here to answer questions.

Yeah but pharma in my personal experience is VERY conservative. A lot of them still use SAS as their bread and butter for analysis. And guess what? All their older trials that they have to maintain are all written in SAS. It's basically how banking got stuck in a lot of legacy COBOL code, pharma got stuck in SAS. Want to do healthcare economic modeling? Buy a license for TreeAge (and it ain't cheap!). And it's not like you cannot do complex analysis in it, it just doesn't have a sleek interface. The "biggest" technology disputer in pharma I would argue would be Salesforce! They support creating all tons of custom input interfaces for patient consenting and record keeping. Enterprise ain't supposed to be sexy!

I just sorta worry if there is correct culture fit in pharmaceuticals when there is a prevailing mentality of "no one ever got fired for buying Microsoft".

You are right, a portion of pharma is conservative. There are two different needs in Pharma - one is more like assembly line (pure operations) and other other is "innovation". One has to decide what excites scientists - operations (eg calculating group means over and over where the answer has to be exact) OR innovation (get the questions right, designs right, analysis right, answers approximately accurate). I think there is a place for "operations", but they constitute "sustaining innovations". Meaning can I find the most economical solution for this problem? Operations is a separate discussion.

Let us talk about "disruptive innovations". What if I told you I was part of the team that approved a treatment for H1N1 flu pandemic in children without ANY trial based on unapproved data in adults? What if I told you the development pathway for pulmonary arterial hypertension in pediatrics was driven by analysis performed by Pumas-like software to establish a reasonable endpoint? These are patients who cannot perform most daily functions we take for granted. There would have been no drugs approved otherwise. How can you, me and others contribute to the disruptive innovation which will transform how drugs are developed and patients are treated. Pumas is designed to disrupt drug development and precision medicine. There are plenty of opportunities for disruption in healthcare. these opportunities are the motivation for Pumas..Joga Gobburu (Co-Founder, Pumas-AI)

Which country are you tlaking about? ( cfr. "H1N1 flu pandemic in children without ANY trial based on unapproved data in adults")

USA for emergency use.

Pumas-AI co-founder here. Indeed, Pharma is conservative due to the impact it can have on patient lives (good or bad). If something is working, it stays, and something failed them, it gets axed, both people and tools. Having said that, let's give them some credit, as along with the regulators at the FDA, pharma has come a long way from the days of SAS. Even the regulators at the FDA have started gravitating away from SAS into more R-based workflows. While it was a good switch away from SAS, scientists started realizing the bottlenecks with R and the requirement to learn multiple tools to get things working, especially when one has to scale up for the new age analytics. While it is good to have a swiss army knife of tools, there is more overhead in making them work together. I think that is why SAS had shined for the longest time. But as you rightly said, it is cost prohibitive and in the hands of very few.

Pumas and the Julia ecosystem are about integrated modeling speak across multiple sectors that are composable and the best part is that one does not have to develop in one environment and switch to another for production. Your point about Salesforce may be right for trial operations, but someone has to do the anlaytics that connect back to the first principles of physiology and pharmacology. Pumas provides this connection to first principles from the core, and lets you build on it for both internal decision making or to serve the stakeholders with fancy UI's.

A subtle but important point here is that compared to a 30-40 years ago since when SAS became a mainstay, I would argue that more of the younger generation are not averse to coding. In fact programming at the basic level is a fundamental expected skill set, and I personally see more and more life-science majors be efficient coders, thanks to the democratization of tools and education. Combined with this new workforce, and tools like Pumas and Julia, we are not far away from disrupting this sector.

Why Julia?

By choosing a language that a tiny minority know and use, haven't you crippled your adoption from the start?

I think Chris Rackauckas answers "Why Julia" below in https://news.ycombinator.com/item?id=24132382

This is a very clear case of a well-defined application. The field of pharma needs a combination of mechanistic models (based on differential equations), statistics, and ability to work with large datasets. The existing tools are far from sufficient. If anything, the need for high performance and scalable tools has been felt in the industry for a very long time.

There are hundreds of thousands of Julia programmers worldwide now, and the user base is doubling every year. After all, new ideas do start with a tiny minority, but one that truly believes in those ideas and is enthusiastic in adopting them early.

Thanks - So it's around the speed of certain numerical operations?

So you could still use python and just wrap Julia libraries for those operations it's optimized fors

Not necessarily, because the kernel cost can still be too high in many cases. It's not the machine learning case where everything in Python is a matrix multiplication call that will be costly enough to mask the cost of Python. In this cases, users will be writing models that are for scalar operations, and doing things like inlining those functions matters a lot. Of course, you could write something that takes Python code and generates Fortran code that it then statically compiles inline to your other Fortran codes to eliminate all of the overhead, and from that you can see exactly why we resorted to a JIT compiled language. FWIW, other software in this field have essentially built mini JIT compilers for Fortran for doing exactly this, and I'll just say that I'm glad I'm not the one who has to maintain that kind of thing while working on modeling and simulation algorithms.

Ok - thanks - I guess it depends on the problem - I assume offloading computation to the GPU has similar issues?

> FWIW, other software in this field have essentially built mini JIT compilers for Fortran

Things like LLVM not specialist enough in terms of optimizations?

You're not wrong - Pharma is quite conservative. I wasn't convinced the the Pumas folks had any hope, but apparently the legacy workflow for what they have is so aweful that people are aching for something better. I'm not in the day to day, but from my perspective Pumas seems to be quite successful and well received in Pharma, despite the conservatism.

That might be true about some pharma software, but that does not mean there's no appetite for innovation. A lot of the computational work is done with decades old Fortran libraries, plugged into R (yup, no Python in this world) and spreadsheets. It is very hard to upgrade these old Fortran libraries to leverage capabilities of modern numerical stacks.

The reason why companies have to hold on to their SAS and COBOL is that once a drug is out in the real world, and there is a lawsuit, or a recall - you need to be able to have everything available for review. These things can happen 20 or 30 years after a drug becomes available. That would naturally induce risk aversion.

New drugs today will use newer tools, and 30 years out, we'll be saying the same things about today's tools. In fact pharma companies have no choice but to adapt. Technology moves fast, and anything that gives an edge, if not adopted, will leave you behind.

What are these old Fortran libraries doing, which requires "new numerical stacks"? From my naive point of view, all what you are describing should be trivially done on a workstation. As I'm sure that's probably me underestimating the field: can you point me to some algorithm/technique which would benefit from new numerical stacks?

Also is your stack open source? I somehow doubt the long-term advantage of one proprietary stack over another.

Pharmacometrics is built around small stiff ODE models. You might think that this might mean it's "simple" computationally, but what it actually means is that every little detail matters a lot for performance. The way you do lu-factorization for 4 ODEs vs 6 ODEs needs to change in an architecture-dependent fashion, and BLAS libraries are not efficient in this range. Standard floating point exponentiation can be too expensive in some spots, while at the same time the standard stiff ODE solvers make assumptions about regularity which are regularly violated by pharmacometric models (due to how dosing works). There is more than a little bit of "by-hand SIMD" in this software stack. Specializing on all of these features is an interplay between algorithms and JIT compilation, where not utilizing statically compiled optimizations will hurt you at this size. We see this in benchmarks like the SciML Hires [1] which is able to outperform the classic Fortran libraries like LSODA with newly developed Rodas methods. As a less direct and more illustrative example, these benchmarks vs PyTorch [2] demonstrate what happens when you compare against an optimizing JIT compiler which doesn't specialize on small sized interactions (>30x performance difference!), showing how this kind of application is very much not in the regime of large kernels which a lot of recent compiler work tends to optimize for and instead heavily relies on being able to cross-compile and inline computations to remove every little overhead, all while improving the ODE algorithms themselves. We will have a paper that goes into more detail on this fairly soon.

    [1] https://benchmarks.sciml.ai/html/StiffODE/Hires.html
    [2] https://gist.github.com/ChrisRackauckas/cc6ac746e2dfd285c28e0584a2bfd320

Thanks, super interesting. So it's really high throughput for simple stuff (and probably people don't mind that it's too slow).

Like any big computation, it's lots of simple stuff smashed together, so if the simple stuff is slightly slow it can cost a lot. Nonlinear mixed effects model fitting with a few thousand patients (like in a later stage clinical trial) can take 2 weeks to one month to run. Being slow at this stage of the computation can delay clinical trials. Because of the extreme cost of these trials, people very much care if it's too slow.

Winnonlin is one such tool - been around forever, written in fortran.

One of the reasons it is still popular is for regulatory use it's very well understood.

I'm not enough of an expert to answer the second part of your question - on why new numerical stacks are beneficial.

It may be around building more complex models of PK/PD but I'm not up to speed on what winnonlin can and can't do.

Here's the stream from the event today, which talks more about what we released, and where it is going, with demos etc.


I highly recommend the story of IronShore .. it really speaks about the importance of having the right analysis stack along with the technology that can save share holders millions of dollars.. these are things Pumas and the team at Pumas-AI does on a day to day basis

For those who want to jump into the details:

Docs: https://docs.pumas.ai/

Tutorials: https://tutorials.pumas.ai/

Julia rocks! Congrats on the launch

Slightly related: is there a good open source platform around for managing health clinics? I've never stumbled upon a good one in my searches.

Drug discovery is fairly different from pharmacometrics. Drug discovery is about finding what chemicals would likely produce the right effects by mining models and simulation for how they would bind to proteins and how that effects the protein's behavior. This generally uses molecular simulations, things like molecular dynamics or DFT to compute properties of the molecules themselves.

Pharmacometrics is focused on precision dosing: given a drug in a clinical trial, how should you be personalizing the dosing in order to have high efficacy with low toxicity? This is different depending on many factors (weight, metabolic factors, gender, etc.) and are a mix systems physiology types of models of metabolic and cell signaling (quantitative systems pharmacology and physiologically-based pharmacokinetics) and compartmental models.

They are both useful, just at different stages of the drug development pipeline. Drug discovery modeling and simulation is done at the very early stages before the clinical trial to predict what drugs to test and what the specificity of the targeting is (i.e. will it have off-target effects and cause side effects?). On the other hand, pharmacological modeling and simulation is done during the clinical to try and adaptively change the dosing, understand effects on the population, and predict whether the new off-target effects cause a system-wide toxic effects (i.e. just because drug X accidentally blocks the binding of Y to Z doesn't necessarily mean that most people will have a side effect, but you can predict whether certain sub-populations might be more prone to side effects and how likely that is to cause a clinical trial to fail). Given the cost of clinical trials is in the billions, any mathematics that can predict whether it will fail or simply avoid a clinical trial by proving safety through statistical means is something that's in high demand.

Hey Chris, thanks for the great talks on math and ML with Julia. I highly recommend them to anyone interested in learning how to spell mathematical model in julia or, in general, in any language.


Do you really think you can avoid a trial by proving safety through statistical means?

Aren't clinical trials done because we don't know in advance - as the full complexity of biology is beyond our ability to predict?

ie we do the trials to find out the things we didn't know ( and thus couldn't model ).

Perhaps through rationalization of a trial result to avoid the call for additional trials - but I find it hard to believe trials can be avoided in general.

I think the key "emergency use" is missed. The question is what are we going to do when you need something right away in a pandemic situation. The point here has nothing to do with "avoiding" trials - that I will never advocate. But there are enough situations that require us to think "outside the box". at that point, magic cannot happen. we need build things systemically as science progresses...

Even in a pandemic situation, I'd not sure I'd ever use a model if I could do a phased rollout in terms of safety.

And if it's so bad - that you have no option but to give it to everyone now - well you have no option...

I suppose the practical problem right now is not so much about the risks of one individual vaccine, but rather choosing between the many many candidates.

How would you go about that?


Out of the categories of RWE - I'd say that the real time patient data collection looks the most promising - but then in some ways that's just clinical trial information collected in a different way.

Schrodinger is actually in the space of using quantum chemistry simulations (and molecular simulations) to prepare targets - new molecules for pharma companies as the basis of new drugs.

Pumas is at the other end - modelling and simulation tool that simulates the pharmacokinetics and pharmacodynamics - effects of a drug on the body and the body on the drug. It is relevant during clinical trials, for dosing and for final submission to the regulator.

There is a lot of complexity in drug dosing, but it is possible to eventually bring the modelling tools to the patient's bedside and personalize the dose to have effective treatment. There are also lots of possibilities for new treatments by mining old trials, and by learning new therapies from old trial datasets - what the FDA now calls Real World Evidence.

While I am a co-founder of Julia Computing, we are the technology partners for Pumas.ai, and Pumas is one of the most complex and largest Julia applications to date leveraging a large part of the Julia ecosystem.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact