We've thought of doing this sort of exercise at work but mostly hit the wall of data becoming a lot more scare the further back in time we go. Particularly high quality science data - even going pre 1970 (and that's already a stretch) you lose a lot of information. There's a triple whammy of data still existing, being accessible in any format, and that format being suitable for training an LLM. Then there's the complications of wanting additional model capabilities that won't leak data causally.
I was wondering this. what is the minimum amount of text an LLM needs to be coherent? fun of an idea as this is, the samples of its responses are basically babbling nonsense. going further, a lot of what makes LLMs so strong isn't their original training data, but the RLHF done afterwards. RLHF would be very difficult in this case
I, and my advisor's lab[1] work in this area. I'm going to focus on the technical aspects of evidence synthesis, as opposed to the business aspects.
There's a difference between symptoms, diagnosis, treatment options, evidence for those, and the audiences for whom these are written. WebMD and friends target a broader market, scientific studies target...scientists and doctors.
I think the hard parts of building a better WebMD (along the article's lines) are:
- screening articles for relevance. This is more than mere search or finding potentially relevant articles, but also making a decision to include them
- extracting _structured_ information from the articles. Frequently we talk about Populations/Problems, Interventions, Comparators (interventions), and (medical) Outcomes, collectively PICOs. Extraction of each component is easy. Assembling them is surprisingly hard. Finding equivalencies between them within documents is surprisingly hard. Finding facets or different parts of a treatment (how do you handle combined treatments for a single study? how about when some studies use them and some don't?)
- establishing equivalencies for your evidence synthesis between documents is even harder: do you care about dosage? combination with other treatments? what happens when a slightly different formulation of a treatment gets used, or a treatment is administered poorly? I don't know of anyone within scholarly document processing community working on analysis of medical methods (possibly ignorance on my part!).
It would be nice if trial preregistration had all these details, but not all trials are preregistered, nor all outcomes published, the aim of the trial can shift, and there's a large pile of literature where this information isn't available.
I think real time updates are the smallest problem among these: solving extraction (including the big structured objects) mostly solves this; the statistics of a meta-analysis are not complicated. To be clear, this is still an issue. I have a figure[2] highlighting the lag.
Presenting the evidence in a digestible, meaningful way, seems like a hard HCI problem to do right, and easy to do poorly. Merely giving PICOs (structured!) and findings is easy and a bad UX for the non-technical, a narrative summary is interesting to provide and hard to do well, and a subgroup analysis can suffer problems from differing group sizes and effectiveness findings (tailoring is tricky).
There are several organizations[3] working on these sorts of problems.
You might be interested in different fermentation processes. A friend of mine has wild fermented, at pressure and low temperature (55F I think), a mix of Mac and Ida Red that ended up tasting delicious. He chucked in about half the campden tablets you'd need at his volume to leave some natural yeasts. The cider started off tasting sweet and being a good competitor for a Magners, and eventually became this very clear/tart flavor.
He uses a fancy and expensive SS Brewtech vessel and glycol chiller. I'm interested in duplicating some of this process using cornelius (corny) kegs with a spunding valve and possibly a used mini-fridge or basement for temperature control.