Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: What things are happening in ML that we can't hear over the din of LLMs?
364 points by aflip 8 months ago | hide | past | favorite | 99 comments
What are some exciting things that are happening in the #ML #DataScience world that we are not able to hear over the din of LLMs?

I notice that Cynthia rudin is continuing to produce great stuff on explainable AI.

What else is going on that is not GPT/Diffusion/MultiModal?




Some exciting projects from the last months:

- 3d scene reconstruction from a few images: https://dust3r.europe.naverlabs.com/

- gaussian avatars: https://shenhanqian.github.io/gaussian-avatars

- relightable gaussian codec: https://shunsukesaito.github.io/rgca/

- track anything: https://co-tracker.github.io/ https://omnimotion.github.io/

- segment anything: https://github.com/facebookresearch/segment-anything

- good human pose estimate models: (Yolov8, Google's mediapipe models)

- realistic TTS: https://huggingface.co/coqui/XTTS-v2, bark TTS (hit or miss)

- open great STT (mostly whisper based)

- machine translation (ex: seamlessm4t from meta)

It's crazy to see how much is coming out of Meta's R&D alone.


> It's crazy to see how much is coming out of Meta's R&D alone.

They have the money...


and data


and (rumours say) engineers who will bail if Meta doesn’t let them open source


Hundreds of thousands of H100s…


And a dystopian vision for the future that can make profitable use of the above ...


On the plus side, people make up the organization and when they eventually grow fed up with the dystopia, they leave with their acquired knowledge and make their own thing. So dystopias aren't stable in the long term.


That seems to rely on the assumption that human input is required to keep the dystopia going. Maybe I watched too much sci-fi, but the more pessimistic view is that the AI dystopia will be self-sustaining and couldn't be overcome without the concerted use of force by humans. But we humans aren't that good in even agreeing on common goals, let alone exerting continuous effort to achieve them. And most likely, by the time we start to even think of organizing, the AI dystopia will be conducting effective psychological warfare (using social media bots etc.) to pit us against each other even more.


The Ones Who Walk Away From O-Meta-s


A very apt reference to the story

The ones who walk away from Omelas

Dunno how pasting a link works but here it is:

https://shsdavisapes.pbworks.com/f/Omelas.pdf


I feel vaguely annoyed, I think it's because it took a lot of time to read through that, and it amounts to "bad to put child in solitary confinement to keep whole society happy."

What does a simplistic moral set piece about the abhorrence of sacrificing the good of one for the good of many have to do with (check notes) Facebook? Even as vague hand-wavey criticism, wouldn't Facebook would be the inverse?


You have every right to take what you like from it, but I'd suggest that perhaps you're not seeing what others are if all you get is a morality play. As one example, maybe spend some time thinking about why you apparently missed that it's intentionally left ambiguous as to whether the child is even real in the story's world.


A condescending lecture starting with "you just don't get it" ending with "I read your mind and know you missed the 'but was it even real?'" part isnt imparting anything useful.

Re: "actually you should just ponder why you are a simpleton who doesn't get it, given other people derived value from how it relates to Facebook": There arent people here running around praising it. The comment 4 up was, and still is downvoted well below 0, there's barely anyone reading all the way down here. Only one other person even bothered replying.

I don't think me mentioning this is useful or fair, but I don't know how to drive home how little contribution there is from a condescending "think harder, didn't you notice the crowd loves it and understands how it's just like Facebook"


You misread my comment, I wasn't trying to be condescending; a primary theme of the story (in my and many others' readings) is the limits of our ability to imagine different, better worlds than the one we exist in. We struggle to read the story as purely utopian, even when we are explicitly told to do so. It has more impact when you find this on your own, and I was trying to avoid spoilers.


So the dystopia spreads out... Metastasis


> So dystopias aren't stable in the long term.

Unless they think to hire new people.


For some people this is a stable dystopia.


Whoa, Bark got a major update recently. Thanks for the link as a reminder to check in on that project!


Can you share what update you are referring to ?

I've played with Bark quite extensively a few month ago and I'm on the fence regarding that model: when it works, it's the best, but I found it to be pretty useless for most use-case I want to use TTS for because of the high rate of bad or weird output.

I'm pretty happy with XTTv2 though. It's reliable and output quality is still pretty good.


- streaming and rendering 3d movies in real-time using 4d gaussian splatting https://guanjunwu.github.io/4dgs/


Not sure how relevant this is but note that Coqui TTS (the realistic TTS) has already shut down

https://coqui.ai


NeRFS. It's a rethink of 3D graphics from the ground up, oriented around positioning glowing translucent orbs instead of textured polygons. The positioning and color of the orbs is learned by a NN given accurate multi-angle camera shots and poses, then you can render them on GPUs by ray tracing. The resulting scenes are entirely photo-realistic, as they were generated from photos, but they can also be explored.

In theory you can also animate such scenes but how to actually do that is still a research problem.

Whether this will end up being better than really well optimized polygon based systems like Nanite+photogrammetry is also an open question. The existing poly pipes are pretty damn good already.


What you're talking about is I think gaussian splats. NeRFS are exclusively radiance fields without any sort of regular 3d representation.


Yes, I think Gaussian Splats are were all the rage is.

My limited understanding is that Nerfs are compute-heavy because each cloud point is essentially a small neural network that can compute its value from a specific camera angle. Gaussian splats are interesting since they achieve almost the same effect using a much simpler mechanism of using gaussian values at each cloud points and can be efficiently computed in real-time on GPU.

While a Nerf could be used to render a novel view of a scene, it could not do so in real-time, while gaussian splats can which opens up lots of use-cases.


> My limited understanding is that Nerfs are compute-heavy because each cloud point is essentially a small neural network

There's no point cloud in NeRFs. A NeRF scene is a continuous representation in a neural network, i.e. the scene is represented by neural network weights, but (unlike with 3D Gaussian Splatting) there's no explicit representation of any points. Nobody can tell you what any of the network weights represent, and there's no part of it that explicitly tells you "we have a point at location (x, y, z)". That's why 3D Gaussian Splatting is much easier to work with and create editing tools for.


Interesting. Thanks for the clarification.


There's a couple of computerphile videos on this:

nerfs: https://youtu.be/wKsoGiENBHU Gaussian platting: https://youtu.be/VkIJbpdTujE


Very cool, thanks! NeRFs = Neural Radiance Fields, here [1] is the first hit I got that provides some example images.

[1]: https://www.matthewtancik.com/nerf


>Whether this will end up being better than really well optimized polygon based systems like Nanite+photogrammetry is also an open question

I think this is pretty much settled unless we encounter any fundamental new theory roadblocks on the path of scaling ML compute. Polygon based systems like Nanite took 40+ years to develop. With Moore's law finally out of the way and Huang's law replacing it for ML, hardware development is no longer the issue. Neural visual computing today is where polygons where in the 80s. I have no doubt that it will revolutionize the industry, if only because it is so much easier to work with for artists and designers in principle. As a near-term intermediate we will probably see a lot of polygon renderers with neural generated stuff inbetween, like DLSS or just artificially generated models/textures. But this stuff we have today is like the Wright brother's first flight compared to the moon landing. I think in 40 years we'll have comprehensive real time neural rendering engines. Possibly even rendering output directly to your visual cortex, if medical science can keep up.


It's easier to just turn NeRFs/splats into polygons for faster rendering.


That's only true today. And it's quite difficult for artists by comparison. I don't think people will bother with the complexities of polygon based graphics once they no longer have to.


Rasterisation will always be faster, it's mathematically simpler.


Not really. Look at how many calculations a single pixel needs in modern PBR pipelines just from shaders. And we're not even talking about the actual scene logic. A super-realistic recreation of reality will probably need a kind of learned, streaming compression that neural networks are naturally suited for.


neural networks will be used on top of polygon based models


They already are. But the future will probably not look like this if the current trend continues. It's just not efficient enough when you look at the whole design process.


You can convert neural spatial representations to polygon based, so there is no need to use a much more inefficient path during the real time phase.


As I said twice now already, efficiency is not just a question of rendering pixels. When you take the entire development lifecycle into account, there is a vast opportunity for improvement. This path is an obvious continuation of current trends we see today: Why spend time optimising when you can slap on DLSS? Why spend time adjusting countless lights when you can use real time GI? Why spend time making LODs when you can slap on Nanite? In the future people will ask "Why spend time modelling polygons at all when you can get them for free?"


Nobody will spend time modelling polygons. They will convert gaussian splats to polygons automatically, and the application will rasterise polygons. This is how it's already done, if we went back to ray marching NeRFs we would be going backwards and would be an incredible waste of performance. Polygons are here to stay for the next 20 years.


One area that I would dive into (if I had more time) is "geometric deep learning". i.e) How to design models in a principled way to respect known symmetries in the data. ConvNets are the famous/obvious example for their translation equivariance, but there are many recent examples that extend the same logic to other symmetry groups. And then there is also a question of whether certain symmetries can be discovered or identified automatically.


I've been doing some reading on LLMs for protein/RNA structure prediction and I think there's a decent amount of work on SO3 invariant transformer architectures now


There's also been some work on more general Lie-group equivariant transformer models.

http://proceedings.mlr.press/v139/hutchinson21a/hutchinson21...


I launched https://app.scholars.io to get latest research from arxiv on specific topics I’m interested in so I can filter out ones that I’m not interested. Hopefully it will help someone find research activities other than LLM.


just signed up for computer vision and image processing related topics as this is what I'm specializing in for my Master's

The interface to sign up was very painless and straightforward

I signed up for a 2-week periodic digest

The first digest comes instantly and scanning through the titles alone was inspirational and I'm sure will provide me with more than a few great papers to read over upcoming years


Anyone know anything I can use to take video of a road from my car (a phone) and create a 3D scene from it? More focused on the scenery around the road as I can put a road surface in there myself later. I’m talking about several miles or perhaps more, but I don’t mind if it takes a lot of processing time or I need multiple angles, I can drive it several times from several directions. I’m trying to create a local road or two for driving on in racing simulators.


photogrammetry - is the key word you're looking to search on.

There's quite a few advanced solutions already (predating LLM/ML)


SLAM from monoscopic video. I imagine without an IMU or other high quality pose estimator you'll need to do a fair bit of manual cleanup.


Gaussian splatting, there is quite a bit of youtube about it and there are commercial packages that are trying to make a polished experience.

https://www.youtube.com/@OlliHuttunen78

edit - I just realized you want a mesh :) for which Gaussian splatting is not there yet! BUT there are multiple papers which are exploring adding gaussians to a mesh thats progressively refined, I think its inevitable based on what's needed for editing and usecases just like yours.

You could start exploring and compiling footage and testing and maybe it will work out but ...

Here is a news site focused on the field -

https://radiancefields.com/


You can do this for free now with RealityCapture, not ML though.


Microsoft's PhotoSynth did this years ago, but they cancelled it.


More like a cousin of LLMs are Vision-Language-Action (VLA) models like RT-2 [1]. Additionally to text and vision data they also include data from robot actions as "another language" as tokens to output movement actions for robots.

[1]: https://robotics-transformer2.github.io


The SAM-family of computer-vision models have made many of the human annotation services and tools somewhat redundant, as it's possible to achieve relatively high-quality auto-labeling of vision data.


This is probably true for simple objects, but there is almost certainly a market for hiring people who use SAM-based tools (or similar) to label with some level of QA. I've tried a few implementations and they struggle with complex objects and can be quite slow (due to GPU overhead). Some platforms have had some variant of "click guided" labelling for a while (eg V7) but they're not cheap to use.

Prompt guided labelling is also pretty cool, but still in infancy (eg you can tell the model "label all the shadows"). Seg GPT for example. But now we're right back to LLMs...

On labelling, there is still a dearth of high quality niche datasets ($$$). Everyone tests on MS-COCO and the same 5-6 segmentation datasets. Very few papers provide solid instructions for fine tuning on bespoke data.


That's basically what we are able to do now: showing models an image (or images, from video) and prompting for labels, such as with "person, soccer player".


Keep in mind that LLMs are basically just sequence to sequence models that can scan 1 million tokens and do inference affordably. The underlying advances (attention, transformers, masking, scale) that made this possible are fungible to other settings. We have a recipe for learning similar models on a huge variety of other tasks and data types.


Transformers are really more general than seq-to-seq, maybe more like set-to-set or graph-to-graph.

The key insight (Jakob Uszkoreit) to using self-attention for language was that language is really more hierarchical than sequential, as indicated by linguist's tree diagrams for describing sentence structure. The leaves of one branch of a tree (or sub-tree) are independent of those in another sub-tree, allowing them to be processed in parallel (not in sequence). The idea of a multi-layer transformer is therefore to process this language hierarchy one level at a time, working from leaves on upwards through the layers of the transformer (processing smaller neighborhoods into increasingly larger neighborhoods).


I was just going to ask a similar question recently. Ive been working on a side project involving xgboost and was wondering if ML is still worth learning in 2024.

My intuition says yes but what do I know.


I recently attended an interesting talk at a local conference. It was from someone that works at a company that makes heating systems. They want to optimize heating given the conditions of the day (building properties, outside temperature, amount of sunshine, humidity, past patterns, etc.). They have certain hard constraints wrt. model size, training/update compute, etc.

Turns out that for their use case a small (weights fit in tens of KiB IIRC) multilayer perceptron works the best.

There is a lot of machine learning out in the world like that, but it doesn't grab the headlines.


I have doubts that a simple adaptive building model-based controller wouldn't be better, and interpretable. I wonder why you'd go with a perceptron... those are so limited.


Sounds interesting, can you share a link to video if available?



The foundations of ML aren't changing. The models change, the data pipelines become more sophisticated, but the core skills are still important. Imagine you're trying to predict a binary event. Do you want to predict whether a given instance will be a 0/1 or do you want to predict the probability of each instance being a 1? Why? What do all those evaluation metrics mean? Even if you're using a super advanced AutoML platform backed by LLMs or whatever, you still need to be able to understand the base concepts to build ML apps in the real world.


xgboost will still work better for most problems people encounter in industry (which usually involve tabular data).


UW-Madison's ML+X community is hosting Machine Learning Marathon that will be featured as a competition on Kaggle (https://www.kaggle.com/c/about/host)

"What is the 2024 Machine Learning Marathon (MLM24)?

This approximately 12-week summer event (exact dates TBA) is an opportunity for machine learning (ML) practitioners to learn and apply ML tools together and come up with innovative solutions to real-world datasets. There will be different challenges to select from — some suited for beginners and some suited for advanced practitioners. All participants, project advisors, and event organizers will gather on a weekly or biweekly basis to share tips with one another and present short demos/discussions (e.g., how to load and finetune a pretrained model, getting started with GitHub, how to select a model, etc.). Beyond the intrinsic rewards of skill enhancement and community building, the stakes are heightened by the prospect of a cash prize for the winning team."

More information here: https://datascience.wisc.edu/2024/03/19/crowdsourced-ml-for-...


+1 to this, but one might be hard pressed to find anything nowadays that isn't involving a transfomer model somehow.


Same sentiment here. Love the question, but transformers are still so new and so effective that they will probably dominate for a while.

We (humans) are following the last thing that worked (imagine if we could do true gradient decent on the algorithm space).

Good question, and I'm interested to hear the other responses.


> but transformers are still so new and so effective that they will probably dominate for a while.

They're mostly easy grant money and are being gamed by entire research groups worldwide to be seen as effective on the published papers. State of academia...


In the area in working in (bioacoustics), embeddings from supervised learning are still consistently beating self supervised transformer embeddings. The transformers win on held out training data (in-domain) but greatly underperform on novel data (generalization).

I suspect that this is because we've actually got a much more complex supervised training task than average (10k classes, multilabel), leading to much better supervised embeddings, and rather more intense needs for generalization (new species, new microphones, new geographic areas) than 'yet more humans on the internet.'


In text analysis people usually get better results in many-shot scenarios (supervised training on data) vs zero-shot (give a prompt) and the various one-shot and few-shot approaches.


Hey, that is a field that I am interested in (mostly inspired by a recent museum exhibition). Do you have recent papers on this topic, or labs/researchers to follow?


It's a really fun area to work in, but beware that it's very easy to underestimate the complexity. And also very easy to do things which look helpful but actually are not (eg, improving classification on xeno canto, but degrading performance on real soundscapes).

Here's some recent-ish work: https://www.nature.com/articles/s41598-023-49989-z

We also run a yearly kaggle competition on birdsong recognition, called birdclef. Should be launching this year's edition this week, in fact!

Here's this year's competition, which will be a dead link for now: https://www.kaggle.com/competitions/birdclef-2024

And last year's: https://www.kaggle.com/competitions/birdclef-2023


I wager the better question is

    What things are happening in fields of, or other than, CS that we don't hear over the din of ML/AI


Seems like there is always push back on LLM's that they don't learn to do proofs and reasoning.

Deepmind just placed pretty high at International Mathematical Olympiad . Here it does have to present reasoning.

https://arstechnica.com/ai/2024/01/deepmind-ai-rivals-the-wo...

And it's couple years old, but AlphaFold was pretty impressive.

EDIT: Sorry, I said LLM. But meant AI/ML/NN generally, people say a computer can't reason, but DeepMind is doing it.


>To overcome this difficulty, DeepMind paired a language model with a more traditional symbolic deduction engine that performs algebraic and geometric reasoning.

I couldn't think of a better way to demonstrate that LLMs are poor at reasoning than using this crutch.


I wouldn't say 'crutch' but component.

Eventually LLMs will be plugged into Vision Systems, and Symbolic Systems, and Motion Systems, etc... etc...

The LLM wont be the main 'thing'. But the text interface.

Even human brain is bit segmented with different faculties being 'processed' in different areas with different architectures.


I suppose it's because LLM training data uses text that can contain reasoning within it, but without any specific context to specifically learn reasoning. I feel like the little reasoning an LLM can do is a byproduct of the training data.

Does seem more realistic to train something not on text but on actual reasoning/logic concepts and use that along with other models for something more general purpose. LLMs should really only be used to turn "thoughts" into text and to receive instructions, not to do the actual reasoning.


So, from the perspective I have within the subfield I work in, explainable AI (XAI), we're seeing a bunch of fascinating developments.

First, as you mentioned, Rudin continues to prove that the reason for using AI/ML is that we don't understand the problem well enough; otherwise we wouldn't even think to use it! So, pushing our focus to better understand the problem, and then levy ML concepts and techniques (including "classical AI" and statistical learning), we're able to make something that not only outperforms some state-of-the-art in most metrics, but often even is much less resource intensive to create and deploy (in compute, data, energy, and human labour), with added benefits from direct interpretability and post-hoc explanations. One example has been the continued primacy of tree ensembles on tabular datasets [0], even for the larger datasets, though they truly shine on the small to medium datasets that actually show up in practice, which from Tigani's observations [1] would include most of those who think they have big data.

Second, we're seeing practical examples of exactly this outside Rudin! In particular, people are using ML more to do live parameter fine-tuning that outwise would need more exhaustive searches or human labour that are difficult for real-time feedback, or copious human ingenuity to resolve in a closed-form solution. Opus 1.5 is introducing some experimental work here, as are a few approaches in video and image encoding. These are domains where, as in the first, we understand the problem, but also understand well enough that there's search spaces we simply don't know enough about to be able to dramatically reduce. Approaches like this have been bubbling out of other sciences (physics, complexity theory, bioinformatics, etc) that lead to some interesting work in distillation and extraction of new models from ML, or "physically aware" operators that dramatically improve neural nets, such as Fourier Neural Operators (FNO) [2], which embeds FFTs rather than forcing it to be relearned (as has been found to often happen) for remarkable speed-ups with PDEs such as for fluid dynamics, and has already shown promise with climate modelling [3], material science [4]. There are also many more operators, which all work completely differently, yet bring human insight back to the problem, and sometimes lead to extracting a new model for us to use without the ML! Understanding begets understanding, so the "shifting goalposts" of techniques considered "AI" is a good thing!

Third, specifically to improvements in explainability, we've seen the Neural Tangent Kernel (NTK) [5] rapidly go from strength to strength since its introduction. While rooted in core explainability vis a vis making neural nets more mathematically tractable to analysis, not only inspiring other approaches [6] and behavioural understanding of neural nets [7, 8], but novel ML itself [9] with ways to transfer the benefits of neural networks to far less resource intensive techniques; which [9]'s RFM kernel machine proves competitive with the best tree ensembles from [0], and even has advantage on numerical data (plus outperforms prior NTK based kernel machines). An added benefit is the approach used to underpin [9] itself leads to new interpretation and explanation techniques, similar to integrated gradients [10, 11] but perhaps more reminiscent of the idea in [6].

Finally, specific to XAI, we're seeing people actually deal with the problem that, well, people aren't really using this stuff! XAI in particular, yes, but also the myriad of interpretable models a la Rudin or the significant improvements found in hybrid approaches and reinforcement learning. Cicero [12], for example, does have an LLM component, but uses it in a radically different way compared to most people's current conception of LLMs (though, again, ironically closer to the "classic" LLMs for semantic markup), much like the AlphaGo series altered the way the deep learning component was utilised by embedding and hybridising it [13] (its successors obviating even the traditional supervised approach through self-play [14], and beyond Go). This is all without even mentioning the neurosymbolic and other approaches to embed "classical AI" in deep learning (such as RETRO [15]). Despite these successes, adoption of these approaches is still very far behind, especially compared to the zeitgeist of ChatGPT style LLMs (and general hype around transformers), and arguably much worse for XAI due to the barrier between adoption and deeper usage [16].

This is still early days, however, and again to harken Rudin, we don't understand the problem anywhere near well enough, and that extends to XAI and ML as problem domains themselves. Things we can actually understand seem a far better approach to me, but without getting too Monkey's Paw about it, I'd posit that we should really consider if some GPT-N or whatever is actually what we want, even if it did achieve what we thought we wanted. Constructing ML with useful and efficient inductive bias is a much harder challenge than we ever anticipated, hence the eternal 20 years away problem, so I just think it would perhaps be a better use of our time to make stuff like this, where we know what is actually going on, instead of just theoretically. It'll have a part, no doubt, Cicero showed that there's clear potential, but people seem to be realising "... is all you need" and "scaling laws" were just a myth (or worse, marketing). Plus, all those delays to the 20 years weren't for nothing, and there's a lot of really capable, understandable techniques just waiting to be used, with more being developed and refined every year. After all, look at the other comments! So many different areas, particularly within deep learning (such as NeRFs or NAS [17]), which really show we have so much left to learn. Exciting!

  [0]: Léo Grinsztajn et al. "Why do tree-based models still outperform deep learning on tabular data?" https://arxiv.org/abs/2207.08815
  [1]: Jordan Tigani "Big Data is Dead" https://motherduck.com/blog/big-data-is-dead/
  [2]: Zongyi Li et al. "Fourier Neural Operator for Parametric Partial Differential Equations" https://arxiv.org/abs/2010.08895
  [3]: Jaideep Pathak et al. "FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators" https://arxiv.org/abs/2202.11214
  [4]: Huaiqian You et al. "Learning Deep Implicit Fourier Neural Operators with Applications to Heterogeneous Material Modeling" https://arxiv.org/abs/2203.08205
  [5]: Arthur Jacot et al. "Neural Tangent Kernel: Convergence and Generalization in Neural Networks" https://arxiv.org/abs/1806.07572
  [6]: Pedro Domingos "Every Model Learned by Gradient Descent Is Approximately a Kernel Machine" https://arxiv.org/abs/2012.00152
  [7]: Alexander Atanasov et al. "Neural Networks as Kernel Learners: The Silent Alignment Effect" https://arxiv.org/abs/2111.00034
  [8]: Yilan Chen et al. "On the Equivalence between Neural Network and Support Vector Machine" https://arxiv.org/abs/2111.06063
  [9]: Adityanarayanan Radhakrishnan et al. "Mechanism of feature learning in deep fully connected networks and kernel machines that recursively learn features" https://arxiv.org/abs/2212.13881
  [10]: Mukund Sundararajan et al. "Axiomatic Attribution for Deep Networks" https://arxiv.org/abs/1703.01365
  [11]: Pramod Mudrakarta "Did the model understand the questions?" https://arxiv.org/abs/1805.05492
  [12]: META FAIR Diplomacy Team et al. "Human-level play in the game of Diplomacy by combining language models with strategic reasoning" https://www.science.org/doi/10.1126/science.ade9097
  [13]: DeepMind et al. "Mastering the game of Go with deep neural networks and tree search" https://www.nature.com/articles/nature16961
  [14]: DeepMind et al. "Mastering the game of Go without human knowledge" https://www.nature.com/articles/nature24270
  [15]: Sebastian Borgeaud et al. "Improving language models by retrieving from trillions of tokens" https://arxiv.org/abs/2112.04426
  [16]: Umang Bhatt et al. "Explainable Machine Learning in Deployment" https://dl.acm.org/doi/10.1145/3351095.3375624
  [17]: M. F. Kasim et al. "Building high accuracy emulators for scientific simulations with deep neural architecture search" https://arxiv.org/abs/2001.08055


Thank you for providing an exhaustive list of references :)

> Finally, specific to XAI, we're seeing people actually deal with the problem that, well, people aren't really using this stuff!

I am very curious to see which practical interpretability/explainability requirements enter into regulations - on one hand it's hard to imagine a one-size fits all approach, especially for applications incorporating LLMs, but Bordt et al. [1] demonstrate that you can provoke arbitrary feature attributions for a prediction if you can choose post-hoc explanations and parameters freely, making a case that it can't _just_ be left to the model developers either

[1] "Post-Hoc Explanations Fail to Achieve their Purpose in Adversarial Contexts", Bordt et al. 2022, https://dl.acm.org/doi/10.1145/3531146.3533153


I think the situation with regulations will be similar to that with interpretability and explanations. There's a popular phrase that gets thrown around, that "there is no silver bullet" (perhaps most poignantly in AIX360's initial paper [0]), as no single explanation suffices (otherwise, would we not simply use that instead?) and no single static selection of them would either. What we need is to have flexible, adaptable approaches that can interactively meet the moment, likely backed by a large selection of well understood, diverse, and disparate approaches that cover for one other in a totality. It needs to interactively adapt, as the issue with the "dashboards" people have put forward to provide such coverage is that there are simply too many options and typical humans cannot process it all in parallel.

So, it's an interesting unsolved area for how to put forward approaches that aren't quite one-size fits all, since that doesn't work, but also makes tailoring it to the domain and moment tractable (otherwise we lose what ground we gain and people don't use it again!)... which is precisely the issue that regulation will have to tackle too! Having spoken with some people involved with the AI HLEG [1] that contributed towards the AI Act currently processing through the EU, there's going to have to be some specific tailoring within regulations that fit the domain, so classically the higher-stakes and time-sensitive domains (like, say, healthcare) will need more stringent requirements to ensure compliance means it delivers as intended/promised, but that it's not simply going to be a sliding scale from there, and too much complexity may prevent the very flexibility we actually desire; it's harder to standardise something fully general purpose than something fitted to a specific problem.

But perhaps that's where things go hand in hand. An issue currently is the lack of standardisation, in general, it's unreasonable to expect people re-implement these things on their own given the mathematical nuance, yet many of my colleagues agree it's usually the most reliable way. Things like scikit had an opportunity, sitting as a de facto interface for the basics, but niche competitors then grew and grew, many of which simply ignored it. Especially with things like [0], there are a bunch of wholly different "frameworks" that cannot intercommunicate except by someone knuckling down and fudging some dataframes or ndarrays, and that's just within Python, let alone those in R (and there are many) or C++ (fewer, but notable). I'm simplifying somewhat, but it means that plenty of isolated approaches simply can't worth together, meaning model developers may not have much chance but to use whatever batteries are available! Unlike, say, Matplotlib, I don't see much chance for declarative/semi-declarative layers to take over here, such as pyplot and seaborn could, which enabled people to empower everything backed by Matplotlib "for free" with downstream benefits such as enabling intervals or live interaction with a lower-level plugin or upgrade. After all, scikit was meant to be exactly this for SciPy! Everything else like that is generally focused on either models (e.g. Keras) or explanations/interpretability (e.g. Captum or Alibi).

So it's going to be a real challenge figuring out how to get regulations that aren't so toothless that people don't bother or are easily satisfied by some token measure, but also don't leave us open to other layers of issues, such as adversarial attacks on explanations or developer malfeasance. Naturally, we don't want something easily gamed that the ones causing the most trouble and harm can just bypass! So I think there's going to have to be a bit of give and take on this one, the regulators must step up while industry must step down, since there's been far too much "oh, you simply must regulate us, here, we'll help draft it" going around lately for my liking. There will be a time for industry to come back to the fore, when we actually need to figure out how to build something that satisfies, and ideally, it's something we could engage in mutually, prototyping and developing both the regulations and the compliant implementations such that there are no moats, there's a clearly better way to do things that ultimately would probably be more popular anyway even without any of the regulatory overhead; when has a clean break and freshening up of the air not benefited? We've got a lot of cruft in the way that's making everyone's jobs harder, to which we're only adding more and more layers, which is why so many are pursuing clean-ish breaks (bypass, say, PyTorch or Jax, and go straight to new, vectorised, Python-ese dialects). The issue is, of course, the 14 standards problem, and now so many are competing that the number only grows, preventing the very thing all these intended to do: refresh things so we can get back to the actual task! So I think a regulatory push can help with that, and that industry then has the once-in-a-lifetime chance to then ride that through to the actual thing we need to get this stuff out there to millions, if not billions, of people.

A saying keeps coming back to mind for me, all models are wrong, some are useful. (Interpretable) AI, explanations, regulations, they're all models, so of course they won't be perfect... if they were, we wouldn't have this problem to begin with. What it all comes back to is usefulness. Clearly, we find these things useful, or we wouldn't have them, necessity being the mother of invention and all, but then we must actually make sure what we do is useful. Spinning wheels inventing one new framework after the next doesn't seem like that to me. Building tools that people can make their own, but know that no matter what, a hammer is still a hammer, and someone else can still use it? That seems much more meaningful of an investment, if we're talking the tooling/framework side of things. Regulation will be much the same, and I do think there are some quite positive directions, and things like [1] seem promising, even if only as a stop-gap measure until we solve the hard problems and have no need for it any more -- though they're not solved yet, so I wouldn't hold out for such a thing either. Regulations also have the nice benefit that, unlike much of the software we seem to write these days, they're actually vertically and horizontally composable, and different places and domains at different levels have a fascinating interplay and cross-pollination of ideas, sometimes we see nation-states following in the footsteps of municipalities or towns, other times a federal guideline inspires new institutional or industrial policies, and all such combinations. Plus, at the end of the day, it's still about people, so if a regulation needs fixing, well, it's not like you're not trying to change the physics of the universe, are you?

  [0]: Vijay Arya et al. "One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques" https://arxiv.org/abs/1909.03012
  [1]: High-Level Expert Group on AI "Ethics Guidelines for Trustworthy AI"
  Apologies, will have to just cite those, since while there are some papers associated with the others, it's quite late now, so I hope the recognisable names suffices.


Thanks a lot. I love the whole XAI movement, as it often forced you think of cliff and limits and non-linearity of the methods. Makes you circle back to an engineering process of thinking about specification and qualification of your black box.


Thank you! especially for the exhaustive reading list!!


Alpha fold seems like a major medical breakthrough


There was a lot of research into game-playing prior to LLMs (e.g. real-time strategy). Is there nothing left to conquer there now? Or is it still happening but no one reports on it?


This is a nice daily newsletter with AI news: https://tldr.tech/ai


A novel SNN framework I'm working on. Newest post has been taking me a while. metalmind.substack.com


Is there anything cool going on in animation? Seems like an industry that relies on a lot of rote, repetitive work and is a prime candidate for using AI to interpolate movement.


3D animation is seeing tools like https://me.meshcapade.com/ crop up


That is a really creepy demo. It is cool for sure, but creepy for sure.


To plug my own field a bit, in material science and chemistry there is a lot of excitement in using machine learning to get better simulations of atomic behavior. This can open up exciting areas in drug and alloy design, maybe find new CO2 capturing material's or better cladding for fusion reactors, to name just a few.

The idea is that to solve these problems you need to solve the schrodinger equation (1). But the schrodinger equation scales really badly with the number of electrons and can't get computed directly for more than a few sample cases. Even Density Functional Theory (DFT), the most popular approximation that still is reasonably accurate scales N^3 with the number of electrons, with a pretty big pre factor. A reasonable rule of thumb would be 12 hours on 12 nodes (each node being 160 cpu cores) for 256 atoms. You can play with settings and increase your budget to maybe get 2000 (and only for a few timesteps) but good luck beyond that.

Machine learning seems to be really useful here. In my own work on aluminium alloys I was able to get the same simulations that would have needed hours on the supercomputer to run in seconds on a laptop. Or, do simulations with tens of thousands of atoms for long periods of time on the supercomputer. The most famous application is probably alphafold from deep mind.

There are a lot of interesting questions people are still working on:

What are the best input features? We don't have any nice equivalent to CNNs that are universally applicable, though some have tried 3d convnets. One of the best methods right now involves taking spherical harmonic based approximates of the local environment in some complex way I've never fully understood, but is closer to the underlying physics.

Can we put physics into these models? Almost all these models fail in dumb ways sometimes. For example if I begin to squish two atoms together they should eventually repel each other and that repulsion force should scale really fast (ok maybe they fuse into a black hole or something but we're not dealing with that kind of esoteric physics here). But, all machine learning potentials will by default fail to learn this and will only learn the repulsion to the closest distance of any two atoms in their training set. Beyond that and the guess wildly. Some people are able to put this physics into the model directly but I don't think we have it totally solved yet.

How do we know which atomic environments to simulate? These models can really only interpolate they can't extrapolate. But while I can get an intuition of interpolation in low dimensions once your training set consists of many features over many atoms in 3d space this becomes a high dimensional problem. In my own experience, I can get really good energies for shearing behavior of strengthening precipitates in aluminum without directly putting the structures in. But was this extrapolated or interpolated from the other structures. Not always clear.

(1) sometimes also the relativistic Dirac equation. E.g. fast moving moving atoms in some of the heavier elements move at relativistic speeds.


More physical ML force fields is a super interesting topic that I feel like blurs the line between ML and actually just doing physics. My favorite topic lately is parametrizing tight binding models with neural nets, which hopefully would lead to more transferable potentials, but also let you predict electronic properties directly since you’re explicitly modeling the valence electrons

Context for the non-mat-sci crowd - numerically solving Schrodinger essentially means constructing a large matrix that describes all the electron interactions and computing its eigenvalues (iterated to convergence because the electron interactions are interdependent on the solutions). Density functional theory (for solids) uses a Fourier expansion for each electron (these are the one-electron wave functions), so the complexity of each eigensolve is cubic in the number of valence electrons times the number of Fourier components

The tight binding approximation is cool because it uses a small spherical harmonic basis set to represent the wavefunctions in real space - you still have the cubic complexity of the eigensolve, and you can model detailed electronic behavior, but the interaction matrix you’re building is much smaller.

Back to the ML variant: it’s a hard problem because ultimately you’re trying to predict a matrix that has the same eigenvalues as your training data, but there are tons of degeneracies that lead to loads of unphysical local minima (in my experience anyway, this is where I got stuck with it). The papers I’ve seen deal with it by basically only modeling deviations from an existing tight binding model, which in my opinion only kind of moves to problem upstream


I am currently working on physics-informed ML models for accelerating DFT calculations and am broadly interested in ML PDE solvers. Overall, I think physics-informed ML (not just PINNs) will be very impactful for computationally heavy science and engineering simulations. Nvidia and Ansys already have "AI" acceleration for their sims.

https://developer.nvidia.com/modulus

https://www.ansys.com/ai


I was a grad student in an ab initio quantum chemistry group about a decade and a half ago. I was working on using DFT with correction from various post-Hartree-Fock methods for long-range correlation - it worked okay, but it was clear that it would never scale up to large non-crystalline molecules. DFT did somewhat better on solid-state systems. The scaling issue really killed my motivation to work on the field, and led me to taking a master's degree and leaving early. So it's been fascinating to hear about deep learning approaches to computational chemistry recently - almost like the revenge of the molecular mechanics models, which our group disdained a little but was also by far the most-used feature of the software package for which we wrote our codes.


> In my own work on aluminium alloys I was able to get the same simulations that would have needed hours on the supercomputer to run in seconds on a laptop.

Could you elaborate on this further? How exactly were the simulations sped up? From what I could understand, were the ML models able to effectively approximate the Schrodinger's equation for larger systems?


What you do is you compute a lot of simulations with the expensive method. Then you train using neural neural networks (well any regression method you like).

Then you can use the trained method on new arbitrary structures. If you've done everything right you get good, or good enough results, but much much faster.

At a high level It's the same pipeline as in all ML. But some aspects are different, e.g. unlike image recognition you can generate training data on the fly by running more DFT simulations


That's pretty cool! It seems like most of ML is just creating a higher dimensional representation of the problem space during training and then exploring that during inference.

I suppose your process would be using ML to get pointed in the "right direction" and then confirming the models theories using the expensive method?


Yeah exactly like this. It is a subtle art of validating in small scale a method you would later use at large scale.


ibh i didn't understand most of that but sounds exciting.


We want to do computer experiments instead of real life experiments to discover or improve chemicals and materials. The current way of doing computer experiments is really really slow and takes a lot of computers. We now have much faster ways of doing the same computer experiments by first doing it the slow way a bunch of time to train an machine learning model. Then, with the trained model, we can do the same simulations but way way faster. Along the way there are tons of technical challenges that don't show up in LLMs or Visual machine learning.

If there is anything unclear you're interested in just let know. In my heart I feel I'm still just a McDonald's fry cook and feel like none of this is as scary as it might seem :)


I'm just a touch disappointed that this thread is still dominated by neural-network methods, often that apply similar architectures as LLMs to other domains such as vision transformers.

I'd like to see something about other ML methods such as SVM, XGBoost, etc.


featup




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: