Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Unlocking the power of time-series data with multimodal models (research.google)
131 points by alach11 on Dec 2, 2024 | hide | past | favorite | 26 comments


To me, this basically says "LLMs aren't pre-trained on enough 1D timeseries data" - there's a classic technique in time series analysis where you just do a wavelet or FFT on the time series and feed it into a convnet as an image, leveraging the massive pre-training on, e.g. ImageNet. This "shouldn't" be the best way to do it, since a giant network should learn a better internal representation than something static like FFT or a wavelet transform. But there's no 1D equivalent of ImageNet so it still often works better than a 1D ConvNet trained from scratch.

Same applies here. An LLM trained on tons of time series should be able to create its own internal representation that's much more effective than looking at a static plot, since plots can't represent patterns at all scales (indeed, a human plotting to explore data will zoom in, zoom out, transform the timeseries, etc.). But since LLMs don't have enough 1D timeseries pretraining, the plot-as-image technique leverages the massive amount of image pre-training.


For training AI MLPs to predict time-series data that's known to have sinusoidal behaviors (which might lead to 'reasoning' like it did in LLMs) I bet it's more efficient to first curve-fit the data onto continuous data points, and then convert to frequency domain (like you said, FFT), and then do all the training using just "Frequency Domain" datasets. So then the way the AI would "predict" (run inference) would be by spitting out Frequency Domain predictions, which have to be converted back to 'time domain' to get the 'real output'.

I'm sure the audio-processing AI systems out there are doing something like this already so it would be interesting to try to leverage that stuff by sending it "audio" that's actually just arbitrary time-series data rather than PCM of sound waves.


There are some recent foundation models pre-trained on time-series data. For example TimesFM from Google. Of course, it's not directly built for classification, and it's meant for univariate datasets, so it would take some work to adapt it to these problem domains.


It kind of feels criminal to do time-series analysis with multimodel models and not use any traditional numerical models to provide a baseline result. It's an interesting result though.


They mention using a IMU dataset that is collected using an APDM Opal. https://www.apdm.com/wp-content/uploads/2015/05/Opal-Publica... This publication mentions a paper on p. 5839 (p 13 of the pdf) where a single sensor on the waist (as used in the Google research) would lead to an f1 score of 0.77 if I did my math correctly. In other words, pretty close to a >1 shot plot analysis of gpt4o and gemini pro1.5.

I would also be interested how the llm's would hold up to the free-fall interrupt that's built in to some consumer grade IMU's (BMA253 for instance), anyone here with experience in this usecase?


I don't want to sound too dismissive of someone's hard work but I was kind of hoping for something more sophisticated than showing an LLM the image of a plot. Using the article's example, I would be interested in understanding causes (or even just correlations) of near falls - is it old people, or people who didn't take their vitamins, or people who recently had an illness, etc.? What's the best way of discovering these that isn't me slicing the data by X and looking at the plot.


The fact that you can show an LLM an image of a plot and it'll give you a good-enough-ish classification I think is the interesting part. It really is just prompt engineering all the way down...


Encoding an image (especially a simple x/y plot of a line) vs encoding the numbers of a timeseries end up looking quite similar.


There is a surprisingly common use case for "quick and dirty univariate time series forecasts" that are basically equivalent to giving a small child a pencil, and asking them to draw out the trendline. The now-deprecated Prophet model from Facebook (which was just some GAM) was often used for this. Auto-ARIMA models, ETS etc are also still really commonly used. I also see people try to use boosted trees, or deep learning stuff like DeepAR or N-BEATS etc even though it's rarely appropriate for their 1k-datapoint univariate time series, just because it gives off the impression of serious methodological work.

There are a lot of use cases in business were what's needed is just some basic reasonable-ish forecast. I actually think this new model is really neat because it completely dispenses with the pretense that we're doing some really serious and methodologically-backed thing, and we're really just looking a basic curve fit that seems pretty reasonable with human intuition.


This is not curve fitting or forecasting: it's pattern matching.

It's also a serious methodological approach. A fall on a sensor graph has a certain look to it just like an abnormality on an EKG that a human can detect. You can train multimodal models to detect these too with decent accuracy. What's methodologically unsound about that? If anything, it demonstrates you don't necessarily need a class of hyper-specific models to do pattern matching.


> basically equivalent to giving a small child a pencil

this is false, both superficially and at deeper levels. It is a harmful anti-pattern to repeat this analogy.


This is really neat. I imagine this will be an entryway for LLMs to creep into more classic data science / ML workloads.


IMO if doing this, you should avoid text in the charts entirely (as the title can sometimes I think lead the models astray, such as the clustering title I think will bias it to find clusters even if none exist). Presuming you are the one making the chart and not just prompting with another image.

I believe the text in the image will be more prone to misinterpretation that direct text in the prompt anyway, https://andrewpwheeler.com/2024/07/16/using-genai-to-describ...


Kelly et al took similar approach to trading. The idea was that human traders looked at charts on the screen and “intuitively” made trading decisions.

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3756587


I've also heard of converting stock data into sound, to try to listen to it as music so you can sort of intuitively use the audio part of your brain to predict where the stock market will go next. It's such an obvious idea I'm sure some large investment institutions have tried this. But I bet it failed, because music tends to lock into certain notes, and jump octaves in ways that markets definitely do not!


Logically it wont be long until we all have our own micromodels instead of hedge fund managers. Trained on random factors that have seemingly no relation to anything at all, but the correlation absurdly strong from the market. With enough data collected and compute getting cheap enough such a model is certainly possible. I bet us in the peasant class won’t get to leverage it when it comes out of course.


I agree. AI is going to be (or already is) able to not only predict markets, but also uncover and plan strategies to MANIPULATE markets as well, thru both legal and illegal means.


Here's an example: https://rnsaffn.com/cot4/

That's large-scale prediction of trader positioning changes in most of the big commodity futures and options markets.


Perhaps, Renaissance has been doing this all along. Just that they had data and compute in times (80s and 90s) when most have not heard of these things.

In the end, one needs a small edge and thousands of low correlated trades to take advantage of LLN.


This conjures up an image in my mind of a million monkeys with headphones on, smashing at Blooomberg terminals.


If you had a million monkeys commanding substantial enough funds it doesn’t matter what you model as the market will react to your moves. Which you can then anticipate and profit further from. Show them an image of a rotting banana, they all panic sell, then the puts your orangutans bought and the calls your capuchins sold will be looking pretty.


How dare you insult monkeys. :0 lol. They have better short term memories than humans, per 2007 study by Tetsuro Matsuzawa and colleagues at the Primate Research Institute at Kyoto University in Japan.


Monkeys aren't (yet) interested in spending a lot of money on Brioni suits.


Not sure how much or if at all anything valuable was unlocked. Given this amount of paid talent and this amount people involved, surely the amount being unlocked should be proportional, was it?


Has anyone seen an example of time series analysis via transfer learning / fine-tuning an LLM to process and predict multivariate data as xml or something? e.g. : <speed 45> <speed 46> <heading 123> <speed 47> <speed 47> ...etc


my thought was that image-as-input is maybe overkill.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: