Well not as strong as the best steels, but still stronger than many common steels. Even some less special bronze alloys can beat common steels in strength.
Search engines are still required for me. LLM's still get lots of very important things wrong.
Last night, I asked Claude 3.7 Sonnet to obtain historical gold prices in AUD and the ASX200 TR index values and plot the ratio of them, it got all of the tickers wrong - I had to google (it then got a bunch of other stuff wrong in the code).
Also yesterday, I was preparing a brief summary of forecasting metrics/measures for a stakeholder and it incorrectly described the properties of SMAPE (easily validated by checking Wikipedia).
I constantly have issues with my direct reports writing code using LLM's. They constantly hallucinate things for some of the SDK's we use.
Asking for a list of companies in a specific sector also gives you made up tickers, or at best a list it found on some blog.
Was a bit more useful at questions like "Rank these stocks by exposure to the Chinese market", as you can prioritise your own research but in the end you just have to go through the individual company filings yourself.
> Personally, when I want to get a sense of capability improvements in the future, I'm going to be looking almost exclusively at benchmarks like Claude Plays Pokemon.
Definitely interested to see how the best models from Anthropics competitors do at this.,
I always thought it was interesting that my modern CPU takes ages to plot 100,000 or so points in R or Python (ggplot2, seaborn, plotnine, etc) and yet somehow my 486DX 50Mhz could pump out all those pixels to play Doom interactively and smoothly.
This SO thread [1] analyses how much time ggplot spends on various tasks. Not sure if a better GPU integration to produce the visual output would help speed it up significantly.
Nobody cares about optimization for relatively big datasets like million points, maybe it's not a very popular use case. Even libraries that do able to render these datasets, do that incorrectly e.g. skip peaks, show black rectangles instead of showing internal distribution of noisy data, etc.
I ended up with writing my own tool that's able to show millions of points and never looked back.
reply