Yes it's pretty cool. There was a neat comparison of deep learning development that I think resonates quite well here.
Around 5 years ago, it took a lambda user some pretty significant hardware, software and time (around a full night), to try to create a short deepfake. Now, you don't need any fancy hardware and you can have some decent results within 5 min on your average computer.
Virtually every announcement of a new model release has some sort of table or graph matching it up against a bunch of other models on various benchmarks, and they're always selected in such a way that the newly-released model dominates along several axes.
It turns interpreting the results into an exercise in detecting which models and benchmarks were omitted.
It would make sense, wouldn't it? Just as we've seen rising fuel efficiency, safety, dependability, etc. over the lifecycle of a particular car model.
The different teams are learning from each other and pushing boundaries; there's virtually no reason for any of the teams to release a model or product that is somehow inferior to a prior one (unless it had some secondary attribute such as requiring lower end hardware).
We're simply not seeing the ones that came up short; we don't even see the ones where it fell short of current benchmarks because they're not worth releasing to the public.
Sibling comment made a good point about benchmarks not being a great indiactor of real world quality. Every time something scores near GPT-4 on benchmarks, I try it out and it ends up being less reliable than GPT-3 within a few minutes of usage.
That's totally fine, but benchmarks are like standardized tests like the SAT. They measure something and it totally makes sense that each release bests the prior in the context of these benchmarks.
It may even be the case that in measuring against the benchmarks, these product teams sacrifice some real world performance (just as a student that only studies for the SAT might sacrifice some real world skills).
That's a valid theory, a priori, but if you actually follow up you'll find that the vast majority of these benchmark results don't end up matching anyone's subjective experience with the models. The churn at the top is not nearly as fast as the press releases make it out to be.
Subjective experience is not a benchmark that you can measure success against. Also, of course new models are better on some set of benchmarks. Why would someone bother releasing a "new" model that is inferior to old ones? (Aside from attributes like more preferable licensing).
This is completely normal, the opposite would be strange.
Thanks for confirming you could see this - this is the first time it has happened, though, as previously, I would ask colleagues to check if they could see my post, but they couldn't.
Hi, HN community. I hope that whatever project you are working on is going well.
I have been unable to post any links every since the creation of this account.
I was told that maybe my account had been flagged. Upon looking at deleting and re-creating my account, I read here that there were better ideas than this one.
Can somebody tell me what they think might happen and the best course of action moving forward? Thanks
I can't reply to you elsewhere, and you don't have contact details in your profile page, but I suspect you previously haven't been able to submit links because you are consistently posting dev.to links, and I suspect dev.to is on the "Dead On Arrival" list.
I wasn't aware Apple had partnered with Goldamn Sachs from the first place - weird that it did not work considering both are titans in their respective industries. Beyond what was provided in the article, can someone provide more clarification for why it did not work? I understand the delayed payments can be frustrating but what I can't figure out is why these gigantic firms were not able to make it work - surely they have the financial power to overcome any burden.
I'm not in the industry (obvi lol) so would appreciate any enlightenment on this. :)
Goldman Sachs is first and foremost a trading bank.
Consumer Banking is not a field they are experienced in. They launched their consumer banking division only in 2016, hoping to create an additional revenue stream. It didn't grow as expected, so ~2018 they obviously pivoted to the idea to grow this business by working with companies like Apple and GM (so bluntly put: not focusing on customer service but on corporate negotiations)
So now after 5 years reality seems to set in. First for Goldman Sachs that it's not that simple to do a lean-scaling Consumer product, and second for Apple and GM that they were hoping to outsource this complexity to a experienced player which didn't turn out as expected...
All that excluding the fact that the Apple Card was never really a natural fit in Apple's digital-service core portfolio anyway...
Around 5 years ago, it took a lambda user some pretty significant hardware, software and time (around a full night), to try to create a short deepfake. Now, you don't need any fancy hardware and you can have some decent results within 5 min on your average computer.