I'd agree if OpenAI seemed any good at building apps. They're a frontier AI lab and not operating like a product company. A lot of great and interesting things come from that, but not refined products.
Their MacOS app really sucks. For months now, it's been eating up 100% CPU with some zombie process occasionally, their helpful global shortcut layover often doesn't autofocus, and their UX has never felt like it gets a lot of attention.
They lack user obsession. Altman talks a lot about going above and beyond RLHFing to get the tone just right. But it's never felt right to me.
Like any LLM benchmark, LMArena is highly flawed. I do think it has a right to exist. For me anecdotally it has been indicative of which LLMs style I like best, not necessarily its factual accuracy. It hasn't however been a very useful tool to find the best LLM for a given job.
To the article's point though, it's treated as the gold standard, which it isn't. We should have learned that with the sycophancy-gate.
I'm not sure if the methodology here really is sound for the question at hand. It's a bit like saying, oh prediction markets don't work because 40% of people that voted were wrong.
You can't really get around running your own benchmarks for the job at hand, if you really want to get 95th-percentile performance on a task.
E-commerce has a massive climate impact, that's why at Vaayu we're focused on helping retailers lower their carbon emissions. We look at all aspects of our clients' businesses and build automated, high fidelity models to estimate their environmental impact in real-time. With these models, we show our customers a clear pathway to reducing their emissions.
After having raised $1.6M pre-seed funding round in July this year, we recently celebrated our first anniversary. Our international team of 17 people is distributed across Europe, we all work remotely with a flexible schedule. Our clients include billion-dollar companies from the e-commerce/retail industry.
We are looking for a senior backend engineer to build services from scratch and take full ownership of their work. We value environmentally conscious and entrepreneurial minded people, who always put the customer first.
Our current stack includes: Go, JavaScript, Python, Vue.js, D3.js, PostgreSQL, Redis, Neo4j, Linux, Google Cloud, Kubernetes, Docker and Terraform, but we are also open to other technologies.
If you're interested send me an email through luca at company name dot tech.
Their MacOS app really sucks. For months now, it's been eating up 100% CPU with some zombie process occasionally, their helpful global shortcut layover often doesn't autofocus, and their UX has never felt like it gets a lot of attention.
They lack user obsession. Altman talks a lot about going above and beyond RLHFing to get the tone just right. But it's never felt right to me.
reply