I dunno, every software product I use in the US has been getting worse over time. Even if Europe is doing nothing, at least they aren’t accelerating in the negative direction.
Of all the software companies, Microsoft really makes me feel the ick every time I have to interact with their suite. I started using again teams after months and I didn't remember it was so bad, it's like a reverse course on ux
> Which shows that "massive scaling", even enormous, gigantic scaling, doesn't improve intelligence one bit; it improves scope, maybe, or flexibility, or coverage, or something, but not "intelligence".
Do you have any data to support 1. That grok is not more intelligent than previous models (you gave one anecdotal datapoint), and 2. That it was trained on more data than other models like o1 and Claude-3.5 Sonnet?
All datapoints I have support the opposite: scaling actually increases intelligence of models. (agreed, calling this "intelligence" might be a stretch, but alternative definitions like "scope, maybe, or flexibility, or coverage, or something" seem to me like beating around the bush to avoid saying that machines have intelligence)
Check out the technical report of Llama 3 for instance, with nice figures on how scaling up model training does increase performance on intelligence tests (might as well call that intelligence): https://arxiv.org/abs/2407.21783
Many seem to fear the brutality of these methods.
But could massive institutions be reformed continuously by fixing every problem carefully à la Chesterton's Fence ? I think not: admistrations only grow like a cancer until they are such a complicated knot that no one can disentangle them.
See Europe's fossilization, accelerated by the fact that we have put an administration of bureaucrats over countries to add one layer of rules.
I'd love a DOGE in Eurppe because our administrations can only be cut brutally, like the Gordian knot. That may destroy some good things, but I'll happily pay that to get rid of the bad things.
>But could massive institutions be reformed continuously by fixing every problem carefully à la Chesterton's Fence ? I think not:
I do think so. But we need proper leaders voted in first. We've been quite horrible at that, to be frank. But I guess it represents us: a political gridlock too busy fighting each other (often o petty issues) to proper come together and identify what's actually wrong in our lives.
> See Europe's fossilization, accelerated by the fact that we have put an administration of bureaucrats over countries to add one layer of rules
you mean the institution where a bad fall doesn't put you in debt? That mandate 30 days vacation a year? that doesn't just roll over and let technocrats steal and sell your data without consent?
Yea, caring for one another is such a fossilized concept.
>I'd love a DOGE in Eurppe because our administrations can only be cut brutally, like the Gordian knot. That may destroy some good things, but I'll happily pay that to get rid of the bad things.
I'll put this in your language: you're esssentially trying to make a proposal to a business to shut down its main product while your engineering team takes time to make a new, efficient product that guarantees a 10x ROI.
Now, maybe there's a chance you're right. But there's no way in high hell any business would ever agree to that. You may be willing to pay, but you're not the only one paying. That reboot will lose shareholder value, piss off and permanently turn off customers, and may even make you vulnerable to security breaches. It makes no sense.
Meanwhile, we can simply revisit our lovely fence and start things off the right way. You can make your own module and show how efficient it is before deprecating the old one. and you repeat that until you fully optimized the legacy code. There was no shutdown, you worked incrementally to prove your changes worked, and satisfied customers along the way. That's all I'm suggesting for such an "efficiency" process here.
> Yea, caring for one another is such a fossilized concept.
Do not conflate "being efficient" that I was suggesting, and "being mean" that I never suggested. I don't think that social security is a bad concept. And anyway we'll probably all need UBI when AIs are more efficient than us at everything.
Btw, conflating these two concepts of "efficient" and "mean" is one of the roots of Europe's fall.
> You can make your own module and show how efficient it is before deprecating the old one.
Are you suggesting doing A/B testing with laws? Maybe you were trying again to "put this in [my] language", but your proposal sounds delusional. A country's not a product, you can't have two parallel versions.
Then you'll get similar parasites.
Good luck with no healthcare on upcoming natural disasters.
Good luck with the avian flu and measles.
When corpos rule everything, y'all pray.
Hi all! Aymeric (m-ric) here, maintainer of smolagents and part of the team who built this. Happy to see this interesting people here!
Few points:
- open Deep Research is not a production app, but it could easily be productionized (would need to be faster + good UX).
- As the GAIA score of 55% (not 54%, that would be lame) says, it's not far from the Deep Research score of 67%. It's also not there yet: I think the main point of progress is to improve web browsing. We're working on integrating vision models (for now we've used a text browser developed by the Microsofit autogen team, congrats to them) because it's probably the best way to really interact with webpages.
- Open Deep Research is built on smolagents, a library that we're building, for which the core is having agents that write their actions (tool calls) in code snippets instead of the unpractical JSON blobs + parsing that everyone incl OpenAI and Anthropic use for their agentic/tool-calling APIs. Don't hesitate to go try out the lib and drop issues/PRs!
- smolagents does code execution, which means "danger for your machine" if ran locally. We've railguardeed that a bit with our custom python interpreter, but it will never be 100% safe, so we're enabling remote execution with E2B and soon Docker.
> smolagents does code execution, which means "danger for your machine" if ran locally. We've railguardeed that a bit with our custom python interpreter, but it will never be 100% safe, so we're enabling remote execution with E2B and soon Docker.
Those remote interfaces may also work with local VMs for isolation.
Yeah, that's what I was thinking: just throw the whole lot inside a Docker container and call it a day. Unless you're dealing with potentially malicious code that could break out of a container, that should isolate the rest of your machine sufficiently.
Alternatively, PyPy is actually fully sandboxable.
Great work on this, Aymeric and team! In terms of improving browsing and/or data sources, do you think it might be worth integrating things like Google Scholar search capability to increase the depth of some of the research that can be done?
It's something I'd be happy to explore a bit if it's of interest.
> for now we've used a text browser developed by the Microsofit autogen team, congrats to them
oh super cool! i've usually heard it the other way - people develop LLM-friendly web scrapers. i wrote one for myself, and for others there's firecrawl and expand.ai. a full "text browser" (i guess with rendering?) run locally seems like a better solution for local agents.
I think using vision models for browsing is the wrong approach. It is the same as using OCR for scanning PDFs. The underlying text is already in digital form. So it would make more sense to establish a standard similar to meta-tags that enable the agentic web.
If you're working from the markup rather than the appearance of the page, you're probably increasing the incentives for metacrap, "invisible text spam" and similar tactics.
PDFs are more akin to SVG than to a Word document, and the text is often very far from “available”. OCR can be the only way to reconstruct the document as it appears on screen.
If anyone's familiar with Christianism they will be also familiar with Pharisians, mentioned probably mentioned more frequently in the New Testament than the old (Jesus often recused their ways)
I was raised Christian and yes, the Pharisees were not just taught about, but a subject of focus.
This makes the modern American strain of Christianity all the more puzzling to me, with how it in many ways shares more with the Pharisees than it does with the religion's namesake, but that’s a topic for a different post.
Ehh ... it's indisputable that in $CURRENTYEAR that there are a lot of people whose only experience with Christianity is "things people said on the Internet".
If many random readers won't understand a reference to "Pharisee", and people trying to make a point stop using it as a result, then even fewer Internet-educated readers will get the reference.
Hard disagree. His thoughts are so rich and varied that it's harsh to classify them under "blogs for wealthy people". He speaks about death, self worth, many other things that speak to anyone.
I myself said that de Montaigne is pretty good stuff as this sort of thing goes.
But the kind of agency attached to being quasi-Royal wealthy in the mid-sixteenth century France is not terribly useful to anyone under crushing debt peonage then, nor it’s resurgent beginning comeback now.
For truly catholic stoicism there are better sources. If I want to hear someone talk about inner will from atop a mountain I’ll go all the way back to Marcus Aurelius.
It’s good to see that Randian Objectivists are diversifying out of such a shitty brand, but it’s all boomers and their bootstraps to me, and I’ve read fucking ALL of it. Twice.
Why would you want Notre Dame to represent modern condition ?
I'm very happy it was not the way you suggest, because that would have been a high risk of getting a ugly/huge/provocative addition by a modern fame-craving architect, à la Le Corbusier or Jean Nouvel.
This is a 1000 years old church that has hosted France's history, not a company office in Manhattan.
There are many catacombs in Paris. Nobody knows how deep the system goes. They were there before Haussmann, before the Normans, before the Goths, before the Romans, perhaps even before the Gauls. Your idea of what is French, however, is not what is buried underneath in the collective unconsciousness of the city, what supports it, but instead the reproduction of what people want to believe represents the city. But the Notre Dame no longer exists, it was a symbol of the power of a Medieval state. Paris today is a city of malls, broad avenues and apartments. We kid ourselves to believe that putting it back will "continue" its history: the history is buried under the immediately visible surface, its waiting for you to get lost in it. If we could build something that might unearth that history, perhaps it might be possible to begin to remake the city of Paris once again.
Then it does not really meet its goal, if it wants to convey the idea that "the true identity is hidden underground" by referring to something that isn't the true identity.
I read somewhere that the catacombs were the result of needing cemetery land for building on.
What an effort that must have been! Convincing people that their ancestors could be extracted from the ground and stored as skeletons in caves underground. What a massive amount of land that must have freed up!
Def not what really happened, since the arrangement down there more closely resembles macabre interior decorations than any reasonable funerary arrangements. The catacombs are the result of Paris being built on top of numerous underground tunnels that have existed at least since the Roman era but were massively expanded in medieval times.
That would have cost probably even more than restoring it. It might have been extremely criticized by the public, as a funding grave, a bottomless pit swallowing a lot of funding for foreseeable future. The grandest these days is ... very grand. So grand, that there is no other building worldwide achieving it yet. We have learned a lot and would be capable of a lot, if there was a need to.
an argument could be made that this was a once in a few generations chance to do something different, or at least change/add to the existing design. another argument is that the building saw many changes from ~1180 to today, so change was actually part of the history of the building.
If there's one reason the French can't compete in the modern economy its because they are so painfully conservative in their culture. Elon Musk might be a cook with reactionary tendencies but he is a far more imaginary cook than anything the French can come up with. Did it really all end in 1848?
What ended in 1848? I don’t really get the relationship between these two unrelated things, you’re free to use whatever architectural style you want when building new houses and office buildings in France as far as I can tell.
By that standard other Europeans are even more conservative.. Germans rebuilt the Dresden Cathedral and palaces despite them being a complete ruin for decades. The Polish rebuilt Warsaw, Gdansk and I assume other cities almost entirely from scratch pretty much as replicas of their pre-war state.
reply