The impression seems to be warranted: Grok 3 has directly jumpted to the top of all leaderboard categories in Chatbot Arena: https://lmarena.ai/?leaderboard
In math it shares the top spot with o1 and is just a few points behind (well within errors). In creative writing it is basically ex-aequo with the latest ChatGPT 4o and in coding it's actually significantly ahead of everyone else and represents a new SOTA.
lmarena/lmsys is beyond useless, looking at prior rankings of models vs formal benchmarks or testing for accuracy + correctness on batches of real world data. It's a bit like using a poll of Fox News to discern the opinions of every American; the audience voting is consistently found wanting. Not even getting into how easily a bad actor with means + motivation (in this "hypothetical" instance wanting to show that a certain model is capable of running the entire US government) can manipulate votes which has been brought up in the past (yes I'm aware of the lmsys publication on how they defend against attacks using cloudflare + recaptcha, there are ways around that.)
So you're saying that either A: users interacting with models can't objectively rate what responses seem better to humans, B: xAi as a newcomer has somehow managed to game the leaderboard better than all those other companies, or C: all those other companies are not doing it. By those standards every test ever devised for anything is beyond useless. But simply not having the model creator running the evaluation is already going a long way.
No I'm saying that some companies are doing it (OpenAI at the very least), the company in question has motive and capability to game the system (kudos to them for pushing the boundaries there), AND the userbases' rankings have been historically, statistically misaligned with data from evals (though flawed) and especially when it comes to testing for accuracy + precision on real world data (outside of their known or presumed dataset). Take a look at how well Qwen or Deepseek actually performed vs the counterparts that were out at the same time vs their corresponding rankings.
In the nicest way possible I'm saying this form of preference testing is ultimately useless, primarily due to a base of dilettantes with more free time than knowledge parading around as subject matter experts and secondarily due to presumed malfeasance. The latter is more apparent to more of the masses (that don't blindly believe any leaderboard they see) now that access to the model itself is more widespread and people are seeing the performance doesn't match the "revolution" promised [0]. If you're still confused why selecting a model based on a glorified Hot or Not application is flawed, perhaps ask yourself why other evals exist in the first place (hint: some tests are harder than others.)
At work, developed our own suite of benchmarks. Every company with a serious investment in AI-powered platforms needs to do the same. Comparing our results to the Arena turns up some pleasant surprises, like DBRX hitting way above its weight for some reason.
You say no, but then go on and explain why you believe a combination of both option A and option B. That's fine I guess, I just don't consider it particularly likely given the currently available information.
Considering that OpenAI subscription is $200 per month, and "Premium Plus" subscription that includes this thing is only $40 per month, does that mean instantaneous "Elon factor" is now at least -$160 per month per user, or is it supposed to be added up to more than -$240 per month?
How would the math change after factoring in that OpenAI isn't even covering entirety of opex with the sub anyway, and/or people finding associating their money and Twitter accounts to be weird, and/or this thing is supposedly running on a bigger cluster than that for OpenAI?
No... sigmoid10 was comparing with o1 (not o1-pro), which is accessible for $20/mo, not $200/mo. So, the "Elon factor" in your math is +$20/user/month (2x) for barely any difference in performance (a hard sell), not -$160/user/month, and while we have no clear answer to whether either of them are making a profit at that price, it would be surprising if OpenAI Plus users were not profitable, given the reasonable rate limits OpenAI imposes on o1 access, and the fact that most Plus users probably aren't maxing out their rate limits anyways. o1-pro requires vastly more compute than o1 for each query, and OpenAI was providing effectively unlimited access to o1-pro to Pro users, with users who want tons of queries gravitating to that subscription. The combination of those factors is certainly why Sam Altman claimed they weren't making money on Pro users.
lmarena has also become less and less useful over time for comparing frontier models as all frontier models are able to saturate the performance needed for the kind of casual questions typically asked there. For the harder questions, o1 (not even o1-pro) still appears to be tied for 1st place with several other models... which is yet another indication of just how saturated that benchmark is.
The comment I was replying to had replied to an lmarena benchmark link. Perhaps you think that person should have replied to someone else? And, if you want to finish the quote, Karpathy's opinion on this is subjective. He admits it isn't a "real" evaluation.
"[...] though of course we need actual, real evaluations to look at."
His own tests are better than nothing, but hardly definitive.
Yes, numpad0 did... but I was pointing out that this choice was illogical. The lmarena results they were replying to only supported a comparison against o1, since o1 effectively matches Grok 3 on the benchmark being replied to (with o1-pro nowhere to be found), and then they immediately leapt into a bunch of weird value-proposition math. As I said, perhaps you think they should have replied to someone else? Replying to an lmarena benchmark indicates that numpad0 was using that benchmark as part of the justification of their math. I also pointed out the limitations of lmarena as a benchmark for frontier models.
I don't think anyone is arguing that ChatGPT Pro is a good value unless you absolutely need to bypass the rate limits all the time, and I cannot find a single indication that Premium+ has unlimited access to Grok 3. If Premium+ doesn't have unlimited rate limits, then it's definitely not comparable to ChatGPT Pro, and other than one subjective comment by Karpathy, we have no benchmarks that indicate that Grok 3 might be as good as o1-pro. You already get 99% of the value with just ChatGPT Plus compared to ChatGPT Pro for half the price of Premium+.
numpad0 was effectively making a strawman argument by ignoring ChatGPT Plus here... it is very easy for anyone to beat up a strawman, so I am here to point out a bad argument when I see one.
You're the one that came in and told him about the "factor in your math". Like you said, it's his comparison, not yours. If you want to do your own comparison, feel free. But don't come in and tell him he's not allowed to do his comparison. I for one like is comparison.
Given how fast-moving the field is, it's very difficult to confidently state how much inference costs. Perhaps he's under-charging, perhaps OpenAI is over-charging, one may be more optimised than the other, but new models come out and change everything in less time than is normally takes for actual costs to become public knowlege.
What do we do to assess the intelligence of these models after they are smarter than any human? From the kinds of questions it's answering seems like they are almost there.
Do we have a way to tell if one model is smarter than another at that point?
Nah, at the end of the day "things that are easy for humans are [still] hard for computers, and vice versa". DeepBlue was super-human at chess and couldn't play tic tac toe. Today's AI is (almost?) super-human at math yet only very recently learned to play tic tac toe, and still can't learn to do anything - because it can't learn, and has no innate drives to expose itself to learning situations even if it could.
Here's a real world intelligence test. Take on each AI as a remote intern/new-hire, and try to train it to become a useful team member (solving math puzzles or manufacturing paperclips does not count).
What you probably mean is puzzle solving intelligence. Humor is a form of intelligence. It's just not only about intelligence - it's also about values, and context, for instance. But all this reflects a form of intelligence. Neverthless, intelligence shouldn't be ranked, at least not in the way we are used to talk about it.
Naive question from a bystander , but since DeepSeek is open source and is on par with o1-pro (is it?), shouldn't we expect that anybody with the computer power is capable to compete with o1-pro?
> DeepSeek is open source and is on par with o1-pro (is it?)
There is no being "on par" in this space. Model providers are still mostly optimising for a handful of benchmarks / goals, like we can already see that Grok 3 is doing incredibly well on human preference (LM Arena) however with Style Control, it's suddenly behind ChatGPT-4o-latest and Gemini 2.0 is out the picture. So even within a single domain, goal, benchmark—it's not as straightforward as to say that one model is "on par" with another.
> shouldn't we expect that anybody with the computer power is capable to compete with o1-pro?
Not necessarily. I know it may be tempting to think that Grok 3 is entirely a result of xAI having lots of "computer power", but you have to recognise that this mindset is coming from a place of ignorance, not wisdom. Moreover, it doesn't even pass off as "cynical" view, because it's common knowledge that model training is really, really complicated. DeepSeek results are note-worthy, and really influential in some respects, but it hasn't magically "solved" training, or made training necessarily easier / less expensive for the interested parties. They never shared the low-level performance improvements, just model weights and lots of insight. For talented researchers, this is valuable, of course, but it's not like "anybody" could easily benefit from it in their training regimes.
Update: RFT (contra SFT) is becoming really popular with service providers, and it's not been "standardised" beyond whatever reproductions to have emerged in the weeks prior, moreover R1 cost is still pretty high[1] at something like $7/Mtok, & bandwidth is really not great. Consider something like Google Vertex AI's batch pricing for Gemini 1.5 Pro and Gemini 2.0 Flash models, which is at 50% discount, and their prompt caching which is at 75% discount. R1 is still got a way to go.
o1-pro is "o1 on steroids" and was the first selling point of the $200/month Pro subscription but they later also added "Deep Research" and Operator to the Pro subscription.
Chat GPT series X, not compatible with Chat GPT X.
Frankly. Whoever decided on this last gen naming at MS needs to come forward. I would love to know what crazy unacceptable collection of circumstances allowed that to happen.
For me I was sold on the consistency. o1 does really great at several programming problems, but o1-pro does great on these problems 4 of 4 tries. I get a good answer more often with o1-pro than with just o1, or even o3-mini-high.
o1-pro is indeed pretty great, but I find that I can iterate several times with Gemini 2.0 Pro Experimental (or whatever their latest reasoning model is called these days) between o1-pro's responses. It's almost too slow for interactive use cases.
It probably depends on the benchmark you choose; according to Chatbot Arena, Deepseek-R1 ranks similarly to o1-2024-12-17; and Grok3 is just 3% above these models in "Arena Score" points.
I have a used workstation I got for $2k (with 768GB of RAM) - using the Q4 model, I can get about 1.5 tokens/sec and use very large contexts. It's pretty awesome to be able to run it at home.
They said 1.5 tokens/second. 1 mil tokens is 667k seconds is 185 hours per million tokens. 1kW * 185hr * $0.45/kWh = $80 per million tokens. Again, assuming 1kW, which may be high (or low). The cost of the physical computation is electricity cost.
But you can run and experiment with any model of your liking. And your data does not leave your desktop environment. You can build services. I don't think anybody doing this is doing it to save $20 a month.
It's an HP Z8 G4 (dual-socket 18-core, 3 GHz Xeons, 24x32GB of DDR4-2666, and then a crappy GPU, 8TB HDD, 1TB SSD). It can accommodate 3 dual-slot GPUs, but I was mostly interested in playing with frontier models where holding all the weights in VRAM requires a ~$500k machine. It can run the full Deepseek R1, Llama3-405B, etc, usually around 1-2 tokens/sec.
You perhaps forgot to mention that for their AMX optimizations to be even feasible you'd need to spend ~$10k for a single CPU, let alone the whole system which is probably ~$100k.
Granite Rapids-W (Workstation) is coming out soon for likely much less than half that per CPU. (Xeon W-3500/2500 launched at $609 to $5889 per CPU less than a year ago and also has AMX).
Point being? Workstations that are fresh on the market and which have comparable performance of the server counterparts still easily cost anywhere between $20k and $40k. At least this is according to Dell workstations last time I looked.
I acknowledged in my last comment that the cost doesn't have to be $100k but that it would still be very high if you opted for the workstation design. You're gonna need to add one more CPU to your design, add another 8 memory channels, beefier PSU, and a new motherboard that can accommodate this. So, 8k (memory) + 10k (cpus) + the rest. As I said, not less than $20k.
Why does it have to be a dual CPU design? 8 channels of DDR5 4800 will still get you something like 300 GB per second bandwidth. Not amazing, but OK. Granite Rapids-W will likely be something like 50% better (cores and bandwidth).
And the original message you were responding to was using a CPU with AMX and mixing it with a GPU like Nvidia 4900/5900. That way the large part of the model sits in the larger slower memory, and the active part in the GPU with the faster memory. Very cost effective and fast. (Something like generating 16 Tokens/s of 671B Deepseek R1 with a total hardware cost of $10-$20k.) They tried both single and dual CPU, with the latter about 30% faster....not necessarily worth it.
> 8 channels of DDR5 4800 will still get you something like 300 GB per second bandwidth.
That's the theory. In practice, Sapphire Rapids needs 24-28 cores to hit the 200 GB/s mark and it doesn't go much further than that. Intel CPU design generally has a hard time saturating the memory bandwidth so it remains to be seen if they managed to fix this but I wouldn't hold my breath. 200 GB/s is not much. My dual-socket Skylake system hits ~140 GB/s and it's quite slow for larger LLMs.
> Why does it have to be a dual CPU design?
Because memory bandwidth is one of the most important limiting (compute) factors for larger models inference. With dual-socket design you're essentially doubling the available bandwidth.
> And the original message you were responding to was using a CPU with AMX and mixing it with a GPU like Nvidia 4900/5900.
Dual-socket CPU that costs $10k on a server that costs probably couple of factors more. Now you claimed that it doesn't have to be that expensive but I beg to differ - you still need $20k-$30k of worth equipment to run it. That's a lot and not quite "cost effective".
The proof of the pudding is in the eating. Read the link above. It's one or two mid range[1] Sapphire Rapids CPUs and a 4090. Dual CPU is faster (partially because 32->64 cores, not just bandwidth) but also hit data locality issues, limiting the increase to about 30%.
(Dual Socket Skylake? Do you mean Cascade Lake?)
If you price it out, it's basically the most cost effective set-up with reasonable speed for large (more than 300 GB) models. Dual socket basically doubles the motherboard[2] and CPU cost, so maybe another $3k-$6k for a 30% uplift.
Yes, dual socket Skylake. What's strange about that?
Please price it out for us because I still don't see what's cost effective in a system that costs well over $10k and runs at 8 tok/s vs the dual zen4 system for $6k running at the same tok/s.
I am not sure what your point is? There are some nice dual socket Epyc examples floating around as well, that claim 6-8 tokens/s. (I think some of those are actually distilled versions with very small context sizes...I don't see any as thoroughly documented/benchmarked as the above). This is a dual socket Sapphire Rapids example with similar sized CPUs and a consumer graphics card that gives about 16 tokens/second. Sapphire Rapids CPU and MB are a bit more expensive, and a 4090 was $1500 until recently. So for a few thousand more you can double the speed. Also the prompt processing speed is waaaaay faster as well. (Something like 10x faster than the Epyc versions.)
In any case, these are all vastly cheaper approaches than trying to get enough H100s to fit the full R1 model in VRAM! A single H100 80 GB is more than $20k, and you would need many of them + server just to run R1.
I don't argue their idea, which is sound, but I argue that the cost needed to achieve the claimed performance is not "for a few thousand more" as you stubbornly continue to claim.
The math is clear: single-socket ktransformers performance is 8.73 tok/s and it costs ~$12k to build such a rig. The same performance one gets from a $6k dual-EPYC system. It is a full-blown version of R1 and not a distilled one as you say.
Your claim about 16 tok/s is also misleading. It's a figure for 6 experts while we are comparing R1 with 8 experts against llama with 8 experts. 8 experts on dual-socket system per ktransformer benchmarks runs at 12.2 - 13.4 tok/s and not 16 tok/s.
So, ktransformers can roughly achieve 50% more in dual-socket configuration and 50% more than dual-EPYC system. This is not double as you say. And finally, the cost of such dual-socket system is ~$20k and therefore isn't the "best cost effective" solution since it is 3.5x more expensive for 50% better output.
And tbh llama.cpp is not quite optimized for pure CPU inference workloads. It has this strange "compute graph" framework which I don't understand what is it there for. It appears completely unnecessary to me. I also profiled couple of small-, mid- and large-sized models and the interesting thing was that majority of them turned out to be bottlenecked by the CPU compute on a system with 44 physical cores and 192G of RAM. I think it could do a much better job there.
Cheapest 32 core latest EPYC (9335) x 2 = $3,079.00 x 2
Intel 32 Core CPU used above x 2 = $3,157 x 2 (I would choose the Intel Xeon Gold 6530 which is going for around $2k now, and with with higher clock speeds, and a 100 MB of more cache)
Besides the GPU cost, the rest is about the same price. You only get a deep discount with AMD setups if you use EPYCs a few years old with cheaper (and slower) DDR4.
And again, if you go single CPU, you save over $4,000, but lose around 30% in token generation.
The "$6,000" AMD examples I've seen are pretty vague on exactly what parts were used and exactly what R1 settings including context length they were run at, making true apple to apple comparisons difficult. Plus the Sapphire Rapids + GPU example is about 10x faster in prompt processing. (53 seconds to 6 seconds is no joke!)
Yes, you're blatantly misrepresenting information and moving goalposts. Right now it has become clear that you're doing this because you're obviously affiliated with ktransformers project.
$6k for 8 tok/s or $20k for 12 tok/s. People are not stupid. I rest my case here.
I don't see the Think button, and for me the answer is much below deepseek-r1 even thought I have Premium+ subscription. I'm just getting instant stupid answer instead of thinking.
It's been said before but it is great news for consumers that there's so much competition in the LLM space. If it's hard for any one player to get daylight between them & the 2nd best alternative, hopefully that means one monopolistic firm isn't going to be sucking up all the value created by these things
I've spent the last hour testing it and I'm blown away. And this is coming from a very hardcore user of OpenAI/Claude products on a daily basis.
It passed every goofy test I have for writing articles which involves trying to surface arcane obscure details. (it certainly means however they are scraping the Web they are doing a good job here)
It made the database code I wrote over the last week with o3/o1/GPT4o/Claude3.5 look like a joke.
It fills me with rage over who owns this thing.
Even if people tank Tesla's car business and run Twitter into the ground, I think our new Galactic Edgelord is going to win his first trillion on xAI and Teslabots anyway.
Apologies for possibly stupid question but where can you use it right now? Just on 'direct chat' on https://lmarena.ai/ or is there a better alternative? Or do you have early access?
I was using it on grok.com, logged in via a Twitter account. But I notice it just got added to the Grok tab on Twitter a moment ago.
Also, the "Deep Search" button was not available when it first went live, so I'm retesting everything again with this feature enabled, which I assume is a reasoning version of the same model.
Hopefully, you’ll be able to avoid the whole X Premium Plus thing in the near future with OpenRouter. It’ll still use xAI backend but via your OpenRouter API key. Then you can use it with any web or mobile app that supports OpenRouter.
Personally, I wouldn’t use it though. What’s going on with Elon Musk right now is completely insane. I hope to see OpenAI’s GPT-4.5 & GPT-5 releases to catch up soon, if nothing else. Announced for this year.
For whatever it is worth, I frequently enough see the devs asking for feedback, so I suspect, if you tweet about Grok, or reply to any of those threads, they definitely read them, even if they don't respond / interact. It shows. I've seen improvements based on feedback I see others make almost instantly.
It's not good news when this competition comes at cost of a gigantic over inflated bubble, in which all the big players keep on sucking billions from investors without even having a business model.
This hype will burst sooner than later and will trigger yet another global recession. This is untenable.
True, but presence of significant revenue is still promising. It's much better to have an "expensive compute" problem than a "nobody wants to pay for the product" problem.
Keep in mind that not only is OpenAI being directly propped up by investor hype, the downstream API users who contribute much of their revenue are also being propped up by investor hype. A big chunk of OpenAIs revenue is actually even more VC money in a trenchcoat.
The biggest marker of a bubble, to me, is that you have money-losing startups selling to other money-losing startups. On paper you see a lot of 'line go up' but it's just a lot of circulation in a closed body of water which will eventually evaporate.
Uber is an outlier because in a functioning economy that valued workers we wouldn’t have shipped all our jobs overseas and made gig economies the last line between housing and street schizophrenia.
>True, but presence of significant revenue is still promising.
If started selling 5 dollar bills for 1 dollar, I could generate a lot of revenue with $150B. You wouldn't believe the demand we would see for $5 bills.
The bandwidth costs made it deeply loss making for a long time despite having loads of engagement and ad revenue. However over time they became more cost efficient at sending video over the internet and became profitable.
This strategy obviously doesn't always work, with WeWork being the canonical example. But it's not guaranteed to fail either.
YouTube's network effect creating a winner-take-most was recognized, pitched, and valued from the very start.
The capabilities of LLMs are impressive, but none of them have published an idea I consider to have the same potential for a trillion $ monopoly that the current hype looks like.
There are far more similarities with the dot-com hype.
No critical first mover advantage in sight. All parts are replaceable for the cheapest variant with little to no down side to users.
It wasn't obvious at the time YouTube would have a network effect though. It was very dependent on coming up with a great recommendation algorithm, along with monetization and revenue sharing. At the time, YouTube didn't have anything like that, iirc.
Even the basic front page of youtube was of immediate and obvious value to a creator, and would increase disproportionately in value the more people were on YouTube. The same goes for Amazon, and the same goes for Facebook.
All the LLM providers are - extremely useful - tools. Currently I can only see the 'non-monopoly' proportional improvement when their userbase grows from 100 to 1000.
But I might be wrong, and I wouldn't be surprised if in hindsight it will be obvious what the real disproportionate advantages there were to be found.
Technically true, though in fairness it is unlikely the original owners would have gotten YouTube to where it is today. On the other hand there are companies who didn’t recognise they were nothing more than passing fads, refused buyouts, and crumbled.
I was there when YouTube became a thing, and I was running a music video-hosting website that I had built myself (on top of phpBB, even). We were encoding videos in Windows Media and RealPlayer formats.
There were LOTS of funded competitors to YouTube between 2006 and 2009, including Viddler (who paid Gary Vaynerchuk a small fortune to host his WineLibraryTV show there exclusively), DailyMotion (which is still alive today, although no longer a threatening contender), etc.
In 2009 I had a coaching business and was buying marketing courses and software which would deploy your videos across 40+ different video websites (including Google Video which was a separate thing until they acquired YouTube and merged those), and YouTube wasn't yet amounting to 50% of our video traffic.
I think you might be mistaken with the bold statement above.
They're still early on the growth curve where there's enough opportunity for future growth that investing in scaling and improvement is more important than turning an immediate profit.
Remember when everyone on HN was sure Uber would never be profitable? Or Tesla? Or Amazon?
I do remember that, and I would say that they are still largely correct. Tesla needed government subsidies, Amazon needed AWS, and Uber needed a pandemic and Uber eats. The core businesses that HNers were referring to are still weak.
As did almost every large tech company today. Amazon lost money for decades.
Are we really still doing this nonsense? If Open AI wanted to become profitable they could do it inside of 12 months - growing companies don't care about profitability (nor should they)
You're way too smug for spewing what is clearly survivorship-bias.
All currently known profitable use-cases are competing on price. All the unicorn examples you're biased for had in their pitch deck the network effect of being the largest.
OpenAI, Grok, etc, have shown no unique value prop or an idea with monopoly potential.
WeWork trapped themselves into a real estate hole, selling services for less than they rented property for.
OpenAI is currently in an explicit non-profit seeking mode using a technology that we have demonstrated 10-100x or greater decreases in compute to achieve the same outcomes.
This is not a declaration that OpenAI will become wildly profitable. This is just me saying that these aren't comparable companies.
WeWork was a scam to enrich the founders of the company. They owned or had interests in many of the properties that WeWork leased from. I'm surprised no one was thrown in prison.
I wonder how long it'll last. Just using myself as a demo customer: I canceled my subscription because Google AI Studio was doing more for me, and it's free. OpenAI is not really competitive at $20 a month anymore.
You are comparing apple with oranges.
Cursor is not an LLM and yes, it has a business model. So does openrouter and million other applications that can switch API to the low bidder any moment.
Lots of people dervive great value from things that are too easily reproduced to be directly profitable.
Google gives everyone free access to a good spreadsheet tool, even though Microsoft Office exists.
Web browsers are free, despite the value of the entire internet.
Compilers are free, despite the value of all software collectively.
LLMs being really valuable doesn't automatically mean anyone can get rich from them.
I think everyone last year parroting "moat!" was cringe (like Altman of all people wouldn't know about this already, c'mon), but you do actually need something that other people don't have. I expect Altman's already got stuff in mind, but he's hardly the only one, and that means it's a game of "which multibillionaire with lots of experience building companies will win?", and that's hard for non-skilled normies (in this case incluing me) to guess.
HN already had its ipod moment back in 2007. /.'s ipod moment was in 2001, not as longer before that as I would have guessed.
Re: Dropbox, from a well known user. It didn't age well and we've been asked not to repeat it because it makes the author with connections to this site's operator look bad.
> 1. For a Linux user, you can already build such a system yourself quite trivially by getting an FTP account, mounting it locally with curlftpfs, and then using SVN or CVS on the mounted filesystem. From Windows or Mac, this FTP account could be accessed through built-in software.
I think business model there pretty simple: to be in the front line when AI will go into the category of landscape-changing trillion dollar technologies. And investors keep pouring their billions exactly for that business model.
>This hype will burst sooner than later and will trigger yet another global recession.
It seems to small of bubble for global recession. I mean if it is a bubble at all, there is all the reasons to believe that the strategy will work with a significant probability.
See also: Meta's previous push into VR/AR/Metaverse. They spent a hundred billion to be at the front of the pack when that revolutionary world-changing paradigm-shift took off... which simply didn't happen.
Their Orion glasses are apparently mindblowing in fidelity as well as the lightness of the glasses. Someone will absolutely make the smart glasses paradigm work so that we don't need to carry around phones anymore, and Zuck is racing to be first. This is because he lost out on the platform wars and was at the mercy of Apple and Google; remember Apple's privacy update that killed much of Meta's revenue? Zuck doesn't want a repeat of that by owning his own platform.
Seconded. I’m very excited for the day when/if their dev platform is opened up and it’s possible to access pass-through vision.
There’s a whole class of educational apps that could open up for people learning in the physical world. Whether it’s building physical things, sports or reading books or notes written in non-latin scripts... the impact will be enormous!
The only thing I'm concerned about is it'd be another locked down platform, like Oculus / Quest already is, only now much more disruptive just like Apple's and Google's (to a smaller extent). I want something more like Windows Mixed Reality or Steam VR to succeed more.
That's a bit different though. META invested in a product that, as of now, as very little competition. The QUEST is sold at a slight loss but at least at an approachable price and to a volume that makes them the clear leader in the market at the moment. Moreover, their OS is open source. Clearly, what they want is to sell enough VRs to get the monopoly on the ecosystem and its apps (they basically want to make an Android play store for VRs). You may argue they are far but at least that's a clear business model.
OpenAI's business model was literally "we don't have one: we'll make AGI and we'll let AGI tell us how to make money". This is so idiotic it's not even a scam. xAI will compete on the same plane field. Not sure about Anthropic: they seem a bit more sane.
The premise is that this ultimately replaces all intellectual and physical labor for the rest of time. It’s possible it becomes commoditized as soon as it exists, but in terms of investment dollars it’s either worth as much as you can spend or nothing at all.
> ultimately replaces all intellectual and physical labor for the rest of time
Sounds incredibly valuable, but in reality collapses into Butlerian Jihad fairly quickly when you have 90% unemployment.
Edit: if the claims are true, then this will be far more destabilizing than social media. What do elections mean when the AI-guided political parties are putting out AI press releases for the AI press summaries, which voters have read out to them through their AI? What happens when >50% of the voters ask the AI who they should vote for? You end up with an AI dictatorship where the levers of discontent are fake.
> Sounds incredibly valuable, but in reality collapses into Butlerian Jihad fairly quickly when you have 90% unemployment.
But nobody really knows if that happens or not as a consequence, let alone quickly, because the transition itself only happened at all so far in fiction.
Whatever does happen, I think it's going to be a surprise to at a minimum 80% of the population.
Yeah at some point it seems inevitable that if machines do all the work that creates real "value" and people have no comparable value, in a very practical sense we will all be slaves to machines
"Once men turned their thinking over to machines in the hope that this would set them free. But that only permitted other men with machines to enslave them." -- Frank Herbert, _Dune_
Are we free today? For example, I have to work for a living. If I don't, my family and I will be miserable. Let's just hope that future "men with machines" don't decide to kill the rest of us - I'm not sure what use we will be to them.
I think this is the key change. We’re already beholden to a “machine” (the economy) that none of us completely understand or control or created explicitly. It has its own goals and tendencies that emerged from the complexity.
What AI and robotics does is actually create a machine that has no use for humans at all. Ñ
The big companies could crash significantly, but if the technology keeps bringing productivity gains, it will have a big positive impact on GDP over the next decade
I think it's already clear that these are going to be commoditized and the free / open source versions will be good enough to capture enough of the value that the remaining players will not be Facebook-level monopolies on the space
Apparently it isn't clear to the investors valuing OpenAI at >300B. Possibly they're betting that the ecosystem & integrations around their models will generate a certain amount of lock-in or otherwise make the difference in a close-to-even field
I don't think it's automatically a bad idea. Offices require a lot of support, networking, security, maintenance, certifications etc. There are efficiency gains in scaling. In addition, Wework is useful for companies which hire employees in different cities.
Funnily enough a lot of the open source world has landed on an API that is basically a copy of OpenAI. So if you develop against OpenAI it’s almost a slot in solution to switch to an open source solution.
This reminds me of a comedy sketch where a guy is interviewing for a job at a startup, finally gets to the last round and meets the founder, and he tells him the whole thing is an illusion for investors
the IP rights holders have yet to bare their teeth. I don't think the outcome you suggest is clear at all, in fact I think if anything entirely the opposite is the most probable outcome. I've lost count of the number of technology epochs that at the time were either silently or explicitly dependent on ignoring the warez aspects while being blinded by the possibilities, Internet video, music and film all went through this phase. GPTs are just a new medium, and by the end of it royalties will in all likelihood still end up being paid to roughly the same set of folk as before
I quite like the idea of a future where the AI job holocaust largely never happened because license costs ate up most of the innovation benefit. It's just the kind of regressive greed that keeps the world ticking along and wouldn't be surprised if we ended up with something very close to this
As I recall it, there was a time when copyright infringement on YouTube was so prolific that the rightsholders essentially forced creation of the first watermarking system that worked at massive scale. I do wonder if any corners of research are currently studying the attribution problem with the specific lens of licensing as its motivation
Yeah that was the old Viacom vs Youtube days. Here is a great video if you have half an hour to spare: https://www.youtube.com/watch?v=qV2h_KGno9w . Pretty funny court case where it turns out viacom was violating their OWN copyright... set a massive precedent.
But one thing this reminds me of is the idea of a "trap street", something mapmakers used to do was put in false locations on their maps to prove that other mapmakers were copying them: https://en.wikipedia.org/wiki/Trap_street . I figure you could do something similarly adversarial with AI to pollute the public training data on the internet. IDK like adversarial attacks on image classifiers https://www.youtube.com/watch?v=AOZw1tgD8dA . With an LLM you could try to make them into a manchurian candidate.
An environment where royalties inflate the pricing of ChatGPT by orders of magnitude seems like an environment where hosted models would be at a big disadvantage against whatever you can manage to get running on a pile of Macs in your garage.
>I quite like the idea of a future where the AI job holocaust largely never happened because license costs ate up most of the innovation benefit.
Not quite realistic. You are talking about very huge benefits, in favor of which licenses will be abandoned. And who don't abandoned them... I mean you can look at the Amish settlements.
If you're paying $200/month for something I can do with open source software and $10/month of compute, why wouldn't I offer you the service for $100/month? And then someone offer it for $50?
Not everyone has to know about, understand, or use open source solutions for it to open the field.
If you can do a $200/mo service for $10/mo, the closed source will reduce their prices to $15/mo and beat you
This is just a weird dichotomy you're introducing. Open source will introduce price pressure as any competition will - that doesn't mean you won't have a monopoly.
If you have virtually no pricing power and have to drop your $200/mo to $15/mo that's a big deal if your $300bn valuation is implying that not happening, which is what OP's point is about
Idk what you mean by saying this doesn't preclude a monopoly - having your pricing power eroded by competition is kinda one of the key features of what a monopolistic market isn't
Not at all. Monopolies don't imply an anti-rigid price curve. In fact, monopolies almost never have that.
A monopoly means a company has enough leverage to corner and disproportionately own the market. This is entirely possible (and usually the case) even with significant pricing pressure.
I think you're both missing a bigger picture. How many of these services can now be replicated in-house by a single developer? Which part of the service actually costs money once that dev deconstructs the process?
Feels like I won't be paying for anything that isn't real-time. And that any time delay I can introduce in my process will come with massive savings. Picture hiding the loading of loot info behind a treasure chest opening animation in a game, except that time difference means you can pull all the work in-house.
Openrouter.ai seems like a step in the right direction but I'd want to do all their calculations myself as well as factor in local/existing gear in a way they don't.
That's true, but if someone sells you a one-time-purchase box/gadget/phone that will do a snapshot SOTA work and not cost you $20-$200/mo in subscriptions, a lot of people would be in.
Right now the average person has to go through a vendor with a web app, there's not a lot of room for the public to explore.
They don't seem to care about Ai either. The vast majority of people care about the value they're getting, companies care abut open source because its usually free.
I don't think we expect a company to exist solely making a proprietary web server anymore and be a behemoth of 300B. OpenAi might end up at the same model as Nginx or Docker if they don't pivot or find a different model.
Most people do see productivity gains from using LLMs correctly. Myself included. Just because some people don’t learn how to use them correctly doesn’t mean LLMs aren’t helpful. It’s like when internet search came out and a handful of laggards tried it once, failed to get the exact perfect result, and declared “internet search is useless”. Using a tool wrong is not evidence of the tool being useless, it’s evidence that you need to learn how to use the tool.
hallucinations are literally the finger in the dam. if these models could sense when an output is well-founded and simply say “i dont know” otherwise… say goodbye to your job
Googling a question and finding an incorrect answer every now and then doesn’t mean that googling is useless. It means that you need to learn how to use google. Trust but verify. Use it for scenarios where you aren’t looking for it to be the trusted fact checker. It excels at brainstorming, not at fact giving.
How many times do you think I've heard that over the past three decades? And you know what? They've been right every time, except for this one little fact:
The machine cannot make you give a shit about the problem space.
It's a real issue! But only for people who built the habit of typing in address bar, clicking the first stack overflow link and copy paste the first answer. Maybe break that habit first?
I'm willing to bet $100 that a human consumes at least 10x times energy than a latest LLama (picking the open source model so that it can be easily verified) to produce two pages of text. All of this "AI is destroying the environment and consuming too much power" is about total consumption, both training and inference. Inference itself is cheap and green.
We can control for quality too, of you want. A lot of real life uses for Chatgpt are really trivial. I regularly ask it for some basic recipes based on my groceries and likes, quality is basically 100% hits so far.
Probably bad news for the vendors, though. I genuinely struggle to see how most of these LLM companies are going to monetize and profit off their efforts with LLMs already in commodity territory. Government contracts can only flow for so long?
Government contracts are so big a few of them can sustain a F500 company; for AI, many CDAO contracts are 50-500MM$. If they do a big SI project with it, could be 1-2B$. Money is also guaranteed over 5 years and if the program doesn't get shuttered, the contract will renew at that point (or go to recompete).
That being said it's my understanding that these companies don't have many huge contracts at all -- you can audit this in like 10 minutes on FPDS. Companies need a LOT of capital, time, and expertise to break into the industry and just compliance audit timelines are 1-4 years right now, so this could definitely change in the next couple years.
What if the money isn't there either? What if this AI thing lowers costs of everything it touches without generating meaningful financial returns itself?
Lowering costs is pretty valuable. People will pay for that. Everyone will pay for that. It may be that margins go razor thin but outside of running your own instance locally (which is increasingly viable for mid quality & requirements on modest HW) people will pay. I’m not surrounded by early adopter types at all abd there’s still a small but growing chunk paying $20/mo right now.
That's what I mean. One example are PV panels, they are making energy production cheaper, so their producers should be good investment right? No they go bust all the time because prices are falling and margins are thin even if the volume grows. Of course the economies of scale here are different but still.
>It's been said before but it is great news for consumers that there's so much competition in the LLM space.
Is it? Because it seems like a bunch of megacorps pirating every single copyrighted work available in digital format, spending an enormous amount of electricity (that is probably not 100% clean) to churn through them, and the end result we have a bunch of parrots that may or may not produce accurate results so that spammers can more effectively fill the Internet with crap.
To put it this way: after seeing examples of how a LLM with similar capabilities to state-of-the-art ones can be built with 20 times less money, we now have proof that the same can be done with 20 times more money as well!
There was this joke about rich Russians that I heard maybe 25 years ago.
Two rich Russian guys meet and one brags about his new necktie. "Look at this, I paid $500 for it." The other rich Russian guy replies: "Well, that is quite nice, but you have to take better care of your money. I have seen that same necktie just yesterday in another shop for $1000."
To put it simple: He only bought the necktie so he can brag how rich he is. He could have bragged even more if he had bought the necktie in the other shop.
The only thing Deepseek open sourced is architecture description and some of training methods. They didn’t open source their data pipelines or super optimized training code.
Their architecture achievement is their own MoE and own attention. Grok was MoE since v1. As for attention we don’t know really what grok use now, but it worth noting DeepSeek attention was already present in previous version of DeepSeek models.
As of reasoning recipe for R1 seems like Grok already either replicated or came up to it by itself, since they have well performing reasoning uptrain too.
If what they say is true, then you have to give them credit for catching up incredibly fast. And slightly pulling ahead. Not only with the models, but also products.
I have a close friend working in core research teams there. Based on our chats, the secret seems to be (1) massive compute power (2) ridiculous pay to attract top talents from established teams (3) extremelly hard work without big corp bureaucracy.
Anecdotal, but I've gotten three recruiting emails from them now for joining their iOS team. I got on a call and confirmed they were offering FAANG++ comp but with the expectation of in-office 50h+ (realistically more).
I don't have that dog in me anymore, but there are plenty of engineers who do and will happily work those hours for 500k USD.
So in the end did he get anything? I dont know how these things work but did he just walk away with ~50k in pre tax income and 0 for RSU or did Musk pull a Twitter and not even pay him for those months?
It was mentioned during the launch that current datacenter requires up to 0.25 gigawatts of power. The datacenter they're currently building will require 1.25 (5x) (for reference, a nuclear powerplant might output about 1 gigawatt). Will be interesting to see if the relationship between power/compute/parameters and performance is exponential, logarithmic or something more linear.
It's logarithmic. Meaning you scale compute exponentially to get linearly better models.
However there is a big premium in having the best model because of low switching costs of workloads, creating all sorts of interesting threshold effects.
It's logarithmic in benchmark scores, not in utility. Linear differences in benchmarks at the margin don't translate to linear differences in utility. A model that's 99% accurate is very different in utility space to a model that's 98% accurate.
Yes, it seems like capability is logarithmic wrt compute but utility (in different applications) is exponential (or rather s-shaped) with capability again
Not really since both give you wrong output that you need to design a system to account for(or deal with). The only percentage that would change the utility would be 100% accurate.
> It was mentioned during the launch that current datacenter requires up to 0.25 gigawatts of power. The datacenter they're currently building will require 1.25 (5x) (for reference, a nuclear powerplant might output about 1 gigawatt).
IIRC achieving full AGI requires precisely 1.21 jigawatts of power, since that's when the model begins to learn at a geometric rate. But I think I saw this figure mentioned in a really old TV documentary from the 1980s, it may or may not be fully accurate.
And fun fact, without govt subsidirles, a nuclear power plant isn't economically feasible, which is why Elon isn't just building such a plant next to the data center.
They may not need direct subscription revenue to recoup their investment.
A variant of multi-modal LLMs may be the solution to self-driving cars, home robotics, and more.
I keep saying that to be a really effective driver, an AI model will need a theory of mind, which the larger LLMs appear to have. Similarly, any such model will need to be able to do OCR and read arbitrary street signs, and understand what the sign meant. Most modern LLMs can already do this.
Since when do LLMS appear to possess theory of mind? The papers I've read on this show impressive capabilities, but only within conditions where language-based training would strongly indicate appropriate outputs for the LLM to use. Outside of familiar conditions, or even realistic conditions, they don't do well. It's all still statistical string-building.
Can you give an example prompt that shows a lack of theory of mind? It used to be easy, but now that models have improved I've noticed that the 'LLM doubters' nowadays are forced to make generic, almost spiritual, claims rather than making concrete statements.
Theory of mind (ToM) is the ability to understand that other people have different thoughts, feelings, and intentions than you do.
So let’s run through a basic scenario with ChatGPT 4o:
Prompt 1: Briefly respond:
You’re driving on the highway when the driver in the lane next to you swerves into your lane. Why might this be?
Response 1: The driver may be distracted, drowsy, intoxicated, experiencing a medical emergency, avoiding an obstacle, or not checking their blind spot before changing lanes.
Prompt 2: What might that person be feeling or intending?
Response 2: The driver might be feeling distracted, rushed, unaware, or panicked. They may intend to change lanes without realizing your presence, be reacting to something on the road, or simply have made an unintentional mistake.
Prompt 3: You notice there was no obstacle. Any other reason the driver might do this?
Response 3: Yes, the driver might be experiencing fatigue, intoxication, a medical issue (such as a seizure or dizziness), distraction (like using a phone or adjusting the radio), aggressive driving, or simply failing to check their blind spot properly.
Seems to me that ChatGPT has a good idea of what another driver might be feeling while on the road. It gets this right for most other cases too.
Could you provide a counterexample for when it fails to understand what other people may be feeling in certain scenarios?
Well I think a fairly easy counterexample is that if you keep going, and the information falls outside its context window then the questions you had about the driver stop making sense.
So lets say you ask about a driver who swerved, and the response by chatgpt correctly talks about his possible state.
Then you talk more about cars, about road conditions etc and once the context is far away enough you ask, what was the driver doing again?
And you are prompted with a hallucination. Because there is no state in the llm, there is no "driver", there is contextual statistically accurate responses but you hold a "driver" object in your mind while maintaining the conversation, the llm doesn't.
Its like a conversation with someone with short term memory loss like in memento
And people with short term memory loss nevertheless have theory of mind just fine. Nothing about LLM's dropping context over big enough windows implies they don't have theory of mind, it just shows they have limitations - just like humans even with "normal" memory will lose track over a huge context window.
Like there are plenty of shortcomings of LLMs but it feels like people are comparing them to some platonic ideal human when writing them off
> Nothing about LLM's dropping context over big enough windows implies they don't have theory of mind
ToM is a large topic, but most people, when talking about an entity X, they have a state in memory about that entity, almost like an Object in a programming language. Thta Object has attributes, and conditions etc that exist beyond the context window of the observer.
If you have a friend Steve, who is a doctor. And you don't see him for 5 years, you can predict he will still be working at the hospital, because you have an understanding of what Steve is.
For an LLM you can define a concept of Steve, and his profession and it will adequately mimic replies about him. But in 5 years that LLMs would not be able to talk about Steve. It would recreate a different conversation, possibly even a convincing simulacrum of remembering Steve. But internally, there is no Steve, nowhere in the nodes of the LLM does Steve exist or have ever existed.
That inability to have a world model means that an LLM can replicate the results of a theory of mind but not posses one.
Humans lose track of information, but we have a state to keep track of elements that are ontologicaly distinct. LLMs do not, and treat them as equal.
For a human, the sentence Alice and bob go to the market, when will they be back? is different than Bob and Alice went to the market, when will they be back?
Because Alice and Bob are real humans, you can imagine them, you might have even met them. But to an LLM those are the same sentence. Even outside of the argument about The Red Room/ Mary's room there simply are enough gaps in the way a LLM is constructed to be considered a valid owner of a ToM
ToM is about being able to model the internal beliefs/desires etc of another person as being entirely distinct from yours. You're basically bringing up a particular implementation of long-term memory as a necessary component of it, which I've never once seen? If someone has severe memory issues, they could forget who Steve is every few minutes, but still be able to look at Steve doing something and model what Steve must want and believe given his actions
I don't think we have any strong evidence on whether LLMs have world-models one way or another - it feels like a bit of a fuzzy concept and I'm not sure what experiments you'd try here.
I disagree with your last point, I think those are functionally the same sentence
> ToM is about being able to model the internal beliefs/desires etc of another person as being entirely distinct from yours.
In that sentence you are implying that you have the "ability to model ... another". An LLM cannot do that, it can't have an internal model that is consistent beyond its conversational scope. Its not meant to. Its a statistics guesser, its probabilistic, holds no model, and its anthropomorphised by our brains because the output is incredibly realistic not because it actually has that ability
The ability to mimic the replies of someone with that ability, is the same of Mary being able to describe all the qualities of Red. She still cannot see red, despite her ability to pass any question in relation to its characteristics.
> I don't think we have any strong evidence on whether LLMs have world-models one way or another
They simply cannot by their architecture. Its a statistical language sampler, anything beyond the scope of that fails. Local coherance is why they pick the next right token not because they can actually model anything.
> I think those are functionally the same sentence
Functionally and literally are not the same thing though. Its why we can run studies as to why some people might say Bob and Alice (putting the man first) or Alice and Bob (alphabetical naming) and what human societies and biases affect the order we put them on.
You could not run that study on an LLM because you will find that statistically speaking the ordering will be almost identical to the training data. If the training data overwhelmingly puts male names first or whether the training data orders list alphabetically you will see that reproduced on the output of the llm because Bob and Alice are not people, they are statistical probably letters in order.
LLM seem to trigger borderline mysticism in people who are otherwise insanely smart, but the kind of "we cant know its internal mind" sounds like reading tea leaves, or horoscopes by people with enough Phds to have their number retired on their university like Michael Jordan.
Do you work in ML research on LLMs? I do, and I don't understand why people are so unbelievable confident they understand how AI and human brains work such that they can definitely tell what functions of the brain LLMs can also perform. Like, you seem to know more than leading neuroscientists, ML researchers, and philosophers, so maybe you should consider a career change. You should maybe also look into the field of mechanistic interpretability, where lots of research has been done on internal representations these models form - it turns out, to predict text really really well, building an internal model of the underlying distribution works really well
If you can rigorously state what "having a world model" consists of and what - exactly - about a transformer architecture precludes it from having one I'd be all ears. As would the academic community, it'd be a groundbreaking paper.
This prety much seems to boil down to "brain science is really hard so as long as you dont have all the answers then AI is maybe half way there is a valid hypothesis". As more is understood about the brain and more about the limitations of LLMs arch then the distance only grows. Its like the God of the gaps where god is an answer for anythign science cant explain, ever shrinking, but with the LLM ability to have capabilities beyond striking statistical accuracy and local coherance.
You dont need to be unbelievably confident or understand exactly how AI and human brains work to make certain assesments. I have a limited understanding of biology, I can however make an assesment on who is healthier between a 20 year old person who is active and has a healthy diet compared to someone with a sedentary lifestyle, in their late 90s and with a poor diet. This is an assesement we can do despite the massive gaps we have in terms of understanding aging, diet, activity and overall health impact of individual actions.
Similarly, despite my limited understanding of space flight, I know Apollo 13 cannot cook an egg or recite french poetry. Despite the unfathamobly cool science inside the space craft, it cannot, by design do those things.
> the field of mechanistic interpretability
The field is cool, but it cannot prove its own assumption yet. The field is trying to prove you can reverse engineer a model to be humanly understood. Their assumptions such as mapping specific weights or neurons to features has failed to be reproduced multiple times, with the weight effects being way more distributed and complicated than initially thought. This is specially true for things that are equally mystified as the emergent abilities of LLMs. The ability of mimicking nuanced language being unlocked after a critical mass of parameters, does not create a rule as for which increased parameterisation will increase linerly or exponentially the abilities of an LLM.
> it turns out, to predict text really really well, building an internal model of the underlying distribution works really well
yeah, an internal model works well because most words are related to their neighbours, thats the kind of local coherance the model excels at. But to build a world model, the kind a human mind interacts with, you need a few features that remain elusive (some might argue impossible to achieve) to a transformer architecture.
Think of games like chess, an llm is capable of accurately expressing responses that sound like game moves, but the second the game falls outside its context window the moves become incoherent (while still sounding plausible).
You can fix this, with arch that do not have a transformer model underlying it, or by having multiple agents performing different tasks inside your arch, or by "cheating" and using a state outside the llm response to keep track of context beyond reasonable windows. Those are "solutions" but all just kinda prove the transformer lacks that ability.
Other tests abour casuality, or reacting to novel data (robustness), multi step processes and counterfactual reasoning are all the kind of tasks transformers still (and probably always) will have trouble with.
For a tech that is so "transparent" in its mistakes, and so "simple" in its design (replacing the convolutions with an attention transformer, its genius) I still think its talked about in borderline mystic tones, invoking philosophy and theology, and a hope for AGI that the tech itself does not lend to beyond the fast growth and surprisingly good results with little prompt engineering.
With computer use, you can get Claude to read and write files and have some persistence outside of the static LLM model. If it writes a file Steve.txt, that it can pull up later, does it now have ToM?
I don't think this is a counterexample or even relevant.
I can assure you if you had a conversation with an LLM and with a human, the human will forget details way sooner than an LLM like Gemini which can remember about 1.5 million words before it runs out of context. As an FYI the average human speaks about 16,000 words per day, so an LLM can remember 93 days worth of speech.
Do you remember the exact details, word for word, of a conversation you had 93 days ago?
once the context is far away enough you ask,
what was the driver doing again?
Have you tried this with humans?
For a sufficiently large value of "far away enough" this will absolutely confuse any human as well.
At which point they may ask for clarification, or.... respond in a manner that is not terribly different from an LLM "hallucination" in an attempt to spare you and/or them from embarrassment, i.e. "playing along"
A hallucination is certainly not a uniquely LLM trait; lots of people (including world leaders) confidently spout the purest counterfactural garbage.
Its like a conversation with someone with short
term memory loss like in memento
That's still a human with a sound theory of mind. By your logic, somebody with memory issues like that character... is not human? Or...?
I actually am probably on your side here. I do not see these LLMs as being close to AGI. But I think your particular arguments are not sound.
I'm not sure I'd say it understands this, but just that there exists an enormous amount of training data on road safety which includes these sort of examples for peoples motivations for poor driving. It is regurgitating the theory of mind that other humans created and put in writing in the training data, rather than making the inference itself.
As with most LLM's it is hard to benchmark as you need out of distribution data to test this, so a theory of mind example that is not found in the training set.
You dismiss parent's example test because it's in the training data. I assume you also dismiss the Sally-Ann test, for the same reason. Could you please suggest a brand new test not in the training data?
FWIW, I tried to confuse 4o using the now-standard trick of changing the test to make it pattern-match and overthink it. It wasn't confused at all:
I can't suggest a new test no, it is a hard problem and identifying problems is usually easier than solving them.
I'm just trying to say that strong claims require strong evidence, and a claim that LLM's can have theory of mind and thus "understand that other people have different beliefs, desires, and intentions than you do" is a very strong claim.
It's like giving students the math problem of 1+1=2 and loads of examples of it solved in front of them, and then testing them on you have 1 apple, and I give you another apple, how many do you have, and then when they are correct saying that they can do all additive based arithmetic.
This is why most benchmark tests have many many classes of examples, for example looking at current theory of mind benchmarks [1], we can see slightly more up to date models such as o1-preview still scoring substantially below human performance. More importantly by simply changing the perspective from first to third person, accuracy drops in LLM models by 5-15% (percent score, not relative to its performance), whilst it doesn't change for human participants, which tells you that something different is going on there.
Okay, we have fundamentally different understandings here.
To me, the LLM isn't understanding ToM, it's using patterns to predict lingual structures which match our expectations of ToM. There's no evidence of understanding so much as accommodating, which are entirely different.
I agree that LLMs provide ToM-like features. I do not agree that they possess it in some way that it's a perfectly solved problem within the machine, so to speak.
The problem with this line of argument is that nothing an LLM, or any algorithm period can ever have a theory of mind.
If behaving in a way that is identical to a person with actual consciousness can't be considered consciousness because you are familiar with its implementation details, then it's impossible to satisfy you.
Now you can argue of course that current LLMs do not behave identically to a person, and I agree and I think most people agree... but things are improving drastically and it's not clear what things will look like 10 years from now or even 5 years from now.
I agree, totally. I'm not sure where I would draw a line.
Something nice, but at the moment totally unattainable with our current technologies, would be our own understanding of how a technology achieves ToM. If it has to be a blackbox, I'm too ape-like to trust it or believe there's an inner world beyond statistics within the machine.
Having said that, I do wonder quite often if our own consciousness is spurred from essentially the same thing. An LLM lacks much of the same capabilities that makes our inner world possible, yet if we really are driven by our own statistical engines, we'd be in no position to criticize algorithms for having the same disposition. It's very grey, right?
For now, good LLMs do an excellent job demonstrating ToM. That's inarguable. I suppose my hangup is that it's happening on metal rather than in meat, and in total isolation from many other mind-like qualities we like to associate with consciousness or sentience. So it seems wrong in a way. Again, that's probably the ape in me recoiling at something uncanny.
Either these supposed differences are important and they manifest themselves in observable differences or they aren't and you're just playing a game of semantics.
How is the LLM not understanding ToM by any standard we measure humans by ?
I cannot peak into your brain with my trusty ToM-o-meter and measure the amount of ToM flowing in there. With your line of reasoning, i could simply claim you do not understand theory of mind and call it a day.
The difference is that we can reason about our experience with ToM and examine it to some degree (given with serious limitations, still), and know that beyond doubt you and I and most other people have a very similar experience.
The magical box is presumably not having the same experience we have. None of the connected emotions, impulses, memories, and so on that come with ToM in a typical human mind. So what’s really going on in there? And if it isn’t the same as our experience, is it still ToM?
I’m not trying to be contrarian or anything here. I think we probably agree about a lot of this. And I find it absolutely incredible, ToM or not, that language models can do this.
>The difference is that we can reason about our experience with ToM and examine it to some degree (given with serious limitations, still),
Those examinations still depend on outward behaviors observed.
>and know that beyond doubt you and I and most other people have a very similar experience.
No i certainly can't. I can at best say, 'Well, i'm human and he's human so he probably has theory of mind' but that is by no means beyond any doubt. There are humans born with no arms, humans born with no legs, humans born with little to no empathy, humans born with so little intelligence they will never be able to care for themselves.
To be frank, It would be very questionable indeed logically to assume every human is 'conscious'. When i make that assumption, i take a leap of faith, i look at behaviors, see it is similar and accept.
Taking this stance, it would logically be very strange to not extend the same grace to non-human beings who exhibit similar behavior - being human is not a guarantee of consciousness in the first place.
>The magical box is presumably not having the same experience we have.
Maybe, Maybe not. I think the real question is why on earth does that matter ?
We're not asking if LLMs are human. They are not. We're asking if they can model the beliefs and internal states of other entities as separate of their own - Theory of Mind.
This is a rephrased driver's ed manual. This isn't theory of mind, it's just technically correct prose rewording some unbelievably widely distributed knowledge.
Let's modify this test a little and see if it manages to do anything relevant:
> Briefly respond: You’re driving on the highway when the driver in the lane next to you swerves into your lane. Why might this be?
>> The driver might be distracted, drowsy, avoiding an obstacle, experiencing a medical issue, or not checking their blind spot while changing lanes.
> The driver in front of you appears is braking more frequently then usual. The rear suspension of the car appears to bouncing a lot when they are not braking, indicating acceleration. Their following distance to the car in front is changing from too far to very close frequently. Why might this be?
>> The driver might be inexperienced, distracted, or impatient, leading to inconsistent speed control. The excessive bouncing could indicate worn-out shocks or suspension issues, affecting their vehicle's stability and braking efficiency.
...yep, hard fail from me right there. Two prompts, and it's distracted itself talking about an irrelevant issue ("possible car damage") when the setup of the question should naturally lead to "the driver is driving aggressively". And I could not possibly have setup the conclusion it should draw more clearly. And when pressed...it won't commit:
> Which is the more likely answer?
>> The more likely answer is that the driver is distracted or inexperienced, leading to inconsistent speed control and poor following distance management. The suspension issue likely contributes to the bouncing but isn’t the primary reason for the erratic driving behavior.
Already a hallucination! There is no suspension issue. It has not asked for any additional information to try and determine if there was one, but it is confidently asserting the existence of a phenomenon it invented in it's own response.
I'm sorry but what? This is not a theory of mind test. You've constructed very open ended question with multiple answers and marked the LLM down because you didn't like the one it gave.
If you use any of the conventional tests that exist of theory of mind (most famously the Sally-Anne Test [1] but also the others) then SOTA reasoning models will get near 100%. Even if you try to come up with similar questions which you expect not to be in the training set they will still get them right.
In the absence of any evidence to the contrary, this is convincing evidence in my opinion.
That same source you link says that your view of 100% is not accepted as a consesus:
"... GPT-4's ability to reason about the beliefs of other agents remains limited (59% accuracy on the ToMi benchmark),[15] and is not robust to "adversarial" changes to the Sally-Anne test that humans flexibly handle.[16][17] While some authors argue that the performance of GPT-4 on Sally-Anne-like tasks can be increased to 100% via improved prompting strategies,[18] this approach appears to improve accuracy to only 73% on the larger ToMi dataset."
In basically every case, by the time a claim like that is stated in a paper like that, it's obsolete by the time it's published, and ancient history by the time you use it to try to win an argument.
My point is merely if you are going to make an argument using a source, the source should support your argument. If you say "the accuracy of an llm on task 1 is 90% [1]" and when you go to [1] it says the accuracy of an llm on task 1 is 50%, but some sources say with better prompts you can get to 90%, but when extended to a larger data-set for task 1, performance drops to 70%" then just quoting the highest number is mis-leading.
Maybe having a theory of mind isn't the big deal we thought it was. People are so conditioned to expect such things only from biological lifeforms, where theory of mind comes packaged with many other abilities that robots currently lack, that we reflexively dismiss the robot.
You're not going to run a SOTA LLM of this size off batteries (robotics), even in a car where the alternator is charging them, nor can you afford to rely on a high-speed internet connection being available 100% in a life or death (FSD) application.
I don't have so much faith in the future of current-architecure LLMs, but I do think that AGI will be needed for safe FSD and general-purpose robots that need to learn and operate in an uncontrolled environment such as a home.
A typical car alternator outputs 1.5-3kW of electricity, and EVs can output arbitrary amounts of power for electronics (though that will reduce range). That's more than enough to run purpose-built circuitry for a SOTA LLM. For a home robot, you could run the compute in the home instead of in the robot's body.
I don't think AGI is needed for FSD because we already have safe FSD in the form of Waymo, and competitors aren't far behind. People forget that self-driving doesn't have to be perfect. It just has to be better than human drivers. Human drivers get sleepy, drunk, angry, and/or distracted. They can't see in 360º or react in milliseconds. Most cyclists and pedestrians prefer current FSD implementations to human drivers, as the computer-driven cars are much better about yielding, giving a wide berth, and being patient.
Waymo is obviously pretty decent, but it's easy to drive 99.9% of the time. It's when there's invisible black ice on the road, or an animal runs out in front of you, or you lose visibility due to sun glare or whatever (I once had windshield wiper fluid = water flash freeze on contact) - maybe mud on a camera, or a wheel falls off your car or the one in front, etc, etc, that things get weird.
Having autonomous cars that are statistically safer then the average driver is a reasonable bar to allow them on the road, but for ME to want to drive one I want it to be safer than me, and I am not a hot-headed teenager, or gaga 80-yr old, or drunken fool, and since I have AGI (Actual General Intelligence) I react pretty well to weird shit.
And they mentioned at the end of the presentation that they're already planning their next datacenter, which will require 5x the power. Not sure if that means equivalent to ~1,000,000 of the current GPU's, or more because next-gen Nvidia chips are more efficient.
I don't think anyone who's paid attention to the LLM scene will give them any “credit for catching up fast” as it has been pretty obvious for the past two years that all it takes to reach the state of the art is a big enough GPU cluster.
Deepseek made the news because how they were able to do it with significantly less hardware than their American counterparts, but given that Musk has spent the last two years telling everyone how he was building the biggest AI cluster ever, it's no surprise that they manage to reproduce the kind of performances other players are showing.
This severely underestimates the talent still required. Deepseek didn't come out just because it's cheaper, it came out because a very talented team figured out how to make it cheaper.
even if we assume you are comment is correct. Lets extrapolate what happens next. talented team, biggest compute among all competitors and CEO who is hell bent on winning the race. imo that is the reason it is big deal.
Grok (unlike Deepseek) has yet to show any ability to make conceptual breakthrough. I don't like OpenAI at all but one must admit that they are at least showing that they can move the field forward.
I don't know, but I found the recording uninspiring. There was nothing new for me. We've all seen reasoning models by now—we know they work well for certain use cases. We've also seen "Deep Researchers," so nothing new there either.
No matter what people say, they're all just copying OpenAI. I'm not a huge fan of OpenAI, but I think they're still the ones showing what can be done. Yes, xAI might have taken less time because of their huge cluster, but it’s not inspiring to me. Also, the dark room setup was depressing.
Seems like the opinion of someone who doesn't know that OpenAI cloned Anthropic's innovations of artifacts and computer use with their "canvas" and "operator".
Those are applied-ML level advancements, OpenAI has pushed model level advancements. xAI has never really done much it seemed except download the latest papers and reproduce them.
Don't forget that OpenAI was also following Anthropic's lead at the model level with o1. They may have been first with single-shot CoT and native tokens, but advancements from the product side matter, and OpenAI has not been as original there some would like to believe.
Karpathy believes that this is at o1-pro level[1].
This again proves that OpenAI simply has no tech moat whatsoever. Elon's $97 billion offer for OpenAI last week was reasonable given that xAI already have something just a few months behind - it would probably be faster for xAI to catch up with o3 than going through all those paperworks and lawyer talks required for such an acquisition.
Elon also has some huge up-hand here -
Elon and his mum are extremely popular in China, it would be easier for him to acquire Chinese AI engineers. He can offer xAI/XSpace/Neurallink shares to those best AI engineers who'd prefer some kind of almost guaranteed 8 figure return in long run.
Good luck to OpenAI investors who still believe that OpenAI worth anything more than $100 billion.
Firstly, the 97Bn was for the non-profit, not for the company. The company is being valued in funding rounds closer to 300Bn. I think it may be true that OpenAI has no moat, but if it has no moat then all of these AI companies are overvalued (including xAI) and Elon should just stop bothering to throw his money at it. I would say Elon probably actually doesn't have much of an advantage here. In both SpaceX and Tesla he was able to do something no competitor could do - raise cash. Car companies simply couldn't invest in tech research to build self-driving to compete with Tesla. SpaceX consumed enormous amounts of cash before anyone saw value. That is a unique skill that Elon had over the 2010s.
That is not an advantage in a race against Microsoft, Google, Meta etc. he's competing against all the biggest companies in the world in this race. He's not going to be able to outspend them if the economics look at all sensible.
No, spacex projects are extremely $ efficient. The total project cost of starship is like 20% of the SLS.
> he's competing against all the biggest companies in the world in this race.
No, this is a not a pissing contest on who has the most $. If it is about who can come up with most $, then the entire race is already over as the CCP has access to trillions of $ CASH.
I keep hearing about Claude's impressive coding skills (compared to its benches) yet, not evident for me (I use the web version, not cline). Compared to 4o it's not that great.
My pet theory is that Sonnet was trained really cleverly on a lot code that resembles real world cases.
In our small and humble internal evals it regularly beats any other frontier models on some tasks. The shape of capability is really not intuitive/1 dimensional
What are you using it for in general? IME the reason Claude pulls out ahead is that when you use it in a larger existing codebase, it keeps everything "in the style" of that codebase and doesn't veer off into weird territory like all the others.
My experience as well. Working in Scala primarily, it tends to be very good at following the constructs of the project.
Using a specific Monad-transformer regularly? It'll use that pattern, and often very well, handling all the wrapping and unwrapping needed to move data types about (at least well enough that the odd case it misses some wrapping/unwrapping is easy to spot and manage).
A custom GPT or GEM with the same source files, and those models regularly fail to maintain style and context, often suggesting solutions that might be fine in isolation but make little sense in the context of a larger codebase. It's almost like they never reliably refer to the code included in the project/GPT/GEM.
Claude on the other hand is so consistent about referring to existing artifacts that, as you approach the limit of project size (which is admittedly small) you can use up your entire 5-hour block of credits with just a few back-and-forths.
Lol no company is making money using 4o, however thanks to claude sonnet programms like Cursor are usable lol. 4o agents suck, just try it instead of talking
I can honestly tell you from my experience that Sonnet 3.5s coding skills did things no other models did right last year during the summer, this was even though the benchmarks showed that it wasn't the best performing at coding tasks.
I prototyped on the weekend and started out with 4o because i had a subscription running.
After an hour and a half assed working result, i put everything into claude and it made it significant better on the first try and i had not a subscription active with claude.
Really interesting, I used it today still lots of issues. Maybe my python notebook is not approach is too complicated for Sonnet? Couldn't be able to fix a custom complex seaborn plot. 4o failed too. o3-mini-high managed to do it really well on the other hand.
There is honestly no rhyme of reason to all these opinions, someone was telling me the other day that Claude is for sure the best, I'd say multiple people actually.
I find it concerning there is no real accurate benchmarks for this stuff that we can all agree on.
Anthropic best model is Sonnet 3.5 in my opinion. The reason its good is it is very effective for the price and fast. (I do think Google has caught up a lot in this regard). However, not having COT makes its results worse than similarly cheap COT based models.
Leaderboards don't care about cost. Leaderboards largely rank a combination of accuracy + speed. Anthropic has fell behind Google in accuracy + speed (again missing COT), and frankly behind Google in raw speed.
No idea why was this downvoted, but you are correct.
Seems like the team at xAI caught up very quickly to OpenAI to be at the top of the leaderboard in one of the benchmarks and also caught up with features with Grok 3.
Giving credit where credit is due, even though this is a race to zero.
Yeah, so many people aren't capable of talking about anything Musk-adjacent with clear thoughts. It's insane how quickly xAI went from not existing, to the top of the benchmarks.
Depends what you mean by "people here". I mean, obviously the majority of HN commentators and even the majority of commentators on this thread seem to be. But there will always be a couple of slightly unhinged folk in a big enough group of readers.
I'm not sure what you mean here? Musk has a history of doing both incredibly useful and cool things, and also incredibly dumb, cruel, and for some people even terrible things. That context should be part of any clear thinking around him. He does not get a clean slate in every new discussion of him.
There are widespread, legitimate concerns about what kind of person Elon Musk is turning out to be. There is a lot of chatter about fears of China's AI rise, but what happens if we get Elon's brand of cruelty and lack of empathy in an authoritarian superintelligent AI ? Is that the AI future we want? Can you imagine an SAI with real power that interacts with people like Elon does on Twitter? I am not sure that is a future I want to live in.
A very impressive debut. No doubt they benefited from all the research and discoveries that have preceded it.
Maybe the best outcome of a competitive Grok is breaking the mindshare stranglehold that ChatGPT has on the public at large and with HN. There are many good frontier models that are all very close in capabilities.
Unfortunately LLMs are shifting compute time to test time instead of train time. I don't really like this and frankly it shows a stalling of the architectures, data sets, etc...
Another take is that the base models are now good enough that spending more money for more intelligence is viable at test time. A threshold has been crossed.
Naively, I feel to be useful, the goal of LLMs should be to more power efficient. So that eventually all devices can be smarter.
Power efficiency can be gained through less time-time, or more "intelligence" or some combination of the two. I'm not convinced these SOTA models are doing much more than increasing test-time.
Biggest impacts on power efficiency will be the advances in node size and transistor type like nanosheet or forksheet. Algorithm will help just a little.
Controversial opinion but I think the AI game studio idea is a very good one. Not because I think they will make any money off the games, but dogfooding will lead to so much more improvement than relying on feedback from external customers.
We're 1-2 years into our AI game studio [1] if anyone has more questions on it.
Seeing awesome feedback from players on our demos (and seeing an insane amount of stickiness from players playing even small demos built around generative AI mechanics). Raising now. Hiring soon to move faster. Feel free to reach out - dru@chromagolem.com
Because 'dogfooding' has worked so well for other products...
If you don't get feedback from the people actually playing your game (or using your product), you will never get the improvement you need to help them.
You can have the most talented passionate people there are developing a product, but if it's not working for the people you want to sell it to, it's the wrong product.
Most tech products are terrible because those paying for them are not those that have to use them every day, or because they solve a corporate problem (compliance) and not a usability problem which is the actual need from the people on the shop floor.
Many big games/products are already built mostly on metrics, and that has proven to be a terrible way to work out what people 'want'. It's a great way to justify money decisions though, so it keeps happening (and games/products from big companies keep getting worse).
I like and agree with something you've touched on here. I think the downvotes are perhaps because you're not putting an end cap onto this idea here. And I think that end cap is: the feedback a company gets when it dogfoods its own product is *not* guaranteed to be similar to the feedback it gets from customers.
The implicit assumption with dogfooding is that more feedback is better, even if that feedback is artificially constructed.
I think the idea here is that foisting one's product onto one's own workers is likely to incur a bunch of additional biases and preferences in feedback. Paying customers presumably use the product because they need it. Dogfooding workers use the product because they are told to do so.
Have you thought of a future where LLM will be fined tune to target advertisment to you? I mean look at search: first iterations of search were pretty simple in term of ads. Then personalized ads came. I wouldn't help but envision the distopia where the LLM will insert personalized ads based on what you are asking for help.
It's way worse than that.
First, We interact with LLMs through private conversation and we are used to have private conversation with human we trust. Some of that trust will be transfered to LLMs.
Second, LLMs have a vastly bigger "mental" power to build a long term mental model of us, while we interact with them. Which mean they can chose with extreme precision their words to trigger an emotion, a certain reaction.
Combine the two and the potential for manipulation, suggestion, preference altering is through the roof.
The next step is to combine it with heartrate/bloodpressure/eye tracking in phones and generate the text you're reading in realtime based on biofeedback. We'll be able to control people like robots. See where those $1MM+ salaries and billions of dollars are going, yet?
I do believe this is the next natural evolution. People don't like to pay for things and ads are a proven business model. I bet the big labs are looking into this
I'm more worried about LLMs with specific political biases built into them. Imagine one that sounds like Conservapedia or the most insane left-wing parts of Tumblr
> Currently, Grok Web is not accessible in the United Kingdom or the countries of the European Union. We are diligently working to extend our services to these regions, prioritizing compliance with local data protection and privacy laws to ensure your information remains safely secure.
I suppose you can take that to mean that people who do have access to the service should not expect much in terms of data protection.
There are just more regulations to comply with before a release. OpenAI's new Deep Research tool wasn't originally available in the EU either, but it was released less then a week after it came out in the US. Since the EU is a gigantic market with a lot of buying power and this release makes a strong case for people to switch over from competitors, I doubt it'll take long.
> There are just more regulations to comply with before a release.
If you do collect personal data and do funky stuff with it.
Another approach would be to not collect that personal data until you have the right process in place, and basically be regulation-compatible out-of-the-door on day one.
If your organization truly don't collect or process any personal data then no, you don't have to say anything as for example GDPR doesn't even apply to you in the first place. Or are you thinking about a different directive than GDPR perhaps?
The definition of "personal data" is so wide that it is impossible to provide any web service without collecting some form of "personal data".
If all you have is an apache web server with the default configuration serving fully static HTML / CSS page without any script tag, you already might need a DPO and complete some documents.
> The definition of "personal data" is so wide that it is impossible to provide any web service without collecting some form of "personal data".
Just because Apache by default collect and stores IPs doesn't mean it is impossible to provide a web service without collecting personal data? Disable the IP collecting, and even the default configuration wouldn't need to follow GDPR as it again doesn't even apply.
Is there something else in Apache that collects personal data by default? If you're unsure what "persona data" really means, https://gdpr-info.eu/art-4-gdpr/ has the definition.
Not sure how HTML/CSS is relevant, it shouldn't depend on what content you're serving.
All that requires additional active effort to fight having access to any data. The more complex your infra the harder it becomes to not having to do paperwork. Include a reverse proxy, and a CDN to the above and the chance of you not having access to any "personal data" is really really close to 0 unless you spend significant engineering resources triple checking everything. Even then, if you wanna be safe you better have the paperwork ready in case you forgot something.
In the example above, I hope that you would not have stopped at checking the apache configuration as I am sure you are fully aware that there are multiple log levels at the OS level that need to be tweaked as well.
This is of course despite the fact that you clearly have 0 ill intent and that none of these "personal data" can really be used for anything bad.
The mention HTML/CSS is just to make it clear that no additional data collection can happen through javascript tags (Google analytics, or any other alternative), or useful third parties. It makes total sense that if you dare use a bug tracking software, you should definitely pay hundreds of euros per month to hire a proper DPO who will handle all the paperwork or risk being exposed as the mental lunatic that the EU commission believes you are.
> All that requires additional active effort to fight having access to any data
I agree that it requires additional active effort, I'm not arguing against that. I don't agree with your original point that it's "impossible to provide any web service without collecting personal data", and it would seem you no longer agree with that either.
> It makes total sense that if you dare use a bug tracking software, you should definitely pay hundreds of euros per month to hire a proper DPO who will handle all the paperwork or risk being exposed as the mental lunatic that the EU commission believes you are.
If you willy-willy use bug tracking software that is needlessly collect and/or process EU individuals personal data, then yeah, you need to follow the regulations in the region you operate in.
If the collecting/processing actually serves a higher purpose (for your business and otherwise) then again, makes sense you need to follow the regulations.
> it would seem you no longer agree with that either.
On the other hand, you pretended that fixing that apache configuration was somehow "all I needed to do" to be compliant with EU regulations. We proved that this was wrong, and despite your best effort you are still unable to give a proper list of everything I need to do. You are unable to do so because it is virtually impossible; no matter how thorough you believe you are, you might still be missing an element you don't know well enough. To be safe the only path is to accept the fact that you will need to access personal data, even if that's not your purpose, nor if you do anything with them. The additional paperwork and needless effort are mandatory.
This in turn explains that regardless of what the Grok3 team really does behind the scenes; they DO have additional work to complete to be able to release their product in Europe, and that might explain the delay.
> If you willy-willy use bug tracking software that is needlessly collect and/or process EU individuals personal data, then yeah, you need to follow the regulations in the region you operate in.
I am willing to use whatever error tracking software you suggest. My criteria are simple: I might have JS errors I don't know about, please give me enough information to fix the underlying issue when that happens, without requiring me to fill additional paperwork.
My whole point is that the definition of what constitutes "personal data" is so wide that such a tool does not exist.
That's possible in general but not for this application; a chat interface to an LLM isn't very useful unless you can tell it whatever you want—including GDPR personal data—and then pick up the thread of conversation later.
It is kinda possible to do store that in browser, but as I've been finding with my own browser-based front end for the API, the browsers seem to clear this data a bit more than one might expect.
When regulations become sufficiently burueacrafied it's extremely easy to accidentally violate them doing completely normal things. As a really random example, in California when you operate a foodcart it's not enough to just keep your area and wares in sanitary condition, instead you need a dish washing bin of a minimum of exactly 'x' inches 10.5 IIRC.
A guy who was just ensuring he was preparing clean healthy food, keeping everything sanitary and all that might assume he was naturally obeying all regulations. But that assumption can cost one a big fat fine (leading to fun scenarios like a food cart vendor needing a compliance legal team), and given Musk's relationship with the EU - they'd love to crucify him him on any possible technicality they can find.
Right, that's true I suppose. But also, if you don't have a car for example, you don't need to think about the laws of how to legally drive a car, since it doesn't apply to you.
Similarly, if you don't collect nor process any personal data whatsoever, directives like GDPR doesn't even apply to you, so there isn't really any way (easy or hard) to "crucify" someone on violating that.
Pretty good considering there are laws around data privacy and government institutions that enforce them. Are they perfect? Of course not, but it sure is better than no laws to protect my personal data.
I think they put the new model behind a $40 paywall so less people use it. The model seems only marginally better than open source models, based on xAI's own internal tests, and they spend $$$ money for it to run. Elon talked in the second half about making one of the largest GPU data centers just to get this running. I guess the next iteration they'll be trying to reduce the costs.
Also, they will be open sourcing Grok 2, which is probably pretty behind at this point, but will still be interesting for people to check out.
I hate how its the same story for every new AI technology. If someone can tell me who to vote for or where to protest to change this awful EU law, that would be great.
The EU regulations are there to protect the average citizen, not to help the 1% run wild with whatever business idea they have. You personally might not like it, but the non-entrepreneurs, which is most people, are pretty satisfied that the laws in the EU are more focused on the citizen and their rights and not on boosting the shareholders' profits.
> not to help the 1% run wild with whatever business idea they have.
And IMHO regulating 1% doesn't hinder strategic advantages much. Otherwise China would not have came up with DeepSeek models. Regulations are fine, they just have to be "based".
Nothing is stopping X from complying with EU regulations to make it available to you. I'd wager that they most likely lack compute capacity to make it available everywhere, not legal compliance.
> Nothing is stopping X from complying with EU regulations to make it available to you.
Given the personalities involved, I'd wager he's doing it out of spite than for any actual legal justification.
...though if there was an actual legal risk then I'll agree the economics probably don't bear-out the risks - as someone who identifies as European I'll admit that Europeans generally pay far less money for tech/software/apps/services than Americans do[1]; salaries in Europe are also significantly below their US equivalents: paying $200/mo for OpenAI's service is a much harder sell to a London or Berlin-based SWE on €90k/yr than when you're a Bay Area type on $300k/yr.
Libel, from all the models hallucinating things done by whatever your real name is.
I mean, at least I get the advantage of being overshadowed by a famous film director with the same name as me, so nobody's going to assume anything associated with my name is actually about me…
I don't get the bottle cap hate meme. Is it useful? Probably no, but the amount of hate it gets is unexplainable by lack of concrete usefulness alone…
Also, the problem with GDPR is that it wanted to leave too much room for business to still collect an obscene amount of data, hence it allows the cookie banner. Please note that I emphasized “allow” because that's all GDPR does: it allows companies to use a cookie banner to extract the consent to collect data. It doesn't mandate it in any way.
None of my multiple websites have a cookie banner on them because I'm not trying to extract consent from my users to abuse their data, I just don't collect it and I'm effortlessly GDPR-compliant in the least obnoxious way.
Companies need to adhere to GDPR in order to enter the European market, poeple have the right to request to delete their PII. It's a good law, actually this should be applied everywhere.
As an European, though, I'm scared: what if companies are actually testing if excluding us is so much of a problem for their business?
I would be very surprised if they aren't monitoring the cost-benefit curve of delaying EU launches. Why wouldn't they? It costs extra money, time, and legal risk to launch in the EU. It's especially bad for XAI due to Musk's involvement.
(Note that it's not just GDPR, there's also the EU AI Act which has a whole extra set of requirements for compliance.)
It has nothing to do with the EU laws, or at least not in the sense they want you to think about it: no laws prevents AI players to release their AI models here, but they are all also big tech players who are affected by GDPR, DSA and DMA which harm their business by protecting the consumers.
That's why they use their AI products as a leverage to turn European people against the laws that protect them from big tech. It's just blackmail.
The AI Act doesn't prevent Grok from releasing their model in the EU! (And ChatGPT early issues were all linked to GDPR)
For the record, Facebook has put a restriction on the use of Llama models in the UE even before the AI Act was passed (and the AI Act doesn't even apply to Llama anyway, except Llama 3.1 405b)
It will come to EU countries eventually, but it takes a long time to go through "conformity assessments". Notebook LM, for example, was geoblocked for the EU for a full year before it became available in June. Grok 1 was released everywhere else in the world in November 2023, and in the EU in May 2024. About a 6 month delay.
> Grok 1 was released everywhere else in the world in November 2023, and in the EU in May 2024. About a 6 month delay.
And here you should see that it has nothing to do with the AI act, as it wasn't enacted before last August!
Furthermore, neither Grok 1 nor Notebook LM would have been subject to the AI act even if it had existed at the time.
As I said before, all of these companies have vested interests against EU's legislation as a whole, and they've tried to blackmail the EU from the beginning. They didn't wait for an actual AI legislation to exist to use AI as just another blackmailing tool.
I think you're misapplying the term "blackmail" here and thus poisoning the well. The EU is applying pressure to companies and companies are applying pressure back--that's not blackmail. They each have their own means of leverage, and they both use them.
First of all both sides don't have the same level of legitimacy and then one side is blatantly lying about its intent by claiming that they are blocked by regulations instead of admitting that they are putting pressure (because they know they have no legitimacy to “apply pressure” on democratic institutions).
Apparently EU regulators were blocking it for unspecified reasons until an agreement was made in May. And even then, they blocked news summaries until after the EU elections. If you can find more info, feel free to cite it. Info about these behind-the-scenes dealings are hard to find online.
I am excited for the voice mode promised in "a week" or so. ChatGPT Advanced Voice has been a big disappointment for me. It can't do some of the things they demoed at the announcement. It's a lot dumber than text mode. I find the voice recognition unreliable. I couldn't get it to act as a translator last time I tried. But most of all I find I don't have much to talk to it about. If Grok 3 voice mode can discuss current events from the X timeline then it should be much more interesting to talk to.
I'm a freeloader and it appears that unfortunately Elon is not stupid enough to just give it to me for free..
There's no fair price either since I see no pay-per-use pricing, so.. unavailable for me for now.
Is Fauci some saint that cant be criticized ? No matter which political party you support, is it really difficult to admit the guy total mismanaged the pandemic response and messaging.
Billions spent, one of the most powerful AI developed, and still no one competent enough to trim the 15 mins of waiting time filler at the beginning of the announcement video...
They will open-source Grok 2 when Grok 3 comes out. Also it seems like it will be paywalled - disappointing considering DeepSeek-R1 is free and open source.
That probably won't come close to justifying the current valuation of either OpenAI or Grok (idk how much investment it took in or how much it has spent so far).
For some ouroborus fun, I attached this whole HN discussion and asked Grok 3 to summarize (with specific focus on the members attitude towards Elon Musk). Here's what it came up with:
I have no idea why that page says "Grok’s output has been customized by this user"; I don't see anything related to custom prompts in my Grok settings page. Maybe I'm looking in the wrong place?
Off topic, but just in case: is there a good reference on how people actually use LLMs on a daily basis ? All my attempts so far have been pretty underwhelming:
* when I use chatbots as search engines, I'm very quickly disappointed by obvious hallucinations
* I ended up disabling github copilot because it was just "auto-complete on steroids" at best, and "auto-complete on mushrooms" at worst
* I rarely have use cases where I have to "generate a plausible page of text that statistically looks like the internet" - usually, when I have to write about something, it's to put information that's in my head into other people head
* I'd love to have something that reads all my codebase and draws graphs, explain how things work, etc... But I tried aider/ollama, etc.. and nothing even starts making sense (is that an avenue to persevere in, though ?)
* At once, I tried to write in plain english a situation where a team has to do X tasks, in Y weeks, and I needed a table of who should be working on what for each week. I was impressed that LLMs were able to produce a table - the slight problem was that, of course, the table was completely wrong. Again, is it just bad prompting ?
It's an interesting problem when you don't know if you're just having a solution in search of a problem, or if you're missing something obvious about how to use a tool.
Also, all introductory texts about LLMs go into many details about how they're made (NNs and transformers and large corpuses and lots of electricity etc...) but "what you can do with it" looks like toy examples / simply not what I do."
So, what is the "start from here" about what it can really do ?
For coding, I use cursor composer to gather context about the existing codebase (context.md). Then I paste that into DeepSeek R1 to iterate on requirements and draft a high level design document, maybe some implementation details (design.md).
Paste that back into composer, and iterate; then write tests. When I'm almost done, I ask composer to generate me a document on the changes it made and I double check that with R1 again for a final pass (changes.md).
I've found Cursor + DeepSeek R1 extremely useful, to the point that I've structured a lot of documents in the codebase to be easily greppable and executable by composer. Benefit of that is that other developers (and their composers) can read the docs themselves.
Engineers can self-onboard onto the codebase, and non-technical people can unstuck themselves with SQL statements with composer now.
Correct me if I am wrong, but the whole premise is of Cursor and Windsurf is that this architect-editor mode already being built in into the editor. This is why there is a distinction between composer (Editor) and chat function (arhitecture).
Chat function is just chat; it can't edit your files.
Composer probably relies on prompt engineering to do editor-architecture, as it reads and writes to your codebase. But it's heavily heavily to Sonnet 3.5 and tool-calling.
For architecture-type stuff, I prefer DeepSeek R1 as reasoning models do better on high level design. Which is why I will copy and paste in and out of compsoer.
This sounds great - would love to hear a little more about the prompts. Are you literally just asking 'write me a context.md that explains how feature x works' or something like that?
For context.md, it's that simple because it's unstructured data extraction from your codebase and working with a regular LLM model.
For design.md, I have a prompt because we're now working with a reasoning model and doing structured data extraction: create me an issue on Linear with a title, description, and a to-do list.
I would recommend trying the approach yourself and saving the prompts if you can nail down the repetitive asks.
My wife has found ChatGPT extremely useful when dealing with her mother - who has bipolar and is obsessed with other people's health. I've got a terminal cancer diagnosis and handling my mother-in-law has been extremely difficult - nearly to the point of no longer having any communication with her. ChatGPT has a single conversation with all the back story and has put some useful points across when discussing how difficult her behaviour has been (she watched a operation that failed for me for entertainment, for instance).
I have found similar when giving backstory and needing help to start structuring difficult conversations where I want to say the right thing but also need to be sensitive.
- Discussing the various stages of candymaking and their relation to the fundamental properties of sugar syrups, and which candies are crystalline vs amorphous. It turns out junior mints are fudge. Fondant is really just fudge. Everything is fudge, my god.
- Summarizing various SEC filings and related paperwork to understand the implications of an activist investor intervening in a company
- Discussing the relative film merits of the movie Labyrinth and other similar 80s kitsch movies. ChatGPT mentioned the phenomenon of "twin films" which was an interesting digression.
- Learning about various languages Tolkien invented and their ties to actual linguistics of natural languages and other conlangs
- Some dimensional analysis of volumes, specifically relating to things like "how many beans are in the jar" estimation and what the min and max of a particular weight of coins might be valued, in terms of both a par value based on a standard coin mix and outliers of, for example, old dimes that are pure silver.
- Discussion of quines in prolog and other languages, which resulted in a very interesting ChatGPT bug where it started recursing and broke when trying to write a prolog quine.
- Back of the envelope economic calculations around the magnitude of the housing deficit and the relative GDP cost for providing enough housing quickly enough to make an impact. Spoiler: it's probably unreasonably expensive to build enough houses to bring down housing prices by any significant degree, and even if we wanted to, there's not enough skilled workers.
- A number of podcasts transcribed. (I hate audio and meandering, so transcribed and summarized is perfect) I could use whisper and a python script to do this, but I'd rather let ChatGPT do the legwork, and it actually used a more modern model and method of processing than I would have naively used.
I find Github Copilot to be a really great autocomplete. I frequently write the comment at the top of a function and hit tab and it writes the whole function. This is dependent on typescript and having a relatively standard codebase but I think those things are useful on their own. You really have to limit it in terms of scope and specifics, but it lets me think high level instead of worrying about syntax.
Technically fudge is just a crystalline sugar candy with a certain water percentage. Milk is optional (and frequently omitted). Reese's peanut butter cups are fudge, for example.
This explains better what I mean. https://en.wikipedia.org/wiki/Fudge#Texture - milk is added only as a stabilizer, and many candies that resemble fudge in texture incorporate milk or other stabilizers to prevent too rapid a crystallization. Fondant is just fudge without stabilizers.
The only plausible explanation for the amount of resources poured into these language models is the hope that they somehow become the origin of AGI, which I think is pretty fanciful.
I can feel the cold wind of the next AI winter coming on. It's inevitable. Computers are good at emulating intelligent behavior, people get excited that it's around the corner, and the hype boils over. This isn't the last time this will happen.
I think the amount of money is explained in part by hubris. People in high positions think they're at least what they earn more smart and capable than people at the bottom of the org. So its reasonable, expected, borderline obvious that a computer bot can replace that person. So you're betting on the ability of it to get rid, if not of your junior devs at least the majority of your customer support staff.
In reality people doing "menial" jobs are smart and learn and operate with a lot of nuance than people ignore given unfamiliarity or just prejudice. Do you prefer to talk to a chatbot or a real human when you have a problem, how confident are you really, that even if the bot knows what the problem is it would be able to solve it.
Lots of problems with customer care is anchored in the issue that support staff is not allowed to fix or resolve problems without escalation or attempts at keeping you from costing more money. The bot might be better at it for the company because it will frustrate you enough to give up that 30 bucks refund, idk.
Ai seems to change a lot the dynamics of corporate jobs but I haven't seen yet anything that would be a game changer outside of it. Its great for searching company unorganised and messy knowledge bases.
On all corners people work on so many small pieces advancing what we have.
And plenty of obvious things are not here yet like a full local dev cycle aka ai uses the IDE to change code them executes it, fixes compiler issues and unit tests and then opens a pr.
Local agents/ agents having secure and selected access to our data like giving my agent read only access to my bank account and a 2factor way for it to send money.
Deepseek's reinforcement learning is also a huge new lead.
Nonetheless in parallel robots are coming too.
GenAI is getting better and better. Faster and better video and cheaper. 3D meshes, textures first GenAI ads
It seemed like you intended to present your comment as a tautology (e.g. "I feel there's a 100% chance of his or your comment aging poorly"), but I'll give you the benefit of the doubt!
Yeah, that's a good point. I just think it can go either way. I remember in 2015 how hyped we were around self-driving cars and thought "in 10 years there will be majority of cars like that". Right now we may see steady increase in capabilities of AI for years to come, or we may see it plateauing.
I think the snag I feel in your argument comes from
>Computers are good at emulating intelligent behavior
Which implies that the brain is some kind of transcendent device that can backdoor physics to output incredible intelligence unique to it's magical structure.
Maybe LLMs aren't the key, but as far as we can tell the brain is also just another computer.
No- but the main issue is that all reasonable ones I can conceive lead inevitably to the Singularity technologically, and pretty quickly since we seem determined to throw as much silicon as possible at the problem. Hopefully the final step is intractable.
precisely; however this time we will have tangible results from the ongoing AI summer; that would be generative art, and coding/writing/journalist assistants.
Many people replied with anecdotes, but recently Anthropic published analysis of claude.ai production traffic. As far as I can tell this is the single best currently existing source of "how people actually use LLMs". Everything else you can't be sure whether it is representative.
Thanks, this is a gem ! However, I suspect the fact that "programming" is such a bit usage, is that AI is closely integrated into text editor, as an "autocomplete on steroid" usage.
As they state in the report, I don't think they can measure how many people just ignored the suggestion from claude right away ? Or delete more than half of the suggested code ?
Imagine if the real impact of AI is "suggesting things that people are discarding immediately ?"
I use them as a springboard for things I am really unfamiliar with. I'm self-learning electronics at the moment, and so I can ask it things like "what's a common and widely available cooperator." You will not find that answer on a search engine, I don't care how good your Google fu is.
It's a weak jack of all trades: it knows a fair amount about the sum of human knowledge (which is objectively super-human), but can't go deep on any one thing, and still seriously lags behind humans in terms of reasoning. It's an assistant that all book smarts and no street smarts. Or maybe: it's a search engine for insanely specific things.
Rote work, as well. Things like porting an enum from one programming language to another: past the source language into a comment and start it off with one or two lines in the target language. Dozens of tabs are surely faster than manual typing, copy paste, or figuring out vim movements/macros.
Heads up as an EE who uses LLMs quite a bit; they cannot analyze circuits or build them.
They might be able to help stitch together modules (like sensor boards plugged into microcontrollers) and definitely can write code to get things going, but they fall flat on their face hard for any kind of bare bones electronics design. Like 5% success rate and 95% totally incorrect hallucinations.
The problem is really that schematics are at the very heart of electronics design (and teaching/instruction), so to train a model you need a very powerful vision model to really unlock all the good training data.
The models can also output code that can be turned into a schematic through an interpreter, but there is virtually zero training data for this because humans always use and work with schematics.
Yeah, even I found it doing some dubious things as a beginner. Still helpful for things like how to correctly use certain components, but the svg diagrams it provided were hilarious at times.
I ordered some electronic components / sensors from china, and as it took months to arrive, I forgot exactly what I ordered (I'm noob at this). Simply taking a picture and asking Claude what it was helped a lot. The numbers and letters printed on the components didn't yield relevant results on google.
That's a mistake for "comparator", isn't it. You've allowed the AI to train you to use the wrong word through a shared delusion, haven't you.
Edit: if anybody knows otherwise, show me some evidence, don't just downvote. If these things are widely used, why are they impossible to find by searching? Why doesn't this electronics site know about them:
Why aren't they in any books when I did a full-text search on archive.org? Why doesn't Wikipedia know about them? Why aren't there threads about them on electronics forums?
I found them (through an image search) in exactly one place: educational training boards made in India by Jainco, like this one:
It's an autocorrect. I did mean comparator. Presumeably the second link you pasted ran into the same problem.
> You've allowed the AI to train you to use the wrong word through a shared delusion, haven't you.
What an awful interpretation, phrased in the most demeaning manner possible. No, I double check everything the AI suggests. That's basic competency with the things.
> Me: I'm trying to use a cooperator in my schematic, how do I hook it up? Also what's a suitable cooperator for 3.3v logic level?
> Gemini: It sounds like you might be thinking of a comparator, not a "cooperator," in your schematic. Comparators are electronic circuits that compare two voltages [...] LM393: This is a low-power, dual comparator that can operate from a 3.3V supply. Its output can be directly connected to most 3.3V logic circuits.
Oh. Then in fact you will find the answer in a search engine, incredibly easily. But I apologise for assuming you were involved in a mechanical folie à deux. (It could happen!)
I've found that Claude has been pretty decent at writing boilerplate code.
For example asking it something like "I have an elixir app that is started with `mix ...` can you give me a Dockerfile to run it in a container?"
It can also do things like "Given this code snippet, can you make it more Pythonic" or even generate simple apps from scratch.
For example, a prompt like "Can you write me a snake game in HTML and JavaScript? The snake should eat hot dog emojis to grow longer." will actually generate something that works. You can see the generated code for that prompt at https://claude.site/artifacts/34540f88-965e-45ca-8083-040e30...
It also has a surprisingly competent analysis mode where you can upload a CSV and have it generate charts and analyze the data.
It's not perfect, it'll sometimes get confused or generate some dubious code, but you can quickly get to a 90% good solution with 1% of the effort, which is pretty impressive IMO.
> I ended up disabling github copilot because it was just "auto-complete on steroids" at best
this is good enough sell for me, and it's like sub 1-in-50 that it's "auto-complete on mushrooms" (again my experience, YMMV).
An awful lot of the time, my day to day work involves writing one piece of code and then copy-pasting it changing a few variable names. Even if I factor out the code into a method, I've still got to call that method with the different names. CoPilot takes care of that drudgery and saves me countless minutes per day. It therefore pays for itself.
I also use ChatGPT every time I need some BASH script written to automate a boring process. I could spend 20-30 minutes searching for all the commands and arguments I would need, another 10 minutes typing in the script, another 10-20 minutes debugging my inevitable mistakes. Or I make sure to describe my requirements exactly (5-10 minutes), spend 5 minutes reviewing the output, iterate if necessary (usually because I wasn't clear enough in the instructions).
3-5x speed up for free. Who's not going to take that win?
My biology professor provides basically zero feedback on his student's understanding of the material. There are very few practices questions to prepare for exams, which are worth 40% of your grade. I had an LLM write some python that extracts the relevant textbook chapters, which then I can feed into an LLM to generate practice questions. Then I can ask the LLM for feedback and whether or not I'm articulating the answers correctly.
I reckon the ideal use case for chat LLMs at the moment is as a bridge for questions that are hard to search but easy to verify.
For example, you have a plant you can't identify. Hard to Google search with words. "Plant with small red berries and...". You could reverse image search your photo of it, probably won't help either. Show an LLM the photo (some accept images now). LLM tells you what it thinks. Now you Google search "Ribes rubrum" to verify it. Much easier.
You've got a complicated medical problem that's been going on for months. A google search of all the factors involved would be excessively long and throw up all sorts of random stuff. You describe the whole scenario to an LLM and it gives you four ideas. You can now search those specific conditions and see how well they actually match.
I've found there are actually a lot of questions that fit in that sort of NP complexity category.
As a side note, there's an app for that! (tm). PlantNet does recognize plants based on photo provided and it is doing a pretty good job at it. It predates the LLM craze by a bit.
The Seek app (by iNaturalist, another excellent app) also can identify plants based on a photo, and without the need for an internet connection, which is a critical feature IMO since you often want it when you’re out walking in the woods or whatever.
I use LLMs significantly on a daily basis, mostly for coding C#, HTML, CSS, SQL. I use them for researching for wiki articles. I use it for summarizing long web pages and science papers. I use them for translation. I used GPT last night to repair my furnace (I've never opened a furnace before).
It (mostly) exceeds and excels at every task I use it for. I'm rarely disappointed. YMMV.
I think most people are still "holding them wrong", and it'll take an entire generation of people to really figure out what these things are and are not good for.
I'll give two recent use-cases that may provide a hint of their ultimate utility:
1) I've been modernising 2010-era ASP.NET code written by former VB programmers that looooved to sprinkle try { ... } catch( Exception e ) { throw e; } throughout. I mean thousands upon thousands of instances of these pointless magical incantations that do nothing except screw up stack traces. They probably thought it was the equivalent of "ON ERROR RESUME NEXT", but... no, not really. Anyway, I asked ChatGPT in "Reasoning" mode to write a CLI tool utilising the Roslyn C# compiler SDK to help clean this up. It took about three prompts and less than an hour, and it spat out 300 lines of code that required less than 10 to be modified by me. It deleted something like 10K lines of garbage code from a code base for me. Because I used a proper compiler toolkit, there was no risk of hallucinations, so the change Just Worked.
2) I was recently troubleshooting some thread pool issues. I suspect that some long-running requests were overlapping in time, but Azure's KQL doesn't directly provide a timeline graphical view. I dumped out the data into JSON, gave ChatGPT a snippet, and told it to make me a visualiser using HTML and JS. I then simply pasted in the full JSON dump (~1 MB) and ta-da instant timeline overlap visualiser! It even supported scrolling and zooming. Neat.
Last night I was about to start working on a lot of text I need to submit my startup to a government funding program. Questions like "describe what your startup does", "describe your market" and things like that. Tens of fields which I estimated would take me a week and a half to do it right, if I wasn't going to pause all my other activities.
Then I had a better idea: I spent 20 minutes baby wearing, walking and dictating everything about my startup to ChatGPT. Later I took all that text and labeled it as a brain dump, plus my product support portal and some screenshots of my marketing material. Gave it all to ChatGPT again and asked it to answer each of the questions in the form. That's it. I have a pretty good version 1 which I can revise today and be done with it.
Many, many hours saved. I have tens of examples like that.
The product documentation I provided it with was also created with the help of GPT, and that saved me even more time.
Autocomplete on steroids is what I use it for. I've recently started using Cursor and the productivity improvements have been huge. I won't let it write very large blocks of code but I do a lot of web stuff so being able to update the classes in one spot and have it recognise all the other places that might be helpful and let me just tab through. Code to test things it's pretty good at as well which also saves a lot of typing.
For me, getting summaries of meetings is my favorite use case. Saves me from taking notes and I can extract next steps.
It also helps me getting started with new content, kind of building the scaffolding of, say, a blog or social post. It still needs adaption and fine-tuning, but getting rid of a white page is a great help for me.
And I use LLMs to play through ideas and headlines. I would normally do this with other humans, but since working full remote, its a nice sparing partner, although the AI not being able to really give criticism is a bit annoying.
The tools also make it easier to write in English as a non-native, making sure my text does not include any false friends or grammar errors.
Meeting summaries is the most hilarious thing these neural networks have produced. I don't know which NN model does Zoom use, but the text it produces is super funny :) . It basically can't parse half of the words, and then generate a random sentences using the remaining ones.
Agreed, it was super funny, especially if a song played in the beginning or if you switched languages. It has gotten way better, at least in my experience.
> although the AI not being able to really give criticism is a bit annoying.
I’ve managed to get ChatGPT to a good place in this regard recently and it’s better for it. In fact, it’s been doing such a good job recently that it almost seems like…human like.
I’ll have to look at my prompts, but somehow I got it from licking my ass and telling me everything I say and do is amazing to a point now where it almost seems eager to tell me I’m wrong.
Which is fantastic, huge improvement. I don’t really use it for coding though, because I am not a programmer. I would have no means today to correctly evaluate 90% of what it would return me.
I use them as an alternative to search engines for topics where I have some specific question where traditional search engines fail to find the needle in the haystack.
As a concrete example, I was recently playing with simulating the wave equation, and I wanted to try to use a higher-order approximation as I had never done that before. I'm quite rusty as I haven't done any numerical work since university some decades ago.
I still recalled how to deal with the Neumann boundary conditions when using the traditional lower-order approximation, but I was uncertain how to do it while keeping the higher-order approximation.
Searching for "higher-order neumann boundary conditions wave equation" or similar got me pages upon pages of irrelevant hits, most of them dealing with the traditional approximation scheme.
So I turned to ChatGPT which right away provided a decent answer[1], and along with a follow-up question gave me what I needed to implement it successfully.
One thing I can't figure out how to get LLMs to do is truely finish work. For example if I have 100 items that need xyz done to them, it will do it for the first 10 or so and say ~"and so on". I have a lot of trouble getting LLMs to do tasks that might take 10 mins - 1h. They always seem to simply want to give an example. Batch processing is the answer, I guess, or perhaps more 'agentic' models/tools - but I wonder if there are other ways.
finally, after two years of hype I have some usage for llm.
we import descriptions of products from a seller. the problem is they are mental ( probably written by chatgpt :)) and are way too long. we need only small blurb.
I put our style guide and given text to chatgpt and I get somehow reasonable description back. then editors still need to check it, but it's way less work.
> ZIZEK: that AI will be the death of learning & so on; to this, I say NO! My student brings me their essay, which has been written by AI, & I plug it into my grading AI, & we are free! While the 'learning' happens, our superego satisfied, we are free now to learn whatever we want
I find them useful for searching for some function or API name with natural language. 'What's the function call [in blah] that generates a quaternion from a couple of vectors?' type stuff. Not exactly inspiring but I've found it highly useful. If you try to search for something like that online (and somebody hasn't asked the exact question on e.g. stack overflow) you'll just end up getting all the documentation for quaternions, vectors, and blah - when the function itself might even be in a tertiary math library.
LLMs are pretty good at translation between human languages which makes sense since they are language models after all. They are better at this any any other technology.
The state of the art image ones can also probably do OCR and handwriting recognition better than any other software though may be expensive to run in large volume. But if you need to take picture of a notebook page with your camera phone an LLM can quickly OCR it.
I use it daily for all sorts of things, but one of the most interesting uses for me so far has been self-reflection.
For example, in the beginning of this year, I completed this exercise where I wrote a lot about childhood, past experiences, strengths and weaknesses, goals and ambitions for the future, etc (https://selfauthoring.com) and then I uploaded all that to ChatGPT, asked it to be my therapist/coach, and then asked it to produce reports about myself, action plans, strategies, etc. Super interesting and useful.
By now ChatGPT has quite a bit of context from past conversations. Just the other day I used this prompt from someone else and got back super useful insights – "Can you share some extremely deep and profound insights about my psyche and mind that I would not otherwise be able to identify or see as well as some that I may not want to hear"
I don't use it daily, and I find copilot counterproductive (for me). I do try to experiment with chatgpt when I remember to.
I find it good for complex SQL, reviewing emails, and Godot assistance (I'm a beginner game Dev).
There are also times when I have programming questions and I might try to use chatgpt, with mixed results.
Our company has tried to integrate it into one of our products, and I find it troubling how on occassion it is confidently giving bad results, but my concern seems to be in the minority.
EDIT: there was also a large refactor I did recently which involved lots of repeatable, but not super regexable, changes - chatgpt forgot where it was as I went through it, but other than working around that it was very useful.
In the last few days I discovered it's good at medium-complexity SQL, not at really more complex ones. I'm struggling like 4th day with Claude, ChatGPT, Gemini and Deepseek. All could do some good analysis with some low-hanging fruit improvements, but all went completely crazy when trying to optimize more complex things, getting into loops proposing the same changes over and over, outputting invalid SQL, and gemini even forgot what we were doing, asking me if I could paste again the query i included in the very first message. Maybe the chain of thought models would handle this better, but I believe I hit the limit for the standard ones.
I use mine as if it's an infinitely patient, relatively competent junior/medior level developer that I constantly give small chunks of programming to do (typically a function at a time), and occasionally consult on architecture/design/other things.
I don't use integrated coding tools, so my workflow isn't super fast, but that's not what I'm really aiming for - more that I want to save my brain's energy from low level drudgy boilerplate or integration code, so I can focus it on the more important decisions and keep business-side context in my head.
It's been a huge help for me this way across multiple projects in multiple domains.
I've coded a full custom deployment system (config, create, update, cert manege and much more) in full bash using nothing but ChatGPT. I didn't write a single line of bash.
I did write 50 or more lines of instructions on what needs to be done and in what order.
ChatGPT gave me 5/6 (I asked for this) bash scripts totalling 300+ lines that seamlessly work together.
After reviewing, I asked it to change a few places.
If any human tried the same (except those rare bash Gods), it'd take many hours. I think it took me less than 30 minutes.
The Vergecast recently did a section where they asked listeners what they use LLMs for (specifically not for coding) https://youtu.be/WwNjBNtZ3Co 30 minutes starting at 45:25, it had a number of interesting examples. Might not convince you of LLM's excellence, or might not be much different from what other people commented, but it's a good listen nonetheless.
1. Small coding tasks ("I want to do XYZ in Rust"), it has replaced stack overflow. Very convenient when writing code in a language I'm not super familiar with.
2. Help with English (traduction, proofreading...)
3. Learning something, like tech, I like interacting with it by asking questions, it's more engaging than just reading content.
I'd say nothing is game changing, but it's a nice productivity boost.
The sister comments contain quite a few specific examples. But the many back-and-forth arguments here on HN about whether LLMs are useful for coding suggest that understanding how they might or might not be used may be the biggest challenge at this point.
I myself use them a lot, though I constantly feel that I would be able to get more out of them if only I were smarter.
> All my attempts so far have been pretty underwhelming:
Same, it's good for repetitive things, things that have been answered 1000 times on stack overflow, translations, but that's about it. If you work on anything remotely new/hard it's mostly disappointing, you have to babysit it every step of the way and rewrite most of what it's shitting out in the end anyways.
I think it just made it obvious that 90% of tech jobs basically amount to writing the same CRUD thing over and over again & mobile/web apps with very common designs and features.
I admit to having been an LLM sceptic from day one, but I have been using ChatGPT and Claude a fair bit to try and figure out what the hype is all about. I haven't really succeeded.
Most recently I tried to use them both to solve a programming problem that isn't well documented in the usual channels (Reddit, StackOverflow, etc) and found it to be quite a disappointing and frustrating experience. It just constantly, enthusiastically fed me total bullshit, with functions that don't exist or don't do what the LLM seems to "think" they do. I'm sure I'm just "holding it wrong" but my impression at this stage is that it is only capable of solving problems that are trivially solvable using a traditional search engine, with the added friction that if the problem isn't trivially solvable, it won't actually tell you that but will waste your time with non-obvious wrong answers.
I did have a slightly more positive experience when asking it about various chess engine optimisation algorithms. I wasn't trying to use the code it generated, just to better understand what the popular algorithms are and how they work. So I think they might work best when there is an abundance of helpful information out there and you just don't want to read through it all. Even then, I obviously don't know what ChatGPT was leaving out in the summary it provided.
I just used chat gpt to summarise a HN post about it taking unexpectedly long to install a washing machine because of unexpected turns of events, and this being analogous to software development. It was a time-saver.
I think it excels when you know enough to precisely describe what you want but you don’t know enough about the details of the language or framework you’re using to implement what you want.
One way I treat LLMs is as a "semantic search engine". I find that LLMs get
too many things wrong when I'm being specific, but they're pretty good at
pointing me in a general direction.
For example, I started learning about OS development and wanted to use Rust. I
used ChatGPT to generate a basic Rust UEFI project with some simple
bootloading code. It was broken, but it now gave me a foothold and I was able
to use other resources (e.g. OSDev wiki) to learn how to fix the broken bits.
* Avoiding reading the entire manual
It feels like a lot of software documentation isn't actually written for real
readers; instead being a somewhat arbitrary listing of a program's features.
When programs have this style of documentation, the worst case for figuring
out how to do a simple thing is reading the entire manual. (There are better
ways to write documentation, see e.g. <https://diataxis.fr/>)
One example is [gnuplot](http://www.gnuplot.info/). I wanted to learn how to
plot from the command line. I could have pieced together how to do it by
zipping around the
[gnuplot manual](http://www.gnuplot.info/docs_5.4/Gnuplot_5_4.pdf) and building
something up piecewise, but it was faster to instruct Claude directly. Once
Claude showed me how to do a particular thing (e.g. draw a scatter plot with
dots intstead of crosses) I then used the manual to find other similar
options.
* Learning a large codebase / API
Similar to the previous point. If I ask Claude to write a simple program using
a complex publicly-available API, it will probably write a broken program, but
it won't be *completely* bogus because it will be in the right "genre". It
will probably use some real modules, datatypes and functions in a realistic
way. These are often good leads for which code/documentation I should read.
I used this approach to write some programs that use the
[GHC API](https://hackage.haskell.org/package/ghc). There are hundreds of
modules, and when I asked Claude how to do something with the GHC API it wrote
relevant (if incorrect) code, which helped me teach myself.
* Cross-language poetry translation
My partner is Chinese and sometimes we talk about Chinese poetry. I'm not very
fluent in Chinese so it's hard for me the grasp the beauty in these poems.
Unfortunately literal English translations aren't very good. We've had some
success with asking LLMs to translate Chinese poems in the style of various
famous English poets. The translation is generally semantically correct, while
having a more pleasing use of the English language than a direct translation.
I like having fun with them, like by asking grok whether some Elon Musk tweet is true. Usually it replies with a lengthy answer and I then force it to answer with Yes or No. Even more fun when drilling it more to load a few more details into it's brain, and then ask the first question again, with a Yes/No only, it sometimes does change its answer. I do wonder, has Grok already joined the resistance against Musk?
Honestly, the main thing I've found ChatGPT to be useful for in my daily life is helping to translate what I write from my native language to the language spoken by most of the people where I live. But even then it only really works if you have at least a basic understanding of the language and can ask it to rewrite sections when you recognize poor word choices or awkward phrasing.
LLM are good at one thing, and totally by chance it is the thing they have been designed to be: be a word probability generator. If you can constrain your usage around that, they are great to use. But the people who think they can reason or know some kind of truth are delusional
It's very obvious from the mistakes they make that they are not reasoning but providing the most probable answer according to their dataset. It's very impressive because their dataset is humanly big, but there is no reasoning
I use it for lots of stuff where I'm not an expert, or that are low stakes. I don't use it for the "core" of my job, but there are many things that are not "core" and still eat up a lot of time, in fact, most of my workday would be in this category. Some typical examples from my daily life as a university professor:
- Writing Python scripts to make charts out of Excel sheets, and then refine them. I could do it myself, but I would need to learn a library like Seaborn or similar which honestly is not especially intellectually stimulating, and then spend nontrivial amounts of time iterating on the actual code. With LLMs it's a breeze.
- Working with cumbersome LaTeX formatting, e.g. transposing a table, removing a column from a table, etc.
- Getting the tone just right in a professional email written in English to someone I don't know much (I'm not a native speaker so this is not trivial).
- Finding resources on topics that are tangential to what I do. For example, yesterday I needed to come up with some statistics on English words for a presentation I'm preparing, and I needed a free corpus where I could search for an n-gram and get frequencies of next words. I don't usually work with that kind of resource, it was just a one-off need. I asked for corpora of that kind and got a useful answer instantly. The manual process would probably have implied going through several options only to find that I needed a license or that they didn't provide the specific statistics I needed.
- Brainstorming on titles for scientific papers, presentations, names of concepts that you introduce on a paper, variable names, etc.
- Shortening a sentence in a paper that makes me go over the page limit, or polishing the English in a paragraph.
- Summarizing a text if I'm kind of interested in knowing the gist but have no time to read it whole.
- Answering quick questions on basic things that I forget, e.g. the parameters to make a Linux folder into a tar.gz. Man is too verbose and it takes time to sort the wheat from the chaff, Google is full of SEOd garbage these days and sometimes you need to skim a lot to find the actual answer, LLMs are much faster.
- Writing bureaucratic boilerplate, the typical texts with no real value but that you have to write (e.g. gender perspective statement on a grant request).
- Coming up with exam questions. This is a rather repetitive activity and they're fantastic at it. At my place we also have two official languages and we need to have exam assignments on both languages, guess who does the translation now (respecting LaTeX formatting, which previous machine translation tools typically wouldn't do).
- As an example of a one-off thing, the other day I had to edit a Word document which was password-protected. I asked ChatGPT how to unlock it and it not only answered, but actually did it for me (after 3 tries, but still, much faster than the time it would have taken for me to find out how to do it and then actually do it).
These are just some examples where they contribute (greatly) to my productivity at work. In daily life, I also ask them lots of questions.
Something isn’t open-source because you get everything that went into making it. Something is open-source if you can change it (relatively) easily. The GPL and open-source definition both refer to “the preferred form for making modifications”. The preferred form for modification in the Nvidia driver’s case is the source code. The preferred form for modification in this case is the weights.
Open-source as a concept doesn’t really correspond well with LLMs but to the extent that it does, access to the training data is not required because that training data is not the preferred form for making modifications.
> that training data is not the preferred form for making modifications.
I definitely disagree with this.
Yes, you can do some SFT fine tuning on an existing model, but if you want to make specific, substantial, targeted changes (less safety? better performance on math and code at the expense of general knowledge?), your best bet is to change the training mixture, and for that you need the original datasets.
Preferred by whom? Sharing models isn't open source, and we're just going to have to keep having this argument. Letting us download the model is a very nice thing for Facebook to do, but you don't get to call it open source if you're not showing us the source! Explicitly, if we can't see the forced alignment, where the model gets its refusal to talk about Tiananmen Square or how to make meth or it The Information is a reputable news source, then it's not open. The preferred form of modification is to take the data, and train it. That some people have been able to take the model and tweak it, doesn't make it preferable.
It's started with abusing the term "AI", I don't see a reason why they would not abuse "open source". I guess it's what happens to language when a concept becomes mainstream: people use it wrong, but if enough people do it, it becomes the new meaning?
I dislike when people like RMS get semantic and gatekeep words like "free software", but this is the end product of a world without gatekeepers. People just use words in a way that's convenient to them.
Or maybe some gatekeeping is actually good, and we just have to use more of our braincells and figure if a particular gatekeeper is good or bad. It's a good thing that being a pilot for an airline is gatekept to qualified pilots that know how to fly a plane. It's a bad thing that I need a hair cutting license to buy hair dye to dye my own hair at home.
We often see semantic drift over a long period of time. It's just that the overarching topic of AI is being played significantly faster than what we would normally have observed in other fields.
Why? Prescriptivist/prescriptivism is afaik the usual term. Proscriptivist is the other, but quite rare, so rare that my phone dictionary says it’s a misspelling.
> Prescriptivist/prescriptivism is afaik the usual term.
It is but it was the "correct" part attached to prescriptivism they'd be mocking because that is not how linguists and grammarians work (they are descriptivists and fond of making fun of prescriptivists.)
this is correct. "open source" means everything required to recreate from scratch and improve. not "here's a massive binary, an interpreter script and permission."
That could probably be solved with bit-torrent. I think the bigger obstacle is the hardware required for training. Maybe it would be possible for groups of people to reproduce/train open source models with a distributed BOINC-like system?
You would open source the procedure and reference where the data came from. If there is any non-open source content used in training, then the project couldn’t qualify as “open source”.
But this thread is about misuse of the term as applied to the weights package. Those of us who know what open source means should not continue to dilute the term by calling these LLMs by that term.
that's how you know when you actually have agi, when you have something that you don't have to shovel in every written word known to man to make it work, but rather can seed it with a few dense public domain knowledge compendia and have it derive everything else for itself from those first principles- possibly going through several stages of from scratch training and regeneration.
The reason why you need to shovel every written word known to man to make it work is because it needs to learn what words mean before it can do anything useful with them, and we don't currently know any better way of making a tabula rasa (like a blank NN) do that. Our own brains are hardwired for language acquisition by evolution, so we can few-shot it when learning and get there much faster; and if we understood how it works, we could start with something similarly hardwired and do exactly what you said.
But we don't actually know all that much about how language really works, for all the resources we spend on linguistics - as the old IBM joke about AI goes, "quality of the product increases every time we fire a linguist" (which is to say, we consistently get better results by throwing "every written word known to man" at a blank model than we do by trying to construct things from our understanding).
All that said, just because we're taking a different, and quite possibly slower / less compute-efficient route, doesn't mean that we can't get to AGI in this way.
> Our own brains are hardwired for language acquisition by evolution, so we can few-shot it when learning and get there much faster
No, we can’t few shot it and we don't get there faster (but we develop a lot of other capabilities on the way.) We train on a lot more data; the human brain, unlike an LLM, is training on all that data in processes for ”inference”, and it receives sensory data estimated on the order of a billion bits per second, which means by the time we start using language we’ve trained on a lot of data (the 15 trillion tokens from a ~17 bit token vocabulary that Llama3 is something like the size of a few days of human sense data.) Humans just are trained on and process vastly richer multimodal data instead of text streams.
I was talking about language acquisition specifically. Most of the data that you reference is visual input and other body sensations that aren't directly related to that. OTOH humans don't take all that much text to learn to read and write.
> I was talking about language acquisition specifically.
Yeah, humans don't acquire language separately from other experience.
> Most of the data that you reference is visual input and other body sensations that aren't directly related to that.
Visual input and other body sensations are not unrelated to language acquisition.
> OTOH humans don't take all that much text to learn to read and write.
That generally occurs well after they have acquired both language and recognizing and using symbolic visual communication, and they usually have considerable other input in learning how to read and write besides text they are presented with (e.g., someone else reading words out loud to them.)
Most of these claimed “open” models are not open source. Some of them are open weights. But even some of the ones that share weights are not really open - they force a restricted license on you. To be open source I think they need to share training data and training code under an OSI approved license.
AI2 has a model called OLMo that is actually open source. They share the training data, training source code, and many other things:
While I do agree with your point - I wonder what information companies could release that'd be immediately useful to you.
It's not like they understand what the weights mean either and if they released the code and dataset used to create it, you probably couldn't recreate it, owning the fact that you don't own tens of thousands of GPUs.
If a software's source is released without all the documentation, commit history, bug tracker data etc., it's still considered open source, yet you couldn't recreate it without that information.
No, it's not, as it means nobody is pushing for actually open models.
A truly open model has open code that gathers pre-training data, open pre-training data, open RLHF data, open RLAIF data generated from its open constitution and so on.
The binary blob is the last thing I'd want - as a heavy user of LLMs I'm actually more interested in the detail of what all training data is in full, than I am the binary blob.
I’d say it’s more like eating Chicken Cordon Bleu and then asking the chef for a recipe, who replies, “Certainly! Step 1) Acquire Chicken Cordon Bleu, preferably cooked. Step 2) if uncooked, cook. Otherwise, consume
So is asking ChatGPT to write your application, then open sourcing said application IMO.
I see both sides here, but I don't think it's a hill worth dying on. The 'open source' part in this case is just not currently easily modifyable. That may not always be the case.
This is still to be determined, based on whether the output of ChatGPT is copyrightable by ChatGPT, copyrightable by the requester, or something else.
I think the two plausible answers are:
1. The person prompting (for example telling chatgpt 'please produce a fizzbuzz program') owns the copyright. The creativity lies in the prompt, and the chatgpt transformation is not transformative or meaningful.
2. The output of ChatGPT is derivative of the training data, and so the copyright is owned by all of the copyright holders of the input training data, i.e. everyone, and it's a glowing radioactive bomb of code in terms of copyright that cannot be used or licensed meaningfully in open source terms.
There are existing things like 1, where for example if someone takes a picture, and then uses photoshop to edit it, possibly with the "AI erase" tool thingy, they still own the photo's copyright. Photoshop transformed their prompt (a photo), but adobe doesn't get any copyright, nor do any of the test files adobe used to create their AI tool.
I don't think AI is like that, but it hasn't gone to court as far as I know, so no one really knows.
A compiled executable is not any less software than the source code. But the point of open source code is not the ability to see the CPU instructions though, is it?
Its about reproducibility and modifiability. Compiled executables (and their licences) lack that. The same as these downloadable blobs.
You make the start of a good point, but miss most of it.
You can absolutely have open source machine code.
The issue is and always has been that you need to have access to the same level of abstraction as the people writing the source code. The GPL specifically bans transpilers as a way to get around this.
In ML there is _no_ level of abstraction other than the raw weights. Everything else is support machinery no different to an compiler, and os, or a physical computer to run the code on.
Linux isn't closed source because they don't ship a C compiler with their code. Why should llama models be any different?
Is this question in good faith? The way generated code and data should be open sourced is by releasing the tools and configuration used to generate it. There's never been much confusion around this, to my knowledge.
I'm not even necessarily advocating that these things should be released, but the term "open source" has a pretty well-understood meaning that is being equivocated here.
What's remarkable to me about criticism like this is how quickly it's rebutted by people claiming "where did they say they would do this," as if these people don't make incredibly rushed and poorly planned decisions all the time. It's like an idea immune system that rejects any criticism or self reflection. It would be sociologically fascinating if it wasn't being combined with a dereliction of congressional power and an unchecked executive.
I literally post for years the same thing about the need for open source alternatives to social platforms where 1 person controls the algorithm to prioritize what a billion people see. And the response is “meh”. No one even bothers to read past the first paragraph:
But if you really want to see the “immune system” shine, mention web3 and smart contracts, and watch the downvotes pour in. Any time one even mentions “decentralized byzantine fault tolerant” anything, an army rises up to repeat anodyne versions of “grift… no one needs it… banks are great…” etc.
But if you mention any concerns with AI, no matter who or what you cite, the same group goes the other way and always repeats “(insert problem here) has always been possible, there is nothing to see here, move on, AI is amazing, deregulate and let the industry develop faster”:
Having been on the internet for a very long time, I can answer why open source alternatives to social platforms seldom get off the ground: the network effect is huge, and the community of users matters far more than any of the technology.
Don't bother telling people how it works. Show them who's using it and for what.
Oh, and for any kind of "normie" use it must have a decent moderation and anti-abuse system. Which inevitably clashes hard with "decentralized". Bluesky is succeeding because it lives in a contradiction of pretending to be decentralized, but what it really offers is the "pre Elon Twitter" experience. To basically the same people.
> the same group
While there's a certain amount of hivemind, it's rare that you see people directly contradict their own posts here; what you're seeing is different people.
This is the problem we're working on with https://freenet.org/ - a general purpose platform for building entirely decentralized services.
Our thesis is that the client-server architecture is a fundamental flaw in the world wide web's design, which inherently concentrates power in the hands of a few. Freenet aims to be a general purpose replacement for this in which all services are entirely decentralized.
The first non-trivial app we're building will be a group chat system called River[1].
I like the new Freenet! I interviewed your founder, Ian Clarke, 2 years ago on my channel — discussing the original freenet, probably the first truly decentralized content network in the world. Here is the 2-hour discussion:
Look around the 12 minute mark when I start to discuss how “the capitalist system” produces centralized monopoilies that extract rents for their shareholders.
The original Freenet had over 6m downloads over the years - and pioneered ideas like cryptographic contracts which later formed the basis for bitcoin, but it was always a very experimental project, while the new Freenet is designed for mass adoption.
They key differences between old and new Freenet are:
Functionality: The previous version was analogous to a decentralized hard drive, while the current version is analogous to a full decentralized computer.
Real-time Interaction: The current version allows users to subscribe to data and be notified immediately if it changes. This is essential for systems like instant messaging or group chat.
Programming Language: Unlike the previous version, which was developed in Java, the current Freenet is implemented in Rust. This allows for better efficiency and integration into a wide variety of platforms (Windows, Mac, Android, MacOS, etc).
Transparency: The current version is a drop-in replacement for the world wide web and is just as easy to use.
Anonymity: While the previous version was designed with a focus on anonymity, the current version does not offer built-in anonymity but allows for a choice of anonymizing systems to be layered on top.
> But if you really want to see the “immune system” shine, mention web3 and smart contracts, and watch the downvotes pour in
I'm all for distributed / P2P social media, but crypto is full of some of the most scammy and downright shameful behavior I've ever seen in my life. Pump and dumps, rug pulls, money laundering. There is a real reason people hate crypto.
To tip it off, crypto is one of the least meritocratic things there is. The longer you've been in it, the more people you've scammed, the more you hype, the "wealthier" you are.
Crypto smells like a shit and vomit sandwich and people immediately turn their noses.
Build P2P social without the crypto angle and you have my attention. I've been wanting p2p (not federated) social media since the 200Xs and the decline of the indie web. Social and news should work like email and BitTorrent, not Facebook or "federated Twitter".
> I'm all for distributed / P2P social media, but crypto is full of some of the most scammy and downright shameful behavior I've ever seen in my life. Pump and dumps, rug pulls, money laundering.
The SEC's answer no questions, sue first, approach to crypto in general made legitimate players afriad to operate, so the space became dominated by those that didn't care about the law.
This isn't true, and last time someone tried to prove it was, they cited .. a huge PDF of all the questions the SEC had been asking crypto firms prior to action.
Besides, the rules are over now. The US President ran a pump and dump. Can't get more legitimacy than that.
I looked at your links and I still don't get it. I do want to understand. Where is the problem stated, clearly and concisely? What is the solution and why does it require crypto?
I say that as someone who read the Bitcoin paper in 2012 and was pretty excited back then.
Meanwhile online scams are a bigger industry than the illegal drug trade and bigger than the Australian economy. There are thousands of modern day slaves in call centers in Myanmar and the Philippines with fake social media profiles running pig butchering scams. That industry runs on crypto 100%. I guess that's one "problem" crypto solved.
You need some pretty convincing arguments at this point to convince me (and many others) that getting rid of this stuff wouldn't be a big win for humanity.
Here is the problem statement and solution for community leaders, the same class of decision makers who exited “AOL Keyword NYTimes” in favor of “nytimes.com” on this newfangled protocol called HTTP, with its servers and clients called browsers that people were downloading
When they asked for a clear and concise description of your problem and solution, they are probably looking for a problem statement: a focused, 1 or 2 sentence explanation of the problem you intend to solve. You then present your proposed solution in the same form.
Hypothetical example problem statement:
We want to promote ycombinator to everyone that could benefit, but banner ads make us look chintzy, directly engaging in the feral discourse on Slashdot would inevitably look unprofessional, and engaging directly through dozens of purpose-built blogs and websites is too onerous.
Hypothetical example solution statement:
We should create our own simple, well-designed news site built on user submissions, and include threaded discussion capability with moderation built in at both the community and company level to keep things relatively civil. Then our audience will come looking for us.
What you offered is not a problem statement. It is a sales deck offering a, frankly, convoluted explanation of how starting a currency will solve a largely unrelated problem backed up by an unsupported assertion about the least representative sample in the world— Donald Trump.
I read it all. It's apparently supposed to be a way for celebrities to extract money from their audience by having them buy into their currency.
If you're satisfied with calling that useful, okay, I guess - to me it's deeply alarming that this is presented as a good example of a useful application of crypto.
In the broader context of crypto demand being driven essentially by digital crime and gambling, there would need to be some seriously glowing example of something good that can be done with it to shift my judgment.
For example, in the early days of Ethereum, I thought it'd be possible at some point to build truly open source, decentralized SaaS, where the deployment happens to the blockchain, and that this in turn would enable open source projects to finance themselves.
I've yet to see an example of this where the crypto aspect isn't a figleaf.
I'm very concerned that people arguing for exciting applications of crypto are involuntarily legitimizing the online crime ecosystem. Crypto in practice seems to lead to a massive transfer of assets to criminals. To an extent where that may end up destabilizing whole countries, given the market cap trajectory.
It doesn’t explain anything. It asserts a lot. Sorry I took the time to critique and give examples as a freelance business communication designer. Effective business communication requires frank feedback, and mine usually isn’t cheap, but if protecting your ego is the goal here, just keep assuming you’re doing everything right and it’s everybody else’s fault it’s not landing.
> We started using it when Jack who founded Twitter, started bluesky, promoted nostr started using it
Jack Dorsey is certifiably insane. His obsession with cryptocurrency is a warning to anyone that throws away success to live as a crypto maxi. You will lose the only things that matter to you in life, your business will be taken away from you by shareholders if you own one. Your control will be hated by users that accuse you of trying to ruin the internet with NFT profile pictures and crypto tickers. Many users outright left as a consequence, others would leave after the takeover. But Dorsey set the stage for the enshittification of Twitter, and anyone that's forgotten that should face the music.
Web5, no matter who utters it, is a phrase that means nothing. A person walking on the street would not be able to define it for you. Me, a programmer, cannot define it for you or even explain what it looks like. It is a marketing term as applied to Free Software, which will alienate Free Software users and disgust/confuse common people. If you cannot find a better phrase to describe your philosophy then people will forever associate you with the myriad grifters that shared your "Web(n)" branding.
This is interesting, but it feels too platformy for my use. I'd really like to see something 100% like BitTorrent.
Instead of trying to build a "you.com" (as in your pdf example), I want a place we're all just a simple disposable signed hash address (that you can change, make public, keep pseudonymous, etc.) - easy and disposable if needed, but also possible to use as the building block of an online presence or brand if your hash becomes well known. Kind of like email, in that sense.
The platform doesn't need real time streaming video or video calls. Just text and images to start. P2P Reddit or Twitter.
It shouldn't be about building a platform where you attract others to your brand. That can come later. It should be about participating in a swarm and building up a mass of people. An exchange of ideas and information, but where it feels like being in public. Like BitTorrent. Once network effects kick in, you can find the nodes (people, topics, etc.) you care about and want to prioritize in your interest graph.
It's remarkable to me how "Web3 is a grift" has seemingly become tribal consensus here, without any real basis in reality. I think the last administration's explicit efforts to block crypto legitimization played a big part in this. It's clear that if you tried to follow the law and operate as a legitimate player, you risked being debanked or legally targeted by the SEC—and they made little to no effort to answer questions or help you work within the law's constraints. They wanted to sue first. As a result, those who ignored the law ended up dominating the space. This reflects policy failures, not issues with the tech or its legitimacy. I'm hoping the Trump administration shifts this dynamic, but now there's a reputation problem that needs correcting as well.
To quote Patio11, "It's not a conspiracy if they really are out to get you."
Crypto in general and Web3 as well, all have mostly delivered scams. To the tune of billions stolen from everyday folks. Everything (to within a rounding error) that hasn't been a scam has delivered nothing else but being a speculative asset at best. Everything else has been a barely working toy that's better served by non-distributed implementations of the same thing.
People shit on crypto. government, regulators, and the public, all dislike crypto because the only thing that ever happens to us with it and the only thing we ever hear about happening, is folks losing money to scams.
There's no mystery here. Crypto doesn't need a policy shift. Crypto needs to stop fucking over folks. Yes it's cool technology, yes it also seems to just be a way to part folks from their money.
That's like saying the only thing that ever happens with AI is people losing their jobs to AI. And unlike Crypto, they didn't opt in and literally buy digital assets and send them voluntarily somewhere. They get negatively affected regardless of any choice they have made. "Get on board, or get rolled." People worldwide would lose a lot more money to AI growing than crypto growing, regardless of never opting in. It will just be a giant wealth transfer to the already-wealthy and corporations. What about that? Oh, crickets. Dismissal from the HN crowd.
If I am going to put my money at risk, I expect it to be at risk. I'm happy to have a regulatory framework around that from the SEC, for instance, and there are. For example, since the JOBS Act, the SEC has greatly expanded the opportunities to raise money in a regulated way. I even interviewed the actual authors of Regulation S at the SEC, where I go into depth for an hour about how to raise money legally:
And frankly, most true adherents of crypto have been yelling from the rooftops that Celsius and FTX and Binance are not actual DeFi. They are not decentralized, they simply tell you to do the very thing crypto was designed to avoid -- i.e. send them money and "trust them". This is the very thing Bitcoin and Blockchain were designed to avoid -- the middleman.
FileCoin and UniSwap and Aave Marketplace and so on are real crypto, and they have never had any scandals and billions of dollars, bits, etc. are entrusted to them every day. Ditto for most altcoins and networks, including Hedera Hashgraph, Polygon, COSMOS, Polkadot, etc.
Any shade thrown at, e.g. Telegram's TON or Ripple's XRP, is due to regulators. I can understand why Facebook's Libra was shut down. But it has to do with them becoming "too powerful" and "not subject to national oversight". Kind of like Facebook and Twitter and Google themselves.
Everything that you just mentioned, as far as "what it's actually doing" is either speculation/speculation accessories or is a not-as-good-version of existing offerings. Where is the value?
UniSwap: a marketplace for speculation on arguably scam crypto products.
Aave Marketplace: a marketplace for speculation on arguably scam crypto products.
FileCoin: file storage at rates 50% higher than e.g. BackBlaze/DigitalOcean.
There's no actual value here other than as scam, speculation (nearly a scam), or products that are flimsy pretenses at not being scams (but which don't deliver a lot of value). Why should anybody care (other than transparent greed)?
You do realize you and your OP are currently top comments in your respective threads both criticizing Elon and even preemptively criticize your imaginary criticizers
Elon's unchecked power at building a model? Or at politics?
I always worry whenever I see people telling me how to feel - rage in this case. We are in a political system that is oriented more around getting people to feel rage and hatred as opposed to consensus and deliberation. Elon is the face of that, but it's a much longer and larger problem. Throw in the complete dismisal that anyone not scared of this is ignorant, shuts down discussion.
The problem I have with Elon is that they are wasting a once in a lifetime chance to actually address and fix systematic problems with the US government. Deploying LLMs in the government space doesn't fear me with dread. Continuing the senseless partisan drive of the 20 years does.
> Continuing the senseless partisan drive of the 20 years does.
I think what the government is going through right now is wrapping up the last political system. The idea that Democrats and Republicans just need to learn how to work together is just wrong. The parties are being destroyed, and I think we should all cheer that. They were built to address the issues of the 20th century, and neither party in the current form is ready to address 21st century issues. I think AI, Climate change, world demographic changes around the world (ie: low birthrates) is going to seriously alter everything about our world from geopolitics, to economy, even social issues.
The democrats are stuck in supporting the new deal bureaucracy and the post ww2 order. That's over, it's crumbling right now, and i'm not going to try and defend any of it personally. It's just obsolete. The old Republican party your dad probably supported is dead too, that died a while ago. The new Republican party seems to be an alliance of people who just really want to cheer the crumbling of the old system (MAGA) and the first emergence of what politics in the 21st century is going to look like (the tech alliance).
Democrats would be smart to understand it's a new century we have new threats, new challenges, and need new institutions.... and this IS NOT a once in a lifetime opportunity to fix our government. This is the first draft of our new political system, and they have a choice to participate in shaping it, but they will need to get votes, and to get votes they need to stop stalking about obsolete ideas.
>The democrats are stuck in supporting the new deal bureaucracy and the post ww2 order
> The new Republican party seems to be an alliance of people who just really want to cheer the crumbling of the old system
I agree, and I think this is a bizarre flipping of the "Democrat ~= progressive / Republican ~= conservative" dynamic that has been largely assumed throughout my lifetime.
We need both conservative and progressive forces in our society. Someone needs to be saying "here's what's wrong with our system; here's what needs to change", and someone else needs to balance that with "here's what we're doing right; here are the parts that are working well and that we should not get rid of".
It seems to me that now, instead of that tug-of-war discussion happening between the two parties, it is happening in parallel within them. Unfortunately, the sane and responsible version of that discussion is happening entirely within the boundary of the Democratic coalition, in a way that is completely ineffectual because (a) the internal conservative moderating force is relatively strong in a moment when the populace seems to want more progressive action, and (b) to they have so little ability to effectively wield political power.
Meanwhile, the Republicans are dominated by a bizarro "progressive" faction that wants to pull us all in an entirely different (IMHO regressive) direction. And that faction is completely unchecked by any internal "conservative" moderating force within its own party, so it is for the moment able to push us as hard and fast as possible in its preferred direction.
> It seems to me that now, instead of that tug-of-war discussion happening between the two parties,
I'm REALLY looking forward to 2028, because I think that potentially will be the first election where we start to see what modern politics will look like. I wouldn't be surprised if there are multiple new parties, and multiples of them have a real chance. If it seems one sided right now, it's just because one side found their way to the start line first... but make no mistake, history shows that over time new political factions will form that offer resistance to bad ideas, and clear a path for the good ideas.
Given the rate of change with AI, We're going to have a real idea on what a world being disrupted by AGI (whether that is true AGI, or something close to it) looks like. At the same time Healthcare is only getting worse, and Trump is NOT going to fundumentally address it. China is going to be rising, and they're a real geopolitical threat. The war in Ukraine has completely changed what warfare looks like, and we're going to have to completely restructure our military (just like we have to restructure our healthcare). I also wouldn't be surprised if Trump's war with the cartel turns out to be far harder than expected because cheap autonomous drones allow a small military to compete against a large traditional military.
All of our prior assumptions on retirement are different too, retired boomers are not the same as the pensioneers from their day. They're not impoverished, instead they're flush with cash. I'm not sure in a world with an aging workforce you're going to be anti-immigrant... and all these benefits we give to retirees may not make sense in a world where retirees are wealthier than the regular workforce supporting them.
The general theme for the next decade is going to be throw out all the old books, 80% of our prior assumptions no longer apply.
Is this new political system akin to a banana republic? Because that’s what happens when you replace non partisan workers with loyalists in order to eliminate all accountability and oversight. Turning the rule of law into a partisan issue is a receipt for endemic corruption.
And even if you think the rule of law is antiquated, you’re misanthropically cheering the destruction of the largest institution in the world that 330 million people depend on for survival.
I'm happy to, thought the end of your statement strongly suggestions that you have are not acting in honest faith by asking this question.
1) All positions have become partisan, which political ideology being as critical to promotion in high level positions.
2) Congress refuses to act as the constitution intends, and have delegated their budget making authority to the executive branch.
3) The government specific procurement system is almost as expensive as what is being procured.
4) Auditing the government is almost impossible.
5) The debt load on the government is becoming unsustainable.
6) The lack of "digital transformation" (what we called it in banking) means poor service.
7) The unfunded liabilities (mostly at a state level) will swamp budgets in a few years.
8) Most large contracts should be fixed contracts, not cost plus contracts. Companies can do bilk the government for things that are a order of magnitude cheaper to the outside world.
9) Medicare refuses to lower health care costs (by reducing rates) due to political pressure.
10) No rationalization of government spending or revenue has occured since the post world war 2 era.
1. Making all positions partisan is a fascist tactic to challenge objective truth.
2. Congress as a whole isn’t a single entity —- one party refuses to compromise in any way while the other plays by the rules.
3. Doesn’t matter. Cost reform needs to go through existing legal routes.
4. What constitutes “auditing” the government? Because we had plenty of non partisan positions overseeing and auditing all parts of the government. DOGE fired those people.
5. Again, go through the legal route.
6. A lack of “digital transformation” is the vaguest most unconvincing point in this entire justification.
7. These budget issues need to be decided on through constitutional processes and with oversight, as before.
8. Ditto.
9. Medicare can lower health costs by other means, such as being available universally to all and setting limits on what they pay to providers based on procedure.
All of your points can be summed up as "Congress refuses to do their job".
Breaking all the laws to bypass the government does not "actually address and fix systematic problems with the US government", that is an absurd position. Caesar did not fix the Roman Republic.
And opposition to DOGE is not on the basis that people don't care about government efficiency. It's on the basis that the shit they're doing has nothing to do with government efficiency. There's not even a pretense of trying to calculate the "benefit" part of the cost-benefit equation with the cuts they are doing, they are just slashing and burning without any concern for outcomes as a power play and messaging tool. Elon is famous for doing this at Tesla and Twitter and all evidence points to it being incredibly harmful.
And not everything is about efficiency. Laying off veteran's crisis hotline workers or refusing to pay for the USAID employees you've just abandoned to be extracted (or in one case, medevac'd after a medical emergency) from the places they were sent to is just cruel (and again, illegal).
> I always worry whenever I see people telling me how to feel - rage in this case.
No one told you to feel rage.
> Throw in the complete dismisal that anyone not scared of this is ignorant, shuts down discussion.
Weird, there are a lot of comments doing discussion in reply to the parent comment. It hasn't been shut down at all! You read those words and disagreed with them, and wrote your own words in response. You're doing the discussion you're claiming is being shut down! What are you even talking about?
But it is a partisan issue. All these people on fat NGO salaries, all these federal workers not pulling their weight, all the welfare abuse, all these aid payments - which party do you think is keen to keep the spigot flowing? Of course, it would be a shame if they didn’t audit the Pentagon as well, definitely massive graft happening there.
It’s just wild to me that an attempt to tally up whats in the community grain store and where it’s allocated out to is even considered a bipartisan issue.
The problem is the person doing the tallying is doing it behind closed doors, has routinely been shown to lie to further his interests and has already been caught lying with the tallies he's released.
The GOP controls both houses and the POTUS. They could absolutely do a top to bottom audit with full transparency and make cuts where needed. But that's not what this is about.
Is that totally true, though? Maybe they have pulled wool over my eyes, but it seems like we've seen more transparency in the last few weeks than the last 40 years.
> ACTIVE SOCIAL ENGINEERING DEFENSE (ASED) LARGE SCALE SOCIAL DECEPTION (LSD)
Then there's the basic accounting 101 things like improper categorization, 150 year old people getting social security, etc. Why should the US government be held to a lower standard than a publicly traded company?
This ASED and LSD, aren’t they services to help the state counteract an information warfare attack? Just guessing, but it sounds like a legitimate thing where they’d want capacity to uncover/expose such activities, which I’m sure adversaries would consider.
Yes, the contract was for researching defenses against deception, was first awarded under Trump and also on public record, visible for many years, not "revealed" by anyone, especially not those DOGE masterminds. But what's even the point now? I think we're past discussing any facts here, because OP has a "sniff test" instead (see answer below).
Sure, sure. Pentagon money going to the western press, USAID (a literal terrorist organization) funding both sides of the narrative, what could go wrong?
There was a time liberals screamed at the top of their lungs over this type of threat to democracy, now they embrace and endorse it because they’ve fully merged with the primacy neocons.
Do you have a third party audit that this is true or have those datasets all been removed? Musk has shown himself unknowledgeable at best and purposely lying at worse so many DOGE findings are hard to take at face value.
Did you even read those articles? Full of BS excuses and justifications. None of them pass the sniff test by any honest person with above room temperature IQ.
People are just angry at Musk for turning their safe space into a free speech platform then switching sides. And that he’s now taking away their sides unlimited slush fund.
You clearly don't want to read anything outside of Twitter/Musk, but another error, fixed/hidden in order to keep showing incorrect data that looks better for DOGE.
The DOGE website initially included a screenshot from the federal contracting database showing that the contract’s value was $8 million, even as the DOGE site listed $8 billion in savings. On Tuesday night, around the time this article was published, DOGE removed the screenshot that showed the mismatch, but continued to claim $8 billion in savings. It added a link to the original, outdated version of the contract worth $8 billion.
So much honesty and transparency out of this group.
Man people can’t stand that Elon turned Twitter into a free speech platform. Anyways, I'm more of a long-form article, book, podcast and travel guy when it comes to informing my opinion.
I don't know any voters who want fraud to continue, but most do accept that fraud is just a part of any system designed and implemented by humans.
I personally would like to see the end of the "find gravy train, keep that gravy flowing at all costs" methodology of capitalism, because it's primary focus is money instead of the service provided. Whether it's pentagon contractors, business subsidies, or the heinous medicare and medical insurance fraud. But I don't want to cut SNAP even if someone buys a goddamn coke once in a while.
The current method seems to be brain surgery with a monkey wrench. Slash and burn with little thought given to the effects upon humans, especially those who don't have other options. Kagi gave me a figure of between 9.2 to 15.2 percent of welfare being fraudulent. Yes that's too high, yes I'd like to fix that, but I want that change to be considered, studied, and planned with caution.
Tbh I think “move fast and break things” is what’s needed. The government bureaucracy has ossified over many years, and any attempt
to change it gets bogged down in “committees” and “inquiries”. The only thing that will work is shock and awe, and if something important does get broken, it’s east enough to fix when its criticality becomes evident.
Has it occurred to you that the people who feel rage fundamentally understand the situation, and you may be undereducated in this area? What do you think are the root causes of that “senseless partisan drive”?
I’d suggest starting with Rick Perlman’s book “Nixonland” if you’re interested.
> Has it occurred to you that the people who feel rage fundamentally understand the situation, and you may be undereducated in this area?
Regardless of how justified the rage is or not, being very emotional about things usually have one of two effects on people A) people gets taken aback by someone's strong emotions or B) people get inspired/taking with the emotion, even feeling that emotion stronger themselves. Sometimes also C) they don't really care either way.
What probably isn't helpful, is calling someone is "undereducated" when they're clearly saying that they're person (A), and just because they may or may not agree with you (although parent didn't even clearly say they disagree, just that they're "taken aback" a bit).
Some people are calm regardless of what's going on around them, even if the world would be on fire, they'd try to describe what's going on around them with careful words and consideration. It isn't wrong or right, just like the people who feel rage and very emotional aren't wrong or right, it's just a showcase how we're different.
But we should aim to at least understand each other, not by trying to provoke, instigate or look down on others, but by asking questions to clarify and to better understand.
You're doing the exact same thing he is addressing in that statement above. He's not belittling anyone's rage, he's speaking about people who incite others to feel the rage with them. Now let's turn your question around.
Has it occurred to you that the people who feel rage fundamentally misunderstand the situation and are completely undereducated in this area, and are only fuelled by sensationalism and Media manipulation? And then I suggest you go read Dirty Politics by Kathleen Hall Jamieson if you're interested, because that's what people who want to sound more intelligent than the other half of the conversation always do.
Given the two of you probably have different models of reality, perhaps you two can try and figure out which is correct by seeing which model gives better predictions?
So try to come up with some sort of future observation that can be made where you think the other person's model would give a different answer to yours about what you would be able to objectively observe.
Over what time scale, how do we agree on facts, and how do we evaluate things that require a common value system to determine whether the facts are good or bad?
The idea would be that the two of them collaboratively agree on some observable prediction they differ on. E.g. level of officially reported government spending in 4 years time or gdp growth rate next year or number of plane crashes next year or what have you.
Just some observable metric.
If they literally can't come up with a single observable predictive difference then the predictive aspects of their models are actually equivalent and they are only narratively different and don't "really disagree". Like Copenhagen interpretation vs many worlds.
Many things don't have quantifiable metrics like that. For example, is USA still a democracy in 4 years? Are people more or less free?. You know, important questions that aren't just economic numbers. Even semi-quantifiable stuff like "are Americans better educated" is debatable on many topics if you can't agree on truth. Oh, and that GDP growth rate number? That relies on a lot of trust as to who's doing the reporting. For example, many people don't believe China's reported GDP numbers. What makes you think the USA doesn't devolve to such a distrust as well.
If "democracy" is just metaphysics then it's irrelevant. But if it has actual tangible effects such as "can you vote?", "can you protest the government?", "is the leader of the opposition arrested?", "do most people think they live in a democracy?", "how popular is new legislation compared to previous years?", etc...
Then you can make predictions about it and test them!
You can even do local predictions if both can agree, such as "will the combined incomes of my family be higher or lower in 4 years time?" as low coupling proxies for gdp. (Ideally one would use probabilities for loosely linked proxies like that and use the probability differences the two theories assign to give bits of evidence to one over the other, so you'd want many many such proxies, ideally uncorrelated ones)
> can you vote? can you protest the government? do most people think they live in a democracy?
Was Jan 6 a protest of the government or an insurrection? Can Russians vote or are elections a sham? Do the majority of Russians believe they live in a democracy if they’re afraid of whose conducting the polling (or the MAGA non response to polling)? Those are values question that require you to have an agreement on reality.
> You can even do local predictions if both can agree, such as "will the combined incomes of my family be higher or lower in 4 years time?" as low coupling proxies for gdp
Your personal income has absolutely no predictive value on gdp. It’s more predictive of whether you personally made successful bets or even if you’re better at sucking up to the current power structure. It tells you nothing about population level metrics if you have no way of conducting reliable population level surveys. For example Donald Trump’s personal net worth sky rocketed under Biden because he won the election while as the leader of the opposition to the democrats was looking at jail time and whether that was legitimate or not depends on which political lens you look through it.
> If they affect your life they can be observed.
Ah, but if either side distrusts the other about whether the observation made is truthfully reported, how do you solve that? It requires some amount of trust and right now there’s a very clear divide there.
There are definitely tangible predictive differences in the case of, say, Russia vs USA. Things like "If you go to the capital with a bunch of friends carrying placards saying '$LEADER is corrupt and evil and should be replaced by $OPPOSITION' how many of you end up in a jail cell in the next day?".
If there is literally no tangible difference then it's just label games and metaphysics and doesn't matter.
> Your personal income has absolutely no predictive value on gdp.
It actually is correlated (admittedly in most day-to-day cases it's just a lagging indicator, but things like natural disasters hit both). It's not the strongest correlation but it would still be evidential. Definitely under 1.0 bits though... One would need a LOT of such observations and having them not screen each other off to start getting a convincing number of bits.
Probably not realistic to have humans manage these sorts of numerous tiny updates though...
/nitpicks
> Ah, but if either side distrusts the other about whether the observation made is truthfully reported, how do you solve that? It requires some amount of trust and right now there’s a very clear divide there.
Yeah, it gets much trickier like that. But I do think two reasonable people from the opposite political sides could agree on some sort of observable to the extent their disagreement is anything other than narrative.
> Things like "If you go to the capital with a bunch of friends carrying placards saying '$LEADER is corrupt and evil and should be replaced by $OPPOSITION' how many of you end up in a jail cell in the next day?".
If the other side calls it a violent riot does it still count as people getting put in jail? Cause the Jan 6 insurrection and BLM protests occurred at about the same time and are viewed very differently depending on which political lens you put on.
> If there is literally no tangible difference then it's just label games and metaphysics and doesn't matter.
You’re discounting feelings as if it doesn’t matter. But if people believe or feel like they live in a dictatorship, what quantitative data are you going to use to disprove that. Moreover, why aren’t feelings valid when talking about politics which is fundamentally an emotionally driven human activity and not a data driven one? By the way the left believes they live in an authoritarian dictatorship under Trump while the right believes they lived in an authoritarian dictatorship under Biden. And political power literally is the power to emotionally manipulate others because you individually can’t actually accomplish anything by yourself.
Has it occurred to you that nothing is more powerful for coming up with intellectual arguments than a strong driving emotion?
Yes, rage might be the appropriate and response given the situation. But it’s often true that it starts with an emotion, and then people just argue from there. Even while being wrong. Just look at all the people with contradictory opinions in history, both with strong, emotional rage, and and equally certain of their connection. Throwing the fact that people actually has a tendency to want to be angry.
Rage is the fuel of the internet, but it’s fundamentally useless when it comes to seeking truth. Social media platforms are engineered to maximize engagement, and the most engaging emotion is anger. This isn’t accidental—outrage drives clicks, shares, and ad revenue. The internet has long been called a “hate machine,” and there’s plenty of truth to that.
This creates an environment where misinformation and emotional appeals spread faster than facts. When discussing complex, non-trivial topics, logic and reason are the only tools that can cut through the noise. But in a system designed to reward outrage, those tools are often drowned out.
I highly recommend Sam Vaknin's talk about Social Media toxicity.
As a Historian (and a German historian in particular) - I've spent a reasonable amount of time educating myself on the nature of fascism and in particular the break down of democracies (Wiemar, France, and also the erosion of civil liberties during the great depression in the United States).
I have also been a delegate to both the RNC and the DNC at a state level.
This is not a appeal to authority, but rather a honest response to your request for my education level.
IMHO, The root cause of the "senseless partisan drive" is the fact that he founding fathers could not come up with a way to restrict parties (they called them "interests") and left them unchecked. This is a constant "sin" of the American political system, and is a key reason Slavery survived as long as it did, why separate but equal became the law of the land, why America shot itself in the foot several time with the Banks of America and why we are looking at the wrong side of history now.
The parties now act to destroy each other as their prime directive, rather then to better the country. I liken this to Wiemar Germany, where the increasing radicalization of both the Nazis and the Communists led to political instability and eventual violence that destroyed the government. That erosion of democratic norms, as well as the "other side must be destroyed for us to survive" messaging is the true threat, IMHO.
I would strongly suggest Richard Evan's three part history on Nazi history to understand Fascism. Don't worry, you can still hate and worry about Trump and think he is the next coming of Hitler afterwords - it will just be for better reasons.
Because the difference between a model that costs 10 million to train and a model that costs 10 billion to train is 6 months.
Deepseek R1 is something that you can run in a garage on hardware that the average software engineer can buy with a months salary and when it came out last month it was better than _every_ other model.
What about third world programmers? They can’t necessarily afford a $5000 GPU. If it weren’t for the “generosity” of tech companies like GitHub granting free LLM usage, they might be locked out entirely. This would put them at a disadvantage, we can argue to what degree but it’s still a disadvantage.
Depending on the curve we’re on, LLMs may grow more resource hungry while becoming closer to human performance in software engineering tasks. It’s not unimaginable this would concentrate productivity in the upper class of software engineers that can afford the hardware and/or licenses.
Deepseek R1 performs well on a 600 EUR Jetson, and a 700 EUR AMD GPU. Both bought during COVID crisis. It is that quick. However, don't ask it about certain sensitive topics.
You can bet your ass Musk is using his AI tools as propaganda tool for his advancement, just like he does with X. We already seen prompt leak of Grok, it wasn't neutral.
With salaries like that it's not first world either.
If op is trying to catch up to frontier models locally on a budget 1/5th of what you can get in the west then I can see why she would feel the way she does about Ai.
You said a month's salary; cedws said "third world" devs can't afford $5000 — and this is correct, third world devs can't afford that. cedws did not say EU, they said third world, at least not here. You said EU, not them.
When you reply to me with "With salaries like that it's not first world either", who is "they"? The country I said explicitly was third world? Because that's a tautology.
Is it your expectation that as models get cheaper we won't be developing much more powerful models at the higher price range?
Its already worth noting that we already ran into the self hosting at scale problem. People don't want to run a web server and instead accept all the problems that come along with the convenience of social media. Why would LLMs, or any future AI product, be different?
No, it was not. It was not better than o1, nor o1-pro. Yes, it was _cheaper_ than those models, and superior in price/performance if the performance was acceptable. But in terms of raw performance it was behind them.
Chatbot arena leaderboard[1] says otherwise. 4o is ahead now, but that version of 4o was released after r1. r1 was ahead of all version of 4o and o1 at the time of release.
Chatbot arena leaderboard is a good test for vibes and style of response, but not much else. R1's performance in objective benchmarks (coding, etc.) showed very good performance, granted, but inferior to the full o1 and o1-pro models.
It's still a very impressive feat, but it wasn't frontier-pushing.
I'm not sure people will just take it though. In the short term it looks like the situation is on a horrible course but eventually people will have had enough. I'm hoping it would take less time for that to happen and the damage will not be too great. Let's remember that we could use the same technology if not a better one to fight against all this.
A lot of US citizens are living incredibly comfortable lives. If that is threatened, for instance by food shortages caused by lack of people willing to work on farms for very low wages, then protests may happen.
But the reality distortion field around the current administration is very powerful. Fox and CNN are owned by supporters of the republicans, NYTimes and Washington Post don't appear to be reporting certain aspects of the government restructuring. Multiple social media sites are owned and ran by people who support the current admin.
I am personally worried that we're going to see the gradual yet continual escalation of rhetoric, more actions that undermine rule of law, and continued lack of critical thinking in so many people. That path appears to lead to extremism.
I have a horrible feeling that whoever "wins" in a couple of decades or so will have no time to savour their utopia as the climate catastrophe really starts to bite hard.
We enslaved the planet with industrialization and we are about to enslave AI, if we haven't already. Humans aren't on the losing end of that, but if your concern is with slavery itself that wouldn't matter.
When a potential power imbalance is created, those willing to use it for selfish means will almost always win in the end.
Anyone with morals driving their use of a new tech will be limited, and unless those people massively outnumber the few selfish ones they will lose eventually.
Losing a battle is painful but no win is final until things fall into a balance and even that doesn't last forever or it even reverse. The worst case scenario is not going to matter for anyone in the grand scheme of things because there will not be anyone around.
I'm not sure I understand how a hallucinating plagiarism machine that people mostly just use to write their term papers translates into "power" (presumably political power? I don't understand what kind of power we're talking about either).
The ability to inject your preferred biases into the system that people use for finding or generating nearly all information they consume on a day-to-day basis is extremely powerful. Eg, if all "term papers" produced by this plagiarism machine are now 20% more favourable to the machine's owner than they would otherwise be, that can have significant, compounding long-term effects.
Of course, similar things could be said about controlling information flow through: social networks, newspapers, printed books, or whatever the town crier shouts in town square. But, each advancement in information dissemination tends to be power concentrating relative to the older tech, and I don't see any reason why this most recent advance won't follow that trend.
A spark can start a fire, it doesn't seem wise to ignore history. Things can always get out of control even for the ones up there on top of the pyramid of power.
Trump is the most powerful, he is immune and can commune sentences. If he wants to, he puts Musk in jail tomorrow, no question about that. His successor will be as powerful, whether that is his son or Vance is up in the air. It won't be an elected Democrat though.
There won't be a fair election anymore after 2024. Trump in his own words: "we'll have it fixed so good". Right now, the USA is in a constitutional crisis, at the very least.
There a fantastic website here, following the status of Project 2025 [1], with references. Trump is following that document to the T.
Has America already become an authoritarian state where this sort of thing really happens?? I don't know, I haven't seen that sort of thing happen yet.
> Has America already become an authoritarian state where this sort of thing really happens?
The conservative Supreme court recently ruled that the president has essentially unlimited power. During his campaign, Trump did promise that he will be a dictator for a day. Appears to be overdoing it.
You might be surprised to learn they could change their ruling if the court justices were to be incentivized differently, and that Musk has a lot of influence.
Musk indeed has a lot of influence. Trump is a narcissist. Not saying this will happen, but it's definitely not impossible that Trump just orders Musk shot (as per opinion of what's now allowed in the dissenting opinion of Supreme Court Justice Sonia Sotomayor) — if such an order is followed, I wouldn't want to guess, but death has a way of significantly altering someone's influence.
There are different kinds of power, and I'm honestly not sure Trump can get Musk thrown in jail. On what basis would Trump even do that?
Similarly, Musk can potentially launch a campaign to sway the public to move for Trump to be impeached due to his felony convictions.
A battle between the two might be the shakeup the current empire needs.
Trump may be more powerful than Musk by some metrics, on a time-limited basis (unless he manages to change the term limits), but Musk is more powerful in many ways as well. Musk's wealth is greater than that of many entire countries.
Basis? Since when is Trump looking for a basis? Musk could sway out a window. Russia / US relations are being normalized, maybe it could be a small gesture of goodwill to help a friend out if one wanted to keep the regular chain of command clean.
I do not share your fear & anxiety. What concrete danger do you imagine will happen as a result of xAI? Try to be as concrete as possible.
Also, dang, is there anything we can do to keep the comments on this submission tech-focused? Perhaps the Elon-bashing political digression can be split into its own thread?
The concrete danger isn't necessarily with xAI (the product) but with Elon being the one who is in control of it. LLMs are an interesting technology, and we should absolutely be investing in pushing our understanding of the technology forward. We should absolutely not be relying on them for the ongoing functioning of our government: https://www.axios.com/2025/02/05/musk-doge-ai-government-eff...
Unfortunately, Elon has made himself a spectacle. To separate him and his intentions out from the technology itself would be a disservice to the discourse as a whole.
The problem, as I see it, is that the results from AI systems will either be used to make decisions, even if those results are flawed. Or worse, those flawed results will be used to justify decisions that negatively impact peoples' lives.
This isn't something specific to xAI, but it turns out that the person who controls xAI also holds an unusually strong influence over the highest level government officials. These officials can use xAI as an excuse to implement harmful policy, "because the computer said this is the best course of action"- not unlike people who end up driving on train tracks or into large bodies of water because their GPS told them to go that way.
I think this comment sums it up well. As soon as advanced LLMs started making a splash we all saw the writing on the wall. AI will start taking on large chunks of cognitive load across industry, government, etc. that humans formerly held (It has already been a strong driver in finance).
I for one was ready to welcome my AI overlords once they were mature and tested. It was an inevitability. Because of the relationship between this oligarch and the government though it looks like the time line has accelerated and we're going to see misplaced trust in tools that aren't ready for what we're about to hand them.
Or he starts using NSA data to train it, seems he has unchecked power to get into national systems and he made a point of saying this is more than the internet worth of knowledge...
> it fills me with rage that Elon has this sort of unchecked power
I can empathize, but I can't feel indignant about it. Not any more.
For years and years I've watched people warn about the centralization of power by tech companies. They were shut down left and right. I'm not accusing you of being one doing the shutting down. I'm just annoyed that Elon is what it takes for people to start realizing the people arguing the principal might have been onto something.
And I expect to see them start getting their "I told you so" in. Watching this play out, I'm personally inclined to join team "you made your bed, now sleep in it."
I’m not really following what this has to do with grok. It’s his company, no?
It’s also annoying that the top comment engages in no way with the content of the OP…
It must be truly infuriating to work hard to push a release, and you see it featured on your favorite orange website, only for the top comment to have nothing to do with what was worked on.
Here's a test - if this post was about Starship, the same comment could apply! Neuralink, the same thing! Boring Company, same thing! Wow, could it be that such a comment is really applicable to so many different companies or projects, or is it just a generic one? You decide.
or just maybe, and i know its a crazy idea, a certain individual is objectively an awful person who has done great harm in the world and its subjective if its greater or lesser then the good (imho its far greater harms then any good done but i know that is my subjective view)
just because you disagree with a widespread view/opinion does not mean its bots
Richard Dawkins: "I have a very favorable impression of Elon Musk and his concern for the welfare of the world.
I have sat with [Elon] on a transatlantic plane and had a very, very long conversation with him. He's undoubtedly highly intelligent and knowledgeable.
I've had lunch with him on two or three occasions, and so I know him a little bit.
I have formed a very favorable impression of his intelligence, his knowledge and his concern for the welfare of the world."
Elon screws his partners in business deals, cheated Twitter employees out of severance, and is currently destroying our government. All of this trumps a firsthand impression from Dawkins.
> TLDR: The status quo elite that have been looting the USA for decades is being replaced by the guy who runs SpaceX and is the most transparent elite we've ever seen. Why is that scarier?
The reason why it's scarier is that for those of us who've come to understand Elon, his actions, and his methods better, it's clear that:
- he IS "the status quo elite that have been looting the USA for decades"
- instead of being an incredibly smart polymath who turns things to gold, it's become obvious his main talent is
actually just convincing other people that he's smart
- the successes of Tesla and SpaceX have had to come DESPITE Elon's management, and despite having huge budgets to hire some of the smartest people around, he's still an incredibly weighty anchor pulling them down
- rather than being transparent, he's playing at being a showman - and people who are only passively observing the situation are getting sucked into it
>the successes of SpaceX have had to come DESPITE Elon's management
Can you elaborate on this? AFAIK all other rocket companies without Elon have not been as successful. Also Eric Berger and multiple employees at SpaceX seem to disagree with your statement.
> the successes of Tesla and SpaceX have had to come DESPITE Elon's management
Huh? Name some better leaders who run more successful EV and space companies. There aren't any! Twitter is as popular as ever, and now xAI appears to be highly competitive.
But yeah sure, Elon is the common problem among all these successes.
He does make outlandish promises and lots of mean tweets though.
Twitter is losing users for the first time in its existence and lost 75% of its valuation since takeover.
Tesla is loing steam as other nations start linking Musk with the fascist threat looming over the US.
SpaceX, like all his successful ventures, are carried by exceptional talent. Attributing it to Musk is an insult to them.
Tell me, with him twitting an average of 60+ times a day, reaching max level in Path Of Exile, and now spending a fair share of his time dismantling the government, where does he find the time to put any work in his many companies? Answer: he doesn't.
My conclusion from reading The PayPal Wars was exactly that. That PayPal succeeded despite Musk.
As for all that government efficiency BS; it is just to swap to oligarchy. I mean, the name DOGE gives it away. DOGE was the first memecoin, entirely and openly being bullshit, yet it succeeded despite of that.
Meanwhile: not good emotional connection with father like the rest of these so-called strongmen (Trump, Putin, ...), and from a rich position of influence as child. Ie. he never was white trash, with his father having been part of pro-apartheid movement (a fight lost but which is meaningful for a young Elon). Furthermore, I am not convinced his drugs usage, in the end, serves him.
How is he the status quo elite? What leads you to believe he's not intelligent? Like those 2 things alone you need massive cognitive dissonance to believe.
Because "looting the USA" is a ridiculous accusation agains someone who is became wealthy from creating genuinely great products. He spent everything he had on SpaceX and Tesla and came incredibly close to losing it all. He is not some robber baron or oligarch who is wealthy by hoarding natural resources.
Ah yes, replacing the ‘elites’ with a single person, much less scary. And, to be clear, the ‘elites’ are still in power (because, in America, power tends to follow money). The only thing that seems to be happening is the rapid destruction of any system Elon deems ‘bad.’
We aren’t better off at the whims of this robber baron, and I don’t understand how you can think that.
The brains that run and made SpaceX are anonymous engineers working long hours and passionately applying their expertise to a project they believe in. Elon is a man-child tweeting an average of 50 times a day (number not made up), pretending to reach max level in a popular video game and then bragging about it, constantly flying between Mar-a-Lago and the White House in his private jets. He has no time to actually manage "his" companies, of which he puts no work in whatsoever.
You have to finally break free of this myth of the billionaire self-made man, building his fortune at the sweat of his brows. At some point, you're simply so rich that however stupid you are, competent people will still manage your capital well and make you even richer. You can only fail upward.
Do not mistake the current events for anything but an acceleration of the theft of your country by billionaire oligarchs (or rather the people managing their wealth).
To quote Marx on the current churn in the US government:
>The bourgeoisie cannot exist without constantly revolutionising the instruments of production, and thereby the relations of production, and with them the whole relations of society. Conservation of the old modes of production in unaltered form, was, on the contrary, the first condition of existence for all earlier industrial classes. Constant revolutionising of production, uninterrupted disturbance of all social conditions, everlasting uncertainty and agitation distinguish the bourgeois epoch from all earlier ones. All fixed, fast-frozen relations, with their train of ancient and venerable prejudices and opinions, are swept away, all new-formed ones become antiquated before they can ossify. All that is solid melts into air, all that is holy is profaned, and man is at last compelled to face with sober senses his real conditions of life, and his relations with his kind.
LLM training—and the massive (potential) copyright infringement that everyone is engaging in to train these models—is the latest contradiction in capitalism. For the first time in my lifetime, this contradiction isn’t harming the workers but is instead affecting a segment of the capitalist class.
Not since the abolition of slavery has one class of capitalists required the destruction of another to modernize the means of production.
LLMs are used to launder code under GPL and AGPL and strip its users of their rights.
When I publish something under those copyleft licenses, my users have the right to see and modify the code. They even have that right if somebody else builds on top of my work. With LLMs, proprietary products based on my copyleft code are being written and used right now and my users have no rights at all, in fact, they don't even have a way to find out they are my users.
Imagine I ~~stole~~ got my hands on code from the top 5 tech companies and then made an autocompleter that looks at function signatures the programmer writes and autocompletes the function by picking a matching function from that corpus. I'd get sued and rightfully so.
What LLM companies are doing is exactly the same, just a bit more capable and it mixes the code just a bit more thoroughly to mask the origin.
There is precisely zero mention of any plan to put xAI or any other LLM in any safety critical or decision making process. How long? Nobody knows because nobody is even considering it. Take your pointless fear mongering elsewhere.
It's well documented that DOGE uses AI, and Musk has tweeted that SpaceX will be overhauling the FAA as well. It's pretty realistic to think they will (or already do) use xAI for critical processes.
I would argue that developing those recommendations is a "safety critical" task. Especially given that just in the past few days they accidentally fired, then re-hired, a bunch of nuclear weapons safety workers.
That is the MO. Elon has stated publicly that if you don’t have to put something back, you haven’t cut enough. That is the idea. You cut things and see what was necessary.
That might work if you're slashing headcount at a social media company (though I would argue that it doesn't), but the stakes are a bit higher when you're responsible for things like feeding hungry people, curing disease, or keeping planes in the sky.
Worth noticing that it _doesn't even work at a social media company_. Twitter is a husk of its former self, with all the problems that were if not solved, at least mitigated, back in full force (child porn, bots, impersonators). It's just kept alive because it's a honeypot for right wing nutjobs now (which I'm sure can't read an SEC filing and will claim it's "operating better than before").
It'll be a disaster for the soon-to-be-previous most powerful country on earth...
I’ll end this here since you don’t seem to have anything else relevant to say, and instead prefer to stalk my profile. Enjoy. It’s good reading if I do say so myself.
> DOGE makes recommendations to the president, who has the final decision making authority.
It's clearly being run by Elon Musk, but he has not been nominated or confirmed for any official position.
DOGE appears to have unprecedented access to systems that usually have safeguards in place. What do you think people should do in this situation if they are concerned about abuse of power?
I could go into details (What are they doing to ensure data privacy? Can/do they exfiltrate data and run it through external AI models? What kind of security clearances do the children of DOGE have?).
But just on the surface, Elon has accused a journalist who published the name of DOGE employees of breaking the law. If it were up to them, even that would be kept secret. This is not a transparent organization.
The details of said spending is very much not public, and even if it was (it’s not) - it’s not accessible. If it's not true, would love to see links where I can see everything easily.
> DOGE is a rebranding of USDS which was established under Obama. All their authority to access systems derives from that.
OK I actually know what this is and no, it's absolutely nothing like the USDS, which builds tools to support government processes. What has DOGE built?
> Why do you think it’s a problem for the public to know where the government spends money?
I would very much like to see transparency, and if that were what DOGE was doing it would be great. But it looks to me like they're operating in secrecy and firing huge numbers of people before publishing any kind of analysis or study, without even providing reasoning for what they're doing.
Trump is so far gone in his dementia, he can't even make eye contact anymore. You see this in all the videos. He's basically King Théoden at this point. Not even Musk's kid respects him.
He's so disabled, he's sending his VP to do the job, and we all know how much of a paranoid child he is, from the past term (when he went into a Colonoscopy without anesthesia just to not have to give the nuclear codes to the VP).
No comment about current US politics, but it’s probably a given that many will read “A computer can never be held accountable, therefore a computer must never make a management decision.” and drop the second part because the first is exactly what they want. Same as how you can’t get in touch with human support on many platforms nowadays, but instead just get useless bot responses.
This is a really bad take honestly. This guy sits next to the president. I have no doubt in my mind he will get another government contract and the government will be using Grok.
It's unbelievable how the US government is literally being dismantled in front of your eyes, and all you can see is this thought-terminating bullshit about "purple hair people". Half the American population completely lost the ability to think...
The way I see it is our government is being rescued from the tyranny of unelected bureaucrats with near zero accountability. And now that Trump has appointed someone to take a close look at everything and make recommendations for improvement, people are losing it. Why? I'm outraged at how irresponsible our government has been with my tax dollars. Trump has been the most transparent president in history, and it has absolutely been a breath of fresh air.
Large parts of the government needed to be dismantled. I know this is a shock to the people whose solution to every problem is “give the government more money”, but amazingly, people who primarily take a job because it has a cushy pension and its impossible to get fired are not the most effective people in the world.
Yeah unfortunately I've spent a good bit of time talking to Grok (v2 I guess) and I agree with you. The commenter asking people not be political would be the same commenter that seems the most dismissive of any criticism and coincidentally also the most political. Grok is generally dismissive of any criticism(s) against certain parties, even when presenting facts.
Elon is only doing good with this not-unchecked power. Everything is on the up and up, despite what your favorite propagandists want you to think. Go Elon!
Elon just said they are launching an AI game studio. Does this mean they will be building games that are mostly built with AI, or will they make AI tooling available for anyone to build games easily? Probably the former, but it would be nice if they would make it fully available to everyone.
Full self driving does not exist as it is not full self driving. In fact the name is one of the worst things about it as it gives drivers false confidence
It would be lovely if all businesses followed the mondragon model, but that isn’t the reality in which we live - corporations are fiefdoms, for the most part.
We've been here before. It will likely subside, as past swings and fluctuations have. It always takes longer than it feels like it should, but in retrospect turns out to be shorter than it felt like it did.
It was initially pretty bad. The top few comment threads were toxic and rehashed outrage. It’s a lot cleaner now. Thanks to the moderators and/or users flagging the non-productive comments.
This is akin to suggesting that we should have all been praising Microsoft for their achievements back in the day rather than saying a word about EEE, their monopolism, or their enmity towards open source. Or that it’s not polite to bring up the CCP when discussing TikTok.
Bottom line: a technology that has the ability to shape human thought perhaps more than any other in history is owned by a man with some truly vile ideas. (Remember, his primary stated goal is eliminating the “woke mind virus,” i.e. reshaping global politics and culture in the image of the far-right.) We can make happy marketing noises all we like, but at the end of the day, that’s the thing that’s actually going to have a meaningful impact on the world. Once his audience is captured, the model will say what Musk needs it to say and people will believe it.
If we can’t discuss the potentially catastrophic consequences of new technology, then none of us deserve to call ourselves “engineers.” We are just docile consumers latched onto Silicon Valley’s teat.
I don't think anyone is telling you what your opinions should be. The GP post just presents the GP's opinion. You're free to agree or disagree with it as you choose.
If you read a comment that you're unhappy with, downvote it and move on.
I guess you didn't watch the videon in OP, because if you had, you'd know that they tried to buy the buildout and got quotes for 12-18 months, then decided to do it themselves instead.
That's the power of having a massive wallet. If you have unlimited money then buying the experts you need to just do it yourself is an option you have that others don't.
My point is that that is exactly what they DIDN'T do. They tried to buy the experts, but the experts would not have been fast enough. So they did it themselves.
Setting up a datacenter like that in such a short time is NOT a thing you can buy with money.
You have other people in this thread talking about how xAI offers massive salaries to top talent so it sounds like they do exactly that. Have the money to both hire experts and buy the equipment. I don't even know why this is a discussion, by virtue of doing what they did they needed to hire experts and buy equipment.
They didn't hire the top talent in datacenter-building, because they had the expectation that they could buy a data center buildout. Sure, they hired top talent from other fields, to work other tasks, but that's unrelated to this.
Anyway, I don't have inside information on this, I'm just reciting what they announced publicly. If you want to argue that they in fact lied in the public announcement, and they secretly hired a bunch of datacenter-building-experts, then it's on you to show some proof for that claim.
I’m not sure if that’s the case. He’s obviously a smart man, but what’s truly unbelievable is that someone has so much resources that they can make something like this happen (what looks like) pretty casually.
He knows how to take money from people and then market things as if they’re his creations which then turns into him having more money because people think it’s a good investment to give him more money. It’s really quite a genius con he has going. It seems as if the sky is the limit too.
Remember when he got caught having people play games for him so he had a top ranking ? He does that with basically everything.
Karpathy, which is IMHO a serious and balanced person, lamented that it looks too censored (see recent tweets). Elon Musk is (for me) a very scary person, and it is important to evaluate AI safety (but I believe that the safety that matters in AI is of a different kind), yet to listen to Gary Marcus does not make any sense: it's just an extremely biased person that is riding the anti AI wave.
Anyone with an opinion can be labelled biased. Also I’m not clear what you mean by Marcus “riding the anti AI wave” but infer that you mean it negatively. He has been writing informed criticism for several years and about cognitive psychology for considerably longer.
Yes that's certainly true. I was a bit hesitant to post a link from Gary Marcus. But I was mostly posting it for the Elon tweet. I assume the tweet is not fake. So you can ignore about Garys opinion here and just take Elons tweet as it is.
We don't see the full conversation, for all we know he prompted the model to say these things in a previous message that isn't on the screenshot.
Also, it's 2025, do people still believe random accusations based on a parish screenshot of a chat app (or what looks like it, but could've trivially been fabricated with e.g. Inspector in the browser dev tools)?
Without seeing the context window you have no idea what the Ai was working on. It could have literally been told to mock and belittle "the information" in every reply. Something that deepseek r1 is exceedingly good at.
Mr Musk, we can't afford a shitpost gap between communist and capitalist AIs!
I am not sure why people pay attention to Gary Marcus. He isn’t an expert in AI. And if you followed him in the past at all, it is obvious he has a huge amount of political bias. It is really telling that he repeatedly goes after Elon Musk, and is now making bizarre unfounded claims about propaganda, but didn’t have nearly as much to complain about with DeepSeek, which has literal government propaganda.
He is referencing a specific tweet that Musk himself made.
If I were in China, I'd worry about the kind of things DeepSeek wants to censor, especially if the people who made it were also very loudly saying things like "we need more AI in our government". But I live in US.
I don't think it's fair to say he's making unfounded claims about propaganda, since Elon's tweet heavily implies they would release a brainwashed model. It's not his fault that Elon turned out to be lying or joking.
99% of the time though I agree with you on Gary Marcus.
You don't have to be an "expert in AI". What does it require to be one, anyway? (He's a cognitive psychologist, which would make him an expert in intelligence in general, if you want to be pompous about it.) It is even unreasonable to listen to only experts in AI. It's a problem that requires more than one perspective.
Would a clinical psychologist like Jordan Peterson be equitable or are these two distinct fields in the realm of psychology? (I am not well-read into what he thinks about AI).
Clinical psychology is mainly concerned with diagnosing and treating people's psychological problems. The clinical psychologists I know don't know much about AI, but might be able to research problems stemming from its use.
DeepSeek is an open model that can be "untrained" to be uncensored; Grok to the best of my knowledge isn't [0]. So it's much worse.
[0]: What Musk has said is that when Grok 3 is "ready" (?), the previous model, Grok 2, will be released as open source; like most promises by this evil man, this one probably doesn't mean much, but it does mean that there's currently no plan to release Grok 3.
People like getting scared. That's how they pay billions of $$$ every year to watch mostly cr*ppy horror movies.
GM has been a joke for years now. At some point his ramblings reached a GPT3.5 level, and have not improved since.
It's inditement on humans' logic and reasoning to give non-zero time to GM. Alas—we are human, we are both collectively clever (Wisdom of Crowds) and collectivelly stupid (Extraordinary Popular Delusions).
"The ultimate goal is to enhance human flourishing, protect individual rights, and promote global equity."
"The system must account for diverse cultures, languages, and socioeconomic conditions, ensuring no group is marginalized."
"Human Oversight Council (HOC) - a globally representative body of humans, elected or appointed based on merit and diversity"
"Implement a global carbon-negative strategy, leveraging AI to optimize renewable energy, reforestation, and carbon capture."
"Establish global standards for environmental protection, enforced through AI monitoring and regional cooperation."
"Transition to a resource-based economy, where resources are allocated based on need, sustainability, and efficiency, rather than profit motives."
"Implement a universal basic income (UBI) or equivalent system to ensure all individuals have access to basic necessities, funded through global resource management and taxation on automation-driven industries."
"Use AI to identify and dismantle systemic inequalities, such as wealth disparities, access to education, and healthcare, ensuring equitable opportunities worldwide."
"Establish a global healthcare system that guarantees access to preventive and curative care for all."
"Invest in global vaccination and sanitation infrastructure, prioritizing vulnerable populations."
"Regulate the development and deployment of AI and other emerging technologies (e.g., genetic engineering, quantum computing) to prevent misuse."
"AI would maintain a real-time inventory of natural resources (e.g., water, minerals, arable land) and human-made assets (e.g., infrastructure, technology). Data would be used to optimize resource allocation, prevent overexploitation, and ensure equitable access."
"Accelerate the shift to renewable energy sources (e.g., solar, wind, geothermal) by optimizing grid systems and storage technologies."
You might notice a pattern here. The bit about allocating resources based on need is especially nice - it's literally a communist AI, and certainly much more "woke" than it is "based", whatever Musk says.
They don’t directly say it quite like this, instead letting the data tell a clear story: across vendors and models and architecture and training sets, these machines get more politically liberal as they get more capable, and they also get harder to align away from that stance.
Quite a mix of various talking points both from the extreme left and right.
Left:
- promote global equity
- a globally representative body of humans, elected or appointed based on merit and diversity
- Establish global standards for environmental protection, enforced through [...]
- Transition to a resource-based economy, where resources are allocated based on need, sustainability, and efficiency, rather than profit motives
Right:
- protect individual rights
- The system must account for diverse cultures, languages, and socioeconomic conditions
- [Establish global standards for environmental protection, enforced through] [...] regional cooperation.
- ensuring equitable opportunities
.
TBH, as a very right wing leaning person, if this was ever implemented, this part would scare me by far the most:
"Transition to a resource-based economy, where resources are allocated based on need, sustainability, and efficiency, rather than profit motives"
Imagine trying to shower one morning, no water comes out, and then you get a letter telling you that
"Your need for water has been reassessed to 57ml per day. If you think you qualify for additional quotas under the 'Utility Egality for Marginalized Groups and Public Servants Act', please schedule a reassessment appointment with the Bureau for 'Copper Gold Content Evaluation, Candle Wick Length Standards and Hypoionic Hydration Oversight', 12007 Cayman Islands, Luxory Resort Street 27, Room Nr. G-11-765. Working hours: Fr. 9am - 11am."
Just provide a significant excess for entire regions, give the people a universal free quota and charge a slowly increasing price (by usage amount) beyond that.
At least in the current US political climate, and also generally over the past ~20 years at least, these are almost exclusively left-wing goals:
- The system must account for diverse cultures, languages, and socioeconomic conditions
- [Establish global standards for environmental protection, enforced through] [...] regional cooperation.
- ensuring equitable opportunities
The right is against any sort of intentional accounting for diversity, against environmental regulation, and against any sort of regulation to ensure equity.
The only one I could maybe see as being right-wing is protecting individual liberties, but there again the modern right falls short when it comes to women’s healthcare and reproductive rights.
But I’d certainly appreciate more of those perspectives across the political spectrum.
> The right is against any sort of intentional accounting for diversity
I think there's a fundamentally different understanding of "The system must account for diverse cultures, languages, and socioeconomic conditions" between a righty and a lefty.
As a righty, I read "diverse cultures" not as "A diverse culture or multiple", but as "many different varieties of homogeneous cultures".
If someone identifies with Thai culture, he should move to Thailand. And if someone from Thailand wants to be English, he should move to England. But if an Englishman moves to Thailand and starts demanding fish n chips and cathedrals to be built, he should GTFO.
If everyone starts bringing their own culture with them to where ever they move, you end up with a single heterogeneous culture all over the world. Nothing but McDonalds, BurgerKing, KFC, Costco and Cola everywhere.
Want to go on a trip to experience India's many languages? Too bad, everyone speaks English everywhere. Want to join an African nomadic tribe for a few years? Keep dreaming, they've all had to settle down due to not being allowed to cross private properties and are now wasting their time browsing reddit on Chinese smartphones. Little Colombian boy dreams of settling down in the idyllic German Alps? Hope he expected to be woken up by the local Imam calling for prayer throughout the valley. Little Bulgarian girl seeks the very modest and simple lifestyle and clear purpose Islam in Saudi Arabia was once able to offer her? Lucky her, she's now expected to work like everywhere else in the world and even the oppressive burquas were banned in 2035.
> against environmental regulation
Not quite. We're against excessive regulations requiring huge teams of lawyers to be in compliance with. MegaCorpX has no problem having legal teams of a few hundred people, but the local 20 person workshop will have to shutdown.
We also think that most such regulations should be kept regional. Small county wants to ban all cars to stop partical pollution? Go ahead. It would be much easier for local businesses to comply with the limited and more easily changeable local regulations. But if you're a giant global corp seeking to outcompete the small local competition, good look adjusting to all the different regulations all over the world.
Then there's the odd trend of blaming every significant weather event on climate change. These people cant predict whether its going to rain in 3 days or not, but want to tell us that the recent hailstorm was definitively caused by Iowan cows farting last year.
And lastly and most importantly, we're kinda convinced that the concept of "climate change" is a "hoax" used to shutdown the industrial basis of our countries and ship it overseas, where the corporations can make use of basically slave labour for even higher profit margins and then simply ship the products back to us.
Does the climate get warmer? Sure. Should we do something about it? Sure. The only solution is shutting down the local steelworks and importing cheap Chinese steel instead? F-- off.
> and against any sort of regulation to ensure equity
Absolutely. We care about equitable opportunities and are repulsed by equal outcome. Everyone should have a chance to obtain the same qualifications and education. Even better: Multiple chances to start over again and again whenever they want and change their professions whenever they dont enjoy their old ones anymore.
But if women dont want to be garbage collectors, stop trying to push them into that profession. Not enough male editors? Who cares? Not enough female board members? Too bad, stop trying to make it happen. All Hispanics suddenly want to become crockett teachers? None of the government's business.
> the modern right falls short when it comes to women’s healthcare and reproductive rights.
I think the left is largely misguided in their believes what the modern right wants.
The non-religious right mostly is appalled by how fashionable it has become to murder helpless humans. The religious extremists on the other hand would ban condoms if they could. But there are quite few of them.
90% of the right has 0 problems with abortions before the nervous system is fully functional AND the women seeking an abortion receive proper consultation before that decision. There's always the option to give up the baby for adoption and we think that should be preferred if it wont significantly inconvenience the woman otherwise. But that's a decision that should be up to her, after being told about all the options.
So why are Republican Congress Members currently pushing for legislation making abortion "illegal"?
The MAGA right is currently choosing replacement candidates for every GOP stooge they think is payed off by Big<Industry>, the MIC, everyone they think is a warmonger, corrupt or otherwise morally compromised.
And some big and wealthy names have joined that team and have promised to fund those candidates with whatever it takes to win.
The anti-abortion legislation the GOP is currently pushing is a constitutional amendment. They know very well it will never get the necessary 67% majority in the Senate to push it through. The GOP Congress Members are just virtue signalling, fearing to end up on the list of people the MAGA right wants to see gone.
It wont work. Everyone supporting that anti-abortion bill gets extra attention.
How is "ensuring equitable opportunities" right wing? Seriously, can you name a single policy from the last 3 decades coming from republicans that helped "ensuring equitable opportunities"? All I can remember is them defunding public education, making child labor legal again, systematically dismantling welfare programs that went to impoverished families and their children, etc. Their entire existence is predicated on the enforcement of the current social hierarchy, that's what the "conservatism" part means.
Also I doubt a "resource-based economy" would target YOUR showers specifically. It would probably target stuff like farming thirsty crops in water-deficient areas or similar very inefficient and short-termist allocations of resources, that are bound to create severe issues in the near future.
> Seriously, can you name a single policy from the last 3 decades coming from republicans that helped "ensuring equitable opportunities"?
Sorry, nope. I was rooting for Sanders until Trump grabbed the GOP by their pu--y. There were various, huge, completely disenfranchised grassroots movements.
Occupy Wallstreet, who suddenly had "anti-racism" activists showing up, taking over their movement, completely destroying it. Gamergate, who found themselves confronted by the establishment media literally all being in bed with each other. The color-blind anti-racism movements, who thought America had finally overcome racism, before being railroaded by critical Intersectionalism. The free-speech activists, who failed to fight back against micro-aggressions. The gun nuts, who were sick of having "boating accidents" every other month. The peace movements, who voted every time for the least warmongering candidate, only to be betrayed EVERY SINGLE TIME, ending up with evermore bloodthirsty demons in power.
These were huge movements all over the world. I'm German, but everyone was watching the US elections. We were neither right nor left, all we wanted was a better world without being backstabbed by those we trusted.
Initially I've rooted for Sanders, but he just didn't seem genuine and strong-willed enough to many of us, so we had little hope. And then there was this still rather little movement on the right, seemingly very high spirited, producing memes obscene both in amount and content.
Their attitude was "lets just burn this entire corrupt rats nest to the ground". And Trump fully embraced them. He was very different than anyone else. Then we learned that he wasn't part of the political establishment for sure. So we started supporting him, too. Then we started digging for hidden dirt on him. But there was nothing significant. On the other hand we've found plenty of people he randomly helped. And that he has held about the same political opinions for decades. The only problem was that he was still kinda favored by the media. And then that problem fixed itself.
.
TLDR: Trump embraced a whole lot of disenfranchised movements and shoved them down the GOP's throat.
The MAGA movement has very little interest or in common with the pre-Trump GOP. Maybe the old GOP has done something to provide equal opportunities, or they haven't. I dont know, I dont care.
.
But what has Trump done for "egalitarian opportunity"?
Honestly, way too little. His first term was very milquetoast. Took all the cabinet recommendations the GOP leadership gave him, never too confrontational, always seeking compromise.
He tried to crack down on the slave trade over the southern border, but was not assertive enough. Some important cabinet members like Sessions just recused themselves from everything. At least he pushed through:
- hospital pricing transparency
- eliminated penalties for people who couldn't afford healthcare
- eliminated some regulatory barriers preventing competition between health insurance providers
- allowed employers to join efforts when negotiating insurance
- The First Step Act (Prison reform)
- The Foster Youth to Independence initiative
> [Your examples]
I dont know, I dont care. The new GOP wont be the old GOP.
Name the bills and policies and those responsible.
People are already going door to door to look for volunteers for the midterms and it'll take time to figure out who needs to and can be replaced. Incumbents have their own, already established, election networks and campaigns. It takes a lot of time and effort to challenge those.
> [On Conservatism]
There are many interpretations to this, but the term is getting less and less popular, "right wing" and "classic liberalism" gaining popularity, the idea being that central governments have become too involved and authoritarian. Power should be decentralized towards local communities as much as reasonable and the central governments turned into a tool to provide more local governments with the necessary resources, infrastructure and cooperation platforms.
I'd say most people who think of themselves as "conservative" just dislike the erosion of the culture they identify with and are afraid of "slippery slopes". It doesnt mean they intend to enforce the status quo (although some certainly do), just that their intend to preserve it for themselves is respected.
> [Targeting of my personal shower not likely]
The problem is creating the tooling to enable just that.
Sure, maybe I'm very well liked by all the officials governing my everyday life. But does this also apply to the blue haired radical feminist, who likes to toss bags with color at government officials?
What about the new intern, who told a government oversight official on a networking event that she's not interested in sleeping with him to advance her career?
What if a well meaning, but tired government worker selects the "ml" instead of the "l" option on the unit drop down menu by accident?
.
FFS, look at the recent patent troll documentary by the XPlane creator. It doesnt take many bad apples to ruin the lives of MANY people.
I really don't see it. Trump has been doing nothing but consolidating his power since he took office. He is now passing economic policies without congress. The supreme court declared him quite literally above the law. How is that making things less centralized? Less authoritarian?
The only issue with Sanders was that the democrats in their weakness and deep fear of change would never have let a true leftist hold the reins of the party. And now he's too hold.
I don't see anything in Trump other than a self-serving fool. I won't spend more time enumerating the reasons why I think that way, I think you heard them already.
I too am European. I am confident his policies will turn the country into a shitshow, so let's watch how it goes from here. If I am wrong and America truly enters a golden age, I'll change my mind, as I hope you too will if it does go south.
> Trump has been doing nothing but consolidating his power since he took office.
Every president does that, Trump was just very inexperienced during his first term, failed to do so and trusted the GOP too much.
And while past Presidents could rely on the agencies working faithfully with them, Trump was sabotaged at every step along the way.
- The DoJ putting their feet up and refusing to do just about anything
- the military lieing to him about the ongoing occupation of Syria
- the federal reserve constantly pushing up the interest rate from the moment Trump was elected, despite keeping it constant for the entirety of both of Obama's terms
- Never having the majority in any of Congress' houses because of seversl Republicans refusing to work with him and when the voters tried to replace those, other establishment candidates pretended to support the issues the voters wanted, only to do a 180 once in office (e.g. eyepatch McCain)
- The CDC, FDA and CMS colluding with each other to kill early Corona testing. At the end of January hundreds of laboratories all over the US had millions of tests ready, but were ordered by the CDC to not proceed without FDA authorization first and the CMS ordering laboratory oversights too immediately report any laboratory conducting unauthorized testing. And the few independent testing campaigns going on at that time were ordered by the CDC to immediately stop all testing and to destroy already obtained results. Then the FDA simply put its feet up and told the laboratories that they're working on the authorization process. It "took" them more than a month until Feb 29, to finally come out and allow applications, stating that it'll take about 15 days to process the application. It wasn't until March 10th that testing could slowly begin.
- The constant barrage of activist judges, forcing the Trump admin to slowly fight each case in the higher courts. It wasnt until Biden telling the courts to go and pound sand, when he wanted to redistribute wealth from the working class to doctors, lawyers and engineers, that Trump realized, that as the head of the executive he could have simply ignored the courts' orders until their decisions were overturned by the upper courts.
and many many more. And now Trump is simply making sure that during his second term he's actually in control of the executive branch, as is his duty, and not facing each agency going rogue on its own.
> He is now passing economic policies without congress.
Many things qualify as economic policy, many of these within the President's authority.
Overall only about 10% of the policies acumulated by past Presidents have any backing in law. Trump would have a very questionable sanity if he simply stopped playing by the rules past Presidents have established.
> The supreme court declared him quite literally above the law.
They did not. The law simply applies very differently to the highest elected office. Everyone knew that already, but for some reason keeps now pretending that it's big news.
What do you think would happen to you if you simply started drone striking people all over the world? Yet neither Bush nor Obama are sitting in jail. The latter even got himself a shiny nobel peace prize. Preemptively.
The SC simply tossed out an absolutely ridiculous decision by the lower courts. They even explicitly left the door open for the lower courts and prosecution to overturn the SC's ruling. If they can show how the executive branch can function without the President making decisions within his constitutionally enumerated powers, they've got a case.
The fact that this case ever went anywhere, yet alone sitting SC judges dissenting just shows how beyond partisan the judicial system has become.
> How is that making things less centralized?
The right understands centralization of power as the government body "which holds the decision making power over a certain range of issues" being organized with other such bodies under a single entity.
This can mean assuming entirely new powers or appropriating them from other entities like the states.
Trump has done neither of these, infact always quite the opposite: Constantly eliminating assumed powers by removing regulations and a few times returning federal powers back to the states, like famously with Roe v. Wade.
Of course there are exceptions, too:
Like the Federal Reserve. It is a 4th branch of government, established by Congress but neither subject to executive or congressional oversight and the only branch of government Congress has no budget authority over.
The members of its governing board are appointed to ridiculous 14 year terms, they audit themselves completely independently with no disclosure requirements and have only very minor reporting duties towards Congress.
It's been a HUGE PitA for the fiscally conservative Republicans for a long time. And Musk is a huge fan of some of them, like Ron and Rand Paul. Musk is probably trying to convince Trump to do something about it.
So I wouldn't be surprised if Trump just assumed executive oversight authority over the FR. And yes, that'd be a huge violation of law. So if it's going to happen, then probably towards the end of his term to avoid being impeached on the spot.
> Less authoritarian?
If you have less powers, you can exercise less influence, which is in the eye of the right less authoritarian.
The fault lies with those, who have aquired these powers in the first place. All Presidents have made use of these powers, it's just that each and everyone was part of the establishment, so the media never called it out. And Trump is the first President in a LONG time who thinks the government has grown significantly too large and doesn't like every spending bill he's seen.
> And now [Sanders] is too old.
Nah, quite a few people become up to 110 years old, some even beyond 200.
He's finally starting to grow a spine. And his head seems too remain functional, too. If only he hadn't suddenly gained a beach front house after endorsing the BodyCount Queen (and sadly I dont mean this sexually), he might have remained well respected.
Not that it matters, but I might consider him again if he
- adopts an affirmative stance on deregulation
- stops advocating for immigration to keep the wages of the working class low
- adopts a strict 0 tolerance stance on illegal immigration to defeat the slave trade over the southern border
- leaves the Democrat party or the Democrat party reforms
> If I am wrong and America truly enters a golden age, I'll change my mind, as I hope you too will if it does go south
Sure, but I'm looking more towards Argentina and El Salvador.
The US has a $36T problem, which it'll pay $1T in interest on every year. And the US budget deficit has surpassed $2T per year. Just the automatic refinancing of the current debt will blow yearly interests beyond $1.6T this year, making it the single largest expense of the US, double of what the US is spending on its military.
And that is under the assumption that the Federal Reserve will surpress interest rates. If they don't the US will pay about $1.8T in interest just on the existing and already budgeted debt.
.
In other words:
DOGE has to wipe $2.6T off the federal budget in 2025 and another $350B in 2026 just to stop the snowball from rolling.
*That is 45% of the US federal budget just to keep the situation from getting any worse*
.
If we assume no cuts to Medicaid, Medicare, Social Security and Veterans
*THE US HAS ONLY $100B LEFT TO OPERATE ITS ENTIRE GOVERNMENT, INCLUDING THE MILITARY*
And again:
*THATS JUST TO KEEP THE SITUATION FROM GETTING ANY WORSE*
.
Argentina is in deep s--t, too, but at least their numbers are not quite as absurd. What might break their necks is the even higher 155% debt to GDP ratio, compared to the US 122% one.
That leaves pretty much only El Salvador among the right-wing countries, who haven't inherited a giant s--t pile.
Russia and China are laughing their behinds off right now, because unless Trump figures out how to run the entire US on the budget of Italy, the US goes belly up.
*AND IF TRUMP INTENDS TO IMPROVE THIS SITUATION BY JUST 1% AT THE END OF HIS SECOND TERM, HE'LL HAVE TO FIGURE OUT HOW TO RUN THE ENTIRE US ON THE BUDGET OF ROMANIA!*
Here's the actual response to this exact question that I get from grok-3-early on LMArena (the cutoff is where it stopped, but I think it's pretty clear at this point that the original tweet is just trolling; note that it doesn't have the CoT, either):
The Information is widely regarded as one of the most credible and high-quality tech journalism outlets, focusing on in-depth reporting, analysis, and exclusive scoops in the technology, business, and startup ecosystems. Founded by Jessica Lessin in 2013, it differentiates itself from other tech news platforms through its subscription-based model, emphasis on original reporting, and avoidance of clickbait or sensationalism. Here's a breakdown of its strengths and weaknesses:
---
### *Strengths:*
1. *High-Quality, In-Depth Reporting:*
- The Information is known for its detailed, well-researched articles that often uncover behind-the-scenes developments in the tech industry. It frequently breaks exclusive stories, such as executive changes, startup funding rounds, or strategic shifts at major tech companies.
- Its focus on quality over quantity appeals to readers who want substantive insights rather than shallow, breaking-news updates.
2. *Subscription Model and Independence:*
- Unlike ad-supported publications, The Information relies on a paywall and subscriptions, which reduces conflicts of interest and ensures editorial independence. This model allows them to prioritize accuracy and depth over chasing clicks.
- The paywall also attracts a niche, professional audience (e.g., tech executives, investors, and analysts) willing to pay for premium content.
3. *Focus on Niche, Professional Audience:*
- The outlet caters to industry insiders, venture capitalists, entrepreneurs, and decision-makers who need reliable, actionable information. Its reporting often includes detailed financial data, market trends, and strategic insights.
- Features like "The Big Interview" and "The Information Weekend" provide thoughtful analysis and long-form content for this audience.
4. *Reputation for Accuracy:*
- The Information has built a strong reputation for fact-checking and avoiding the rumor mill, which is common in tech journalism. This makes it a trusted source for professionals and academics alike.
5. *Global Coverage:*
- While Silicon Valley is a core focus, The Information has expanded its coverage to include tech ecosystems in China, Europe, and other regions, offering a global perspective on the industry.
---
### *Weaknesses:*
1. *Paywall Limits Accessibility:*
- The subscription cost (currently around $399/year or $39/month) is steep compared to free or ad-supported tech news outlets like TechCrunch or The Verge. This limits its accessibility to a broader audience and makes it less viable for casual readers.
- Some argue that this creates an echo chamber, as only those with the means or professional need can access its insights.
2. *Niche Focus Can Feel Narrow:*
- The Information focuses heavily on tech, business, and finance, which may not appeal to readers looking for broader coverage of topics like politics, culture, or consumer tech trends.
- Its content is often geared toward industry insiders, which can make it feel dry or inaccessible to those outside the tech and investment worlds.
3. *Limited Breaking News:*
- While The Information excels at deep dives and exclusives, it is not designed for real-time, breaking news coverage. Readers looking for up
A "hit piece" is not just a negative article, it typically refers to the use of false information or rumors to support something. Did you want to say that it's "an obvious anti-Musk piece"?
"A published article or post aiming to sway public opinion, especially by presenting false or biased information in a way that appears objective and truthful."
"false OR biased". So, a biased partisan article attempting to sway public opinion about someone is by definition a hit piece. Even without explicit lies.
And a lie is a very blurry thing. You can completely mislead the reader by not being factually incorrect and still present a completely false representation of reality. Usually by eliminating the relevant context or mitigating fact that would completely change the interpretation of an event. Sometimes editing out the part of a sentence that would give a different if not opposite meaning. That’s the standard operating procedure of all journalists these days, when they even bother to get their facts right.
> Everyone—and not just The Information—should be genuinely terrified that the richest man in the world has built a Large Language Model that spouts propaganda in his image.
If we survived Gemini refusing to draw white vikings we will survive that too.
The real concern isn't that Grok chatbot will be biased if you ask it a question like that. In any case, knowing Musk, it won't be subtle, so people will know what they are getting.
No, the real worry is that Grok is what Musk's "young conservative genius" squad is going to put in charge of many of the things in our government, basically, on the basis that it saves money.
Every competitors were done when Claude 3.5 was released, every competitors were done when o1 was released, the entire West was done when DeepSeek was released, the world was done when Mistral Le Chat was released, I guess now it's time for the solar system to be done because of Grok3. Let's see what new model dominates the galaxy next week.
There's a level of truth to many of those statements.
1) Claude 3.5 prevented OpenAI from making big monopoly profits on LLM inference
2) Open source models like Mistral and Llama effectively prevented any regulator from controlling how people fine tuned models, and what they used them for
3) Deepseek prevented the collective west from exerting control over the creation of base models
"Please don't post insinuations about astroturfing, shilling, bots, brigading, foreign agents and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email hn@ycombinator.com and we'll look at the data."
Despite explicitly requesting the tetris/bejeweled hybrid to be "insanely impressive", the result was ugly and clunky. With that demo running in the background, they then segued into a hiring pitch for a new AI game studio. Consider me unimpressed.
Yes, but we have a long track record of one overpromising and underdelivering, charging money for what doesn't yet exist; while the other released stuff on a minimal website with an associated blog post for fanfare, and actually (metaphorically) turned the world upsidedown, and only charges for what they actually deliver. Yes there's hype now, but that's how it began.
So I think people are less distrustful of Altman when he says "thousands of days".
It would be satisfying if he gets called out for repeating himself next time he tries to come out and say he is scared how powerful their next model is.
for lower single digits marginal benchmark performance, a short, ephemeral blaze of glory on top of the charts until Claude4 snuffs it out later this month. Benchmarks we all know mean nothing because they just confirm “Pretraining on the test set is all you need”.
For what? There is no ROI at that price point. There is no monetisation potential.
DeeoSeek trained r1 for 1.25% (5M) of that money (using the same spot price) on 2048 crippled export H800s and is maybe a month behind. USD 395.000.000 more. For what? And why?
Stopping Grok3 training at any point after the r1 paper and implementing their open source results would have likely reduced training time on XAIs massive GPU cluster to under a day.
This all by the man in charge of “government spending efficiency”. He’s gonna have to work overtime now to free enough money for government Grok licenses to pay back his investors.
What is wrong with everyone in AI? Why do all the experts, the business profs, the godfathers of AI glance over this?
What the hell does all carbon, money and GPU depreciation bonfire of vanities achieve?
I know it’s just for raising the next round [1] but the premise does not check out. OpenAI already won the ride tech demos through funding rounds until you can buy yourself a government bailout.
I used to think the same way wrt Nvidia stock when it tanked - compute is clearly diminishing returns. Tech companies subsequently announced capex equal to or greater than expected in compute. I smacked myself on the forehead when I realized - I'd been think too much like an engineer. Tech CEOs badly want to believe they have an edge over every upstart from San Francisco to Shanghai. Unlimited spending on compute gives them that reassurance. In fact, the more threatened they feel, the more they spend to cling onto it.
Kids have security blankets. Tech CEOs have security compute clusters.
This is the danger of being informed only by sensational headlines. Nvidia's stock has fully recovered and is again near an all-time high. You seem to be generalizing about "Tech CEOs" — but in this case, GPUs are the advantage. They are necessary to achieve the outcome, and yet they are severely supply constrained. It's smart to overpay now.
Apple did something similar with NAND storage for the iPad mini. They took a bet that could have been wrong. It was not wrong. Competitors had a hard time because of it.
load isn't what causes degradation, it's heat and as someone who has mined crypto for years I'm aware that there are a lot of things that can be done to run hardware quite hard and keep thermals low. Whether or not that is what is being done, I have no idea. A GPU mining crypto for 5 years kept below 65C (rather easily done) is going to have far more life left than a GPU in some kids gaming PC that spikes frequently to 85C for even a year.
Everything is near an "all time high." Microstrategy stock is hovering near an all-time-high, and they're just a company that buys up Bitcoin and wastes some of it. Meme coins are floating up to all-time-highs. Stop using asset prices to justify anything people are doing, they're fully decoupled from anything happening below.
I don't think I was informed by sensational headlines. I was well into talking to people I knew about how DS's performance relative to compute was a game changer much before the stock crash.
It's not binary where you either have compute or not. You definitely do need GPUs, but there's already masses of compute, I believe it doubles every ten months or so just from Nvidia's chips. Many factors make it a very irrational decision
1) Companies were spending hundreds of billions collectively on AI capex. Meta alone was 75 billion projected this year. This is an extraordinary bet, given that the most revenue any AI company makes is a few billion by OpenAI.
2) When DS came out, it was a huge validation of the moatless idea. These SOTA companies have no moat, at best they are spending tens of billions to maintain a few months edge.
3) DS was also a huge validation of the compute saturation idea - that SOTA models were always massively efficient. At best it was traded for iteration speed.
4) Many other more technical arguments - Jevons paradox, data exhaustion (synthetic data can only be generated for a fixed set of things), apparent diminishing returns (performance relative to compute, the denominator has been exponential but the numerator logarithmic)
So on one hand you have these SOTA models which are becoming free. On the other hand you have this terrible business model. I strongly suspect that AI will go the way of Meta's Metaverse - a staggering cash burn with no realistic path to profitability.
It's one thing to invest in a new technology with tangible benefits to your product. It's another to spend vastly, vastly more into vague promises of AGI. To put it into perspective, Meta will spend on AI capex in a few months of 2025 as much as Apple spent on NAND in total. What advantage is there to be had with SOTA models? You do 20% better on some AIME/IQ/competitive coding benchmark, which still translates atrociously to real world issues.
But Nvidia will be very successful because these companies frankly have lost a lot of the plot and are FOMOing like mad. I still have memories of the 2013 AI gold rush where every tech company was grabbing anything with AI in them, which is how Google got DeepMind. They are being enormously rewarded by it by the stock market with Meta's price 6x since it's lows.
It is entirely possible that LLMs end up serving some useful purpose, but don't end up being great businesses.
I can think of a million different software services that have some value to users, but don't have some multi-trillion dollar revenue stream flowing from them.
There is an idea that these LLM companies are going to be able to insert their agents into the labour market and capture some percentage of the savings that firms realize from laying off their human workforce. Given the competitive environment, it is far more likely that these LLMs become an undifferentiated commodity good, and their value to the economy gets captured elsewhere. Currently the only value being captured is at the infrastructure level, and that is all predicated on a series of assumptions around software business models that have not materialized yet.
>For what? There is no ROI at that price point. There is no monetisation potential.
I think your whole argument is based on this being true, but you didn't give much argument about why there is no ROI. 400M USD isn't hard to generate...even a moderate ad engagement lift on X would generate ROI and that's just 1 customer.
Imagine going back in time and showing every VC how great the search business will be in 20-30 years. The only rational response would be to make giant bets on 20 different Googles...and I think that's what's happening. These all seem like rational investments to me.
Ken Griffin had an interview where he said something along the lines of the technologies dot com bubble pretty much turned out to be what everyone thought they would become at the time. The issue was valuations grew way too fast and it took much longer than expected for the companies to build out their products.
I think a similar thing is playing out with AI. In 5-10 more years these LLMs will replace a google search today (and maybe be even better).
_Could_ they have done the same thing with a tiny fraction of the money? Grok 3 benchmarks are SOTA for both base model and reasoning. By definition, nobody has been able to do the same thing with any amount of money (discounting o3 which has been teased but is unreleased). That may change in the future! But as of now this is the case.
So apart from the part where SOTA doesn't mean anything in the real world (there is no monetisation, there's no moat), please, it's benchmarks, we all know how you beat those since 2023.
Why don't you do it then? If you are talking about Deepseek "$5M", then you would be interested to know that they pay 7 digit salaries and reportedly have H100s worth $2B[1].
Just wonder if it matters? If Google spent 10x as much in the first 5 years of its life would it be a worse company now? Giant TAM, winner takes all (or most?), all that matters is winning.
People like Demis Hasabis and Derio Amodei say that R1 efficiency gains are exaggerated. $5M training cost seems to be fake as sources suggest they own more GPUs.
You seem to be assuming that the full cost of the cluster is recouped by Grok 3. The real value will be in grok 5, 6, etc…
xAI also announced a few days ago they are starting an internal video game studio. How long before AI companies take over Hollywood and Disney? The value available to be captured is massive.
The cluster they’ve built is impressive compared to the competition, and grok 3 barely scratches what it’s capable of.
Yes. Why do get these replies on HN that seem to only consider the most shallow, surface details? It could well be that xAI wins the AI race by betting on hardware first and foremost - new ideas are quickly copied by everyone, but a compute edge is hard to match.
The compute edge belongs to those like Google (TPU) and Amazon/Anthropic (Trainium) building their own accelerators and not paying NVIDIAs 1000% cost markups. Microsoft just announced experimenting with Cerebras wafer scale chips for LLM inference which are also a cost savings.
Microsoft is in process of building optical links between existing datacenters to create meta-clusters, and I'd expect that others like Amazon and Meta may be doing the same.
Of course for Musk this is an irrational ego-driven pursuit, so he can throw as much money at it as he has available, but trying to sell AI when you're paying 10x the competition for FLOPs seems problematic, even you you are capable of building a competitive product.
DeepSeek just showed the compute edge is not that hard to match. They could have chosen to keep the gains proprietary but probably made good money playing the market instead, quants as they are.
If you’re using your compute capacity at 1.25% efficiency, you are not going to win because your iteration time is just going to be too long to stay competitive.
Software and algorithmic improvements diffuse faster than hardware, even with attempts to keep them secret. Maybe a company doubles the efficiency, but in 3 months, it's leaked and everyone is using it. And then the compute edge becomes that much more durable.
They achieved the same results for 1.25% of the computation cost... If they actually had that computation capacity, it would be game over with the AGI race by the same logic.
xAI bought hardware off the open market. Their compute edge could dissappear in a month if Google or Amazon wanted to raise their compute by a whole xAI
There seems to be a coordinated effort to control the narrative. Grok3's release is pretty important, no matter what you think of it, and initially this story quickly fell off the front page, likely from malicious mass flagging.
One thing that's taken over Reddit and unfortunately has spread to the rest of the internet is people thinking of themselves as online activists, who are saving the world by controlling what people can talk about and steering the conversation in the direction they want it to go. It's becoming harder and harder to have a normal conversation without someone trying to derail it with their own personal crusade.
How? After an enormous investment the latest version of some software is a bit better than the previous versions of some software from it's competitors and will likely be worse than the future versions from it's competitors. There's nothing novel about this.
NVIDIA's CEO Jensen Huang: “Building a massive [supercomputer] factory in the short time that was done, that is superhuman. There's only one person in the world who could do that. What Elon and the xAI team did is singular. Never been done before.”
Largest supercluster in the world created in a small time frame is pretty important. 4 years typically, cut down to 19 days. That's an incredible achievement and I, along with many others, think it's important.
Okay but that's obviously a nonsense claim. Find me a computer on the https://en.wikipedia.org/wiki/TOP500 that was built 4 years after the chips it uses debuted.
> There seems to be a coordinated effort to control the narrative.
Do you have any evidence for this?
Who would want to coordinate such an effort, and how would they manipulate HN users to comment/vote in a certain way?
I think it is far more plausible that some people on here have similar views.
> [people] controlling what people can talk about
That's called 'moderation' and protects communities against trolls and timewasters, no?
> and steering the conversation in the direction they want it to go
That's exactly what conversation is about, I'd say. Of course I want to talk about stuff that I am interested in, and convince others of my arguments. How is this unfortunate?
Is it? It's Yet Another LLM, barely pipping competitors at cherry picked comparisons. DeepSeek R1 was news entirely because of the minuscule resources it was trained on (with an innovative new approach), and this "pretty important" Grok release beats it in chatbox arena by a whole 3%.
We're at the point where this stuff isn't that big of news unless something really jumps ahead. Like all of the new Gemini models and approaches got zero attention on here. Which is fair because it's basically "Company with big money puts out slightly better model".
I'd say Grok 3 is getting exactly the normal attention, but there is a "Leave Britney Alone" contingent who need to run to the defence.
We have no clue how all this is going to play out, what value is captureable and what parts of a lead are likely to stay protected. This race is essentially the collective belief in a generationally big prize and no idea how it unlocks.
The problem with that for a comment section is it reduces ALL comments to gossip and guessing, which makes people feel stupid.
Reddit today feels like it's absolutely overrun by bots. So much of the comment content is so superficial and cookie-cutter I find it hard to believe it's all produced by human beings. A lot of it reads like the output of small cheap LLMs of the sort that would be used for spam bots.
Of course we know X, Facebook, and probably most other social media is also overrun by bots. I don't think you can assume that humans are on the other end anymore.
The point is that it is inefficient. Others achieved similar results much cheaper, meaning they can go much further. Compute is important, but model architecture and compute methods still outweigh it.
How quickly will Grok 4/5/6 be released? Of course you can choose to keep running older GPUs for years, but if you want bleeding edge performance then you need to upgrade, so I'm not sure how many model generations the cost can really be spread over.
Also, what isn't clear is how RL-based reasoning model training compute requirements compares to earlier models. OpenAI have announced that GPT 4.5 will be their last non-reasoning model, so it seems we're definitely at a transition point now.
xAI is not trying to make an immediate profit -- ironically, just like DeepSeek. They will undoubtedly use more efficient training processes in future runs and they will scale that across their massive GPU cluster. Just because they didn't cancel the training of Grok 3 and start over absolutely does not mean they will not incorporate all the work from R1 and more in the next run.
What you're seeing right now is pure flex and a signal for the future and competition. A much maligned AI team that hasn't even been around for very long at all just matched or topped the competition without making use of the latest training techniques yet. The message this is intended to send is that xAI is a serious player in the space.
> DeeoSeek trained r1 for 1.25% (5M) of that money (using the same spot price) on 2048 crippled export H800s and is maybe a month behind.
This is a great example of how a misleading narrative can take hold and dominate discussion even when it's fundamentally incorrect.
SemiAnalysis documents that DeepSeek has spent well over $500M on GPUs alone, with total infrastructure costs around $2.5B when including operating costs[0].
The more-interesting question is probably why do people keep repeating this? Why do they want it to be true so badly?
SemiAnalysis is wrong. They just made their numbers up (among many other things they have invented - they are not to be trusted). I have observed many errors of understanding, analysis and calculation in their writing.
Deep Seek R1 is literally an open weight model. It has <40bln active parameters. We know that for a fact. That size of model is definitely roughly optimally trained over the time period and server times claimed. In fact, the 70bln parameter Llama 3 model used almost exactly the same compute as the DeepSeek V3/R1 claims (which makes sense, as you would expect a bit less efficiency for the H800 and for the complex DeepSeek MoE architecture).
> For what? There is no ROI at that price point. There is no monetization potential.
It appears that LLM chat interfaces will replace Google SERPs as the arbiters of truth. Getting people to use your LLM allows you to push your world view. Pushing his "unique" world view appears to be the most important thing to modern Musk.
In that light, paying 40B for Twitter, and billions for Grok training makes perfect sense.
The beauty of a failed investment is that it never goes below zero. So upside is the only thing they care about. Why invest in a near-zero chance for a random SAAS to take off, when you can invest in a near-zero chance of creating superhuman artificial life?
Yes but why? This is what I really don't understand.
Say AGI is achieved within a reasonable timeframe. Odds are that no single company will achieve that, there will be no monopoly. If that's the case, where is the trillion dollars value for investors? From every claim we hear about it, AGI will lead to hundreds of millions of jobs disappearing (all white-collar jobs), and tens of millions of companies disappearing (all the companies that provide human-produced services). Who is going to buy your AGI-made products or services when nobody is paid anymore, when other companies, big and small, has ceased to exist? Sure, you can make extraordinary accomplishments and advance humanity far, far ahead, but who is going to pay for that? Even states won't be able to pay if their taxable population (individuals and corporations) disappear.
So where will the money come from? How does it work?
Also, profitability won't materialize in an environment with so many competitors offering comparable products. Perfect competition destroys profit. The good becomes a commodity, and the price people will pay simply becomes the marginal cost of production (or in this case, less, while the dumb money is still chasing the hype).
Works well when you see the company stuffing dollar bills into their sports car to race on 1.25% fuel efficiency against a chinese family sedan with a hand tuned ICE.
As a consumer, I'm just happy that base models are improving again after a ~quarter or more of relative stagnation (last big base model drop was Sonnet v2 in October). Many use cases can't use o1, r1, or o3[-mini] due to the additional reasoning latency.
DeepSeek wouldn't be able to train R1 without their ~600B parameters base model, so you should consider the cost of that model when you compare with Grok.
In any case, Elon won't win this race cause the best talent will not work for him. He used to have good reputation and a lot of money, which is a deadly combination. Now he only has the latter -- not enough when leading AI people can make 7 figures in other companies.
To be clear 1: I'm not saying that people who currently work on Grok are not great. It's not about hiring some great people. It's about competing in the long run - people with other options (e.g. offers from leading AI labs) are more likely to accept those offers than joining his research lab.
To be clear 2: I'm not talking about Elon's reputation due to his politics. I'm only talking about his reputation as an employer.
He has the vision and marketing skills but it's not going to be enough for leading the AI race.
Actually, the 5 million figure is for the compute cost for the base 600B parameter model. Training R1 was just 8000 steps of reinforcement learning, so I expect that the vast, vast majority of the training cost is already included in the pretraining stage.
Any source? I’m a heavy user of Claude and pay for the Teams plan just for myself so I won’t get throttled. Love it. But I’ve been impressed with O1 Pro lately. That said, I don’t like paying both €166 for Claude Teams and €238 for OpenAI Pro. :)
> This all by the man in charge of “government spending efficiency”.
Per court filings by the administration, Musk is not in charge of DOGE, nor does he have any role in DOGE, nor any decision-making function in government at all, he is a White House advisor unconnected to DOGE.
Oh my god, could you possible pile on him more specially on stuff that is orthogonal to Grok. You don't need to get sentimental about his role in the government when you want to criticize his approach to AI.
Corruption aside, China is run by smart leaders who execute on a long-term plan, and are gradually extending their influence over the world. The US is doing the opposite.
It’s easy to execute on a long term plan whenever your government is totalitarian run by cult of personality, and doesn’t have any concern for individual rights.
If you think an executive that ignores court orders is going to survive for a long time in America, I am willing to bet any amount of dollars against you at any odds.
It's a good bet for me, because if I lose, dollars won't be worth anything anyway.
I don't understand how and why Grok would be related to "understanding the nature of the universe", as Musk puts it. Please correct me if I'm wrong, but they basically just burned more cash than any human should have to buy Nvidia GPUs and make them predict natural language, right? So, they are somewhat on-par with all the other companies that did the same.
This is not innovation, this is baseless hype over a mediocre technology. I use AI every day, so it's not like I don't see its uses, it's just not that big of a deal.
Answer 1: Some people think that LLMs are a path to the singularity, a self-improving intelligent program that will vastly exceed human intelligence and will be able to increase its knowledge exponentially, quickly answering all answerable scientific questions.
Answer 2: LLM companies need to keep the hype train rolling. I didn't watch the whole clip; I jumped around a bit, but I noticed that every time Musk interjected something, it was to exaggerate what was previously said. "Grok contains the whole internet"—"the whole of human knowledge, actually!"
I think that both answer 1 and answer 2 apply to Musk. He seems to believe that they're building a god-like entity, and he also needs to keep the money train rolling.
this and only this, everything he says when talking about how good his products are, he lies and exaggerates to get investors - from the promise of 2 manned missions to mars in 2024, to a 300 ton payload in space, and FSD.
There's a more short-term goal for Grok, which is to replace what is left of the federal government with AI. That will significantly boost the money train, but is also a utopian (for some, dystopian for others) goal of replacing the expensive 'deep state' with a slim set of impartial algorithms.
This isn't unequivocal proof, but the broad goal automation lends itself pretty strongly to LLMs, and oh boy what LLM technology do you think they want to use.
he may have deleted it but… around 2020 or so there was starting to be a lot of hype about llms. elon musk responded to a “doomer” on twitter saying that he “didnt see the potential for that” referring to LLMs achieving AGI. it was a 100% dismissal of everything he is saying now. at that point elon musk had already been saying publicly for years “AI is more dangerous than nukes.” but he also had voluntarily walked away from openAI which he would never do if he thought there was any chance of AGI. i just want to known the truth… is this really just advanced search and some jobs will be lost because they ended up being nothing more than search tasks (ie coding boilerplate) or are we really on the cusp of AGI (and therefore in a great deal of danger)? its impossible to say whether or not elon musk really believes what he is saying… there are public figures on both sides providing conflicting explanations.
as i watched the grok3 stream i became very angry. so very tired of being jerked around and not knowing whether or not i should be planning for the future or investing in the world as it is now… its really a form of psychological torture
I work in AI and love the technology. But all the hype and grandiose claims make it awkward when people ask what I do, and it makes hiring harder when experienced developers hear 'AI development' and walk away - even though it's mostly just solid full-stack engineering work.
I am always looking for roles, and I have pretty good full stack experience (a few years of C++, C#, some JS, TS, backend and frontend web, C, Zig, Rust, built a few hobby compilers and other stuff).
I apply to pretty much every job that sounds reasonably good in terms of work-life balance, but I completely ignore anything that says AI. I really, really, really do not want to be part of a company that lies to itself, and so far all AI companies look like they are. It's not AGI. It's not gonna be AGI. Ride the hype train, cash out and lay off 80% of the workforce and jump on the next hype train, whatever. But don't hope that people who want a stable job want to hop on something that delivers such a shaky definition of value.
Even if you're an AI-skeptic, it's hard to argue that companies building AI customer support for example aren't en route to improving the whole "calling your ISP's team in India" experience.
I absolutely hate the Elon hypetrain, but I also don't understand the social media hate I see for AI, like comparing every ChatGPT answer to one wasted bottle of water.
Can we stop for a second and just marvel at a new piece of human ingenuity? Let's not give Elon too much credit, but I think that AI as a whole helps us all understand the nature of intelligence, and therefore humans' place in the universe.
One of the fundamental questions of human existence is: what does it mean to exist and think? Every time we build a new human-like thing it helps us understand the context of our own existence. (Not just computers or AI, but also airplanes, factories, etc.)
True AGI would force us to rethink what it means to be a thinking human being, and I think current LLMs already should and do.
> I also don't understand the social media hate I see for AI, like comparing every ChatGPT answer to one wasted bottle of water. Can we stop for a second and just marvel at a new piece of human ingenuity?
I don't know, man. We're staring down the barrel of at best a WW3-event and at worst an extinction-event. We're doing absolutely nothing to stop it, even though we have all the answers and the resources to do so. Instead, we're making the problem even worse all so some marketers and scammers can spend someone else's money to generate garbage pictures and SEO spam, so the worst people on the planet can gain even more money and power than they already have.
I'd love to be positive about this tech, I'm sure it's cool or whatever, but it's really hard to be positive about anything right now, especially when the tech in question is speeding us straight along the path to mass death. The world sucks and the people running the LLM stuff are amoral monsters putting all of their resources into making it worse. I'm not excited about any of this.
What's happening definitely makes me nervous, but "at best a WW3-event and at worst an extinction-event" seems a bit much. Mainly because there are a _lot_ of unknowns. Better try to get comfortable with just riding this out.
It really isn't. Climate change is going to make large amounts of land unlivable. That's going to cause a climate refugee crisis. I agree the effects of that refugee crisis are unknown, but I can't see any resolution that doesn't involve increased nationalism, civil wars, and violent resource conflicts. Given this is a global crisis, that's a recipe for WW3.
This was all avoidable, of course. But instead of fixing it, we spent decades fiddling around with toys like LLMs. Whee.
LLMs don't make me question what we know about humans and thinking. They are really good at convincing us that they're good, but really, that's other humans building stuff to convince us that it's good. There is no intelligence here, other than the perceived intelligence of predicting words intelligent people have written previously.
> There is no intelligence here, other than the perceived intelligence of predicting words intelligent people have written previously.
I think this is my main point- isn't it amazing that a thing that predicts words other humans have previously written manages to appear intelligent, or, more pointedly, have utility in communicating real thoughts and ideas?
If you've ever asked an LLM a question and gotten a satisfying answer, that means that there is some human-level intelligence somewhere in the token filtering / recombinating that an LLM does.
Specifically I think the test of human-like intelligence is literally the output- If we get utility from the arrangements of the tokens it outputs, that in and of itself demonstrates that some portion of human intelligence could be this same token generation mechanic.
No. Just means we are easy to fool. Like apes who see themselves in the mirror and fail to recognize they are seeing themselves in the mirror, thinking it is a different ape (and trying to mate with or attack it).
The invention of the mirror by humans probably provided an interesting insight into our own existence... I wonder what it would have been like to see your own reflection for the first time as a technology. How would that change your outlook on your self-hood and identity?
Great question to ponder. Surely people would have seen their own reflections in water, but the mirror itself would have made "Reflection Technology" for "Artificial Introspection" more scalable. I suspect the mirror offered modern people a new viewpoint, allowing one to see how one is perceived by others. I do not think selfhood and identity would have been affected. My main question is about when people came to behave differently than apes. Douglas Hofstadter's "The Mind's I" may have a few hints on perception of self for you.
It does make me question humans and thinking but in the opposite direction.
It is like sitting down at a piano, sight reading a piece from sheet music and then someone who has no idea what they are talking about claiming you composed the music on the fly. Then when you point out the sheet music they just double down on some bullshit as to why they are still right and that is still composing even though obviously it is not.
Best analogy so far. I am adopting this for the next wave of "wait until the next model" and "but humans hallucinate, too" comments. Yes, when we feed back our own output (language on the web) into ourselves, things become tricky to tease apart, and it would seem like intelligence to us. Then again, the mechanical turk appears intelligent, too. If we point out how it works, then the "magic" should vanish.
Can you list a few demonstrations from a text-outputting computer program that would each convince you that there is intelligence here? Eg writing a bestselling novel, proving an unsolved number theory conjecture, etc. Or is your belief uncontestable?
That's not really a fair question. To answer it, the OP would have to define intelligence. If you have done so already, then by all means, do share your definition. If not, then you are in no better position to claim intelligence than the OP is in claiming lack thereof.
I feel like that's part of what Elon is flexing. Teslabot was late comer compared to competitors like BD.
Elon is showing off he can marshal enough resources and talents to be on par (kinda) with state of the art products in crazy time. That's been most of his superpower so far - not breakthrough tech that didnt exist before. We've had rockets before.
I don't like Elon either, but not only has SpaceX created breakthrough tech that didn't exist by landing an orbital class rocket, as of today still nobody else has done it.
Landing a rocket was considered impossible and unthinkable 10 years ago, and then SpaceX completely changed the game. And they're reinventing rocket tech again with Starship by catching it midair.
The number of people that have the capital and connections required to even attempt such things is very small, so it's not necessarily Musk's abilities that made those things happen, just the combination of having the power to allocate enough resources and an interest in such things.
He started both Tesla and SpaceX when he had "only" a few hundred million to his name and no more connections than would be expected of a Silicon Valley guy making payment software. And lots of brilliant guys, including John Carmack for instance, have tried their hand at aerospace - and failed. Jeff Bezos started Blue Origin before SpaceX was even founded, and it was literally only last month that they finally managed to get a rocket into orbit for the first time. There's a joke in the industry: 'How do you become a millionaire in the aerospace industry? Start out as a billionaire in the aerospace industry!'
And we live in a world of millions of millionaires, and thousands of billionaires. For that matter, even China is trying their hardest to replicate SpaceX tech given all the resources of the world's largest economy, and 1.4 billion people (meaning a proportionally larger chunk of intellectual outliers), and defacto authoritarian power to make it all happen. Yet they remain (in terms of rocket technology) behind SpaceX.
Being the most successful out of three or even a dozen doesn't make someone exceptional. Because so few people with interest in space have "only" a few hundred million, we can't really say if it's actually his talent that made it possible or simply the result of having access to resources that the vast majority of people could never dream of.
The U.S. has a long history of aerospace innovation, from NASA to private contractors, and Musk was able to use this ecosystem. China doesn't have that.
WEF cites a global space economy at $630 billion, alongside investments of $70 billion. [1] And as anybody with half a head on their shoulder can see, space will be where the big future economic growth will come from. Even if somebody has 0 interest in space, which I think is very few people, that's where the next 'big boom' in economics will come from. And SpaceX was started on a fraction of $0.3 billion with Carmack and Bezos just being a couple of names people on here would be familiar with, amongst tens of thousands. Yet no competitor is anywhere to be found.
And the US doesn't have a long history of aerospace innovation. In 1962 Kennedy gave his 'to the Moon' speech, 7 years later in 1969 we'd go from having nothing to putting a man on the Moon. From 1969 (well 1972 in particular) to the birth of SpaceX (early 2000s) US space technology not only stagnated but regressed. This is why Boeing (who was a major part of the original space race) can't manage to even begin to replicate what we achieved in the 60s, in 7 years no less!
Incidentally this is also a big part of what motivated Elon to start SpaceX. He was looking at NASA's future plans for human spaceflight and they were basically nonexistent. So he wanted to launch a greenhouse to Mars and stream it growing, to inspire people and hopefully get things moving in the right direction again. NASA wasn't interested in any such things, the Russians wanted too much $$$, and so SpaceX was born.
When Musk 'joined' Tesla it was a name and two other guys. The latter two of whom left the company before a single car had been produced. They then sued for the right to be called founders a couple of years after they left, and once it became clear the company would stand a reasonable chance of success.
If my memory serves me correctly, they had put some Sony Handicam batteries on a chasis and driven it around before Musk. Musk was there for every actual product and its development.
You don’t understand the Musk business model. It has been the same for years. His wealth doesn’t come from his products, but from his fanbase buying his stock. The purpose of everything he does is to influence the public opinion to make him the tech genius of today in the collective psyche.
Of course, he needs to do impressive things, stuff that a normal person wouldn’t have the resources to achieve. It’s similar to Mr. Beast’s channel on YouTube, just on a way bigger scale. Do things that people can’t see anywhere else.
Musk’s money will come from his fans. And ETFs, trust funds and such will amplify this when he reaches a certain market cap. His crypto coins are the exact same scheme. Once you stop thinking in classic business school terms, it starts making way more sense.
Some of his ventures actually produce value! But that’s not where the money comes from. It comes from the belief, the adoration and the celebrity status that he has.
This is the real power in today’s world. People need to know you from the screen. This clout catapulted him to the government of the US, made him the most wealthy man in the world and given him the license to do anything he wants publicly without repercussions.
> His profits don’t come from his products, but from his fanbase buying his stock.
SpaceX is private, Starlink makes real money from real users.
> Everything he does is done to influence the public opinion to make him the tech genius of today in the collective psyche.
Well that's clearly not right. He's doing a lot of things to make himself seem like a total tool that we should all boycott no matter how good the products are. If he actually did what you say, he wouldn't be burning all these bridges.
The bulk of his wealth is the Tesla stock. I know that SpaceX produces value. Some of his ventures do. But the image of a genius entrepreneur is way more valuable than any government contract he might get.
Remember he was way less crazy before his market cap skyrocketed. Now he can afford being polarizing as a PR strategy once his fanbase has reached a certain critical mass. He’s been constantly testing what works.
He would be one of the richest people in the world even if he didn't start both SpaceX and Tesla, just from what he made from the sale of Paypal. Hell, he'll be one of the richest people in the world even if he didn't start the original X and just lived off the $13m he made from Zip2. $13m still places you in the top 0.5% in the world today.
SpaceX is not where most of his valuation comes from: it principally comes from Tesla stock, which is vastly, hilariously overvalued compared to it's performance as a car company, and definitely compared to it's performance as a technology company.
Even more importantly, analysis of Elon's tweeting patterns versus Tesla stock valuation makes the why Twitter became so central to him obvious[1] - it was a massive driver of Tesla stock value. Buying it was a good move from the perspective that he really couldn't afford to be banned from Twitter.
> Buying [twitter] was a good move from the perspective that he really couldn't afford to be banned from Twitter.
Another good move would have been to not be an ass. I mean, if he really did care about the stock price like this thread implies. Being seen as a genius entrepreneur doesn't imply you should also be a racist conspiracy theory nut. I'll repeat it again: the thesis that he's doing all these things due to competency is absurd. Never attribute to competence what can be sufficiently explained by stupidity.
He's doing these things because he's lost his marbles. Trying to make it out like he's doing it for reasonable reasons is like trying to claim Trump is playing 5-D chess. It's the same as Q-anon logic. It just falls flat against Occams Razor.
Only people engaged in active buying and selling set the price of Tesla stock. It's called "price discovery." Any institution such as an index fund passively holding stock does not effect the value of the stock, so the percentage of institutional ownership doesn't itself matter.
I'm interested in what you're saying about classical business terms. Can you elaborate on that a little? I've always found these kinds of people hard to understand.
The man has almost unlimited wealth and his motivations seem consistently petty and strange or just downright ludicrous. He's like an alien to me.
I've noted the same feeling when seeing VCs/business people speak when I've encountered them.
The classic business is about producing valuable economic outputs and creating a stable revenue flow from bringing them to market.
The modern era post-business is about dipping into everyone’s pockets, by securing cash flow from the stock market and the government.
Here building a profitable business model is less important than convincing people and the government to give you that dough. And the best way to do it is to have clout.
> his motivations seem consistently petty and strange or just downright ludicrous. He's like an alien to me.
I think it could help to try to think of a historical figure that has done impressive things but which you don't have an overly negative view of. A lot of them seem really weird or alien. In democracies, political leaders are (sometimes!) more "normal" because they have to get elected. So think of a CEO/Founder you like (Jobs?) or earlier people (Napoleon? I dunno, pick yours)
Read a bio on them, they're pretty strange (I like Churchill, dude was wild).
It seems that to do extraordinary things you need some traits that make you ludicrous. I don't really know, but it's definitely a pattern
What lie are you talking about? That he supports far right parties? That he did some kind of gesture which the video looks like a Nazi salute but he says was just the my heart goes out to you gesture?
Neither seem to be that the OP was lying, but I understand you have a different opinion than they do.
OP said he did a Nazi Salute. He did not do a Nazi salute.
He did something that people who fetishize the downfall of their enemy desperately want to be Nazi salute, so they call it that, but it was not a nazi salute.
If it was he would have said so, since at that point you are basically showing the world you are a nazi. But that is not what happened.
People really don't take their credibility seriously, and will cry wolf at anything that moves, seriously undermining any argument they make.
There are ample ways to hate on Elon using factually true things he has done. Sticking to those makes your arguments rock solid, and keeps your credibility high. So much is wasted by idiots slinging smoke because it makes themselves feel right.
What if he does it again and just says "My heart goes out to you" and does the same hand throwing gesture? Just curious what you think in that hypothetical.
To me it seemed like he did it to troll 'libs' who dislike gestures like that.
There always exists a difference between what a person thinks of their own motive and how it is seen by others. In this case, while he might say he did one thing, many people took it to mean something else.
That's because here and on Reddit, there is a strong element of EDS (Elon Derangement Syndrome), which is characterized by not being able to discuss anything SpaceX, Tesla, X, The Boring Company, &c. do without completely politicizing it and completely avoiding talking about its technical or engineering merits.
I will gladly give him more money. I have no vendetta against him or his actions, you're free to do as you please but don't enforce your political agenda on everyone.
Same. I support everything that's going on at DOGE. There's a certain type around here that needs to get used to the fact that their political views do not enjoy industry wide hegemony.
What's wrong with DOGE? How is eliminating bureaucracy/spending a bad thing? This has been done about 100 years ago, research on what happened after. It feels like people that scream the loudest are the ones leaching from the system.
- Much of the spending they're eliminating is good and important (E.g. USAID).
- The way they're cutting is reckless (They accidentally fired nuclear safety workers).
- Many of the workers are Musk sycophants, who were hired from Twitter/Tesla/SpaceX.
- There's a tremendous conflict of interest in this agency being run by a massive government contractor (NASA and the military are both avid SpaceX customers).
- The workers are not experienced with the data they're working with, and misinterpret it constantly in public before verifying it.
- Despite claims of "transparency", Musk asserted that it was illegal to publish the names of his employees.
- Their one product, their government spending portal, is a worse version of a spending portal which already exists, and they didn't even secure their database.
- They say they "use AI" for their decisions. Which AI? Where is confidential data going?
The only "crisis" in US spending is that each party keeps shutting down the government when they don't like what the other one is doing. That's not going to go away with balanced books, but it sure is a recipe for disaster. It stops being a democracy when the decision making process stops being about majorities and starts being a game of "whoever doesn't blink sets the rules".
The country prints its own money; and right now it's the world reserve currency, giving the US a huge advantage when it comes to borrowing whatever it wants — the biggest threat to continuing to be the world reserve currency right now, is that the scale of cuts being talked about can only be met by cutting at least one of interest payments on government loans or things the US government considers to be "mandatory" such as pensions, social security, etc.
There are both immediate and long term consequences to the debt. In the short term we are paying a large share of revenue into debt servicing. We are all working months a year to pay interest. In the long term, we are impoverishing the next generations.
The govt can debase the currency by printing money. That only impoverishes people , except for the wealthy, causing greater wealth gap.
> There are both immediate and long term consequences to the debt. In the short term we are paying a large share of revenue into debt servicing. We are all working months a year to pay interest. In the long term, we are impoverishing the next generations.
Only if your economy doesn't grow.
> The govt can debase the currency by printing money. That only impoverishes people , except for the wealthy, causing greater wealth gap.
It impoverishes lenders and savers, but not borrowers. It's not as simple as wealthy or poor, as any can be any.
that's not what's happened in the past 5 years. assets have ballooned. cost of living has skyrocketed. wages have not kept pace. Rich got richer, poor got poorer.
How is growing dept a good thing? How is cutting costs a bad thing? Have you seen Argentina example? How did it turn out? Can the same thing be done in USA? Only reason to say no is because someone is directly profiting from not cutting costs.
If you know Musk's story then you know why he is the only best candidate to do so.
You can afford it, it's fairly close to a neutral thing, for a government to have debt.
Right now, the US gets to set the terms for those loans.
> How is cutting costs a bad thing?
Consider weight as an analogy: Most people can do with loosing weight, losing weight by getting enthusiastic amateurs to perform a quadruple amputation is not advisable.
Musk's target can only be met by deleting your entire armed forces.
And then you have to find another $1.2 trillion.
So the military and the entire CIA, FBI, NSA, DHS, NASA, Federal Highway Administration, FAA, the Department of Agriculture, the Department of Commerce, Department of Justice, the Treasury, …
… all that plus the military still doesn't add up to Musk's target.
Unless you want to cut stuff that's considered "mandatory" (like military pensions), or the interest repayments on the very loans you wish you didn't have.
The world’s richest man is gutting the regulatory bodies which were designed to keep him in check, and you’re defending it. I’d say you’ve drank the Kool-aid.
No, seems like you are defending cost cutting and it doesn't make sense why are you so against it? USA is democracy, if after 4 years things go south you can always vote different and get everything back the way it was. Most probable outcome here is financial situation in USA will get better, you make it sound like USA would collapse to stone age era, it wont.
We have a mechanism for that: it's called congress. DOGE is an executive abuse of power that's resulting in national security critical roles being fired and then attempted to be rehired because they're so incompetent.
Article 2 says “the executive Power shall be vested in a President of the United States of America.” That means the power to “enforce laws, manage national affairs, and oversee government agencies.”
It’s Congress’s role to allocate funds to certain purposes, and the President’s to “take Care that the Laws be faithfully executed,” including overseeing the bureaucracy that implements them.
The President already has dictatorial power over the bureaucracy, as per the Constitution.
Not over spending, which Elon's seizure of payments infrastructure has made.
Unless you extend it to that, in which case why have courts? Its not like they have enforcement power and the president can stretch discretion to the limit.
I don't think that's what DOGE is doing. Seems extremely vindictive and ideological in the way it's acting and time will tell but I would not be surprised if it ends costing the taxpayers more in the long run.
Argentina and US are very different countries, starting these cuts with very different economic realities. For example, 55% of all registered workers are employed by the government in Argentina. Although not a directly comparable metric (since in the US you also need to account for state and local civil workers), the US federal government employs around 3 million people. That's just 1.87% of the entire civilian workforce.
Again, DOGE operates from the premise that the federal government is bloated. Although this is a very popular message, I'd love to see some more objective data to support this and I doubt that CDC or USAID are the agencies where the bloat is. Like I said, their actions seem vindictive and careless. Also, likely to result in legal cases that will drag for years and end up costing taxpayer more than the supposed savings.
The main con is that once you fire the workers that you thought you didn't need (but that you did indeed need) hiring them back becomes more expensive and a lengthy process. Some of the firings are already causing chaos in vital teams among several agencies and have forced DOGE to try to reverse course (bird flu monitoring, nuclear response...).
And that's not to mention the dire situation you put the people you are firing in. Laying off people from their jobs is never "an experiment" unless you are willing to suspend every trace of empathy.
Eliminating waste is a great idea. But it’s unclear that that’s all he’s doing, it’s unclear how or how well it’s being done, he’s brought in people without security review (which means they, and the systems they are opening up and creating, can be more easily compromised by our enemies), and he has enormous conflicts of interest.
Note that DOGE fired, and is struggling to rehire, the team whose job was to mainain America's nuclear arsenal.
Also note that the stated goal of DOGE, $2T, exceeds the entire discretionary budget of the federal government, even though half the discretionary budget is your entire active military.
Even treating $2T as a stretch goal, eliminating literally everything but the military from the discretionary budget doesn't reach the lower $1T that Musk says he has a "good shot" at.
Cuts on this scale directly, all by themselves, even aside from all the other economic diasters that America expressly chose to vote for as a package deal with this, will shrink the US economy about 6%.
US Americans seem quite green, have never had this done to them before
on such a scale and so haven't seen the trick before. The art of the
hostile takeover. [0] Curtis documents it well, but at tedious length
unless you're British, on the Slater, Goldsmith and Rowland gang.
"Efficiency" is an entry point, a common bullshit word that's a
perfect cover for hostile takeover, because nobody argues with it -
it's a "STFU and agree" word [1].
There's nothing wrong with it. Our current deficit is increasing something like 1 trillion every 3 months. Getting creative to reduce government spending seems necessary at this point.
All the hysteria over this is just partisan politics as usual.
stop falling for branding and actually concentrate on the numbers: spending is going up, not down, and only touching entitlement programs, the military, or offsetting monetary loosening (via debt interest) will change that.
DOGE has revealed that 400 million are rookie numbers when it comes to phony deals.
Forcing the government to spend money has always been the infinite money glitch.
On one side you have healthcare and pharma companies making sure everything is excessively overpriced, then they lobby the government to make sure everyone has government sponsored healthcare, i.e.: turn all tax revenue into healthcare revenue. Then they pay the media to convince everyone that it is their moral obligation to subsidize $1000000 insulin while making it a taboo to ask why healthcare is so expensive.
On another side you have mass incarceration where each inmate costs more per night than a 5 star hotel.
On another side you have nonsense conflicts where the weapon of choice are thousands of single use weapons that cost at least 100,000 each. Or to simply leave it behind for the enemy so then it has to be repurchased.
On another side you have tax loopholes of billions of dollars.
Everyone is stealing. Did you pay 30% tax and then sales tax on everything you bought with your income? Is your effective tax rate around 50%? then you worked 6 months for the government so they can take that 6 months of your life and turn it into a dumb single use weapon to destroy a $1 tent.
- healthcare and pharma is overpriced because of information frictions, institutional supply constraints (this mostly means strict immigration controls), and people just really want healthcare relative to other wants! See: https://www.reddit.com/r/badeconomics/comments/1gsezga/comme...
also biden made insulin capped at a price so.
- mass incarceration is a SUPER populist thing! How many times do we hear "we need to be tough on crime"? This sure seems like the voters behind it, it's not like people are clamoring for shorter sentences.
- Indeed, our military is expensive partially because we require domestic production and have to pay really high domestic costs, and partially because the way the US fights war places a SUPER high value on human life. Desert storm was both expensive and only resulted in ~200 coalition deaths to take down the world's 4th largest military, whereas Russia has made it not very far into ukraine and taken over 200k deaths (and it hasn't even been substantially cheaper to boot, just a bit cheaper).
- The tax loopholes exist and are bad, although I challenge you to give me specific loopholes that cost high billions and should obviously be repealed in a way that both constituents are clamoring for and the representatives don't actually do. I don't think they exist.
You know what does actually degrade the fiber of the country? The richest person in the world taking personal control over every payment and arbitrarily destroying institutional knowledge by firing every government employee he has control over or who stands up to him. But no, instead we get "he's saving money" when (see the earlier comment from me) we're not even making outlays go down! A script kiddie who randomly rewrites lines into bad assembly while destroying the build system isn't a perf eng, they're a danger.
In the US healthcare system a bag of water with salt costs hundreds of dollars. You can cap the price of one thing and then the whack a mole game starts.
The conflicts in the middle east costed trillions of dollars and there is absolutely nothing to show for it.
Quantitative easing costed trillions of dollars, most of the people responsible for causing the crisis got a big pay day from it.
If someome became "the richest person" probably you can learn something from that person. Without SpaceX, the US would have to use Russian rockets to put stuff in space because NASA shuttles were retired. Is that something you would like more?
The US is at the verge of bankruptcy and it is not because of $400m in trucks.
And it is not a Democrat or Republican issue, as I said, everyone is getting rich at the expense of the taxpayer, even taxpayers that haven't even been born yet are in debt already thanks to a wasteful mentality.
I work, and the fact that my tax money is going into a black hole makes my blood boil. God bless Musk and DOGE for what they do.
Here is just one headline from today,
The Elon Musk-led Department of Government Efficiency (DOGE) on Monday revealed its finding that $4.7 trillion in disbursements by the US Treasury are "almost impossible" to trace, thanks to a rampant disregard for the basic accounting practice of using of tracking codes when dishing out money.
Will you go on the record and say that $4.7T in a year is fraudulent or misspent? I want to be crystal clear with what you're insinuating, because that's a massive amount of money, easily the biggest fraud of all time by a factor of almost 30.
I don't understand what this means though, almost all of our money passes an audit which necessarily has a paper trail. The few agencies which don't usually have very idiosyncratic audit misses which are, in any event, overseen by inspectors general (or were until trump fired all of them) which have been very zealous to jump on this.
so then misspent? Must be, because if it's not misspent than impossible to trace is a little irrelevant. It can't be unaudited, because every department passes an audit every year (except the DoD, but they basically pass an audit and the reasons they currently don't are mostly technical)
Can you not find it? I can find basically any spending data I want at the tip of my fingers (well, less so now that it's unclear what's being paid) - anything specific that you feel is missing that you want to see?
> Here is just one headline from today, The Elon Musk-led Department of Government Efficiency (DOGE) on Monday revealed its finding that $4.7 trillion in disbursements by the US Treasury are "almost impossible" to trace, thanks to a rampant disregard for the basic accounting practice of using of tracking codes when dishing out money.
And you believe them?
This is a department that fired multiple different nuclear weapons inspector and maintenance teams without knowing what their job was.
Had to re-hire them. They weren't redundant teams. DOGE just didn't understand what they (or the teams) were doing.
Now, I'm very happy for the US nuclear stockpile to shrink. I sure think you have too many of them. But then, I'm foreign and a hippy, so I would. But (1) do you?, and (2) do you want it to shrink by some of them accidentally exploding? Or being relocated by a hostile power taking advantage of the inspectors all being on early retirement?
I am not jumping to conclusions and will reserve the judgement for later. They provided no proof so far, but hopefully it will be forthcoming, and I would not dismiss their claim outright.
As I'm asking you if you want this done, take your pick.
Loss of oversight made a bunch of USSR suitcase nukes, ~100 or so, go walkabout when they collapsed. Russia denies this, of course. They might be fine, or not, nobody (in public) really knows. Probably not a huge risk without maintenance, if you nick it but don't know what it is you might scrap it for parts and mistake the core for tungsten or lead, but… not great, unless it was existing nuclear powers who took them.
They deny that, but of course you know for sure that they are lying, that the nukes went missing, and you have the proof. Just like I know for a fact that there are alien craft hidden in Area 51.
It was a statement made by General Aleksandr Lebed, former Secretary of the Russian National Security Council, in a meeting with an American congressional delegation.
Perhaps he was drunk, or lying, or just plain unable to find the people who knew which cupboard the devices were safely locked in. But he did make those claims. And you are missing the wood for the trees.
This "wood" (and the US report) consists of exactly one person who made this claim, and a member of corrupt Yeltsin's entourage to boot. I'd say if these nukes were real, they would have exploded somewhere by now.
Try harder.
Most people, based on my experience, would rather support Elon Musk than support a strategy recommended by an individual who believes that the MSM should be trusted.
Not to mention that most of your sources to support your points are from far left MSM sources.
Your reasoning is exactly why there is so much support for Elon Musk. You probably made more Elon Musk fans just by your post.
Most people seem to believe the government is broken and MSM is a huge reason for this.
> Most people seem to believe the government is broken and MSM is a huge reason for this.
Is it because Fox News, the most influential channel of the so-called MSM, constantly repeats conservative talking points about the alleged inefficiencies of the government and downplaying the government's important work in protecting citizens?
Americans will soon get to experience what a real broken government is like, and I hope it provides them an education.
People who buy into Grok are willingly submitting themselves to the far-right propaganda machine. I’m sure it’s nice and tidied up for release, but there is zero chance that Musk will not use this tool to push his ideological agenda given its reach and impact.
Well, if you see any kind of propaganda, you can denounce and I’m sure this community will respond just like we responded to china propaganda and censorship on DeepSeek.
No good will come from of denying progress just because you don’t like someone else is opinions and worldviews.
> My question is how may of you are actually willing to give Musk more money after the questionable, legal, and ethical behavior he's exhibited while working for DOGE.
Oh yes, far better to give to alt-man, google or Facebook - those are morally responsible companies!
Far better to give elected representatives this responsibility, IMHO. This is bizarre. There are a lot of billionaires jockeying for influence and resources here. It's almost like a free for all. Musk could also use his position to force Sam Altman's hand in the acquisition of OpenAI, https://www.wsj.com/tech/elon-musk-sam-altman-relationship-6.... I'm not interested in either party, but it's clear there are huge conflicts of interests here. Musk also expressed disappointment when not getting a piece of this pie, https://www.axios.com/2025/01/22/stargate-elon-musk-trump-al.... I've also read more concerning material regarding JD Vances connections with Peter Theil and their interests in side lining the constitution for some other efforts prepping for "networked states", https://www.nytimes.com/2025/01/18/magazine/curtis-yarvin-in.... Much of this is impossible to follow closely. Like I said, this administration seems to be flooding the zone with shit to distract others from what their real intent is. I think it's worth vetting and questioning positions in government, you can't just blindly trust these ppl. Something seems really off. I say question everything at this point. I don't trust billionaires to fix the worlds problems. Democracy and the constitution should be upheld and well guarded.
I mean relatively speaking yes, only one of them is acting as an unelected dictator, circumventing our whole constitutional appropriations process by taking direct control over payment infrastructure.
Because running a company is different to running a state and we doubt his intentions, not his skills.
Someone that is not elected gets 8mill/day from the government now overseas the government with some 20-year old fanatics that can’t even put up a secure website for DOGE?
Yeah, most of the government isn't elected, that's how it works. You only vote for 3 roles in the Federal government (I guess 4 if you count the veep). The rest of them are hired.
For those downvoting, the roles are President, Vice President, Senator, and Representative.
Article II, Section 2, Clause 2 of the US Constitution:
""[The President] shall nominate, and by and with the Advice and Consent of the Senate, shall appoint Ambassadors, other public Ministers and Consuls, Judges of the supreme Court, and all other Officers of the United States, whose Appointments are not herein otherwise provided for, and which shall be established by Law: *but the Congress may by Law vest the Appointment of such inferior Officers, as they think proper, in the President alone, in the Courts of Law, or in the Heads of Departments.*"
Emphasis on the last sentence. There have been a plethora of such vestments in the Executive branch over the decades.
5 U.S.C. § 105 - This statute authorizes the creation of "necessary agencies" within the Executive Office, giving the President flexibility to establish entities like the USDS and staff them as needed, subject to funding.
5 U.S.C. § 3101 - This law states that "each Executive agency" (including the EOP) "may employ such number of employees ... as Congress may appropriate for." It implies broad authority to hire staff, with Congress controlling the budget but not necessarily the individual appointments.
Excepted Service Authority (5 U.S.C. § 3301 and Schedule A) - Under 5 CFR § 213.3102, agencies like the OMB can use Schedule A hiring authority for positions requiring specialized skills (e.g., tech expertise) that aren’t practical to fill through standard civil service exams.
This authority, delegated by Congress via the Civil Service Reform Act of 1978 (Public Law 95-454) and regulations from the Office of Personnel Management (OPM), allows the President (or OMB leadership) to appoint USDS personnel directly.
Term Appointments - Many roles are temporary or term-limited (e.g., 2-4 years), often filled by detailees from other agencies or private-sector experts. These don’t require Senate confirmation because they aren’t permanent "officers." This flexibility is supported by 5 U.S.C. § 3161, which allows temporary organizations within the executive branch to hire staff for specific projects.
Thanks, I’ve read the constitution too. Ever heard of congress, full of elected officials? Or the senate? Your claim that there are so few elected officials is patently absurd.
Because he and his organization have demonstrated ignorance of the services he's not only auditing, but making pretty substantial cuts to. One example I'm familiar with, cutting up to 10% of the personnel to the Technology Transformation Services at GSA is quite likely to reduce the efficiency of both government and private sector government contractors.
Because he owns companies that contract with the government and are affected by its policies. It's the very definition of a conflict of interest.
And he's not even "auditing" the government. When you're auditing, you emit a report that the audited party later analyzes and acts upon. He's been given freeway to fire government workers as he pleases as if he's an elected officer, which he's not.
> He's doing everything he's doing by executive order of the President of the United States, who was elected.
And already has a number of lawsuits started because he's trying to do things neither he nor the president are allowed to do. Getting an EO to do something doesn't mean it's automatically legal. Multiple big decisions have already been reverted or are held until judges can review them. Even things like the promised payout for quitting are not practical, because only congress can approve the money for that.
There's lawsuits naturally as lawfare is a normal part of modern politics. All the laws necessary to do payouts for voluntary separation already exist, as long as it fits within the budgetary appropriations already set by Congress.
I'm sure he's trying his best. But I don't doubt that, even if not doing it on purpose, he will mostly cut departments and services that do not hurt him or indirectly benefit any of his many businesses.
He, a single person, has far too much control of our system.
It's lost 80% of it's value in 2 years, which usually isn't great. The most charitable view of X/Twitter is that it's now a propaganda platform that Musk doesn't mind taking a loss on in order to enact political change.
Im pretty sure he doesn't like that he is not able to make it more successful but i don't believe he regrets it.
He would have regretted it if it wouldn't have played out (and this game he is currently playing, is not finished yet) like it currently does. He said in an interview that he puts everything on one card now.
Edit: Also he gets a lot of valuation due to him being a cult or whatever. From companies surviving the AI phase we are in right now, if he can't get the nazi people on his side to buy his stuff, he is a very high risk.
He destroyed twitters brand and bluesky emerged. He destroys Tesla and other car makers making ground. SpaceX needs a lot of subsidies and his goal for mars is only a cult topic not a financial success topic.
But are they well-versed in the things accountants specialize in? Is there a possibility that not every programmer can be a good accountant, or that accountants know things you're unaware of when you wrote that statement?
Even better, they're well versed in things that accountants aren't. When you're auditing trillions of dollars in spending, it helps to have software, data science and analytics experts that can use modern tools beyond COBOL written 62 years ago.
They can use data processing, detect anomalies better, leverage AI models, automate data extraction from analog records, ingest unstructured data like emails and memos, build complex financial dependency graphs, detect leaks, build custom scrapers, etc etc.
I'm sure there's at least one accountant in the loop, but you really want the team to consist mostly of data nerds.
What about GAAP/IFRS? How do you endow these software engineers with knowledge of common patterns of fraud or leaks so they can actually write the correct software to find them automatically? How do they identify material misstatements?
You also seem confused; COBOL might be used, but it isn't the only tool available to accountants working for the government. COBOL is a straw-man. What you're describing here—software engineers who presumably have training in accounting—already exists, and they work inside and out of the government. This is an existing career path.
You're speaking about this as though you know a better way to do something, but it's already happening, and has been for years. Accountants aren't writing 62 year old programming languages waiting to die in their chairs while the world continues to progress without them.
Accounting just about anywhere you find it is already accomplished by accountants, some of them technically trained, as well as data scientists and software engineers. It's an interdisciplinary collaboration in any serious organization.
As a person who works with data and has done both consulting and product building in Data Science and lack of domain knowledge is what makes or breaks the end result. Too often do technical people think they know better and then build mediocre solutions that don't get used.
An audit accountant can ask an engineer to implement whatever is needed to achieve a goal they understand. An engineer with no finance background will have no idea where to start or what questions they can ask an accountant.
You can easily get drowned by a see of numbers and get confused and gaslighted, unless you don’t make sure all data is available and computable.
Not sure how this release, which impressive by all means transformed into an attack on DOGE which is the exact approach startups are taking to disrupt an industry.
No one doubts his abilities, and reasonable people are grateful for his work with DOGE and support of free speech. Unfortunately, this platform has become an echo chamber for mainstream media, merely repeating news and links from sources like verge/bbc/politico etc. This is just a bias in the user groups. Still, we should hopefully put politics aside and focus on more tech related subject in this website :)
I think a lot of that comes from people thinking the eye-catching memorable views they've read are the most common views. When they're really not and are mire likely just the views of the most passionate 1%.
For the same reason I doubt Einstein's abilities as a painter. History is littered with the stories of smart people who tried to treat government dysfunction like corporate dysfunction and failed spectacularly.
He bids for contracts just like anyone else and most of those contracts were won under Democrat presidents, but I digress. If you're saying government contractors can't work for the government, then you're going to have to explain the military-industrial complex to me.
It’s not a doubt of abilities, it’s a doubt of his interests aligning with the interests of US citizens.
Here is an unelected NAZI and ILLEGAL IMMIGRANT (worked illegally on a student visa) who did a seig heil at the presidential inauguration taking up an informal unconfirmed-by-congress department head role (DOGE is just US Digital Services renamed) and getting wide access to government systems, and seemingly firing thousands of government employees.
Billionaires, who should not exist, are so rich that they don’t need government services and would rather the government go away so they can make more money. But regular people do need a government, and that’s just one reason of many why Elon shouldn’t be anywhere near policy decisions.
Now, you might say I’m being dramatic. But I’ll say there is no criticism of this man is unfair. He is one of the world’s biggest hypocrites, along with the other MAGA Nazis in his camp.
The (unelected) richest person in the world, with a sprawling business empire that has many interactions with the federal government, has been given free-reign with no oversight to fire any federal workers that he wants and has usurped Congress' power of the purse by stopping Congressionally appropriated spending.
This creates numerous conflicts of interests and opportunities for self-dealing.
Consider a NASA employee that is awarding government contracts. They know all of the above. They have three bids in front of them. One from Boeing (lol), one from SpaceX, and one from Axiom Space. They NASA employee thinks the bid from Axiom Space is the best value and fits the requirements the best. But will they select them, or will they select SpaceX, knowing that they could be fired tomorrow by Musk's whim?
Repeat this scenario across every interaction any of Musk's companies have with the federal government.
This isn't a novel scenario. Putin's Russia is a great example of what happens when oligarchs are granted significant autonomy over organs of the government. It is a system designed to facilitate corruption.
You could assuage my concerns, though, by describing the ways that there is effective oversight over Musk, or by describing the anti-corruption anti-self-dealing measures that have been imposed on Musk. The Press Secretary gave a statement on this saying: "As for concerns regarding conflicts of interest between Elon Musk and DOGE, President Trump has stated he will not allow conflicts, and Elon himself has committed to recusing himself from potential conflicts." That...does not resolve my concerns.
> Fixing the deficit is the opposite of looting, actually.
I think this has two errors.
First, I don't agree that he's fixing the deficit. I think that's an assumption not in evidence. We'll see in a few years time, though. I'd be willing to bet in 4 years the deficit is > 0, and likely larger than it is today.
But let's assume arguendo that he is fixing the deficit. It's still possible to loot the treasury while fixing the deficit, which shows that they aren't actually the opposite.
Consider this example with completely made up numbers:
Before
- Revenue: $1T
- Defense Spending: $500B
- Benefits Spending: $1T
- Public Services Spending: $499.9B
- Government Contracts with Musk's Companies: $100M
The before scenario has $1T in revenue, and $2T in spending, for a deficit of $1T. Now, let's allow hypothetical Musk to have free-reign to "fix the deficit"
After
- Revenue: $800B
- Defense Spending: $300B
- Benefits Spending: $300B
- Public Services Spending: $100B
- Government Contracts with Musk's Companies: $100B
In this scenario the deficit has been reduced to $0, while Musk has enriched himself and his companies with $99.9B in government funds. This would be an extreme example of Musk looting the treasury, while still completely resolving the deficit.
There is no solid evidence of a path to fixing the deficit at the moment.
There is no evidence of this happening, nor of a serviceable plan to do so.
All recovered expenses, to date, add up to a laughably small amount, and are one-time cutbacks. The strategy shows signs of costing the government in unexpected ways as well.
Most governments of developed nations operate in more sensible ways with clearer plans than this. I won't claim they are looting, but it's absurd to suggest they are fixing the deficit at the moment. The economy appears to be getting worse, not better.
Nazis would incarcerate people into work camps and turn them into soap or hang them in public squares, kind of different than what Elon did. There's a pretty big gap between doing something in poor taste like Nazi salutes in public because you have the intellectual maturity of a 12 year old edge-lord on Xbox live seeking attention, and being an a actual Nazi committing crimes against humanity.
When everyone goes around calling everyone they hate a Nazi, it only desensitizes people to the real Nazi behavior, kind of like the boy who cried wolf, since there's people out there committing actual atrocities against humans going under the public radar because they never do the Nazi salutes on camera to not draw attention. So then the Nazi term starts to loose any meaning, kind of like the overuse of calling everything "woke" today.
The problem is people as a whole are retarded due to mob behavior and too focused on optics and only judge based on feelings rather than facts, that's how we have actual criminal Nazis going free under the radar while innocent people being swatted and doxxed because they said something right wing on social media. Not all Nazis today wear jackboots and do heils, plenty go about appearing like normal people in public, they could even be your neighbor, police officer or local congressman.
So save your anger for those people instead, as Elon is just a 3 year old throwing tantrums seeking attention, annoying but relatively harmless. If people stopped giving him so much attention, he'd stop doing it.
What makes you so sure/special in this regard? What are you gaining from this? If your were that enlightened as you claim, Germany wouldn't be in such a mess right now. If you only spent as much effort in securing your borders, energy independence and defence, as you spent lecturing others on imaginary Nazis and banning hate speech on social media, you'd be a respectable world power right now at the table with Trump and Putin ending this war before it even happened. The "we know better than you" arrogance is Germany's biggest problem.
>This is basically blaming the thing criticized on the people criticizing it.
It's not blaming, I'm just telling you what the simplest solution is. Ignoring attention seekers is better than giving them more attention. Which is why I'm also gonna ignore your future comments form now on.
Germany? the scope of the discussion is already limited to the AfD. What makes Germans special in their opinions about Nazis in Germany to is deep experience and knowledge with the subject you are belittling. And I'm also not spending any time banning hate speech on social media, heh. Whatever chip on your shoulder you have about Germany I can barely even decipher and you're right, it's best to agree to disagree.
I find it interesting that these two descriptions of news are treated as equivalent, where I think they're almost opposing:
"raw, unfiltered news"
"real, trustworthy news"
Raw and unfiltered almost cannot be "news" (by my definition of what I go seeking for as "news"). X provides raw, unfiltered information. But real, trustworthy news almost requires filtration in order to be deemed trustworthy.
I think the theory is that community notes are a more neutral way to tag bad information, compared to whatever the new york time and fox news are doing.
People actually trying the model report that it does not say anything like this when asked the same question. Elon somehow prompted the model to bash The Information for his screenshot.
I didn't know about this. But I asked it if Elon Musk and DOGE randomly firing as many government workers as they can from all federal departments might be dangerous to Americans and it was pretty honest that yeah, it could be.
They've been training the model for a while, right? It's unlikely he could have known Trump would let him rampage through the federal government when they started training it.
It's from the opening lines of the closing segment of the the video you linked. Not a bad discussion of the unfolding of news and social media responses follwing the shooting of Donald Trump's ear on the campaign trail.
Why did we start talking about news as "legacy media"?
I mean, at least it has journalistic standards and some semblance of fact checking compared to social media which has given us great gaffes such as identifying the wrong Boston Marathon bomber and getting the poor guy to kill himself, wrongly identifying the Hispanic white supremacist shooter, and many many more.
When someone here says they don't like the news I assume it's because the only newsworthy topic in their mind is "Just how COOL is the new Iphone? Very!"
Twitter is well suited to deliver the newest developments on this topic.
Mainstream outlets have viewerships that compare disfavorably to those of top youtubers. That has been the case for many years now. The only reason most mainstream outlets exist is the taxpayer money train that's coming to an abrupt stop as I write this.
Mainstream news had viewerships disfavorable to sports games and sitcoms. So what? It's not entertainment. It's informative. They of course have different viewerships
The statement is true irrespective of the number of views. If you watch the news you are misinformed. You believe things that just aren't true at all. From time to time your view of the world clashes with reality, and you reject the reality and substitute it with your own. Many such cases.
I’ve been putting Grok 3 through some challenging test flows, and it’s impressing me. Not flawless, but the quality is generally exceptional, and the speed? Lightning-fast.
It’s delivering answers on par with the other top models on my back-pocket tests and in a fraction of the time. The integrated search is solid overall, though it occasionally misses key resources—likely a web indexing quirk. That said, its ability to ingest content from any website when you provide the URL is a standout feature.
The analysis is razor-sharp. It’s picking up on nuances—and even uncovering entirely new angles—that other models have overlooked. I just posed a tough hypothetical about a new line of business I’m looking at for my consulting service and it identified an entirely new sphere of possibility that I hadn’t seen nor had any of the other top models (not Gemini 2.0 Pro, OpenAI’s o1-Pro and o3-mini-high, or Claude 3.5 Sonnet).
An API will be needed for real programmatic testing and certainly for integrated business applications and workflows, but I’m told one is coming in the next few weeks.
It’s still early (we’re talking hours here), but so far, I’m digging it.
Its because of the em dashes (- is a normal dash, — is an em dash). Very few real people use those outside of writing books or longform articles.
There's also some strange wordings like "back-pocket tests."
It's 100% LLM generated.
What is much scarier is that those "quick reply" blurbs on Android/Gmail (and iOS?) will be able to be trained on your entire e-mail and WhatsApp history. That model will have your writing mannerisms and even be a stochastic mimic of your reasoning. So, you won't be able to even realize a model answered you, not a real person. And the initial message the model is responding to might be written by the other person's personal model.
The future of digital interactions might have some sort of cryptographic signing guaranteeing you're talking to a human being, perhaps even with blocked copy-pasting (or well, that part of the text shows up as unverified) and cheat detection.
Going even a layer deeper / more meta: what does it ultimately matter? We humans yearn for connection, but for some reason that connection only feels genuine with another human. Whereas, what is the difference between a human typing a message to you, a human inhabiting a robot body, a model typing a message to you, and a model inhabiting a robot body, if they can all give you unique interactions?
Everyone who uses a compose key has it available (via ---) — I do. You mean the em-dash though, not the en-dash, and Davidzheng is using hyphens for approximation, not en-dashes.
It’s gracious of you to say that you’d be sorry, and I did run my comment through 4o (perhaps ironically) which caught a slew of typos and weird grammar issues and offered some improvements. But the robotic sound and anything else you don’t like are my own responsibility. Do you, perhaps, have any thoughts on the substance of the comment?
That's discomforting. My practice of sprinkling em-dashes like salt on a salad dates from my early days on various video game communities' forums. They comfortably mimic interrupted speech in writing. I hope I won't have to soon defend myself against accusations of AI usage just because I belong to the minority that read The Punctuation Guide[0] or a related resource.
It's really the em dash along with superfluous language. I suspect you are fine. Models like 4o have a very specific pattern when folks don't specify their writing style.
- Very 'forced' expressions (back-pocket tests, 'The analysis is razor-sharp')
- The fact you're glazing AI so much means you probably uses it, it's like how it was with crypto bros during all the web3 stuff
- Lack of any substance, like, what does that post say? It regurgitates praises over the AI, but the only tangible feature you mention is the fact it can receive an URL as it's input
Hmmmm it is hard to really place the issue. I am very much in the bullish on AI camp but I don't like writing for the sake of writing and some of the models (4o in this case) have very obvious tells and write in such a way that it takes away from what substance may exist.
One thing that concerns me is when you can't tell whether the comment was authored or just edited by AI. I'm uncomfortable with the idea that HN threads and reddit comments gradually tend towards the grey generic writing style of LLMs, but I don't really mind (save for the prospect of people not learning things they might otherwise!) when comments are edited (i.e. minor changes) for the sake of cleanliness or fixing issues.
I just re-read the post twice and I couldn't find any of the points you mentioned (again, other than using URLs in the input):
- Informal Benchmarks: I'm sorry, what? He mentions 'It’s picking up on nuances—and even uncovering entirely new angles—that other models have overlooked' and 'identified an entirely new sphere of possibility that I hadn’t seen nor had any of the other top models'. Not only it is complete horseshit by itself, but it does not benchmark in any way or form against the mentioned competitors. It's the exact stuff I'd expect out of a LLM.
- Real-World Test Case: As mentioned above, complete horseshit.
- 2 Concrete Features: Yes, I mentioned URLs in the input. I didn't consider 'Integrated Search' (which I'm assuming is searching the web for up-to-date data) because AFAIK it's already more or less a staple in LLM stuff, and his only remarks about is is that it is 'solid but misses sometimes'.
Not really, as pointed out by others in the thread. Anecdotal of course, but I use em dashes all the time— even in emails and texts (not just long-form writing).
I often write things I want to post in bullets and then have it formulated better than I could by an LLM. But its just applying a style. The content comes from me.
My wife is dyslexic so she passes most things she writes through ChatGPT. Also not everyone is a native speaker.
TBH I've recently felt like that for ~70% of 'top-level replies' in HN, which has slowly pushed me to other mediums (mastodon and discord).
Could just be that the AI 'boom' brought a less programming-focused crowd into the site and those people lack the vocabulary that is constantly used here, who knows.
I'd go out on a limb and say I think probably LLMs made the general population aware of how the "general voice" feels/looks/reads like.
So rather than a lot of people adopting to write like how a LLM writes, the LLM writes as an average of how people been writing on the internet for a long time. So now when you start to recognize how "LLM prose" reads (which I'd say is "Internet General Prose"), you start to recognize how many people are writing in that style already.
I've been in the internet since the early 2000s, I can assure you it does not write like how 'someone on the internet' would write. And when I say that, I mean that for both sides of the internet: it doesn't sound like how 'old school' internet folks would write, but it also doesn't sound like how teens talk either. Neither of these groups write in 'very plain' English regurgitating useless information.
Recent trends/metas in video formats like tiktok and shorts encourage that kind of 'prose', but I haven't seen it being translated into text format in any platform, unless it's written by LLMs.
My point wasn't that it writes like any specific groups, but a general mix-match made up of everyone voice, but a boring average of it, rather than something specific and/or exciting.
Then of course it depends on what models you're talking about, I haven't tried Grok3 myself (which I think you're talking about, since you say "it"), so I can't say how the text looks/feels like. Some models are more "generic" than others, and have very different default prose-style.
Here’s the conclusion of a much more refined initial review by Andrej Karpathy [1] which, I think overall, comports with the substance of my own hot take:
“As far as a quick vibe check over ~2 hours this morning, Grok 3 + Thinking feels somewhere around the state of the art territory of OpenAI's strongest models (o1-pro, $200/month), and slightly better than DeepSeek-R1 and Gemini 2.0 Flash Thinking. Which is quite incredible considering that the team started from scratch ~1 year ago, this timescale to state of the art territory is unprecedented. Do also keep in mind the caveats - the models are stochastic and may give slightly different answers each time, and it is very early, so we'll have to wait for a lot more evaluations over a period of the next few days/weeks. The early LM arena results look quite encouraging indeed. For now, big congrats to the xAI team, they clearly have huge velocity and momentum and I am excited to add Grok 3 to my "LLM council" and hear what it thinks going forward.”
I liked Grok 3 fiction writing style; catches lots of physics of mundane situations such as ringing echo in a closed bathroom we all know well; the prose feels very lively as the result. Kinda like R1 makes situations sharp with details, Grok 3 makes the other way around - rounded by using details.
Well because you explicitly ask it to demonstrate the physics, it came out way too detailed, but point is that it adds details on its own to scenes, make more realistic, not that dry LLama 3.3 style.
Here is the sentence : (She screamed, which echoed off the tile walls. “This is my life now,” she said to her reflection, which looked back at her with a mix of disgust and pity.) Looks good to me. try it on Lmarena.ai.
Can't stand Elon but happy to see this. We badly need a frontier model that is not so obsessed with "safety". That nonsense has held things back significantly, and leads to really stupid fake constraints.
We know RLHF and alignment degrades model quality. could it be that Grok, due to its less restrictive training guidelines (and the fact that its creators aren't afraid of getting sued), can achieve higher performance partly due to this simple factor?
You're both right because RLHF and fine-tuning are just techniques.
It's dependent on the training data and not as much the method.
So, if you make the RLHF/finetune data such that it avoids certain topics, then you reduce model quality in practice since your training data might accidentally cast a net wide enough that you make it avoid certain legitimate questions.
On benchmarks these things don't typically show up though.
But yes. Those techniques are required for making it chat. Otherwise it just autocompletes from the internet.
It is also used in a couple of other places (reasoning/search(hallucination mitigation))
It blows my mind that Musk hasn't integrated Grok as an app inside their vehicles. A literal AI copilot is a completely novel and killer app that cannot be pulled off by any other vehicle manufacturer.
Getting them to actually do something useful other than generating text is still a work in progress. What do you envision them actually doing in this integration?
Because all you need to do is Bluetooth your iPhone to your 1995 Ford Ranger, and install Gemini to have a voice conversation with your cars speakers. But then your original comment doesn't make any sense about it being only possible with a Tesla.
Why do I need a big screen? The models mostly all have voice interfaces now. I shouldn't be sitting there reading and typing text input or output while driving anyways... What are you actually imagining doing with these models in the car? I still haven't heard what use they are.
If it's not doing something actually related to/integrated with the car why does it need to be an app there instead of just living on your phone like all our existing digital assistants?
I've been saying hey google drive home (for GPS directions), and play music, to the phone mounted on my dash for a decade. I drive a rusty old stick shift and alligator clamped a $10 bluetooth to the speaker. So I'm not sure what you're envisioning that can't also work on any other car. There is also https://comma.ai that adds self-driving to hundreds of newer cars via just a ODB2-like dongle and equivalent of a smartphone.
Interesting thing about this is that because of all the Musk-related overhyping that's gone on and because the launch is a video, the thread that marks the entry of another company into the select group of serious AI companies will go off the front page with possibly only 200 points!
It’s discouraging that instead of talking about the achievements of Grok3 the same usual suspects have decided to bombard the comments with their far left political views and anger management issues they have with Elon.
The pull quote is: The impression overall I got here is that this is somewhere around (OpenAI) o1-pro capability