Hacker News new | past | comments | ask | show | jobs | submit | c1b's comments login

daniel youre a legend, thanks for all you do!

one question, I see perf comparisons here are done on an L4, but isn't this SKU very rare? Im used to T4 at that tier


Thanks!! Oh Colab provides L4s - but the benchmarks are similar for T4!

In fact Unsloth is the only framework afaik that fits in a t4 for finentuning with reasonable sequence lengths!


So o1 pro is CoT RL and o3 adds search?


How does o3 know when to stop reasoning?


It thinks hard about it


It has a bill counter.


does this also already exist on llvm toolchain?


Not really. LLVM does support compute mode SPIR-V, and _very_ recently graphics-mode SPIR-V which is being dogfooded by the highly WIP HLSL front-end in Clang (I couldn't get a trivial fragment shader to build with it a few weeks ago). Clang does not support compiling graphics-mode SPIR-V from C or C++.

This seems similar to the Vulkan Clang Compiler (which is not in-tree of LLVM), although it has some interesting differences such as implementing shader functions with new specifiers like `__hcc_vertex` rather than repurposing attributes for it like `[[clang::annotate("shady::entry_point::vertex")]]`.


The Microsoft HLSL compiler for dx12+ is based on LLVM and supports spir-v output https://github.com/microsoft/DirectXShaderCompiler

Though my understanding is that it's a complete hackjob stuck on an old version of LLVM, and not in a remotely upstream-able state and breaks other parts of the LLVM toolchain completely. I think there was talk about trying to clean that up at the same time as bringing the baseline version forward, but I have no idea about it's progress.


The progress is that DirectX will also support SPIR-V, going forward.

https://devblogs.microsoft.com/directx/directx-adopting-spir...


There seems to be documentation about it:

https://llvm.org/docs/SPIRVUsage.html


Hi Francois, I'm a huge fan of your work!

In projecting ARC challenge progress with a naive regression from the latest cycle of improvement (from 34% to 54%), it seems that a plausible estimate as to when the 85% target will be reached is sometime between late 2025 & mid 2026.

Supposing ARC challenge target is reached in the coming years, does this update your model of 'AI risk'? // Would this cause you to consider your article on 'The implausibility of intelligence explosion' to be outdated?


This roughly aligns with my timeline. ARC will be solved within a couple of years.

There is a distinction between solving ARC, creating AGI, and creating an AI that would represent an existential risk. ARC is a stepping stone towards AGI, so the first model that solves ARC should have taught us something fundamental about how to create truly general intelligence that can adapt to never-seen-before problem, but it will likely not itself be AGI (due to be specialized in the ARC format, for instance). Its architecture could likely be adapted into a genuine AGI, after a few iterations -- a system capable of solving novel scientific problems in any domain.

Even this would not clearly lead to "intelligence explosion". The points in my old article on intelligence explosion are still valid -- while AGI will lead to some level of recursive self-improvement (as do many other systems!) the available evidence just does not point to this loop triggering an exponential explosion (due to diminishing returns and the fact that "how intelligent one can be" has inherent limitations brought about by things outside of the AI agent itself). And intelligence on its own, without executive autonomy or embodiment, is just a tool in human hands, not a standalone threat. It can certainly present risks, like any other powerful technology, but it isn't a "new species" out to get us.


ARC as a stepping-stone for AGI? For me, ARC has lost all credibility. Your white paper that introduced it claimed that core knowledge priors are needed to solve it, yet all the systems that have any non-zero performance on ARC so far have made no attempt to learn or implement core knowledge priors. You have claimed at different times and in different forms that ARC is protected against memorisation-based Big Data approaches, but the systems that currently perform best on ARC do it by generating thousands of new training examples for some LLM, the quintessential memorisation-based Big Data approach.

I too, believe that ARC will soon be solved: in the same way that the Winograd Schema Challenge was solved. Someone will finally decide to generate a large enough dataset to fine-tune a big, deep, bad LLM and go to town, and I do mean on the private test set. If ARC was really, really a test of intelligence and therefore protected against Big Data approaches, then it wouldn't need to have a super secret hidden test set. Bongard Problems don't and they still stand undefeated (although the ANN community has sidestepped them in a sense, by generating and solving similar, but not identical, sets of problems, then claiming triumph anyway).

ARC will be solved and we won't learn anything at all from it, except that we still don't know how to test for intelligence, let alone artificial intelligence.

The worst outcome of all this is the collateral damage to the reputation of symbolic program synthesis which you have often name-dropped when trying to steer the efforts of the community towards it (other times calling it "discrete program search" etc). Once some big, compensating, LLM solves ARC, any mention of program synthesis will elicit nothing but sneers. "Program synthesis? Isn't that what Chollet thought would solve ARC? Well, we don't need that, LLMs can solve ARC just fine". Talk about sucking out all the air from the room, indeed.


Wow, you're the most passionate hater of ARC that I've seen. Your negativity seems laughably overblown to me.

Are there benchmarks that you prefer?


This might be useful to you: if you want to have an interesting conversation, insulting your interlocutor is not the best way to go about it.


I don't think they are insulting anyone, I think they're just asking for numbers.


What numbers?


CSGO model is only 1.5 gb & training took 12 days on a 4090

https://github.com/eloialonso/diamond/tree/csgo?tab=readme-o...


Thanks, that's the detail I was looking for on the training. It's amazing results like this can be achieved at such a low costs! I thought this kind of work was out of reach for the GPU poor.

The part about the continuous control still seems weird to me though. If anyone understands that then very interested to hear more.


What is this for exactly? Go should be used serverside, and Qt looks like its for UIs on embedded devices?


Qt is a large framework and also includes e.g. a large network module which supported async event based communication long before there was Go. But it's unlikely that a Go application uses this, because Go has its own network library and even native language support for asynchronous communication (buffered channels). But Qt also has cross-platform user interface features even with 3D and OpenGl support, which might be useful for people using Go on the desktop.


Its simple -- you see a justine.lol url, you click immediately


> The difference is that the A6000 absolutely smokes the Mac.

Memory Bandwidth : Mac Studio wins (about the same @ ~800)

VRAM : Mac Studio wins (4x more)

TFLOPs: A6000 wins (32 vs 38)


VRAM in excess of the model one is using isn’t useful per se. My use cases require high throughput, and on many tasks the A6000 executes inference at 2x speed.


As someone familiar with the USG data management tech landscape — it’s probably because it’s by far the best product with no remotely close second.


> As someone familiar with the USG data management tech landscape — it’s probably because it’s by far the best product with no remotely close second.

That is sweetly naive, unless you are talking about their marketing department


Let me rephrase: I am extremely familiar with the USG data management tech landscape.


I'd love to know (in as much detail as you are allowed) what you feel the strengths and weaknesses of CHEETAS is.


I cannot comment specifically on CHEETAS, but what I can say is that USG developing in house software solutions almost always produces a disastrous product that goes over budget and has extreme maintenance overhead.

To see why, you can simply ask yourself: do you think that the unelected officials overseeing government agencies that embark on enterprise software development projects have sufficient expertise and enterprise software project management experience to be able to do this well?

Furthermore, do you think that the quality of engineers that the NHS or DoD can attract with less than half of the compensation of an actual software company stands a chance at developing something good in house?

It’s unfortunately almost impossible for these projects to go right.


CHEETAS isn't really developed in house though; it's mainly developed by Dell. Certainly the leadership is USG-associated, but I think the leadership is actually really good. Unfortunately I seem to be unable to get _real_ access to CHEETAS and finding anyone who has worked with it is a challenge.

I suspect underneath it's mostly Hadoop but it's impossible to separate the roadmap from the implementation without getting my hands on it.


Interesting, thank you for sharing!

That experience speaks more to the perils of in-housing, not to why Palantir is the best COTS for specific needs here. Are there specific leading COTS here you view it so far ahead of for such a contract?

Closer to our own practice.. Modern LLMs have basically reset the field for SOTA in this space, with Palantir, by definition, being behind OpenAI in the most basic tasks, and thus being in the same race as everyone else to retool. Speaking from our own USG experience, we are deep tech leads in some other intelligence areas (graph, ...), and before OpenAI, often chose to adopt prev-gen leading LLM models (BERT, ...) for tasks closer to the NLP side as we recognized that wasn't where our deep tech had an inhouse advantage. We basically had to start over on some of those projects there as soon as GPT4 came out because it just changed so much that the incumbent advantages of already being delivering on a contract were a dead end for core functionality, and almost a year later, it's now obvious that it was the right choice when we get compared to companies that haven't been. Palantir has been publicly resetting as well for using GenAI era tech, which suggests the same situation.


It seems like you don’t know what Palantir is. Nothing OpenAI does is competitive with what Palantir does. Palantir, like every other software company out there, is exploring what “my product + AI” means.


That's a fair surface-level view, but worth thinking through a bit.

Palantir is multiple main things, and a whole ton of custom software projects on top, and a good chunk of them rely on the quality of their NLP & vision systems for being competitive with others. My question relates to the notion that they are inherently the best when, by all public AI benchmarks, they don't make the best components and, in the context of air-gapped / self-hosted government work, don't even have access to them. Separately, I'm curious how they relate to their COTS competitors (vs gov inhouse) given the claims here. For example, their ability to essentially privatize and resell the government's data to itself and make that into a network effects near-monopoly is incredible, but doesn't mean the technology is the best.

I've seen cool things with them, and on the flip side, highly frustrated users who have thrown them out (or are being forbidden to.) It's been a fascinating company to track over the years. I'm asking for any concrete details or comparisons as, so far, there is zero in the claims above, which is more consistent with their successful gov marketing & lobbying efforts than technical competitiveness.


I mean the topic of this thread is data management. That’s their bread and butter.

It just doesn’t make sense to be having this conversation through the lens of AI.


AI leadership seems existential to being a top data management company and providing top data management capabilities:

* Databricks data management innovations, now that basics are in, are half on the AI side, like adding vector & LLM indexing for any data stored in it, moving their data catalog to be LLM-driven, adding genAI interfaces to accessing data stored in it, ...

* Data pipelines spanning ingestion, correction, wrangling, indexing, integration, and feature & model management, and especially of the tricky unstructured text, photo, and video nature, and wide nature of event/log/transaction recordings important to a lot of the government, are all moving or have already moved to AI. Whether it is monitoring video, investigating satellite photos, mining social media & news, entity resolution & linking on documents & logs, linking datasets, or OCR+translation of foreign documents, these are all about the intelligence tier. Tools like ontology management and knowledge graphs are especially being reset due to the ability of modern LLMs to drastically improve their quality and improve their scalability & usability through automation.

* Data protection has long been layering on AI methods for alerting (UEBA, ...), classification, policy synthesis, configuration management, ...

Databricks is a pretty good example of a company here. They don't preconfigure government datasets on the governments behalf and sell that back to them, but we do see architects using it as a way to build their own data platforms, and especially for AI-era workloads. Likewise, they have grown an ecosystem of data management providers on top vs single-sourcing, eg, it's been cool to see Altana bring supply chain data as basically a Databricks implementation. For core parts, Databricks keeps adding more of the data management stack to their system, such as examining how a high-grade entity resolution pipeline would break down between their stack and ecosystem providers.


What are the better products that are available?


Which product specifically? As I understand Palantir has several products and the NHS isn't buying one of them but paying for something bespoke.


https://www.palantir.com/uk/healthcare/

They are using palantir foundry, which is palantir's big data platform, or how they call it: "The Ontology-Powered Operating System for the Modern Enterprise"


What they call it doesn't sound like legendary marketing that I was expecting.


What would be the second in your opinion, even if it's not close?


It varies by agency — either something built in house (very bad) or built by a company that knows how to acquire government contracts, of which there are few - the set of which frankly always has worse tech than Palantir. If product efficacy is not absolutely critical, the acquisition process will be driven by nepotism or other forms of corruption.

As an example for the second case in DoD space, there’s Advana.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: