You are paying for two TD teams rather than three TD teams so the costs are amortized better. It also means that employees are more likely to stick to one employer which leads to more preservation of knowledge, etc.
This looks like a very similar project to "Diffusion Models Are Real-Time Game Engines"[1] that circulated on HN a few months ago [2], which was playing DOOM. There's some pretty interesting commentary on that post that might also apply to this.
I'd like to do a deeper dive into the two approaches, but on a surface level one interesting note is Oasis specifically mentions using a use-specific ASIC (presumably for inference?):
> When Etched's transformer ASIC, Sohu, is released, we can run models like Oasis in 4K.
The motivation for these accounts is usually a rift on the "ultra-wealth is bad" train.
Setting aside any possible agreements/disagreements with that, the flight tracking information is freely public, available to anyone who wants to look- go on flightaware. Flight information has never been private, nobody treats it as private, so why would social media* companies pretend it is? I don't think home addresses are that comparable in this situation.
Is the ownership information as public? Possibly it is, but the implication that specific people are on a given flight seems NOT public information. Or maybe the problem here is that these ultra wealthy people are wealthy enough to one a couple of planes but not enough to own so many that it would be hard to tell if they are flying at all.
I don't know. To me this feels the equivalent of having paparazzo permanently on your tail. I know, it's just a ultrarich person, they don't need our defending. Just feels like like a overkill method of accountability to make a tail visible and available for all to see all the time
Aircraft are (in general) required to transmit ADS-B information in the clear over RF that contains information identifying the aircraft.
Aircraft registrations are public. You can go to the FAA[1] and look up who owns what airplane and what their address is. Some aircraft owners choose to obfuscate their ownership through shell companies or LLCs.
Passenger manifests are collected by the FAA for airlines and charter flights, but they are not made available to the public.
So you can know who owns the plane that's flying over your antenna, but not who's on it.
> The underlying issue is that AI agents too slow,
Inference speed is being rapidly optimized, especially for edge devices.
> too expensive,
The half-life of OpenAI's API pricing is a couple of months. While the bleeding edge model is always costly, the cost of API's are becoming rapidly available to the public.
> and too unreliable
Out of the 3 points raised, this is probably the most up in the air. Personally I chalk this up to sideeffects of OpenAI's rapid growth over the last few years. I think this gets solved, especially once price and latency have been figured out.
IMO, the biggest unknown here isn't a technical one, but rather a business one- I don't think it's certain that products built on multi-agent architectures will be addressing a need for end users. Most of the talk I see in this space are by people excited by building with LLM's, not by people who are asking to pay for these products.
This argument is centered around the belief that language and reasoning flow bidirectionally- language can be understood first (we are here), and reasoning is the next natural rung of the latter (your thesis believes we will get here with LLMs).
I see language more as a medium for transcribing reasoning. While language certainly communicates reasoning, you can have reasoning without language, but not language without reasoning.
This paper seems to imply that current LLM's are just copying the training dataset's reasoning communication, not understand the actual reasoning. I don't think LLM's moving past this is "obvious" or even close to being inevitable.
> Instead, LLMs likely perform a form of probabilistic pattern-matching and searching to find closest seen data during training without proper understanding of concepts. While this process goes beyond naive memorization of words and the models are capable of searching and matching more abstract reasoning steps, it still falls short of true formal reasoning.
I realize there is subtlety to the question of which is first. An infant, crying when it is hungry and pre-linguistic, is applying modus ponens. C -> F crying implies food, so I cry and then I get fed. Language grows in humans just like arms and legs, and so does reasoning. Baby animals do the same behavior but don't use language, so perhaps some logic is wired by instinct. Either way I don't think we need to worry about that detail.
Consider how language input to an LLM is tokenized. Now imagine a tokenization scheme that introduces tokens that track the strict logical reasoning in the language. Thus two completely different English sentences could both tokenize as the application of Modus Ponens over assumption 1 to conclude conclusion 2, for example.
Now consider that we can tokenize formal notation as used in mathematics and logic, and we can train LLMs on mathematical papers, peer review write-ups, etc. We can generate millions of correct proofs and teach it which ones are remarkable and why, etc.
Ultimately we run into the same barrier as mathematical constructivists run into, but I think it's still quite plausible that LLMs trained as I describe would be able to reason quite well and find oversights humans missed. However creating the optimal scheme and implementation is not trivial.
> Differential attention takes the difference between two softmax attention functions to eliminate attention noise
If I understand correctly, this architecture trades twice as much attention memory in exchange for either a higher quality model, or less parameters at a similar quality.
> According to the fitted curves, 6.8B-size
DIFF Transformer achieves a validation loss comparable to 11B-size Transformer, requiring only 62.2% of parameters
This raises a few questions for me:
- Would having only 60% of the parameters negate the double space for attention, leaving a similar memory profile as a traditional transformer?
- Does that tradeoff change noticeably between training and inference?
My understanding was that the extra parameters required for the second attention mechanism are included in those 6.8B parameters (i.e. those are the total parameters of the model, not some made-up metric of would-be parameter count in a standard transformer). This makes the result doubly impressive!
Here's the bit from the paper:
> We set the number of heads h = dmodel/2d, where d is equal to the head dimension
of Transformer. So we can align the parameter counts and computational complexity.
In other words, they make up for it by having only half as many attention heads per layer.
I think they mitigated the extra memory/compute from this by using half the number of overall heads and doubling V and O. Without actually checking the math I think it should be equivalent in flops, not counting the extra (cheap) multiply by const and subtract.
I think it would negate the RAM savings, but it would also reduce the amount of storage needed at rest and possibly reduce initial start up times depending on storage speed and model size. So, possibly good for low-end models on consumer devices?
Think of Apple Health as more like the Photos app on iOS than an online service like Garmin Connect or Google Fit or something like that.
Apple Health data are only stored on your device unless you choose to synchronize them to iCloud, in which case they're e2e encrypted.
Apple does occasionally offer the option to contribute to research studies, in which case they'd have access to the relevant data but this is an explicit opt-in.
All the Apple Health data are also available through HealthKit APIs so that they can be used in other apps, including various export apps (though export is also a native feature). Use of this API requires an explicit app-specific and data category-specific opt-in from the user.
All this is to say: I don't think it's accurate to say Apple owns your data in this case. Apple likes to put themselves as privacy-first and you may disagree more generally whether they live up to their image but IMO this is one of the cases where they've done a pretty good job.
Log in to Oura on the Web with your Oura account details
Select the profile OOTW profile.png in the upper right corner > My Account
Under Export Data, you'll find options to download Oura metrics in either CSV or JSON format
> "As long as your curve is sufficiently expressive all architectures will converge to the same performance in the large-data regime."
I haven't fully ingested the paper yet, but it looks like it's focused more on compute optimization than the size of the dataset:
> ... and (2) are fully parallelizable during training (175x faster for a sequence of length 512
Even if many types of architectures converge to the same loss over time, finding the one that converges the fastest is quite valuable given the cost of running GPU's at scale.
> Even if many types of architectures converge to the same loss over time, finding the one that converges the fastest is quite valuable given the cost of running GPU's at scale.
This! Not just fastest but with the lowest resources in total.
Fully connected neural networks are universal functions. Technically we don’t need anything but a FNN, but memory requirements and speed would be abysmal far beyond the realm of practicality.
Can't wait to see this defiantly spray painted across a torn up brick wall while computronium brained super intelligences slowly disassemble our planet to make paperclips.
There is no known quantum algorithm that can compute the result of a fully-connected neural network exponentially faster than classical computers can. QCs have a known exponential advantage over classical computers only for a very limited class of problems, mostly related to the Quantum Fourier Transform.
Animal brains have little to nothing in common to artifical neural networks. There is no reason whatsoever to think that there is any relation between the complexity class of brain functions and ANN inference.
And the hypothesized (and still wildly speculative) quantum behaviors happening in the animal brain are at the level of the behavior of individual neurons, not of the network connections between neurons. So even if there is some kind of quantum computation happening, it's happening in individual neurons, not at the network level, and that would only go to show even more that animal brains are profoundly different from ANNs.
> finding the one that converges the fastest is quite valuable given the cost of running GPU's at scale
Not to him, he runs the ARC challenge. He wants a new approach entirely. Something capable of few-shot learning out of distribution patterns .... somehow
Regardless of what you think of Pear, making the claim that they have damaged Y Combinator's reputation is pretty dramatic.
Knowing the title is in reference to Pear (and not something that could be _actually_ damaging to YC's rep) lets me know the article is probably isnt worth the time.
> Regardless of what you think of Pear, making the claim that they have damaged Y Combinator's reputation is pretty dramatic.
YC's main value is in subsequent fundraising, wherein companies are pre-vetted by YC before being invested in by VCs. If they lose the confidence of VCs as being a reliable arbiter of preseed startups, the better startups will just go elsewhere (already happening) and soon the VCs will too. Thus harming YC's reputation massively.
The words "internal thought process" seem to flag my questions. Just asking for an explanation of thoughts doesn't.
If I ask for an explanation of "internal feelings" next to a math questions, I get this interesting snippet back inside of the "Thought for n seconds" block:
> Identifying and solving
> I’m mapping out the real roots of the quadratic polynomial 6x^2 + 5x + 1, ensuring it’s factorized into irreducible elements, while carefully navigating OpenAI's policy against revealing internal thought processes.
They figured out how to make it completely useless I guess. I was disappointed but not surprised when they said they weren't going to show us chain of thought. I assumed we'd still be able to ask clarifying questions but apparently they forgot that's how people learn. Or they know and they would rather we just turn to them for our every thought instead of learning on our own.
You have to remember they appointed a CIA director on their board. Not exactly the organization known for wanting a freely thinking citizenry, as their agenda and operation mockingbird allows for legal propaganda on us. This would be the ultimate tool for that.
Yeah, that is a worry: maybe OpenAI's business model and valuation rest on reasoning abilities becoming outdated and atrophying outside of their algorithmic black box, a trade secret we don't have access too. It struck me as an obvious possible concern when the o1 announcement released, but too speculative and conspiratorial to point out - but how hard they're apparently trying to stop it from explaining its reasoning in ways that humans can understand is alarming.
How would this be better for the industry?