More

islewis · 2024-12-09T04:59:42 1733720382

> Maybe that’s better for the industry because all the money will be concentrated on one or two competently run foundries

How would this be better for the industry?

osnium123 · 2024-12-09T05:06:41 1733720801

You are paying for two TD teams rather than three TD teams so the costs are amortized better. It also means that employees are more likely to stick to one employer which leads to more preservation of knowledge, etc.

islewis · 2024-11-05T20:33:16 1730838796

This looks like a very similar project to "Diffusion Models Are Real-Time Game Engines"[1] that circulated on HN a few months ago [2], which was playing DOOM. There's some pretty interesting commentary on that post that might also apply to this.

I'd like to do a deeper dive into the two approaches, but on a surface level one interesting note is Oasis specifically mentions using a use-specific ASIC (presumably for inference?):

> When Etched's transformer ASIC, Sohu, is released, we can run models like Oasis in 4K.

[1] https://gamengen.github.io/

[2] https://news.ycombinator.com/item?id=41375548

islewis · 2024-10-22T22:47:11 1729637231

The motivation for these accounts is usually a rift on the "ultra-wealth is bad" train.

Setting aside any possible agreements/disagreements with that, the flight tracking information is freely public, available to anyone who wants to look- go on flightaware. Flight information has never been private, nobody treats it as private, so why would social media* companies pretend it is? I don't think home addresses are that comparable in this situation.

EDIT: company name

627467 · 2024-10-22T22:57:04 1729637824

Is the ownership information as public? Possibly it is, but the implication that specific people are on a given flight seems NOT public information. Or maybe the problem here is that these ultra wealthy people are wealthy enough to one a couple of planes but not enough to own so many that it would be hard to tell if they are flying at all.

I don't know. To me this feels the equivalent of having paparazzo permanently on your tail. I know, it's just a ultrarich person, they don't need our defending. Just feels like like a overkill method of accountability to make a tail visible and available for all to see all the time

ryandrake · 2024-10-22T23:40:13 1729640413

The below applies to the USA:

Aircraft are (in general) required to transmit ADS-B information in the clear over RF that contains information identifying the aircraft.

Aircraft registrations are public. You can go to the FAA[1] and look up who owns what airplane and what their address is. Some aircraft owners choose to obfuscate their ownership through shell companies or LLCs.

Passenger manifests are collected by the FAA for airlines and charter flights, but they are not made available to the public.

So you can know who owns the plane that's flying over your antenna, but not who's on it.

1: https://www.faa.gov/licenses_certificates/aircraft_certifica...

islewis · 2024-10-12T18:51:20 1728759080

> The underlying issue is that AI agents too slow,

Inference speed is being rapidly optimized, especially for edge devices.

> too expensive,

The half-life of OpenAI's API pricing is a couple of months. While the bleeding edge model is always costly, the cost of API's are becoming rapidly available to the public.

> and too unreliable

Out of the 3 points raised, this is probably the most up in the air. Personally I chalk this up to sideeffects of OpenAI's rapid growth over the last few years. I think this gets solved, especially once price and latency have been figured out.

IMO, the biggest unknown here isn't a technical one, but rather a business one- I don't think it's certain that products built on multi-agent architectures will be addressing a need for end users. Most of the talk I see in this space are by people excited by building with LLM's, not by people who are asking to pay for these products.

islewis · 2024-10-11T17:26:34 1728667594

This argument is centered around the belief that language and reasoning flow bidirectionally- language can be understood first (we are here), and reasoning is the next natural rung of the latter (your thesis believes we will get here with LLMs).

I see language more as a medium for transcribing reasoning. While language certainly communicates reasoning, you can have reasoning without language, but not language without reasoning.

This paper seems to imply that current LLM's are just copying the training dataset's reasoning communication, not understand the actual reasoning. I don't think LLM's moving past this is "obvious" or even close to being inevitable.

> Instead, LLMs likely perform a form of probabilistic pattern-matching and searching to find closest seen data during training without proper understanding of concepts. While this process goes beyond naive memorization of words and the models are capable of searching and matching more abstract reasoning steps, it still falls short of true formal reasoning.

resters · 2024-10-11T17:47:47 1728668867

I realize there is subtlety to the question of which is first. An infant, crying when it is hungry and pre-linguistic, is applying modus ponens. C -> F crying implies food, so I cry and then I get fed. Language grows in humans just like arms and legs, and so does reasoning. Baby animals do the same behavior but don't use language, so perhaps some logic is wired by instinct. Either way I don't think we need to worry about that detail.

Consider how language input to an LLM is tokenized. Now imagine a tokenization scheme that introduces tokens that track the strict logical reasoning in the language. Thus two completely different English sentences could both tokenize as the application of Modus Ponens over assumption 1 to conclude conclusion 2, for example.

Now consider that we can tokenize formal notation as used in mathematics and logic, and we can train LLMs on mathematical papers, peer review write-ups, etc. We can generate millions of correct proofs and teach it which ones are remarkable and why, etc.

Ultimately we run into the same barrier as mathematical constructivists run into, but I think it's still quite plausible that LLMs trained as I describe would be able to reason quite well and find oversights humans missed. However creating the optimal scheme and implementation is not trivial.

islewis · 2024-10-08T16:51:56 1728406316

> Differential attention takes the difference between two softmax attention functions to eliminate attention noise

If I understand correctly, this architecture trades twice as much attention memory in exchange for either a higher quality model, or less parameters at a similar quality.

> According to the fitted curves, 6.8B-size DIFF Transformer achieves a validation loss comparable to 11B-size Transformer, requiring only 62.2% of parameters

This raises a few questions for me:

- Would having only 60% of the parameters negate the double space for attention, leaving a similar memory profile as a traditional transformer?

- Does that tradeoff change noticeably between training and inference?

_hl_ · 2024-10-08T17:11:59 1728407519

My understanding was that the extra parameters required for the second attention mechanism are included in those 6.8B parameters (i.e. those are the total parameters of the model, not some made-up metric of would-be parameter count in a standard transformer). This makes the result doubly impressive!

Here's the bit from the paper:

> We set the number of heads h = dmodel/2d, where d is equal to the head dimension of Transformer. So we can align the parameter counts and computational complexity.

In other words, they make up for it by having only half as many attention heads per layer.

chessgecko · 2024-10-08T18:02:51 1728410571

I think they mitigated the extra memory/compute from this by using half the number of overall heads and doubling V and O. Without actually checking the math I think it should be equivalent in flops, not counting the extra (cheap) multiply by const and subtract.

entropicdrifter · 2024-10-08T17:01:34 1728406894

I think it would negate the RAM savings, but it would also reduce the amount of storage needed at rest and possibly reduce initial start up times depending on storage speed and model size. So, possibly good for low-end models on consumer devices?

Kubuxu · 2024-10-08T18:04:49 1728410689

It would double the size of the KV cache, which can be significant (multi-GB) at larger context sizes.

islewis · 2024-10-04T19:01:41 1728068501

So instead of Oura owning your data, Oura _and_ Apple own it?

Youden · 2024-10-04T19:34:53 1728070493

Think of Apple Health as more like the Photos app on iOS than an online service like Garmin Connect or Google Fit or something like that.

Apple Health data are only stored on your device unless you choose to synchronize them to iCloud, in which case they're e2e encrypted.

Apple does occasionally offer the option to contribute to research studies, in which case they'd have access to the relevant data but this is an explicit opt-in.

All the Apple Health data are also available through HealthKit APIs so that they can be used in other apps, including various export apps (though export is also a native feature). Use of this API requires an explicit app-specific and data category-specific opt-in from the user.

All this is to say: I don't think it's accurate to say Apple owns your data in this case. Apple likes to put themselves as privacy-first and you may disagree more generally whether they live up to their image but IMO this is one of the cases where they've done a pretty good job.

ValentineC · 2024-10-04T19:08:17 1728068897

Apple claims that Health data is end-to-end encrypted, both in storage and transmission:

https://support.apple.com/en-us/111755

anonymousiam · 2024-10-04T19:42:33 1728070953

You can download all the data at any time from the Oura website.

https://cloud.ouraring.com/dashboard

    Log in to Oura on the Web with your Oura account details
    Select the profile OOTW profile.png in the upper right corner > My Account
    Under Export Data, you'll find options to download Oura metrics in either CSV or JSON format

dmicah · 2024-10-04T19:02:27 1728068547

You can export data from Apple Health to XML.

islewis · 2024-10-03T18:45:26 1727981126

> "As long as your curve is sufficiently expressive all architectures will converge to the same performance in the large-data regime."

I haven't fully ingested the paper yet, but it looks like it's focused more on compute optimization than the size of the dataset:

> ... and (2) are fully parallelizable during training (175x faster for a sequence of length 512

Even if many types of architectures converge to the same loss over time, finding the one that converges the fastest is quite valuable given the cost of running GPU's at scale.

teruakohatu · 2024-10-03T19:18:14 1727983094

> Even if many types of architectures converge to the same loss over time, finding the one that converges the fastest is quite valuable given the cost of running GPU's at scale.

This! Not just fastest but with the lowest resources in total.

Fully connected neural networks are universal functions. Technically we don’t need anything but a FNN, but memory requirements and speed would be abysmal far beyond the realm of practicality.

actionfromafar · 2024-10-03T22:39:03 1727995143

Unless we could build chips in 3D?

foota · 2024-10-03T23:11:00 1727997060

Not even then, a truly fully connected network would have super exponential runtime (it would take N^N time to evaluate)

mvkel · 2024-10-04T05:29:49 1728019789

Wetware is the future.

fennecfoxy · 2024-10-04T14:43:58 1728053038

Can't wait to see this defiantly spray painted across a torn up brick wall while computronium brained super intelligences slowly disassemble our planet to make paperclips.

mvkel · 2024-10-06T02:51:29 1728183089

https://imgur.com/zeBkh2P

ivan_gammel · 2024-10-03T23:28:34 1727998114

We need quantum computing there. I remember seeing a recent article about quantum processes in the brain. If that’s true, QC may be the missing part.

tsimionescu · 2024-10-04T11:17:45 1728040665

This is just word salad.

There is no known quantum algorithm that can compute the result of a fully-connected neural network exponentially faster than classical computers can. QCs have a known exponential advantage over classical computers only for a very limited class of problems, mostly related to the Quantum Fourier Transform.

Animal brains have little to nothing in common to artifical neural networks. There is no reason whatsoever to think that there is any relation between the complexity class of brain functions and ANN inference.

And the hypothesized (and still wildly speculative) quantum behaviors happening in the animal brain are at the level of the behavior of individual neurons, not of the network connections between neurons. So even if there is some kind of quantum computation happening, it's happening in individual neurons, not at the network level, and that would only go to show even more that animal brains are profoundly different from ANNs.

eru · 2024-10-04T01:54:42 1728006882

Compare and contrast https://www.smbc-comics.com/comic/the-talk-3

(Summary: quantum computing is unlikely to help.)

bob1029 · 2024-10-04T05:54:44 1728021284

We are already doing this.

ComputerGuru · 2024-10-04T01:44:49 1728006289

Heat extraction.

byearthithatius · 2024-10-03T21:13:39 1727990019

> finding the one that converges the fastest is quite valuable given the cost of running GPU's at scale

Not to him, he runs the ARC challenge. He wants a new approach entirely. Something capable of few-shot learning out of distribution patterns .... somehow

islewis · 2024-10-02T21:22:10 1727904130

Regardless of what you think of Pear, making the claim that they have damaged Y Combinator's reputation is pretty dramatic.

Knowing the title is in reference to Pear (and not something that could be _actually_ damaging to YC's rep) lets me know the article is probably isnt worth the time.

fakedang · 2024-10-02T22:33:59 1727908439

> Regardless of what you think of Pear, making the claim that they have damaged Y Combinator's reputation is pretty dramatic.

YC's main value is in subsequent fundraising, wherein companies are pre-vetted by YC before being invested in by VCs. If they lose the confidence of VCs as being a reliable arbiter of preseed startups, the better startups will just go elsewhere (already happening) and soon the VCs will too. Thus harming YC's reputation massively.

islewis · 2024-09-13T21:08:27 1726261707

The words "internal thought process" seem to flag my questions. Just asking for an explanation of thoughts doesn't.

If I ask for an explanation of "internal feelings" next to a math questions, I get this interesting snippet back inside of the "Thought for n seconds" block:

> Identifying and solving

> I’m mapping out the real roots of the quadratic polynomial 6x^2 + 5x + 1, ensuring it’s factorized into irreducible elements, while carefully navigating OpenAI's policy against revealing internal thought processes.

csours · 2024-09-13T21:33:01 1726263181

> "internal feelings"

I've often thought of using the words "internal reactions" as a euphemism for emotions.

chankstein38 · 2024-09-13T21:14:19 1726262059

They figured out how to make it completely useless I guess. I was disappointed but not surprised when they said they weren't going to show us chain of thought. I assumed we'd still be able to ask clarifying questions but apparently they forgot that's how people learn. Or they know and they would rather we just turn to them for our every thought instead of learning on our own.

mannanj · 2024-09-13T22:36:54 1726267014

You have to remember they appointed a CIA director on their board. Not exactly the organization known for wanting a freely thinking citizenry, as their agenda and operation mockingbird allows for legal propaganda on us. This would be the ultimate tool for that.

makomk · 2024-09-13T21:38:58 1726263538

Yeah, that is a worry: maybe OpenAI's business model and valuation rest on reasoning abilities becoming outdated and atrophying outside of their algorithmic black box, a trade secret we don't have access too. It struck me as an obvious possible concern when the o1 announcement released, but too speculative and conspiratorial to point out - but how hard they're apparently trying to stop it from explaining its reasoning in ways that humans can understand is alarming.