More

p1esk · 2025-01-07T20:19:26 1736281166

Source for your 10% number?

rafaelmn · 2025-01-08T01:31:38 1736299898

I think people are speculating based on graphs Nvidia has on their product page.

brokenmachine · 2025-01-07T22:43:57 1736289837

I heard them say that in the Hardware Unboxed youtube video yesterday.

I think it's this one https://youtu.be/olfgrLqtXEo

p1esk · 2025-01-08T00:03:12 1736294592

I don’t see any testing performed in that video. Did I miss it?

brokenmachine · 2025-01-08T22:00:03 1736373603

No testing, they estimated from the available information.

p1esk · 2025-01-05T16:51:58 1736095918

they are designed to elucidate the author's thought process - not the reader's learning process

No, it’s exactly the opposite: when I write papers I follow a rigid template of what a reader (reviewer) expects to see. Abstract, intro, prior/related work, main claim or result, experiments supporting the claim, conclusion, citations. There’s no room or expectation to explain any of the thought process that led to the claim or discovery.

Vast majority of papers follow this template.

p1esk · 2025-01-04T07:19:14 1735975154

I don’t see o1 in their model list. That’s the only model that produces decent answers to my questions.

p1esk · 2025-01-02T20:45:27 1735850727

I don’t want to follow anyone, but I do give stars to repos I like.

sedatk · 2025-01-02T20:54:32 1735851272

Then you'll have to start following the creators of repos you like to build a web of trust.

p1esk · 2025-01-02T18:07:13 1735841233

This answer is probably better than 99.99% of human answers. Feel the AGI!

p1esk · 2024-12-20T21:42:57 1734730977

Pretty sure a waymo car drives better than an average SF driver.

Mordisquitos · 2024-12-21T11:53:44 1734782024

And how well would a Waymo car do in this challenge with the ARC-AGI datasets?

manquer · 2024-12-21T09:31:57 1734773517

Waymo cannot handle poor weather at all, average human can.

Being able to perform better than humans in specific constrained problem space is how every automation system has been developed.

While self driving systems are impressive, they don’t drive with anywhere close to skills of the average driver

tim333 · 2024-12-21T13:46:18 1734788778

Waymo blog with video of them driving in poor weather https://waymo.com/blog/2019/08/waymo-and-weather

manquer · 2024-12-21T14:27:41 1734791261

And nikola famously made a video of a truck using one which had no engine, we don’t take a company word for anything until we can verify.

This is not offered to public, they are actively expanding in only cities like LA , Miami or Phoenix now where weather is good through the year.

The tech for bad weather is nowhere close to ready for public. Average human on other hand is driving in bad weather every day

tim333 · 2024-12-21T15:14:48 1734794088

"Extreme Weather" tech "will be available to riders in the near future" https://www.cnet.com/roadshow/news/waymos-latest-robotaxi-is...

daveguy · 2024-12-21T17:32:43 1734802363

I'm sure the source of that CNET article came with a forward looking statements disclaimer.

coldcode · 2024-12-21T13:03:58 1734786238

There's a reason why Waymo isn't offered in Buffalo.

fragmede · 2024-12-21T13:28:07 1734787687

Is that reason because Buffalo is the 81st most populated city in the United States, or 123rd by population density, and Waymo currently only serves approximately 3 cities in North America?

We already let computers control cars because they're better than humans at it when the weather is inclement. It's called ABS.

shadowerm · 2024-12-22T12:18:45 1734869925

I would guess you haven't spent much time driving in the winter in the Northeast.

There is an inherent danger to driving in snow and ice. It is a PR nightmare waiting to happen because there is no way around accidents if the cars are on the road all the time in rust belt snow.

fragmede · 2024-12-22T19:33:13 1734895993

I get the feeling that the years I spent in Boston with a car including during the winter and driving to Ithaca somehow aren't enough, but whether or not I have is irrelevant. Still, I'll repeat the advice I was given before you have to drive in snow, go practice driving in the snow (in eg a parking lot) before needing to do so, esp during a storm. Waymo's been spotted driving in Buffalo doing testing, so it seems someone gave them similar advice. https://www.wgrz.com/article/tech/waymo-self-driving-car-pro...

There's always an inherent risk to driving, even in sunny Phoenix, Az. Winter dangers like black ice further multiply that risk but humans still manage to drive in winter. Taking a picture/video of a snowed over road and judging the width and inventing lanes based on the width taking into account snowbanks doesn't take an ML algorithm. Lidar can see black ice while human eyes can not, giving cars equipped with lidar (wether driven by a human or a computer) an advantage over those without it, and Waymo cars currently have lidar.

I'm sure there are new challenges for Waymo to solve before deploying the service in Buffalo, but it's not this unforeseen gotcha parent comment implies.

As far as the possible PR nightmare, you'd never do self driving cars in the first place if you let that fear control you because, you you pointed out, driving on the roads is inherently dangerous with too many unforeseen complications.

p1esk · 2024-12-18T22:20:12 1734560412

Yes, we pay well. No, we don't pay as much as Meta.

How well do you pay? If I were at Meta, my total comp would be 500-600k. I make half that at a small startup. Can you afford me?

snakeyjake · 2024-12-18T22:45:52 1734561952

How many years of experience?

I fall almost exactly on the low end of this range: https://www.glassdoor.com/Salaries/senior-aerospace-engineer...

With 15 years of experience.

I'm at the low end of my peers.

Like I said, auctioning off access to your users to advertisers pays better than the European space agency.

pcthrowaway · 2024-12-19T00:38:58 1734568738

Holy shit, that auto-redirects to the salary range in Canada for me (66K-95K CAD)

Tried again with a U.S. VPN and I'm gob-smacked by how different that range is.

winter_blue · 2024-12-19T03:51:28 1734580288

Yup, I’m in Canada as well, and I was quite a bit thrown off by the redirected Canadian pay range.

For anyone curious, the Canadian pay range is:

Base pay $66K - $95K/yr

$79K/yr Average base pay

The low end of that base pay ($66k CAD) converts to $45k USD.

While the U.S. pay range is:

Total pay range: $266K - $463K/yr

$346K/yr Median total pay

Pay breakdown $137K - $223K/yr Base pay $129K - $240K/yr Additional pay

zipy124 · 2024-12-18T23:04:10 1734563050

Wait the low range of that link is £45k but you say work study placement people get $75k a year?

bitcurious · 2024-12-18T23:35:00 1734564900

There might be some regionality built in; viewed from the US the range is $265K - $463K/yr.

sgerenser · 2024-12-19T02:21:39 1734574899

I feel like that range is a on the high side, at least for jobs outside the Bay Area. Unless aerospace pays a lot better than software jobs at government contractors. When I was laid off less than 2 years ago and applying to a wide variety of places, the gov contractors were only offering in the $120-$140 ballpark for senior sw engineering positions. They tended to play up gold plated benefits, high 401k match, etc as well but there’s no way that would get it up to $263k.

titanomachy · 2024-12-19T00:41:29 1734568889

That is outrageous that two seemingly developed countries could have such a huge compensation gap. A senior aerospace engineer in the UK can make as little as 45k GBP, or 56k USD? One fifth as much as the lowest-paid American??

The take-home on that is £35k. The median rent in London is £26k. I suppose the person making £45k doesn't likely live in London, but still pretty grim.

https://www.economist.com/special-report/2024/10/14/the-amer...

beezlebroxxxxxx · 2024-12-19T04:47:19 1734583639

> One fifth as much as the lowest-paid American??

Americans are often blown away and kind of ignorant of how, relative to the rest of the world, they are really wealthy and well paid. Like, people have way less disposable incomes in other parts of the world, even developed countries. The purchase power of the USD and the power of the US economy is absolutely insane.

sofixa · 2024-12-19T12:23:11 1734610991

Yes, Aerospace Engineers don't live in London because there are very little (if any) aerospace jobs in London. Biggest aerospace employers in the UK are BAE Systems and Airbus, and both have factories in much cheaper locations (Wales, Northwest of England).

You're basically comparing "super specialised job in the middle of nowhere with very low cost of living" vs a "super specialised but much more needed job in multiple high cost of living locations" (Seattle metro area, LA metro area to name a few).

stonemetal12 · 2024-12-19T15:19:57 1734621597

Wichita Kansas is home to about 20% of aircraft-manufacturing workers in the United States. Not exactly known for HCOL.

loeg · 2024-12-19T05:27:11 1734586031

The UK is really, really poor compared to the US.

throwaway48476 · 2024-12-19T06:29:33 1734589773

The UK is poor because they decided to financialize the economy in the 90s and stop making things. It's like canada where the GDP per capita goes down every year. I'm amazed there hasn't been a revolution.

titanomachy · 2024-12-19T06:38:59 1734590339

> canada where the GDP per capita goes down every year

Couldn't this largely be explained by their importing huge amounts of low-income people?

throwaway48476 · 2024-12-19T07:11:27 1734592287

Of course. Canadians are still getting poorer every year and that's the salient point.

titanomachy · 2024-12-21T20:34:09 1734813249

Sure, but expanding the definition of “Canadian” to include people who were already poor is a bit different from people who were already Canadian becoming poorer.

n4r9 · 2024-12-19T11:37:09 1734608229

I don't think the average migrant salary is much different from the average UK citizen salary. Then again, I also don't find the "financialisation" argument very compelling. Plus, the GDP per capita visibly does not go down every year.

titanomachy · 2024-12-20T05:46:23 1734673583

> GDP per capita does not go down

It does in Canada, I think that’s what he meant

n4r9 · 2024-12-20T20:15:58 1734725758

It doesn't go down every year in Canada either.

foldr · 2024-12-19T14:25:46 1734618346

GDP per capita has not gone down every year in Canada or in the UK:

https://tradingeconomics.com/canada/gdp-per-capita#:~:text=T...

https://tradingeconomics.com/united-kingdom/gdp-per-capita

(Click the button to check the 25 year view.)

titanomachy · 2024-12-20T05:48:59 1734673739

According to Stats Canada, “Real GDP per capita has now declined in five of the past six quarters”, so fair to say it’s currently declining. This was news to me.

https://www150.statcan.gc.ca/n1/pub/36-28-0001/2024004/artic...

foldr · 2024-12-20T10:42:58 1734691378

OP’s post clearly suggested a long term trend of declining GDP per capita (“every year”), which is not the case.

foldr · 2024-12-19T11:26:45 1734607605

The UK is poorer than the US, but US software engineering salaries are an outlier. The difference is not as big as that comparison would suggest.

gspetr · 2024-12-19T14:16:34 1734617794

https://www.theguardian.com/commentisfree/2023/sep/03/britai...

"another respected data journalist, John Burn-Murdoch, calculated that without London, the UK would be poorer, in terms of GDP per capita, than even the poorest US state, Mississippi."

foldr · 2024-12-19T14:23:44 1734618224

Yes, it's true that the UK economy is very London-centric, but the original poster was talking about the UK as a whole vs the US as a whole. (The flip side of this is that the figures would look better if you compared London to a major US city.)

None of this changes the fact that US software engineering salaries are a poor comparison to use to illustrate wealth disparities between the US and other countries, as they are an outlier.

titanomachy · 2024-12-20T05:50:41 1734673841

The above comparison was actually aerospace engineering, not software.

foldr · 2024-12-20T10:45:09 1734691509

Regardless, Americans are not five times richer than Brits by any reasonable measure. The salaries in the comparison upthread are outliers. The exact figure obviously depends on which stat you look at, but Americans are around 50% richer by most measures.

dccoolgai · 2024-12-19T13:36:05 1734615365

The U.S. engineer can be fired on a whim immediately and lose their health care (COBRA) and the company that fires them can even contest their unemployment benefits (that the employee paid into) if they feel motivated enough. That's one of the reasons they get paid much more.

titanomachy · 2024-12-20T05:57:59 1734674279

I’ve been fired before by a major American tech company. I was underperforming, unmotivated, and depressed about it. They gave me a substantial severance payment in exchange for quitting voluntarily, and for signing an agreement that basically said I wouldn’t sue them. They let me pick my last date, they paid my health insurance through the next three months, and my manager told me I could use my last month of employment to find a new job. I was quickly hired into a better-paid position at another company, with a better manager, and I did well there.

I realize this story sounds absurd to anyone who hasn’t experienced it, but my understanding is that this form of firing (“managing out”) is basically the norm for low performers at top-tier tech companies.

To get actually fired, you usually have to fuck up big time, like sexually harassing a coworker, stealing trade secrets, or trying to start a union. (That last one is a joke, sort of)

pikclingoil · 2024-12-19T07:51:04 1734594664

>One fifth as much as the lowest-paid American??

Quite! A top 10% earner in Finland, a supposedly very developed country, by saving all of their net-income spending zero on food and letting their SO pay the bills, could in 2-3 years afford a new Skoda.

chipsrafferty · 2024-12-20T17:21:49 1734715309

lol, it's not specific to aerospace it's all software. Only the US overpays us

qmr · 2024-12-19T09:32:23 1734600743

Ah I think I see where the confusion lies.

The US is not a developed country, rather it is a particularly rich third world country.

paleotrope · 2024-12-19T01:37:34 1734572254

You guys get the NHS though.

winter_blue · 2024-12-19T03:55:17 1734580517

I don’t know if you’re making a joke or not, but getting NHS isn’t worth $200,000 USD per year.

Most Americans get employer-provided health insurance, which costs money (the amount specified in the DD section of the W2), and its often in the $1500/month range. That DD amount isn’t part of your income or the salary Glassdoor mentions. It’s an added benefit of top of that.

In the UK and elsewhere, around $500/month/person in taxes pays for your healthcare. That’s essentially subtracted from your income. So the uk income is even lower when you subtract the taxes the NHS costs.

sofixa · 2024-12-19T12:17:58 1734610678

> I don’t know if you’re making a joke or not, but getting NHS isn’t worth $200,000 USD per year.

Nope, but NHS + no/less student loans + no car dependency + cheaper childcare + time off + a ton of other things shave quite a bit off that $200k. Not equal, and not in every personal case, but a lot.

winter_blue · 2024-12-20T18:14:25 1734718465

Isn’t housing extremely unaffordable in the UK though? That erases a lot of these benefits, doesn’t it? (I’m aware this is true of a lot of HCOL areas in the U.S. as well.)

rangestransform · 2024-12-19T14:13:51 1734617631

Canadian and not UKian, but our public healthcare is definitely not worth 50% of my take home cash, I get much better access to care in the US right now. it still says Canada on my passport so I can get healthcare if I get fired or chronically ill

samatman · 2024-12-19T02:25:17 1734575117

Yes but those numbers are pre-tax income.

CalRobert · 2024-12-19T10:39:49 1734604789

They don’t call them europoors for nothing

panzagl · 2024-12-18T22:23:07 1734560587

No they cannot, unless maybe you have advanced degrees and a couple decades experience.

optimiz3 · 2024-12-18T22:42:10 1734561730

Yeah but then you're too old. Need to be in your 20s with a couple decades of experience.

CapeTheory · 2024-12-18T23:31:26 1734564686

Calling it now - a leetcode-based kindergarten programme is the next big a16z investment.

yazzku · 2024-12-19T02:53:19 1734576799

"Hit the ground running."

MichaelZuo · 2024-12-19T13:58:30 1734616710

You say this ironically, but someone who’s been working hard 15 hours 7 days a week in a niche, 50 weeks a year, from age 15 to age 29 has clearly a much higher potential than a 45 year old following the normal path in life.

And almost certainly a higher employable value too unless they have catastrophically bad social skills…

chipsrafferty · 2024-12-20T17:28:35 1734715715

Let's say they sleep 6 hours a day, every day.

Someone who works for 15 out of 18 of their waking hours, leaving 3 hours to eat, exercise, and have any semblance of social interactions or secondary interests, for FOURTEEN YEARS is not a genius. They are actually an idiot, wasting their life.

lovich · 2024-12-19T14:26:24 1734618384

Did they develop those social skills during the free time they had?

MichaelZuo · 2024-12-19T14:36:44 1734619004

Why does this matter?

They can develop it while lying in bed and daydreaming for all the difference it makes.

lovich · 2024-12-19T16:28:37 1734625717

The implication was that someone who dedicated all of their time as physically possible to working and studying, would not have had time to develop social skills

MichaelZuo · 2024-12-19T16:37:34 1734626254

Talented people can improve in multiple directions at the same time… unless you don’t believe this?

Plus any 29 year old who can actually land a genuine 500k-600k USD compensation job at a big company is a literal genius, at the very least.

lovich · 2024-12-19T19:20:47 1734636047

Do you honestly believe that an individual who

>… been working hard 15 hours 7 days a week in a niche, 50 weeks a year, from age 15 to age 29…

Developed the same level of social skills as the average individual who lived a more normal schedule?

I have to ask before you even answer that. Do you believe that social skills are something to be practiced and built upon, are they some waste of time they only hormones bother with, or some other option I haven’t considered?

MichaelZuo · 2024-12-20T00:11:10 1734653470

They can develop, in many aspects, far superior to an average individual given the same amount of time.

And develop to a comparable level given a much shorter period of time.

That’s pretty much by definition for literal geniuses.

lovich · 2024-12-20T09:07:48 1734685668

I think you might be delusional if you think that the people who can do all of this at the same time and don’t come out maladjusted to society is anything beyond a fraction of a fraction of a percent of outliers

MichaelZuo · 2024-12-20T16:27:18 1734712038

Are you confused about what geniuses are? Or did you not finish reading the comment?

Because this reply doesn’t make sense in relation to the previous comment.

They are very much outliers, so they are by definition a very small fraction of society.

lovich · 2024-12-20T17:13:08 1734714788

This is just magical thinking on your end. I’ve met some of these “literal geniuses” making 500k at faangs and most of them are completely socially maladapted once you’ve taken them out of the pipeline they’ve lived in since high school to getting their first job mid or late 20s after their masters or PhD.

Secondly you started off this chain with talking about how someone working hard for 15 hours a day for decades is going to be more valuable and they’ll just be able to pick up every skill a human could have or need because they’re “geniuses”.

If they’re really geniuses why do they need to grind?

If you’re implying that they are only part of the set of geniuses that grind that long and there is another set of geniuses that didn’t, then how does that track with geniuses being a very small fraction of society?

MichaelZuo · 2024-12-20T17:45:25 1734716725

I never said they are guaranteed to be this or that?

Clearly some fraction do have critically bad social skills which do materially affect their prospects to a significant degree.

But the majority of them do exceed that very low bar, so it’s simply not that critical of a hinderance most of the time.

You appear to be reading absolute implications into my comments, and/or inserting your own conjectures which aren’t there on a plain reading.

lovich · 2024-12-20T18:54:14 1734720854

> But the majority of them do exceed that very low bar, so it’s simply not that critical of a hinderance most of the time.

Describing not having catastrophically bad social skills as a “very low” bar is not a valid take when it comes to the world of computer science. I remember when visiting Carnegie Mellon as a senior in high school and evaluating their comp sci program, how the guides suddenly got very serious when they informed our parents(not the prospective students) that a course on hygiene was required freshman year and could not be waived. I’ve also worked with near limitless number of engineers who think they have the social skills down and then don’t understand why no one wants to work with them when they will do shit like call someone else’s project they’ve worked on for months pointless or useless in a group setting without even trying to approach said coworker with even a modicum of social awareness.

Those kinds of behaviors don’t show up in a population where having non catastrophically bad social skills is a “very low bar”

> You appear to be reading absolute implications into my comments, and/or inserting your own conjectures which aren’t there on a plain reading.

I think we’re coming at this with different axioms. You seem to believe that social skills are trivial and don’t matter next to the hard sciences that people grind away on. I am coming from one where I have to constantly make excuses or apologies for various people in software engineering or comp sci because they appear to be literally incapable of empathy or understanding that other people might have a different viewpoint than theirs.

Given my axiom I think your are handwaving away a lot, and that’s where you see my statements as inserted conjectures.

zifpanachr23 · 2024-12-18T23:43:05 1734565385

Realistically, there are plenty of competent people they could hire for any low six figures amount (unless they are directly in the DC area, in which case add 20-30k for cost of living). 500-600k or half of that is a unicorn salary that doesn't apply to those industries or areas and is irrelevant to the discussion. Even if they offered you that salary you wouldn't take it because the work environment would be radically different from working at a bloated web tech firm, or working at a silicon valley startup.

Totally different markets. You wouldn't be interested in that job, and they wouldn't want to hire you even if you were interested. Even the tone of your post makes that obvious.

We should be discussing early to mid career folks from somewhere other than silicon valley startup or big web tech land. Aka "meat and potatoes" tech jobs. That is what's being discussed.

I don't know what their problem is with hiring either and I agree with you that it could be partially compensation related. But not being able to compete with Silicon Valley on compensation is not where I would be going with that argument....I think it's more likely to be related to environment and interview style and notions of what "experience" means. In other words...bad hiring practices...not necessarily raw compensation issues. The compensation for non "big tech" firms can sometimes be quite good in comparison to other career paths especially when located outside of the valley, so being unable to hire talent makes me suspicious of hiring practices more than compensation (assuming they are reasonably large and hit market rate for the area and are in a reasonably large metro).

Just my two cents.

p1esk · 2024-12-14T23:27:14 1734218834

I am also disappointed, and I have not missed the context. The talk is empty for anyone who follows the field for more than two years, and especially for those who are familiar with his 2014 paper. Yes, he had amazing insight and intuition behind modern LLM breakthroughs, and yes, he probably earned the right to sound "prophetic", but he could have provided some interesting personal anecdotes about how the paper was written, or some fresh ideas in "What Comes Next" section of his presentation.

29athrowaway · 2024-12-15T00:31:40 1734222700

True. The entire thing was basically "neurons go brrrr".

p1esk · 2024-12-11T15:58:38 1733932718

Are these benchmarks still meaningful?

maeil · 2024-12-11T16:53:15 1733935995

No, and they haven't been for at least half a year. Utterly optimized for by the providers. Nowadays if a model would be SotA for general use but not #1 on any of these benchmarks, I doubt they'd even release it.

CamperBob2 · 2024-12-11T20:14:57 1733948097

I've started keeping an eye out for original brainteasers, just for that reason. GCHQ's Christmas puzzle just came out [1], and o1-pro got 6 out of 7 of them right. It took about 20 minutes in total.

I wasn't going to bother trying those because I was pretty sure it wouldn't get any of them, but decided to give it an easy one (#4) and was impressed at the CoT.

Meanwhile, Google's newest 2.0 Flash model went 0 for 7.

1: https://metro.co.uk/2024/12/11/gchq-christmas-puzzle-2024-re...

iamdelirium · 2024-12-11T21:42:08 1733953328

Why are you comparing flash vs o1-pro, wouldn't a more fair comparison be flash vs mini?

iamdelirium · 2024-12-11T21:49:54 1733953794

I just ask o1-mini the first two questions and it got it wrong.

CamperBob2 · 2024-12-11T23:57:26 1733961446

It's the only Google model that my account has access to that accepts .PNG files. I assumed it was the latest/greatest experimental 2.0 release.

If they want a rematch, they'll need to bring their 'A' game next time, because o1-pro is crazy good.

nrvn · 2024-12-11T21:19:45 1733951985

Did it get the 8 right? The linked article provides the wrong answer btw.

CamperBob2 · 2024-12-12T00:01:04 1733961664

I didn't see a straightforward way to submit the final problem, because I used different contexts for each of the 7 subproblems.

Given the right prompt, though, I'm sure it could handle the 'find the corresponding letter from the landmarks to form an anagram' part. That's easier than most of the other problems.

You're saying the ultimate answer isn't 'PROTECTING THE UNITED KINGDOM'?

nrvn · 2024-12-12T10:59:07 1734001147

if you follow the sleigh morse path starting from the robin it will be 'united in protecting the kingdom'.

p1esk · 2024-12-11T21:41:20 1733953280

Wow! That’s all I need to know about Google’s model.

Workaccount2 · 2024-12-11T22:43:55 1733957035

What is impressive about this new model is that it is the lightweight version (flash).

There will probably be a 2.0 pro (which will be 4o/sonnet class) and maybe an ultra (o1(?)/Opus).

danpalmer · 2024-12-11T22:09:39 1733954979

That's a comparison of multiple GPT-4 models working together... against a single GPT-4 mini style model.

p1esk · 2024-12-12T05:21:57 1733980917

multiple GPT-4 models working together

What do you mean? Is o1 not a single model?

p1esk · 2024-12-10T06:30:00 1733812200

Can you please give an example of a “completely illogical statement” produced by o1 model? I suspect it would be easier to get an average human to produce an illogical statement.

n2d4 · 2024-12-10T06:46:33 1733813193

Give it anything that sounds like a riddle, but isn't. Just one example:

> H: The surgeon, who is the boy's father, says "I can't operate on this boy, he's my son!" Who is the surgeon of the boy?

> O1: The surgeon is the boy’s mother.

Also, just because humans don't always think rationally doesn't mean ChatGPT does.

pezezin · 2024-12-10T06:55:12 1733813712

Haha, you are right, I just asked Copilot, and it replied this:

> This is a classic riddle! The surgeon is actually the boy's mother. The riddle plays on the assumption that a surgeon is typically male, but in this case, the surgeon is the boy's mother.

> Did you enjoy this riddle? Do you have any more you'd like to share or solve?

throw310822 · 2024-12-10T07:36:02 1733816162

Ha, good one! Claude gets it wrong too, except for apologizing and correcting itself when questioned:

"I was trying to find a clever twist that isn't actually there. The riddle appears to just be a straightforward statement - a father who is a surgeon saying he can't operate on his son"

More than being illogical, it seems that LLMs can be too hasty and too easily attracted by known patterns. People do the same.

varjag · 2024-12-10T08:03:37 1733817817

It's amazing how great these canned apologies work at anthropomorphising LLMs. It wasn't really in haste, it simply failed because the nuance fell below noise in its training set data but you rectified it with your follow-up correction.

throw310822 · 2024-12-10T08:24:50 1733819090

Well, first of all it failed twice: first it spat out the canned riddle answer, then once I asked it to "double check" it said "sorry, I was wrong: the surgeon IS the boy's father, so there must be a second surgeon..."

Then the follow up correction did have the effect of making it look harder at the question. It actually wrote:

"Let me look at EXACTLY what's given" (with the all caps).

It's not very different from a person that decides to focus harder on a problem once it was fooled by it a couple of times already because it is trickier than it seems. So yes, surprisingly human, with all its flaws.

varjag · 2024-12-10T11:01:23 1733828483

But thing is it wasn't trickier than it seemed. It was simply an outlier entry, like the flipped tortoise question that tripped the android in the Bladerunner interrogation scene. It was not able to think harder without your input.

seymore_12 · 2024-12-10T08:22:26 1733818946

Grok gives this as an excuse for answering "The surgeon is the boy's mother." :

<<Because the surgeon, who is the boy's father, says, "I can't operate on this boy, he's my son!" This indicates that there is another parent involved who is also a surgeon. Given that the statement specifies the boy's father cannot operate, the other surgeon must be the boy's mother.>> Sounds plausible and on the first read, almost logical.

wanderer2323 · 2024-12-10T06:51:27 1733813487

Easy, from my recent chat with o1: (Asked about left null space)

‘’’ these are the vectors that when viewed as linear functionals, annihilate every column of A . <…> Another way to view it: these are the vectors orthogonal to the row space. ‘’’

It’s quite obvious that vectors that “annihilate the columns” would be orthogonal to the column space not the row space.

I don’t know if you think o1 is magic. It still hallucinates, just less often and less obvious.

brokensegue · 2024-12-10T06:53:36 1733813616

average humans don't know what "column spaces" are or what "orthogonal" means

Sabinus · 2024-12-10T07:07:27 1733814447

Average humans don't (usually) confidently give you answers to questions they do now know the meaning of. Nor would you ask them.

throw310822 · 2024-12-10T07:38:26 1733816306

Ah hum. The discriminant is whether they know that they don't know. If they don't, they will happily spit out whatever comes to their mind.

leklund · 2024-12-10T17:25:46 1733851546

Sure average humans don’t do that, but this is hackernews where it’s completely normal for commenters to confidently answer questions and opine on topics they know absolutely nothing about.

mdp2021 · 2024-12-10T07:24:10 1733815450

And why would the "average human" count?!

"Support, the calculator gave a bad result for 345987*14569" // "Yes, well, also your average human would"

...That why we do not ask "average humans"!

rzzzt · 2024-12-10T09:15:21 1733822121

"On two occasions I have been asked, 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."

So the result might not necessarily be bad, it's just that the machine _can_ detect that you entered the wrong figures! By the way, the answer is 7.

brokensegue · 2024-12-10T17:25:03 1733851503

average human matters here because the OP said

> Can you please give an example of a “completely illogical statement” produced by o1 model? I suspect it would be easier to get an average human to produce an illogical statement.

mdp2021 · 2024-12-11T07:57:33 1733903853

> because the OP said

And the whole point is nonsensical. If you discussed whether it would be ethically acceptable to canaries it would make more sense.

"The database is losing records...!" // "Also people forget." : that remains not a good point.

brokensegue · 2024-12-11T19:48:57 1733946537

Because the cost competitive alternative to llms are often just ordinary humans

mdp2021 · 2024-12-12T08:56:00 1733993760

Following the trail as you did originally: you do not hire "ordinary humans", you hire "good ones for the job"; going for a "cost competitive" bargain can be suicidal in private enterprise and criminal in public ones.

Sticking instead to the core matter: the architecture is faulty, unsatisfactory by design, and must be fixed. We are playing with the partials of research and getting some results, even some useful tools, but the idea that this is not the real thing must be clear - also since this two years plus old boom brought another horribly ugly cultural degradation ("spitting out prejudice as normal").

brokensegue · 2024-12-12T13:23:35 1734009815

I interpreted the op's argument to be that

> For simple tasks where we would alternatively hire only ordinary humans AIs have similar error rates.

Yes if a task requires deep expertise or great care the AI is a bad choice. But lots of tasks don't. And in those kinds of tasks even ordinary humans are already too expensive to be economically viable

mdp2021 · 2024-12-18T09:38:01 1734514681

Sorry for the delay. If you are still there:

> But lots of tasks

Do you have good examples of tasks in which dubious verbal prompt could be an acceptable outcome?

By the way, I noticed:

> AI

Do not confuse LLMs with general AI. Notably, general AI was also implemented in system where critical failures would be intolerable - i.e., made to be reliable, or part of a finally reliable process.

brokensegue · 2024-12-19T13:08:08 1734613688

Yes lots of low importance tasks. E.g. assigning a provisional filename to an in progress document

Checking documents for compliance with a corporate style guide