> Again, missed the point.... If your absurd claim that Waymo's trials provide j...

bumby · on Dec 23, 2023

As I've said elsewhere, my point was part of a larger context. My point was about how important trust is to adoption of AV tech. That goes well beyond the Waymo cases illustrated. The sample size and quality of the data illustrates the need for a broader context of information needed, in addition to the need to understand that humans don't build trust simply from statistical arguments.

And in the vein of trying to steel-man your position, I gave the comment to ChatGPT to see if it, too, considered the central point a claim about AV having a "death wish." Here's what it said:

"The statement highlights that some AI tasks, while effective, may deviate from conventional practices. An example is given where the Department of Defense (DoD) employed a company to train a dogfighting simulator using reinforcement learning (RL). Pilots were surprised by the simulator breaking established best practices and behaving recklessly, akin to a pilot with a disregard for safety. The implication is that while such behavior might be advantageous in a military context, it may pose risks or be unsuitable in civilian settings, such as public roads. The statement underscores the need to carefully consider and tailor AI applications to specific contexts and objectives."

So it seemed to recognize that the central point is that "the behavior" in question is "breaking established best practices" and that the "implication is that while such behavior might be advantageous in a military context, it may pose risks or be unsuitable in civilian settings". There's probably some irony in the fact that AI did better at a reading task.

dcow · on Dec 24, 2023

Sounds like you’re arguing that “AI” is better at navigating complex human discussion than the multiple humans in this thread? I’ll take that conclusion, I guess (whether you ultimately believe yourself or someone else arguing against your own points doesn't really matter, does it).

Really the only thing left is for you to take a flight to SF and watch the Waymo cars drive. Or ride in one if you dare.

mannykannot · on Dec 24, 2023

Impressive as LLMs are, they lack a theory of mind and are inferior to humans in parsing meaning from statements.

This dispute is not over a difficult or subtle issue: all the people who have responded in this thread see clearly the obvious and unequivocal reading - and, in your own example, ChatGPT also does! It has identified the tacit subject of the sentence "Possibly good in war, maybe not so good on a public road" as the specific military system, with behavior explicitly described as like having a death wish, that is the only topic of the preceding sentence. For one thing, the phrase 'possibly good in war' makes no sense in the reading you are trying to pass off: why, out of nowhere, did the needs of the military appear? And unexpected behavior is not something desirable in general in military systems, any more than elsewhere - it would take very special circumstances and a specific sort of behavior for that proposition to even be entertained. We can see, therefore, that dcow was right to respond 'your comparison is RL dogfighting?...' and 'Modern AVs are not driving like they have a death wish by any stretch of the imagination...'

Oh, and next time you invoke ChatGPT's response, include the prompt, verbatim, like this for example:

Prompt: In the statement 'That’s true, but also one of the selling points of some AI tasks. As a non-hypothetical example, the DoD hired a company to train a software dogfighting simulator with RL. What surprised the pilots was how many "best practices" it broke and how it essentially behaved like a pilot with a death wish. Possibly good in war, maybe not so good on a public road.', what is being called ' not so good on a public road'?

Response: In the statement, the phrase "not so good on a public road" refers to the behavior of the software dogfighting simulator trained with reinforcement learning (RL). The implication is that the simulator, which exhibited behavior contrary to conventional "best practices" and behaved like a pilot with a "death wish," might not be suitable or safe for use in a public road scenario. This suggests a concern about the potentially risky or unpredictable behavior of the AI system in a real-world, civilian setting such as driving on public roads.

What we have in this discussion is a motte-and-bailey fallacy, as we can see in your response to my first post here, which was:

None of these three cases involved the Waymo car behaving in ways that are not that uncommon among human drivers, and our theory of mind does not make us nearly-infallible predictors of what another driver is going to do. Your objection becomes essentially hypothetical unless these cars are behaving in ways that are both outside of the norms established by the driving public, and dangerous." Your reply, in outline, goes like this:

> That's true...

Here we are in the motte, where you nominally accept that the relevance of your concern, which is not unreasonable in itself, is constrained by the extensive testing that has been performed by Waymo so far...

> ...but...

Here we enter The bailey, where we are supposed to turn our attention to an unrelated system, which was found, on testing, to have alarming unexpected behavior. The bailey has become a place where Waymo has been curating the data to the point where we simply don’t have the slightest idea whether there’s dangerous behavior outside of the human norms lurking in Waymo cars.

It has also become a place where all of Waymo’s extensive testing has produced just three data points against this view. I must say that it seems generous of you to concede even three, if Waymo is curating the data to the extent you imply.

bumby · 2023-12-26T17:03:25 1703610205

>It has also become a place where all of Waymo’s extensive testing has produced just three data points against this view.

This is where you are bypassing the point about how much faith we can put in the data, because we don't have the full results of their extensive testing. We only have the results they are willing to share.

As an analogy, I have a relative who loves to talk about all the times he's won money gambling on the craps table. I almost never hear any information about his losses, unless they are couched to say how much more money he's won later. So would you say I can conclude he's an expert gambler who should quit his job to play craps full time, or do you think there might be some human bias in reporting going on that I should be skeptical about?

mannykannot · 2023-12-27T14:17:08 1703686628

So you want to bring up your "three data points" claim again? OK. Those three data points are the three incidents listed in the top post of this thread [1]. About them, you wrote "regardless if you think the decision should be based on statistics alone, my further point is that an n=3 sample size is not adequate to make strong claims." [2] I don't just "bypass" this argument, I dismiss it as an absurd characterization of all the testing Waymo has been performing. As I pointed out at the time, if none of these incidents had occurred during these tests, then, by your logic, we would have no information at all about the safety of the cars! This position, applied consistently, amounts to a complete rejection of statistical methods, and I "bypass" that.

At least the idea that Waymo is hiding data that would reveal the vehicles as being too dangerous to be on public roads is not quite that wrongheaded, but it implies that Waymo is doing this on a massive scale that cannot possibly succeed in the long run. It is not clear, for example, how it could hide information about similar or worse incidents from the insurance companies, and a story about your gambling relative is not changing my mind.

[1] https://news.ycombinator.com/item?id=38721721

[2] https://news.ycombinator.com/item?id=38739023

bumby · 2023-12-27T22:05:18 1703714718

>At least the idea that Waymo is hiding data that would reveal the vehicles as being too dangerous

This is your dichotomous thinking again. I am not making a point about it being "safe" or "unsafe" as a dichotomous choice. I'm saying we don't have good data to make a claim one way or another. That's also different from saying, "we have no information at all." When you combine that with the fact that we have other evidence that RL models can result in unpredictable behavior, it should give us pause.

Insufficient data + priors about unpredictable behavior = uncertainty of performance

The through-line of my entire point is that uncertainty erodes trust, and trust is necessary for wide adoption in the public sphere. It isn't a hard concept if you can lay your bias and dichotomous thinking aside to consider it. Waymo seems to agree; although the PR headline is about 7MM miles traveled, the report says:

>"the required ADS VMT to establish statistical significance ranges from tens to hundreds of millions of miles, and the fatal outcome requires hundreds of millions to billions of miles of driving are needed."

So unless you have better data to show, you've demonstrated nothing to make me change my mind.

mannykannot · 2023-12-28T14:43:41 1703774621

> I'm saying...

You don't get a thread of 173 posts (44 by you) over 6 days and counting (4 in the last day) over the anodyne position you are now professing. That time has been spent dealing with all the bogus stuff that you are now tacitly disavowing, such as the bizarre three data points claim that you brought up again as recently as your previous post.

> ...change my mind.

You have been tacitly changing your mind throughout, though, of course, you will not admit it.

bumby · 2023-12-28T16:16:21 1703780181

Like so many of your previous posts, this doesn't address any of the actual points made, but just tries to shoehorn your own digression. If you can point to where I changed my mind, I'll be happy to try and explain how it relates to those original points or admit if it does not. Note that "unexpected behavior" and "uncertainty" are literally in the first post I made, as is the part about what it takes for society to adopt the tech.

It certainly comes across like either you're too hung up on a position to read critically or you got fooled by a PR piece without knowing how to properly interpret it. I guess that's a win for the SV hype machine.

mannykannot · 2023-12-30T05:51:25 1703915485

Ah - so, if you have not changed your mind on anything, then, regrettably, you left a number of things out of your "I'm saying" list, such as the business about there only being three relevant data points from Waymo's testing [1].

On the one hand, I can understand that it is difficult to keep track of everything you have said throughout your twists and turns, but on the other, it seems particularly important one to keep this one in mind as, if verified, it would be far more damning of Waymo than everything else you have said, combined!

[1] https://news.ycombinator.com/item?id=38739023

dcow · 2024-01-01T09:36:13 1704101773

You’re just sea-lioning at this point. Please stop.