Dogs are sensitive to small variations of the Earth’s magnetic field (2013)

helsinkiandrew · 2023-07-05T12:54:50

Discussed in 2020 (160 comments) https://news.ycombinator.com/item?id=22276468

and 10 months ago (44 comments) https://news.ycombinator.com/item?id=32736851

A good blog post on the seeming ridiculousness of the claim: https://www.shapeoperator.com/2020/02/15/how-to-lie-with-uni...

jncfhnb · 2023-07-05T12:03:11

Somewhat famous p hacking example iirc

superb-owl · 2023-07-05T12:16:43

I was thinking this might be the case just based on the conclusion. But it doesn't seem to be retracted, and I can't find any evidence of a scandal.

jncfhnb · 2023-07-05T12:33:34

I don’t think it was noteworthy enough to cause a scandal. You can find scattered discussions on their p hacking. Nobody really cared though.

The authors are pretty clear that they didn’t find what they were looking for and so they just tried whatever they could with the remaining data and landed on dog pooping orientation.

akiselev · 2023-07-05T14:22:19

They wanted tenure, they found dog shit. Their research is a tortured metaphor for their own career prospects.

When I put it like that, it evokes more pity than outrage. Imagine what they had to go through to collect enough data.

CoastalCoder · 2023-07-05T12:18:44

Do you know where I can find a succinct and good explanation of p-hacking?

Last time I looked into it, I couldn't understand why it was seen as a problem.

wongarsu · 2023-07-05T12:30:31

The p-value is supposed to measure how likely you were to get a result by pure chance. People use this as a measure of how strong the evidence published in some paper is, for example if you publish a result that has a 1% probability of being produced by pure chance it's probably worth looking into.

The problem is if what you actually did wasn't one experiment, but 1000 simultaneous experiments. That happens quickly if you start looking for correlations in your data. The issue is that if you then use the same 1% threshold for significance, on average 10 experiments will clear that threshold by pure chance. Your statistical test has become worthless. That's fine if you account for it, but if you don't and just publish those 10 results that's called p-hacking. People hate it because it makes purely random results look as if they were significant.

jncfhnb · 2023-07-05T12:31:13

You set out to test an idea. Our statistical toolkit can’t prove the idea so we say “if I check out this idea and I think the results could only have happened by in my expected way by random chance 5% of the time, I’ll conclude I was correct.

You do your experiment. You don’t find any evidence you were correct. So you test 20 other ideas. Lo and behold, 1 out of the 20 new unregistered ideas passes the test of seeming to occur with less than a 1 in 20 probability of occurring.

Well duh. It would be weird if you didn’t find such an outcome. This particular study openly acknowledged doing this, and also did some creative binning of their numbers to help nudge the stats.

You will always be able to find something. P hacking doesn’t mean the thing you found is false, but it is at best a sloppy estimate of something to explore further.

The more egregious examples of p hacking is adjusting outliers and filtering out specific datapoints until you can make the data claim whatever you want.

CoastalCoder · 2023-07-05T13:42:45

Thanks for the explanation!

> You do your experiment. You don’t find any evidence you were correct. So you test 20 other ideas. Lo and behold, 1 out of the 20 new unregistered ideas passes the test of seeming to occur with less than a 1 in 20 probability of occurring.

I think this ^^^ is the part that's tripping me up: the notion of "registered ideas".

Suppose we have 21 researchers, each testing one of those 21 ideas. They pool their funds and hire someone to generate that one data set.

20 of those researchers would find their ideas unsupported by the data. But one researcher would find their idea was supported. And that's the only study that gets published in Nature.

I don't hear about this ^^^^ issue being criticized the same way P-hacking is. But AFAICT it's a similar kind of statistical factor.

Name_Chawps · 2023-07-05T14:31:46

It is criticized frequently, and study pre-registration has become the norm already in some fields.

CoastalCoder · 2023-07-05T15:10:51

Thanks. Then maybe it's something else I'm having trouble with:

I'd think that (the extent to which data support a particular model) is unrelated to (subjective details such as what hypothesis the researchers intended to test).

IIUC, the criticism of P-hacking hinges on such subjective details.

If the main goal is to find useful models, why would we care about the researchers' intent when originally designing the experiment?

jncfhnb · 2023-07-05T15:49:52

It’s not that we care about their intent, it’s that we care about whether their conclusions are more likely to represent a genuine effect vs. sample bias. If we are testing everything we can think of, we need to be more conservative about our statistical tests to reflect that. That is, helping to determine if what you found is actually useful or not.

Further a lot of science finds things pass the statistical test but are still not useful, which is a different kind of problem.

If you do an experiment and see something interesting you should probably follow up on it. But it’s the difference between saying “there’s a 5% chance that this is just sample bias” using basic methods and “there’s a 90% chance this is sample bias using methods that account for how many things we tried”. Who knows? Maybe your finding is novel, important, statistically strong, and useful, without you having planned to explore it. Maybe you accidentally found penicillin.

The two big problems is that people don’t know how to do these calcs (a lot of which don’t really make any sense because they try to assert independence into things where independence is impossible) and they don’t mention the things they tried which didn’t work which makes evaluating their claims impossible.

CoastalCoder · 2023-07-05T16:21:16

> It’s not that we care about their intent, it’s that we care about whether their conclusions are more likely to represent a genuine effect vs. sample bias.

So is the fundamental issue that an experiment is probably designed to avoid bias for the data related to the hypothesis being tested, but probably isn't designed to avoid bias for other hypotheses you might generate once the results are in?

jncfhnb · 2023-07-05T17:08:08

The math required to interpret 20 results is different than the math required to interpret 1 result. It’s not a weakness of the experiment itself. On some level the math required to interpret 20 results may not actually exist.

Think of sample bias as a dice roll. Every time you check a new hypothesis, you roll the die. If it rolls a 1, you get a false positive.

Pre registering a hundred different things you want to test doesn’t make that better, but it does create confidence that you didn’t continually test new ideas one at a time until number 100 was something that looked good.

groestl · 2023-07-05T12:40:01

Or, in the form of a comic strip https://xkcd.com/882/

(I have this in my head when p-hacking comes up)

jncfhnb · 2023-07-05T12:48:19

A bit. But you can test 20 ideas. This is not so bad. We have tools to scrutinize such results like the false discovery rate and related concepts.

The real problem is when you set out to test 20 things, and they don’t work, so you test 20 more things; and then much worse when you only tell us about the analysis that lead to the 1/40 that mattered.

The thing about p hacking specifically is changing your analysis until it finds something that looks good. A lot of people don’t understand math and will do this and not understand that they’re gaming it. A lot of people will understand this and do it anyway because academia is soul crushing and they need to publish.

lelanthran · 2023-07-05T16:19:39

> Do you know where I can find a succinct and good explanation of p-hacking?

Sure.

When your experiment has a P value of 0.05, what that means is that 5% of the time you expect completely incorrect results purely due to random chance.

So you test $THEORY, and expect that 1 out of every 20 times the results are going to be wrong purely due to random chance.

P-hacking is when you repeat the test 20 times[1], and publish that 1 random result while throwing away the other 19 results.

[1] Or however many times you need to

swayvil · 2023-07-05T13:28:11

A dowser recently told me that's how he dowses. By detecting kinks in the Earth's magnetic field. Thus he detects underground water, buried objects, disturbed earth, graves, cables, etc.

He works for the state department of land management or somesuch. Apparently they all dowse there. They even have a preferred "dowsing rod". A retractable thing like a radio antenna attached to a swivelling handle. (According to the dowser rather overpriced)

nonrepeating · 2023-07-05T15:05:14

I’ve seen one of these as well, in the hands of a state employee, exactly as you describe. There should be a word for the mass-produced normalization of superstition. Ouijification?

swayvil · 2023-07-05T15:51:05

Superstition? I think they're getting results. It works.

lelanthran · 2023-07-05T16:24:33

> Superstition? I think they're getting results. It works.

Compared to what?

I can almost guarantee that they aren't getting results any better than random chance.

I mean, really, if dowsing is every proven to work under rigorous scientific conditions, there'd be a Nobel prize in it for someone, because we'd have a whole new field of scientific study that has never before in the history of mankind produced any evidence of it's existence.

swayvil · 2023-07-05T17:51:34

Unless it's weird. Weird stuff happens.

I mean you're naysaying the testimony of hundreds of field researchers on the weight of, let's face it, mere convention.

And these aren't necessarily dumb people. They value time and money just like we do

It does depend on the sensitivity of the dowser after all. That sounds squishy, like psychology. Certainly less obvious than weights and measures.

Who knows, maybe it takes a certain kind of person to do it.

lelanthran · 2023-07-05T19:03:27

> It does depend on the sensitivity of the dowser after all.

Many hundreds of the most successful dowsers (i.e. the most sensitive) could not beat random chance.

Until they can, I'll remain skeptical.

micromacrofoot · 2023-07-05T13:15:43

As a lifelong dog owner... this screams nonsense immediately. I'm skeptical that this could even be tested for properly.

stronglikedan · 2023-07-05T15:03:59

Right? We walk on a big loop road, and my dog always poops parallel to the road, regardless of which direction the road is heading in that location.

1letterunixname · 2023-07-05T13:26:02

My doggo's GNSS is broken: he can't even decide where to go much less which heading.

Name_Chawps · 2023-07-05T14:30:05

Even if dogs were sensitive to Earth's magnetic field, why would they shit facing North/South? What would be the point?

sandworm101 · 2023-07-05T14:37:36

Prevailing winds? Weather is generally orientated east-west. So aligning north-south keeps one end from smelling the other.

eimrine · 2023-07-05T13:35:00

Any service which register and share data on magnetic field changes? It might be useful for dog-owners not just to observe strange behaviour of the pet but to understand.

defrost · 2023-07-05T13:45:36

Your National Geophysical Organisation maps all that on a regular basis.

Broad area mag surveys are flown in grids 80m above the ground with nine nano-tesla senistive magnetometers (wing tips and tail boom each have three on cartesian axis).

To produce a level map they also run local base stations to record the daily diurnal mag fluctuations to subtract and normalise.

That gets feed into the IGRF (International Geomagnetic Reference Field) and|or WMM (World Magnetic Model) to create parameters for five year epoh predictions.

On a smaller scale many archeology groups and or soil engineers run smaller tighter grids dragging skids of walking arrays across an acre or more to create sub ground imaging.

https://www.ncei.noaa.gov/products/international-geomagnetic...

https://geomag.bgs.ac.uk/research/modelling/IGRF.html

https://www.ncei.noaa.gov/products/world-magnetic-model

Here's some broad area airborne magnetic maps:

https://www.energymining.sa.gov.au/industry/geological-surve...

jononomo · 2023-07-05T14:33:46

This is an example of the creative power of Evolution at work -- dogs that did not align themselves with the Earth's magnetic poles when peeing were simply been eliminated from the gene pool by the unyielding machine that is the survival of the fittest.

dieg0 · 2023-07-05T13:56:42

This must be the reason dogs feel earthquakes minutes before humans do. A couple weeks back I read an article here (HN) regarding earthquakes being linked to MF changes and solar storms. This aligns to that finding as well. Interesting!

1letterunixname · 2023-07-05T13:24:56

Preregistration is one mitigation.

Perhaps another could be a trusted, HIPAA-compliant third-party log service for raw data that's immutable with a blockchain-like hash and timestamp.

anileated · 2023-07-05T11:44:52

(2013)