Hacker News new | past | comments | ask | show | jobs | submit | jonathan_landy's comments login

>> Some of the problems don’t matter as much if your goal for the model is just prediction, not interpretation of the model and its coefficients. But most of the time that I see the method used (including recent examples being distributed by so-called experts as part of their online teaching), the end model is indeed used for interpretation, and I have no doubt this is also the case with much published science. Further, even when the goal is only prediction, there are better methods like the Lasso, of dealing with a problem of a high number of variables.

I use this method often for prediction applications. First, it’s a sort of hyper parameter selection, so you should obviously use a holdout and test set to help you make a good choice.

Second, I often see the method dogmatically shut down like this, in favor of lasso. Yet every time I have compared the two they give similar selections — so how can one be “evil” and the other so glorified? I prefer the stepwise method though as you can visualize the benefit of adding in each additional feature. That can help to guide further feature development — a point that I’ve seen significantly lift the bottom line of enterprise scale companies.


> Yet every time I have compared the two they give similar selections — so how can one be “evil” and the other so glorified?

Frequentist and Bayesian approaches often yield similar results but philosophically are different. In general I favor and recommend lasso because I see it perform as well or better than stepwise at variable selection but doesn't come with all the baggage.

Lasso avoid the multiple comparison problem by applying a regularization penalty instead of sequentially fitting multiple models and performing hypothesis testing. This also helps to prevent overfitting. If you want to see which variables would be included/excluded you can turn the regularization up or down (it is pretty easy to spit out an automated report).

Stepwise selection comes in different flavors: forward, backwards, or bidirectional; R-squared, adjusted R-squared, AIC, BIC, etc.; these often all lead to different models so the choices must be justified and I rarely see any defense for them.

Of course, if the point is prediction over coefficient estimation and interpretability then neither of these are great choices.


> I use this method often for prediction applications. First, it’s a sort of hyper parameter selection, so you should obviously use a holdout and test set to help you make a good choice.

What the article is talking about is inference, not prediction. It's a different problem domain, it's not about telling a company whether design A or B leads to more engagement, it's about finding out about the (true!) causal drivers of that difference. The distinction may seem subtle but it's important. The key problems outlined all talk about common (frequentist) statistical tests and how they get messed up by variable selection. Holdout sets don't address this, because if the holdout set comes from the same distribution as the test set (as it should), the biases would be the same there. Bayesian inference isn't a panacea either, the core problem is structuring the model based on the data and then drawing conclusions about their relationships (Bayesian analysis gives you tools to help avoid this, but comes with its own set of traps to fall into, such as the difficulty of finding truly non-informative priors).


Yeah, the title is a bit hyperbolic. I have not used selection methods that much, but not too surprising they would have similar results to LASSO as selection or predictive method for people who think of it in terms of "feature development".

The distaste for step-wise selection comes from its typical use. If one reads Harrell's complaints quoted in the blog post carefully, quite many of them are less about the selection method but what analyst does with it, namely, interpretation of inferential statistics. When you see step-wise in the wild, practitioner often has used step-wise or other selection method and then reports the usual test-statistics and p-values for the final fitted model ... that are derived with assumptions that don't usually take into account the selection steps. It is quite unfortunate in fields where people put lot of faith in coefficient estimates, p-values and Wald confidence intervals when writing conclusions of their paper.

With LASSO and its cousins, the standard packages and literature strongly encourage the user to focus on predictions and run cross-validation right from the beginning.


Neat, I wasn’t aware of available data like this. I recently bought a used smart car — lots of fun, but I admit I worried it was a death trap. It’s not in this list but a google showed they are actually not much worse than average.

https://www.smartcarofamerica.com/threads/updated-smart-acci...


I’ve seen only one of those little guys on a freeway before. I suspect many, many more miles inner city where lower speeds and larger impacts are less common.

They do sound fun to rip around in. You used to be able to rent them with an app around my city. I think it was helpful for a lot of people


And which kind of thinking led you to assert he lived in us his whole life on the basis of what little you read?


Class suite actions as only moral option to protect consumers (which is not common in Germany), Citing Kissinger as a god like authority in intra - countries relationships, lack of knowledge in pro - market / competition regulation (very strong in Germany, EU [for different reasons]),

I regularly read German and Swiss newspapers. The arguments are very different (and in many cases more nuanced)


I never cited Kissinger as a "god like authority", and to insinuate that I did is offensive.

Furthermore, you made an earlier false statement about bias -- claiming I had lived in the US all my life -- and backing it up claiming that you had read my LinkedIn. My LinkedIn features my European high school. You're either lying or lazy; you can tell me which.

Making bombastic and trivially false statements doesn't help your arguments. Good luck with your "nuance".


Love this. Very interesting that same amount of compression (samples) can give ever more accuracy if you do a bit more work in the decompression — by taking higher order fits to more of the sample points.


This is pretty much the core principle underlying modern machine learning. More parameters means more faithful fit for the data, at the cost of over-fitting and generalizing poorly on unseen data from outside the range of data that was used to tune the parameters. In this particular application, we aren't that worried about overfitting because we know the actual function used to compress the data in the first place, so we know that our decompression function is "correct" and we know the range of the data. So we can keep adding parameters to reduce reconstruction error. Meanwhile in applied ML and stats, cubic and even quadratic models should be used and interpreted only with extreme caution and detailed knowledge of the data (how it was prepared, what the variables mean, what future data might look like, etc).


This also seems to a difference between interpolation and extrapolation. The table doesn't just fit a polynomial to theta between 0 and pi/8 and expect you to extrapolate for theta > pi/8. That would have catastrophic results. It has always seemed to me like one of the big problems with ML is knowing whether a given inference is an interpolation or an extrapolation.


In that sense, extrapolation should never be used in “production”. At best it is for exploration.

One characteristics of ML is that this distinction often is not clear. (Hallucination, generalization, etc.)


Interpolation would require deriving from the resultant and extrapolation would guarantee no inference is how I make sense of it


I submitted it because I saw a comment somewhere else about the rich being locked into unfair advantages and classes ossifying. Seems to not be true if these stats are to be believed.


It could also be interpreted as demonstrating the advantages that the rich have:

Even those families who somehow manage to build up wealth likely to lose it within two generations. Meanwhile the ultra-wealthy are able to continue being ultra-wealthy from one generation to the next regardless of how competent that generation is.


1% of 1 billion is 10 million. That is lot of money to spend on consumption. And in many cases such consumption doesn't necessarily entirely destroy spend money. Think of classic cars or art or real estate...

Only real way to destroy such amount of money is something stupid like starting an airline...

The ultra-wealthy territory is one were actually losing becomes hard, if you have even some moderation in actions.


Exactly. "The rich" as a narrative is everywhere. While in financial terms, it is not all that difficult to keep wealth growing. To the point that in a hundred years - not all that long - any "new wealth" should turn into a truly massive pile. But there aren't, or very few. On the contrary, the top of the Forbes list is mostly new wealth.


Link below has some interesting plots. One shows child pedestrian deaths per 100k population going steadily down since 70s. Yet adult pedestrian deaths have recently ticked up.

https://www.iihs.org/topics/fatality-statistics/detail/pedes...


And yet… the plug and play nature of many ML methods and their frequently positive impact when applied so has probably played a large part in the growth of the field.


Re temp, I’m glad we use F for daily life in the USA. The most common application I have for temp is to understand the weather and I like the 0-100 range for F as that’s the typical range for weather near me.

For scientific work I obviously prefer kelvin.

Celsius is nearly useless.


For me the best feature of Celsius, the one that makes it much better for weather, is the zero on the freezing point of water. Everything changes in life when water start to freeze, roads get slippery, pipes burst, crops die. So it is important that such a crucial threshold is represented numerically in the scale. In other words, going from 5 to -5 in Fahrenheit is just getting 10° colder, nothing special, while going from 2 to -2 in Celsius is a huge change in your daily life.


95% of the world uses Celcius without problems because they're used to it. You'd either also be fine with it or you belong to a sub-5th percentile which couldn't figure it out, take your pick.


> sub-5th percentile which couldn't figure it out

Ironic, given that one of the prime arguments in favor of metric is that it is easier.

Why do non-US people even care? And do y'all care that you are wrong? The US has recognized the SI. Citizens continue to use measurements they are comfortable with, and it does not hurt anyone. We are also not the only nation that has adopted SI but not made it mandatory. The UK is an obvious example.

Again, I'm back to 'why does anyone else even give a shit'? Aren't there more interesting things to ponder?


What does "adopted" mean in that context? (serious question)


"Celsius is nearly useless."

http://i.imgur.com/3ZidINK.png?1

For anyone not living in the US or Jamaica or Belize, it is Fahrenheit that is completely useless. Which is something like 7,7 billion people.

0 = water freezing temp is hugely useful heuristics for anyone living in moderate climate.


> For anyone not living in the US

So what I am hearing is that sure, it makes perfect sense for US citizens to continue using Fahrenheit.


US residents...

If you, as a US citizen, settle abroad, be prepared to run into a wall with Fahrenheits. People in the rest of the world don't have the intuitive grasp whether 50 degrees Fahrenheit is warm or cold.


> US residents

Yeah that's the right terminology. I knew it when I said citizens it wasn't quite right but I blanked on the right answer. 'Residents' is pretty obvious.

> be prepared to run into a wall with Fahrenheits

I agree it's worth knowing just enough about celsius to use it casually when you are traveling. e.g. I just remember 20 is room temperature and every 5C is about 10F. Close enough. And remembering '6' is enough to remember how km and miles are related.

Anyone who is settling abroad ought to be able to pick up intuitive celsius in a couple days. When everyone around you uses the same measuring unit, you adapt pretty quickly IME.


Perhaps it's just because you're not used to it. 17-18c is perfect, 25 is a mild summer day. 30-35 full swing summer and 40 and up is oh no global warming. 5-7 is chilly, 0 is cold, -single digit is damn it's a cold winter and -double digits is when tf did I move to Canada.


I agree. For ambient temp, F is twice as accurate in the same number of digits. It also reflects human experience better; 100F is damn hot, and 0F is damn cold.

Celsius is for chemists.


There's very little difference between e.g. +25°C and +26°C, not sure why you would need event more accuracy in day to day life. There are decimals if you require that for some reason.

Celsius works significantly better in cold climates for reasons mentioned in another comment.


If that’s the case why do the Celsius thermostats I used while on vacation in Canada use 0.5C increments? The decimals are used, because the change between 25C and 26C is actually pretty big :)

In my old apartment, the difference between 73F and 74F was enough to make me quite cold or hot. And that’s a difference of about 0.5C. I’m not arguing that Farenheit is better, but I definitely do prefer it for setting my thermostat (which is a day to day thing) , but then again I grew up using it so that could be why I prefer it too.


> If that’s the case why do the Celsius thermostats I used while on vacation in Canada use 0.5C increments?

Probably because they were made for US and changed the labels? I've never seen a thermostat with 0.5 C increments in Europe.

> the change between 25C and 26C is actually pretty big

I would maybe be able to tell you if it's 23 or 27, certainly I can't tell 1 C difference.


> Celsius is for chemists

Or cooks. Or anyone who cooks, which is most people


The difference between -1 C and +1 C is VASTLY more important in daily life than the difference between 26.5 and 27 C.

Farmers, drivers, people with gardens need to know if it will get subzero at night.

Nobody cares if it's 26.5 C or 26 C.


> Celsius is nearly useless.

That's like ... your opinion man.

Personally I like knowing that water boils at exactly 100 degrees.


At sea level, yes :)

I do agree, though I live in Europe and C is the norm. I could never wrap my head around F.

That said, I think 0 is more important in daily life, below or above freezing. How much is that in F again?


As a dweller of a cold place in the USA, F is pretty handy because "freezing" isn't terribly cold. Having 0F be "actually quite seriously cold" is useful.


My parents care a lot about "przymrozek" - which is when it gets sub-zero C at night and you need to cover the plants and close the greenhouse doors and put a heater there so the plants survive. They give warnings in radio when this happen outside of regular winter months.

There's also special warning for drivers if it was sub-zero because then the water on the roads freezes and it's very hard to break.

I'd say it's way more important a distinction than anything that F makes obvious.


Also, conveniently, freezer temperature is 0F not 32F.


We just need a new scale just for weather where 100 is 100F and 0 is 32F/0C then everyone can be happy. We'd have a lot more days with subzero temperatures though


You just use one thing and you’ll learn it. When I was a kid my country changed from archaic 12 point “wind levels” to m/s. It took everybody a few weeks to adjust but it wasn’t hard. It was a bit harder for me after moving to America to adjust to Fahrenheit, but as you experience a temperature, and are told it is so many Fahrenheit, you’ll just learn it. I have no idea at what temperature water boils in F simply because I never experience that temperature (and my kettle doesn’t have a thermometer).

That said I wished USA would move over to the unit everyone else is using, but only for the reason that everyone else is using it, that is the only thing that makes it superior, and it would take Americans at worst a couple of months to adjust.


> only for the reason that everyone else is using it

That is an honest answer, which is refreshing. Beside that, there is not really any particular reason that the US has to make SI mandatory. We adopted SI nearly 50 years ago, we just did not make it mandatory. The US has a bit of national identity which leans towards rebelling, so making SI mandatory would probably be contentions anyway. And it's just not worth the argument, since it buys us very little of actual value.


Temperature is easy, probably the easiest unit to convert... Everyone would get used to it pretty soon after they started using it regularly. There would be some legacy systems out there which would annoying to convert (which is already the case) but within a generation nobody would bother with Fahrenheit at all.

I think the hardest unit to convert is probably length as there is not only a bunch of legacy systems and equipment out there, but Americans are very accustomed to fractional sub-units as opposed to the decimal cm, mm, etc. I’m not sure e.g. the building industry would ever stop saying e.g. four and five eighths. Personally I hate fractional lengths when using american tools. E.g. I’m used to a 11 mm wrench being smaller than a 13 mm wrench. I need to stop and think before I know which is smaller a five eights or a three quarters.


> american tools

That's an interesting way to phrase it. I, and everyone I know, have both metric and SAE tools. At least for wrenches & sockets.

> I need to stop and think before I know which is smaller a five eights or a three quarters.

I'm with you there. I've gotten in the habit of just mentally converting every SAE size to 32nds. I wouldn't really mind losing SAE, but that is not happening. What really makes my blood pressure goes up is Ford ... they mix metric and SAE fasteners on their cars. WTF! Pick one! Subaru is at the other end, easy to work on because 10 & 12mm wrenches will work for maybe 9 out of 10 bolts or nuts.


I agree that for weather F is better, but I don't think it's so much better as to be worth having two different temp scales, and unlike K, C is at least reasonable for weather, and it works fine for most scientific disciplines.


I don't see enough love for feet and inches.

A foot can be divided cleanly into 2, 3, 4, and 6. Ten is a really sucky number to base your lengths on. It only divides nicely into 2 and 5.


People normally just use the subunit which doesn’t divide. E.g. height is usually referred to in cm. If accuracy is important they use millimeters. Roadsigns for cars use km but downtown wayfinding signs for pedastrians use meters.

I agree it is really nice to use base-12 until it brakes, but it brakes much worse then metric. If you have to divide into 32nds everything about feet and inches is much worse (in metrics we would just use millimeters). The worst offender are wrenches which don’t order intuitively. In metric, if you 13 mm wrench is too big, you just grab an 11 mm wrench. In inches if your 13/16th inch wrench is too big, do you grab the 5/8th? or three-quarters next?


Stepping down to the next unit doesn't necessarily make anything tidier. If I need to cut a 3.5-foot piece of wood into thirds, then I cut it into 14-inch pieces. If I need to cut a 1-meter piece of wood into thirds, I cut it into 33.3-centimeter pieces.

Or, perhaps I want to hang two photos on a wall, spacing them evenly - the math from the example above applies again.

Regarding your example of dividing 12 into 32 parts - I think that's another good example of the elegance of imperial units. Dividing a foot into 32 parts is 3/8 of an inch! A nice, tidy unit that you'll find on any ruler or measuring tape.

>In inches if your 13/16th inch wrench is too big, do you grab the 5/8th? or three-quarters next?

Neither - I'd grab the 25/32" wrench ;) You make a good point.

I will say that fractional units become more and more intuitive as you use them more often. In a pinch you can just multiply both parts of the fraction by two.

Here's the thing: with wrenches in fractional units, you can do a binary search. Let's say you start with the 1/2 inch wrench. Too small? grab the 3/4. Too big? Try the 1/4. Work your way down.

...or, just remember that a huge share of bolts you'll come by are 7/16" and just start there.


I actually agree. The base-12 fractional system is very nice to work with, until it brakes, and it brakes much worse than the metric system. I actually explained in another post that if the USA were to move to metric, I think the construction industry will still be using feet and inches for at least a couple of generations (at least partially), and they would have a good reason to.

The way the metric system brakes isn’t actually all that bad, at worst you grab a calculator and write down the number.

And also bare in mind that this case is where feet and inches really shines, so we are comparing feet+inches at their best to metric at it’s worst. There are so many cases where metric is anywhere from marginally better to significantly better which does make up for that.

IMO the most significant reason for metric being superior is the universality of it. It is used everywhere in the world, including the USA, and that is an excellent quality of a measurement system which should not be understated.


At least conversion between Celsius degrees and Kelvin is easy and lossless


I find it quite strange that Fahrenheit stuck in the USA with its wide range of climates of all places.

I mean, that "0F to 100F is weather temperature range" completely falls apart unless you live in a very cold climate.


Sure, temperatures go outside those bounds, but only in the most extreme of weather conditions. Below zero? Above 100? You should probably stay inside today.


In relatively hot climates, above 100F is still a pretty reasonable temperature, not something i'd call "extreme".

0F though is crazy cold. Where i live (south-western europe), getting below ~15 F is already considered extreme weather

All that to say that the farenheit system is really geared towards very cold climates. So it's kinda weird that it stuck in a country that also has pretty hot climates in the south


It's all a matter of perception (and humidity).

Where I live, 100F is a hellish, blistering day. 0F is just an uncomfortable day in January, but certainly not abnormal. Just wear your big coat and gloves.

That said, 100F in Wisconsin is a very different animal than 100F in Las Vegas. Wisconsin gets brutally humid as it gets hotter, and that makes it even more oppressive. Meanwhile, Nevada gets drier, and so the heat is more bearable.

If anything, I think it's kinda cool that Fahrenheit lines up with perceived temperatures this way, even across different climates with different humidity. Sure, you can point to extremes (Phoenix, Juneau) but those are... well, extremes. For most of us, it's pretty good!


What the hell are you talking about. If it's 0°C outside (or below that), I know that it's high time to put winter tires on because the water in the puddles will freeze and driving on summer tires becomes risky. I had to look it up, but apparently that's +32 °F. Good luck remembering that.

+10°C is "it's somewhat cold, put a jacket on". +20°C is comfortable in light clothing. +30°C is pretty hot. +40°C is really hot, put as little clothing as society permits and stay out of direct sun.

Same with negatives, but in reverse.

Boiling water is +100°C, melting ice is very close to 0°C. I used that multiple times to adjust digital thermometers without having to look up anything.

It's the most comfortable system I can imagine. I tried living with Fahrenheit for a month just for fun, and it was absolutely not intuitive.


You'll want winter tires on well before the air temperature hits freezing for water. Forecasts aren't that predictable, and bridges (no earth heat sink underneath) will ice over before roads do.

40 F is a good time for getting winter tires on.

As someone who lives in a humid, wet area that goes from -40 at night in winter to 100+ F in summer, I also vastly prefer Fahrenheit.

The difference between 60, 70, 80 and 90 is pretty profound with humidity, and the same is true in winter. I don't think I've ever set a thermometer to freezing or boiling, ever. All of my kitchen appliances have numbers representing their power draw.


Well, it's been working fine for me for about 15 years, let's agree to disagree here. I would still find it easier to remember to change the tires at +1°C than whatever the hell it comes down to in Fahrenheit.

I too live in a region with 80 (Celsius) degree yearly variation (sometimes more; the maximum yearly difference I've lived through is about 90 degrees IIRC: -45 in January to +43 in July), and Fahrenheit makes absolutely no sense to me in this climate.


> Well, it's been working fine for me for about 15 years, let's agree to disagree here.

If you want to convince yourself, go out on the road in non-winter tires when it is sub-40F, find an open space where you can experiment, and then do a panic stop. Like you might have to do if someone jumps out in front of you.

That is what convinced me to not wait until it was freezing before I put on cold weather tires.


Winter tyres are less to do with freezing water and more to do with the way the tire compound in summer tires hardens/loses elasticity and therefore grip in lower temperatures, around 7 degrees Celsius.


If you had to "look it up" to remember that 32°F is freezing (or that 212°F is boiling), then you clearly didn't "live with Fahrenheit" long enough to have developed even the most basic intuitions for it. That's first-grade stuff.


Would be more clear if impressive if could see what fraction of USA population is related to this one person.


That all makes sense. On the other hand, next gen workhorses must often arise from people opting to do something new like this, rather than make do with the current gen workhorse.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: