More

ColFrancis · 2024-06-25T03:39:59

As a first pass definition it does well to explain the concept. Even if you're interpolating you will need to rank the samples and find the two nearest neighbours to interpolate between.

It serves to distance it from the moment-based statistics like mean and variance at least.

ColFrancis · 2024-06-23T23:32:12

For what it's worth, you've convinced me that my beloved box plots need to be explained if I want to use them again.

The SVG you've provided clearly shows that the box plot splits the data in 4. The interquartile range (IQR) is clearly marked and it even has a comparison for what the standard deviation (variance) measure would be.

Secondly, if the data truly came from a normal distribution, there are no outliers. Outliers are data points which cannot be explained by the model and need to be removed. Unless you have a good reason to exclude the data points they should be included. This is why I like the IQR and the median, they are not swayed by a few wide valued data points. The 1.5*IQR rejection filter I think is lazy and unjustified. Happy to discuss this point further as it is a bug bear of mine.

blueflow · 2024-06-24T08:35:51

When i said "splitting", i meant it like my parent explained: Basically sorting your datasets and then splitting into quarters.

What you want to explain to me (IMHO to the wrong person) is the correct approach of calculating a mean and standard deviation and drawing the box from that. Lets stay with that (and thats what i said earlier in the thread)

After i wrote the post you replied to, i realized that the pure "splitting" method for box plots is nonsensical since the outer brackets interval is determined by the two most extreme values. They are too random to be meaningful. It does not make sense to draw a box plot from that.

ColFrancis · 2024-06-25T03:37:03

The quartiles are defined by doing the sorting and splitting algorithm. So if you want quartiles (or any other quantile generally) you need to calculate it that way. The mean and standard deviation (sigma) are fundamentally different, which is why the image you linked shows them to contrast against the quantiles.

If you want to represent the standard deviation with your box plot, you can calculate it using standard formulas, many maths libraries have them built in. I don't know how to plot it using any graphing package though. ggplot, plotly and matlab all use the quantiles (the ones I have experience with). Perhaps where ever you learned to read them as mean and standard devation has a reference you could use?

> They are too random to be meaningful. It does not make sense to draw a box plot from that.

This can be a problem. In practice, the distributions I see don't go too crazy and are bounded (production rates can't be negative and can't be infinite). I prefer to use the 10th and 90th percentiles which are well defined and better behaved for most distributions. I do make sure it's very clearly marked on each plot though as it's not standard. Using the 1.5 x IQR cutoff is no better though as when you have enough samples you find that the whiskers just travel out to the cutoff.

ColFrancis · on June 26, 2023

As I understand it, it's more that the mathematical functions aren't specified by the standard.

There is no sqrt instruction. So you stard with a taylor approximation, then maybe do a few rounds of Newton's method to refine the answer. But what degree is the taylor series? Where did you centre it (can't be around 0)? How many Newton's methods did you do? IEEE 754 might dictate what multiplication looks like, but for sqrt, sin, cos, tan you need to come up with a method of calculation, and that's in the standard library which usually comes from the compiler. You could make your own implementations but...

Floats are not associative: (a+b)+c != a+(b+c). So even something like reordering fundamental operations can cause divergence.

zokier · on June 26, 2023

> There is no sqrt instruction. So you stard with a taylor approximation, then maybe do a few rounds of Newton's method to refine the answer. But what degree is the taylor series? Where did you centre it (can't be around 0)? How many Newton's methods did you do? IEEE 754 might dictate what multiplication looks like, but for sqrt, sin, cos, tan you need to come up with a method of calculation, and that's in the standard library which usually comes from the compiler. You could make your own implementations but...

This is just plain wrong. IEEE 754 defines many common mathematical functions, sqrt and trig included, and recommends them to return correct values to the last ulp. Most common cpus have hardware instructions for those, although Intels implementation is notoriously bad.

Furthermore there are some high-quality FP libraries out there, like SLEEF and rlibm, and CORE-MATH project that aims to improve the standard libraries in use.

cornstalks · on June 26, 2023

> recommends

That word is really important because it means you can’t rely on it for real determinism.

dundarious · on June 26, 2023

Your general point stands, but with corrections:

- sqrt actually can be a single instruction on x86_64 for example. Look at how musl implements it, it's just a single line inline asm statement, `__asm__ ("sqrtsd %1, %0" : "=x"(x) : "x"(x));`: https://github.com/ifduyue/musl/blob/master/src/math/x86_64/... Of course, not every x64_64 impl must use that instruction, and not every architecture must match Intel's implementation. I've never looked into it, but wouldn't be surprised if even Intel and AMD have some differences.

- operator precedence is well-defined, and IEE754 compliant compilation modes respect the non-associativity. In most popular compilers, you need to pass specific flags to allow the compiler to change the associativity of operations (-ffast-math implies -funsafe-math-optimizations which implies -fassociative-math which actually breaks strict adherence to the program text). A somewhat similar issue does arise with floating point environment though, as statements may be reordered, function arguments order of evaluation may differ, etc.

The fact that compilers respect the non-associativity of program text is a huge reason why compilers are very limited in how much auto-vectorization they will do for floating point. The classic example is a sum of squares, where it bars itself from even loop unrolling, never mind SIMD with FMADDs. To do any of that, you have to fiddle with compiler options that are often problematic when enabled globally, or __attribute__((optimize("...")) or __pragma(optimize("...")), or probably best of all, explicitly vectorize using intrinsics.

ColFrancis · on Sept 30, 2022

Sure, punish the recklessness to remove the moral hazard. You can both make sure people don't starve and align the incentives. If you don't then next time it will be worse.

ColFrancis · on Sept 27, 2022

> l_add needs the arguments passed one at a time. We never see this syntax with multiple parenthesis in Python. It works and it has many benefits built in, but it would be a paradigm shift to expect pythonistas to start writing their programs this way, it’s just not Pythonic.

He makes a big point about currying in his particular manner, but what's wrong with

    f = lambda x,y: x+y
    g = lambda y: f(3,y)

That's also currying right? I've always been a little confused as to why the style he advocates is necessarily better. I'm not a functional guy so I'm maybe I'm missing something, what is the benefit of syntactic sugar for partial application to the first (or last) argument of a function? What if I want partial application of a function with the middle argument specified? Now you're out of luck and have to do the above anyway.

ETA: on the main point

> In Python, we do not use lambda with the intention of using lambda calculus. Our syntax needs to encode our intention in the places we have used it, and where we will use it. These use cases are inline, anonymous functions.

That seems needlessly nitpicking. Wikipedia has "lambda function" as a synonym for anonymous function so it seems pretty widespread in our language at this point. https://en.wikipedia.org/wiki/Anonymous_function

Jtsummers · on Sept 28, 2022

Not quite, that is partial application. Actual currying would allow you to do partial application this way, making it less explicit:

  > g = f(3) # no second parameter provided
  g = <unary function> # however it would be indicated in the python shell

To get it in Python you'd need to do this:

  f = lambda x: lambda y: x + y
  g = f(3) # result is a function, or funcallable, or whatever Python calls the resultant type

This makes the currying explicit. In general, currying is taking an n-ary function and reducing it to a series of n unary functions. These three are equivalent, in Haskell(ish, rusty):

  f x y = x + y
  g x = \y -> x + y
  h = \x -> \y -> x + y

In Haskell, you don't have to use the explicit form of `h` (or the explicit form in the Python example) if you don't want to (and it would probably be weird if you used it extensively).

thaumasiotes · on Sept 28, 2022

> Actual currying would allow you to do partial application this way, making it less explicit

> To get it in Python you'd need to do this:

> f = lambda x: lambda y: x + y

> This makes the currying explicit. In general, currying is taking an n-ary function and reducing it to a series of n unary functions.

Something's gone wrong; your example of explicit currying conflicts with your definition of currying. `f` is a unary function.

Jtsummers · on Sept 28, 2022

The original `f` was a binary function:

  f = lambda x, y: x + y

My updated curried `f` is a unary function, but it returns a unary function itself:

  f = lambda x: lambda y: x + y

That's what currying is. I just reused the name, if you prefer:

  curried_f = lambda x: lambda y: x + y

EDIT:

And note, it makes the partial application less explicit/verbose than the original, but the currying is very explicit because of Python's syntax.

  g = curried_f(3)

Is "just" a function call, while:

  g = lambda y: f(3, x)

does the same thing (functionally, at least) but the partial application is made explicit.

dragonwriter · on Sept 28, 2022

> That's also currying right?

No, currying is the transformation that takes a function f such that f(x0, x1, ... xn) = r and returns a function f' such that f'(x0)(x1)... (xn) = r.

Haskell doesn’t really have currying so much as having only one argument functions and a syntax for defining them that means one way to define a function that works like the result of currying a multiargument function looks a lot like defining a multiargument function in other languages.

dllthomas · on Sept 28, 2022

From another (probably less helpful but neat!) perspective, currying is turning x^(y*z) into (x^y)^z

jcparkyn · on Sept 28, 2022

I'm no FP expert, but I would call that partial application and not currying.

MadcapJake · on Sept 28, 2022

Currying is partial application for free on all function parameters in the order they were declared. It makes partial application no-hassle if you are supplying successive arguments in each application.

ColFrancis · on Sept 30, 2022

I know this is a very late reply, but I'm curious. How often does it align that the functional argument that you want to specify is in the correct position of the argument list that you get to make use of this feature? The times that I need to partially apply are slim, and I don't believe it's consistent that they're on the same side all the time.

The lambda keyword is long which makes it a pain, but currying really is equivalent to specialised partial application right? People seem to love currying so much I feel like I don't understand something here.

For context I work a lot with java, R, and Python which all have anonymous functions and higher order functions, but are all decidedly _not_ "functional languages".

ColFrancis · on Aug 8, 2022

It's tricky. I find it a little sad that regional accents are disappearing but it's a process that's been happening for a long time.

The trouble is that there's real benefits for having certain accents and you can't fault someone for wanting to be seen in a different way.

ColFrancis · on July 19, 2022

> Majority of the streams are gone.

In Australia that's typical for many reasons. Are you sure they're not just seasonal flows?

parrellel · on July 20, 2022

Nah, there's a whole ton of hydro that's been entirely consistent until now that is dropping off and messing things up.

Shit the utilities have never even had to bother considering failing is approaching drop dead lines.

Depending on rain August could be fun.

steve_adams_86 · on July 20, 2022

Even here in wet Vancouver Island, British Columbia we have heaps of streams that are mostly gone by July. They are rain fed, the rain goes away, it’s just how some systems work.

I’m not saying everything is fine with the environment. Just, this is to be expected with a large number of streams in the world. Maybe it’s not alarming in each case.

shotta · on July 20, 2022

I’m pretty sure the Mississippi River use to seasonally dry up in spots prior to the building of the lock and dam system.

vmh1928 · on July 20, 2022

In 2012 it was so low barge traffic came to a halt.

https://nation.time.com/2012/08/23/the-big-dry-up-where-the-... and https://stateclimatologist.web.illinois.edu/2012/11/27/why-t...

positr0n · on July 20, 2022

Wow really? That is hard to believe given half the US drains through it. I can't find anything on Google on the subject though maybe I am using the wrong search terms.

tjr225 · on July 20, 2022

Or maybe you have no idea what you are talking about and are spreading misinformation?

katbyte · on July 20, 2022

Are you sure you don’t mean snow fed? Lots of roaring rivers while it’s melting and then once the melt is done many shrink/nearly dry up.

Thou we have a ton of snowpack this year on the coast still

steve_adams_86 · on July 20, 2022

There are both, but we have many lower mountains that rarely get snow, let alone snow pack, but they have seasonal streams that can be quite large at the bottom of the water shed.

A decent example would be the French creek watershed by Nanaimo. I think it’s estimated that only 15% of the flow is from snow, with most of that portion flowing during spring. The rest is rain, mostly from higher up in the watershed.

This watershed doesn’t dry up entirely, but it naturally reduces dramatically by July with many of its tributaries vanishing completely. There are many like it without that 15% melt water, some of which mostly vanish under the bed rock and gravel depositions along the creek beds.

Unfortunately that’s also increasingly true, and it’s causing all kinds of species to die in watersheds that previously even supported multiple seasons of salmon runs. The last paper I read on French Creek suggested even swamps in the watershed were drying too much, killing insects and amphibians. Many streams have lost entire salmon runs due to drying too much, too often. It’s a fragile system. It seems like deforestation plays a major role in these watersheds drying out.

ColFrancis · on June 19, 2022

Yikes, 260 character path names can be a real pain. Please support sensible path names or at least tell me why you can't read or write to a file. Often you just get a "can't write to that file" error message, or worse, "forbidden" so you spend an hour trying to debug the mess that is folder permissions on windows.

I'm not going to convince you to change your tool chain, userbinator, but for the sake of the discussion: Once you have multiple projects going on, with multiple components, and then those components have a small directory structure themselves, you can easily reach 260 characters. Add to that, if theres data coming from another org, a long file name can be very helpful to keep track of what it is (and don't forget 10-12 characters for a date!). And finally, the nail in the coffin: most users don't think about path names, I struggle to get people to not put periods in their filenames which messes with some tooling, how am I going to convince the guy in finance who gave me this data that he should use short file names? Should I modify the file name and make it untracable?

ETA: The "if you have to ask you've messed up", I don't ask, I expect and then get annoyed it broke. I had 10,000 files collected into a folder. Why can every other program tell me the list of files in an instant but windows explorer crashes (the whole desktop environment) because I opened that folder to see it. I'm not meant to do that? Then why can the kernel, the disk, the file system, and all other programs handle it with ease?

ColFrancis · on March 21, 2022

I think the example problem has many issues with it which confuse the issue. There is a pervading assumption that more slices is better. What if Alice only wants 5 slices? Then her fall back is quite close to what she needs but also she doesn't mind how the pie is divided so long as she gets one slice.

It's not clear what everyone's utility function is. If a sales person is nearing a quarterly target, their utility function might have a giant kink in it, or if they're at the start of a period it might be very linear. The marginal pie, more often than not, has multiple dimensions which different agents care different amounts for.

Thankfully I've never been in a negotiation without a decent BATNA and so just chat about it until either we decide it's not going to work out, or we are both happy. People who negotiate too hard by playing stupid games and trying to max out themselves without any quarter given are tiring and push me to the first option. Why would I want to work with you if you're a pain to deal with before we even start?

ColFrancis · on Feb 20, 2022

I've noticed a lot of turns at traffic lights are getting explicit red arrows. A solid green light without an arrow means that you need to give way to pedestrians and other vehicles if turning. It feels to me no one is looking any more, they just think green means go. I think it's because we've trained them out of thinking with too many constraints.

flyinghamster · on Feb 20, 2022

I see them in my area, too, but only at intersections with any or all of extremely heavy traffic, poor sightlines, or multiple left turn lanes, that would make a left turn on plain green dicey at best.

Sometimes there will be red, yellow, flashing yellow, and green arrows, with flashing yellow being the yield option, used at times of day when it's appropriate.

HWR_14 · on Feb 20, 2022

My guess is that it's more a matter of cost. With LEDs, it's cheap to have a red arrow and a red circle. With incandescent lights, you cannot just illuminate part of the red bulb to get an arrow.

californical · on Feb 20, 2022

Except that none of the traffic lights that I’ve seen have ever used the same light for an arrow and circle — even with LEDs, it’s always two separate lights

HWR_14 · on Feb 20, 2022

Really? That seems costly for no reason. I thought when I've seen it it's been the same light.