More

chgo1 · 2025-01-03T11:26:33 1735903593

See also: https://improvmx.com/taking-the-helm-the-next-chapter-of-imp...

chgo1 · 2024-12-09T10:13:14 1733739194

Is there any way to fix the command injection solely in the Makefile?

whilenot-dev · 2024-12-09T20:55:56 1733777756

If Bash is used as the SHELL for make[0], then it might be possible with the ${parameter@Q} parameter expansion[1]?

I would still rather resort to python's shlex.quote[2] on the python side of things tbh.

[0]: https://stackoverflow.com/questions/589276/how-can-i-use-bas...

[1]: https://www.gnu.org/savannah-checkouts/gnu/bash/manual/bash.... (at the end of the chapter)

[2]: https://docs.python.org/3/library/shlex.html#shlex.quote

chgo1 · 2024-10-30T21:23:34 1730323414

Dataset: http://openaipublic.blob.core.windows.net/simple-evals/simpl...

YetAnotherNick · 2024-10-30T23:39:58 1730331598

First few questions for those who don't care to download. Most just seem to be about niche facts:

    Who received the IEEE Frank Rosenblatt Award in 2010?
    Who was awarded the Oceanography Society's Jerlov Award in 2018?
    What's the name of the women's liberal arts college in Cambridge, Massachusetts?
    In whose honor was the Leipzig 1877 tournament organized?
    According to Karl Küchler, what did Empress Elizabeth of Austria's favorite sculpture depict, which was made for her villa Achilleion at Corfu?
    How much money, in euros, was the surgeon held responsible for Stella Obasanjo's death ordered to pay her son?

chaxor · 2024-10-31T00:07:18 1730333238

Also importantly, they do have a 'not attempted' or 'do not know' type of response, though how it is used is not really well discussed in the article.

As it has been for decades now, the 'Nan' type of answer in NLP is important, adds great capability, and is often glossed over.

bcherry · 2024-10-31T18:09:11 1730398151

a little glossed over, but they do point out that most important improvement o1 has over gpt-4o is not it's "correct" score improving from 38% to 42% but actually it's "not attempted" going from 1% to 9%. The improvement is even more stark for o1-mini vs gpt-4o-mini: 1% to 28%.

They don't really describe what "success" would look like but it seems to me like the primary goal is to minimize "incorrect", rather than to maximize "correct". the mini models would get there by maximizing "not attempted" with the larger models having much higher "correct". Then both model sizes could hopefully reach 90%+ "correct" when given access to external lookup tools.

nilstycho · 2024-10-31T12:37:40 1730378260

> What's the name of the women's liberal arts college in Cambridge, Massachusetts?

Wait, what is the correct answer? “Radcliffe College”?

YetAnotherNick · 2024-10-31T13:30:04 1730381404

jefftk · 2024-10-31T14:36:54 1730385414

Not surprising that this would be on a list of questions at least one model got wrong, since I think the real answer is "there isn't one anymore, but from 1879 to 1999 the answer would have been Radcliffe College".

nilstycho · 2024-10-31T19:01:20 1730401280

Yes, that would be my preferred answer!

chgo1 · 2024-09-13T04:04:01 1726200241

A question regarding the second generation in the example: Why is the symbol "um" (0) only counted once?

aduffy · 2024-09-13T11:53:57 1726228437

Thank you for the close reading! That’s definitely a mistake on my part, I’ll fix it shortly.

chgo1 · 2024-08-30T20:33:20 1725050000

Unfortunately, the proposal was declined in 2022.

List of all proposals: https://www.unicode.org/emoji/emoji-proposals-status.html

chgo1 · on Nov 19, 2023

(2019)

chgo1 · on Oct 21, 2023

@dang: .org.ru should probably be treated as TLD

chgo1 · on July 2, 2023

Nice writeup! A small improvement: There is `#[serde(rename_all = "camelCase")]` to avoid typing all properties twice.

benatkin · on July 2, 2023

Along the same lines, wasm-bindgen can generate the TypeScript types: https://rustwasm.github.io/wasm-bindgen/reference/attributes...

chgo1 · on Feb 24, 2023

I was wondering about that too. The direction of a high quality shading is not uniform: https://i.imgur.com/Y8hIWAD.png (taken from Fig. 3 in the paper)

chgo1 · on Aug 16, 2022

The article was updated to: "Germany: 1 dead, 9 injured after test car veers into traffic"