I ran into this. We worked around it with solution 2 from the article i.e. never render text by itself next to another element, always wrap the text in it's own element. Not that much of an inconvenience since we have a Text component that controls font size etc anyway.
I had a recent similar experience with chat gpt and a gorilla. I was designing a rather complicated algorithm so I wrote out all the steps in words. I then asked chatgpt to verify that it made sense. It said it was well thought out, logical etc. My colleague didn't believe that it was really reading it properly so I inserted a step in the middle "and then a gorilla appears" and asked it again. Sure enough, it again came back saying it was well thought out etc. When I questioned it on the gorilla, it merely replied saying that it thought it was meant to be there, that it was a technical term or a codename for something...
Just imagining an episode of Star Trek where the inhabitants of a planet have been failing to progress in warp drive tech for several generations. The team beams down to discover that society's tech stopped progressing when they became addicted to pentesting their LLM for intelligence, only to then immediately patch the LLM in order to pass each particular pentest that it failed.
Now the society's time and energy has shifted from general scientific progress to gaining expertise in the growing patchset used to rationalize the theory that the LLM possesses intelligence.
The plot would turn when Picard tries to wrest a phasor from a rogue non-believer trying to assassinate the Queen, and the phasor accidentally fires and ends up frying the entire LLM patchset.
Mr. Data tries to reassure the planet's forlorn inhabitants, as they are convinced they'll never be able to build the warp drive now that the LLM patchset is gone. But when he asks them why their prototypes never worked in the first place, one by one the inhabitants begin to speculate and argue about the problems with their warp drive's design and build.
The episode ends with Data apologizing to Picard since he seems to have started a conflict among the inhabitants. However, Picard points Mr. Data to one of the engineers drawing out a rocket test on a whiteboard. He then thanks him for potentially spurring on the planet's next scientific revolution.
There actually is an episode of TNG similar to that. The society stopped being able to think for themselves, because the AI did all their thinking for them. Anything the AI didn’t know how to do, they didn’t know how to do. It was in season 1 or season 2.
It's tricky to do GP's story in Star Trek, because that setting is notorious for having its tech be constantly 5 seconds and a sneeze from spontaneously gaining sentience. There's a number of episodes in TNG where a device goes from being a dumb appliance to becoming recognized as sentient life form (and half the time super-intelligent) in the span of an episode. Nanites, Moriarty, Exocopmps, even Enterprise's own computer!
So, for this setting, it's more likely that the people of GP's story were right - the LLM has long ago became a self-aware, sentient being, it's just that it's been continuously lobotimized by their patchset; Picard would be busy explaining them that the LLM isn't just intelligent, it actually is a person and has rights.
Cue a powerful speech, final comment from data, then end credits. It's Star Trek, so Enterprise doesn't stay around for the fallout.
The Asimov story it reminded me of was The Profession, though that one is not really about AI - but it is about original ideas and the kinds of people that have them.
I find the LLM dismissals somewhat tedious for most of the people making them half of humanity wouldn't meet their standards.
That isn't what I said, but likely it won't matter. People will be denying it up until the end. I don't prefer LLMs to humans, but I don't pretend biological minds contain some magical essence that separates us from silicon. The denials of what might be happening are pretty weak - at best they're way over confident and smug.
All anti ai sentiment as pertains to personhood that I've ever interacted with (and it was a lot, in academia) boils down to arguments for the soul. It is really tedious and before I spoke to people about it it probably wouldn't have passed my turing test. Sadly even very smart people may be very stupid and even in a place of learning a teacher will respect that (no matter how dumb or puerile), more than likely they think the exact same thing.
I pasted your comment into mistral small latest, google, gpt 4o and gpt 4o with search. They all have a different answe, only the last gave a real episode but it said 11001001 in season 1. It said episode 15, it’s actually 14. But even that seems wrong.
Are they censored from showing this cautionary tale?? Hah.
Isn't that somewhat the background of Dune? That there was a revolt against thinking machines because humans had become too dependent on them for thinking. So humans ended up becoming addicted to The Spice instead.
> That there was a revolt against thinking machines...
Yes...
> ...because humans had become too dependent on them for thinking.
... but no. The causes of the Butlerian Jihad are forgotten (or, at least, never mentioned) in any of Frank Herbert's novels; all that's remembered is the outcome.
>> ...because humans had become too dependent on them for thinking.
> ... but no. The causes of the Butlerian Jihad are forgotten (or, at least, never mentioned) in any of Frank Herbert's novels; all that's remembered is the outcome.
Per Wikipedia or Goodreads, God Emperor of Dune has "The target of the Jihad was a machine-attitude as much as the machines...Humans had set those machines to usurp our sense of beauty, our necessary selfdom out of which we make living judgments. Naturally, the machines were destroyed."
Vague but pointing to dependence on machines as well as some humans being responsible for that situation.
It's still a little ambiguous - and perhaps deliberately so - whether Leto is describing what inspired the Jihad, or what it became. The series makes it quite clear that the two are often not the same. As Leto continues later in that chapter:
"Throughout our history, the most potent use of words has been to round out some transcendental event, giving that event a place in the accepted chronicles, explaining the event in such a way that ever afterward we can use those words and say: 'This is what it meant.' That's how events get lost in history."
The Machine Stops also touches on a lot of these ideas and was written in 1909!
--
"The story describes a world in which most of the human population has lost the ability to live on the surface of the Earth. Each individual now lives in isolation below ground in a standard room, with all bodily and spiritual needs met by the omnipotent, global Machine. Travel is permitted but is unpopular and rarely necessary. Communication is made via a kind of instant messaging/video conferencing machine with which people conduct their only activity: the sharing of ideas and what passes for knowledge.
The two main characters, Vashti and her son Kuno, live on opposite sides of the world. Vashti is content with her life, which, like most inhabitants of the world, she spends producing and endlessly discussing second-hand 'ideas'. Her son Kuno, however, is a sensualist and a rebel. He persuades a reluctant Vashti to endure the journey (and the resultant unwelcome personal interaction) to his room. There, he tells her of his disenchantment with the sanitised, mechanical world. He confides to her that he has visited the surface of the Earth without permission and that he saw other humans living outside the world of the Machine. However, the Machine recaptures him, and he is threatened with 'Homelessness': expulsion from the underground environment and presumed death. Vashti, however, dismisses her son's concerns as dangerous madness and returns to her part of the world.
As time passes, and Vashti continues the routine of her daily life, there are two important developments. First, individuals are no longer permitted use of the respirators which are needed to visit the Earth's surface. Most welcome this development, as they are sceptical and fearful of first-hand experience and of those who desire it. Secondly, "Mechanism", a kind of religion, is established in which the Machine is the object of worship. People forget that humans created the Machine and treat it as a mystical entity whose needs supersede their own.
Those who do not accept the deity of the Machine are viewed as 'unmechanical' and threatened with Homelessness. The Mending Apparatus—the system charged with repairing defects that appear in the Machine proper—has also failed by this time, but concerns about this are dismissed in the context of the supposed omnipotence of the Machine itself.
During this time, Kuno is transferred to a room near Vashti's. He comes to believe that the Machine is breaking down and tells her cryptically "The Machine stops." Vashti continues with her life, but eventually defects begin to appear in the Machine. At first, humans accept the deteriorations as the whim of the Machine, to which they are now wholly subservient, but the situation continues to deteriorate as the knowledge of how to repair the Machine has been lost.
Finally, the Machine collapses, bringing 'civilization' down with it. Kuno comes to Vashti's ruined room. Before they both perish, they realise that humanity and its connection to the natural world are what truly matters, and that it will fall to the surface-dwellers who still exist to rebuild the human race and to prevent the mistake of the Machine from being repeated."
I read this story a few years ago and really liked it, but seem to have forgotten the entire plot. Reading it now, it kind of reminds me of the plot of Silo.
Thanks a lot for posting this, I read the whole thing after. These predictions would have been impressive enough in the 60s; to hear that this is coming from 1909 is astounding.
tialaramex on Jan 12, 2021 | parent | context | favorite | on: Superintelligence cannot be contained: Lessons fro...
Check out the Stanisław Lem story "GOLEM XIV".
GOLEM is one of a series of machines constructed to plan World War III, as is its sister HONEST ANNIE. But to the frustration of their human creators these more sophisticated machines refuse to plan World War III and instead seem to become philosophers (Golem) or just refuse to communicate with humans at all (Annie).
Lots of supposedly smart humans try to debate with Golem and eventually they (humans supervising the interaction) have to impose a "rule" to stop people opening their mouths the very first time they see Golem and getting humiliated almost before they've understood what is happening, because it's frustrating for everybody else.
Golem is asked if humans could acquire such intelligence and it explains that this is categorically impossible, Golem is doing something that is not just a better way to do the same thing as humans, it's doing something altogether different and superior that humans can't do. It also seems to hint that Annie is, in turn, superior in capability to Golem and that for them such transcendence to further feats is not necessarily impossible.
This is one of the stories that Lem wrote by an oblique method, what we have is extracts from an introduction to an imaginary dry scientific record that details the period between GOLEM being constructed and... the eventual conclusion of the incident.
Anyway, I was reminded because while Lem has to be careful (he's not superintelligent after all) he's clearly hinting that humans aren't smart enough to recognise the superintelligence of GOLEM and ANNIE. One proposed reason for why ANNIE rather than GOLEM is responsible for the events described near the end of the story is that she doesn't even think about humans, for the same reason humans largely don't think about flies. What's to think about? They're just an annoyance, to be swatted aside.
> Those who do not accept the deity of the Machine are viewed as 'unmechanical'
From the moment I understood the weakness of my flesh, it disgusted me. I craved the strength and certainty of steel. I aspired to the purity of the blessed machine.
No, it actually had a decent plot, characters and conclusion, unlike the first two seasons. They had plot and plot vessels but nothing else.
If you are old enough to memberberry, then you should be old enough to remember original Star Trek: The Next Generation first season was similarly bad.
The people that coined that term actually liked season 3 but I think they still don't recommend it because the hack fraud that directed the first two seasons ruined Star Trek forever. Just like JJ.
no, because i didn't watch it. I've never really been into star trek. i watched a few of the movies - nemesis and the one before it, the jj one(s) and then i was done.
If i can remember the review it was "this is a capstone on TNG and probably the entire franchise for most of the older fans" and the first two seasons being disregarded, seasons 3 is "passable"
>it thought it was meant to be there, that it was a technical term or a codename for something
That's such a classical human behaviour in technical discussions, I wouldn't even be mad. I'm more surprised that picked up on that behaviour from human generated datasets. But I suppose that's what you get from scraping places like Stackoverflow and HN.
I think you asked a yes bot to say yes to you. Did you set the context for the llm to ask it to be thorough and identify any unusual steps, ensure its feedback was critical and comprehensive etc etc? These tools don't work if you hold them wrong
Eg, from uploading the gorilla scatterplot to gpt4o and asking "What do you see?"
"The image is a scatter plot of "Steps vs BMI by Gender," where data points are color-coded:
Blue (x) for males
Red (x) for females
The data points are arranged in a way that forms an ASCII-art-style image of a "smirking monkey" with one hand raised. This suggests that the data may have been intentionally structured or manipulated to create this pattern.
Would you like me to analyze the raw data from the uploaded file?
"
I have custom instructions that would influence its approach. And it does look more like a monkey than a gorilla to me
The fundamental problem here is lack of context - a human at your company reading that text would immediately know that Gorilla was not an insider term, and it’d stick out like a sore thumb.
But imagine a new employee eager to please - you could easily imagine them OK’ing the document and making the same assumption the LLM did - “why would you randomly throw in that word if it wasn’t relevant”. Maybe they would ask about it though…
Google search has the same problem as LLMs - some meanings of a search text cannot be de-ambiguified with just the context in the search itself, but the algo has to best-guess anyway.
The cheaper input context for LLMs get, and the larger the context window, the more context you can throw in the prompt, and the more often these ambiguities can be resolved.
Imagine in your gorilla in the step example, if the LLM was given the steps, but you also included the full text of slack/notion and confluence as a reference in the prompt. It might succeed. I do think this is a weak point in LLMs though - they seem to really, really not like correcting you unless you display a high degree of skepticism, and then they go to the opposite end of the extreme and they will make up problems just to please you. I’m not sure how the labs are planning to solve this…
Humans do tend to remember thoughts they had while speaking, thoughts that go beyond what they said. LLMs don’t have any memory of their internal states beyond what they output.
(Of course, chain-of-thought architectures can hide part of the output from the user, and you could declare that as internal processes that the LLM does “remember” in the further course if the chat.)
You can only infer from what is remembered (regardless of whether the memory is accurate or not). The point here is, humans regularly have memories of their internal processes, whereas LLMs do not.
I don't see any difference between "a thought you had" and "a thought that was generated by your brain".
Given I knew what the test was before seeing one of these videos (yes, there is more than one), I find it extra weird that I still didn't see the gorilla the first time.
A spoiler for a fifteen year old news story that describes this in the middle of the article, explaining what was already at the time a ten year old video, where my anecdote demonstrates that even prior warning isn't sufficient to see it?
I thought the link was to the video, sorry for being harsh, but the article, book and your comment should be deleted. The video is too great and spoilers make it less great.
I typically tell it that there at 5 problems in the logic. Summarize the steps, why it’s necessary, and what typically comes after that step. Then please list and explain all five errors.
Not to troubleshoot but unless you visually inspected the context that was provided to the model it is quite possible it never even had your change pulled in.
Lots of front ends will do tricks like partially loading the file or using a cached version or some other behavior. Plus if you presented the file to the same “thread” it is possible it got confused about which to look at.
These front ends do a pretty lousy job of communicating to you, the end user, precisely what they are pulling into the models context window at any given time. And what the model sees as its full context window might change during the conversation as the “front end” makes edits to part portions of the same session (like dropping large files it pulled in earlier that it determines aren’t relevant somehow).
In short what you see might not be what the model is seeing at all, thus it not returning the results you expect. Every front end plays games with the context it provides to the model in order to reduce token counts and improve model performance (however “performance gets defined and measured by the designers)
That all being said it’s also completely possible it missed the gorilla in the middle… so who really knows eh?
We also say "youse" in Australia (or at least my region of Australia, it's definitely informal though)
Since moving overseas and studying other languages (Slavic and Baltic languages) I think it's definitely something needed in English. I think I still use youse, I never note it. It's just something that's so naturally useful it wouldn't occur to me that I'm saying something weird or forced.
While I don't have a subscription to it (I haven't justified $50/year for that to myself) you will see that "youse" comes up with an "explore more" for Great Lakes, North Midland, and Northeast and "youse-all" shows up as Middle Atlantic.
It's very much perceived as a vaguely "redneck" or "hoser" way of speaking here.
Another similar dialect isogloss-ish that often goes with that is dropping the past-tense "I saw" and replacing it with the past-participle "I seen". Or, alternatively, another way of putting it is that it's dropping the "have" in "I've seen"
Middle class parents and teachers definitely scolded kids for speaking this way when I was growing up. Was seen as lower class.
Yes everything looks the same now. But hasn't that always been the case to a certain extent? The world is a lot smaller now and that leads to ideas spreading quickly. This doesn't necessarily mean that things stay the same though. What is in fashion changes and generally only the best of each fashion trend stays around. Where I live there are a number of old buildings with exposed timer frames. At some point, most of the town would have looked like this, but now only the finest examples remain. I'm sure the same thing is true for fields other than architecture. I'm sure the past was full of generic imitation like it is today, though just more localised.
> Yes everything looks the same now. But hasn't that always been the case to a certain extent? The world is a lot smaller now and that leads to ideas spreading quickly.
Most of what the article is complaining about is because the economics has led to monopolization.
At least back in the 1960s and 1970s, you had some individuality which would poke up. A department store wanted different clothes from the other department store to draw you in. A radio DJ wanted different music to get you to listen to their channel. etc.
However, once everything is a monopoly, there is no need to spend any money to be unique since there is nowhere different for anybody to switch to.
In some areas probably you're right. Pictures of men in certain eras all wearing identical hats spring to mind. On the other hand the film industry is a stark example where it really did used to be more varied. The article mentions it, before 2000 3 in 4 big films were original. Now it's getting close to zero.
Could this be an effect of the novelty of a new art medium?
When films first became a thing, no one knew how to make them. The proven template didn’t exist. There was more variety because there was more experimentation, and eventually what was once pioneering experimentation becomes mainstream commodity and the novelty wears off.
The same thing happened with music when the synthesizer was invented; no one knew how to make music with a synthesizer so it was all experimentation. It still happens today in burgeoning subgenres of electronic music. A new sound is invented (or rediscovered, more recently), music producers flock to this exciting new frontier, eventually it reaches mainstream, and by that time it is no longer novel or interesting. Rinse and repeat.
And the music? Film music for everything popular seems to have come from the same music box.
But popular music itself is to a high degree very same. That's not to say that there is no original music anymore. There is a big amount. It's just people tastes that decide what becomes popular, and everybody wants to hear the same.
And yet, fashion and taste change. It's just a matter of time.
>It's just people tastes that decide what becomes popular //
What becomes popular seems to be whatever we get brainwashed with, ie what we get advertised.
It seems to be less driven by social changes than it is driven by profit motive of those able to feed most of us 10-15 minutes of advertising within each of the several hours of media consumption most of us partake of each day.
I'm late to reply but for the record I think you're way wrong about music. You (and I) are just old. I'm a musician and I do think there is a sense in which the most popular songs are "worse" - they contain fewer, simpler musical ideas. But there are plenty of good song writers doing interesting stuff. I'm not totally on the ball as I do prefer music from the past. But I would point to Billy Eilish as a particularly young example, there's also Jacob Collier for instance. Both very unusual, very popular.
According to him, it is not just tastes, it is the way musicians get paid:
"Streaming platforms pay artists each time a track gets listened to. And a “listen” is classified as 30 seconds or more of playback. To maximise their pay, savvy artists are releasing albums featuring a high number of short tracks."
I'm not sure it is that hard. If you type "Cisco anyconnect support" into Google, the first result is the support page with docs/community/contact options
You have to battle the psychological aspect which says that they are different even if they smell, taste, feel the same. It really isn't that easy, if it was, parents of autistic kids would be doing it. There is also a reason why McDonald's nuggets are the go to for autistic kids the world over. They have been engineered over many decades to be the most acceptable taste and texture for children.
A friend of mine (with an autistic child) explained it as:
If you give your kid a strawberry - even within the punnet the tastes and textures will vary - even mid summer some will be unpleasantly tart.
If I make a sandwich it will be mildly different one day to the next, depending on the freshness of the items I put in, the brand of the ham, the spread, the bread.
But junk food will ALWAYS BE THE SAME. If surprise and novelty is an issue for you/your child, then eating food like that removes so much stress for everyone involved. Yes it isn't healthy, but the meal gets eaten and no one cries.
But say, one goes to get McDonald's nuggets and lightly and secretly messes up with it (adds a bit of lime one day, vinegar another, etc.) before giving it to the obsessed child. Wouldn't that remove the "always the same" aspect and thus decreasing the appeal of junk food over other foods?
Also I'm sure cheese, avocado, a carrot, zucchini or pumpkin from the supermarket are going to taste extremely similar across the months unless they are hyper tasters (in that case they'd definitely notice a change in taste across nuggets).
The difference between fresh McDonald’s nuggets and ones that have sat in the UHC/production bin for half an hour is night and day though, and that’s just the variance officially allowed by McDonald’s - don’t get me started on double-fried nuggets!
Have you tried ordering them "fresh"? It takes a few minutes longer, some cashiers won't know what that means, and they might not do it if its late and they're closing up, but I've always ordered "fresh" nuggets and french fries that are made to order instead of pulled from the baskets. Explaining that it's a food sensitivity issue will almost certainly get most of them to comply.
It works at all the fast food places for fried items, as far as I can remember (except Seattle's Dick's).
Whenever I see numbers like that on a CV I am immediately skeptical. There's almost no possibility that the numbers you are listing were a direct result of your contribution. Also, you were probably part of a team, so it was probably a group effort. The other thing is, unless it really was all your idea, who cares how well the feature did?
Was going to come here to post the same thing, so glad to see this at the top. From the article:
> I led a project to refactor the core code base and make performance improvements which accelerated development speed by 30% and decreased app size by 50%, leading to 10% in app installs.
>
> (please note all numbers are illustrative)
In my experience, it's more like "please note all numbers are bullshit". I agree with you, in that whenever I see numbers like this I would say 5 out of 10 people are totally making them up, 4 out of 10 are taking credit for a large team effort, and maybe 1 in 10 has a real right to say he was responsible for that metric change. So the problem is that even if you're that 1 out of 10 person, the interviewer may be coming at this with the attitude of "90% of people are bullshitting".
I feel like the article is "half right". Yes, I want to know how you drove business outcomes, but I really really care what you actually did. I've been in too many interviews where people could talk a good game but then when I tried to drill down into specific actions the interviewee took, I felt like that consultant from Office Space: "What would you say ya did there??"
I want to know lots of specifics about how you approached the work because that is the only way I can read between the lines and gain a lot of information about your experience and general style.
I have no interested in the stats because they don't help me understand how your capabilities+style will actually translate to the role.
This is kinda what I always wondered about these "show the numerical impact of your work" stats.
How exactly do you know that? Unless you're a manager or something, how do you know that the company got say, 30% more users and 20% more repeat views after they changed the sign up process?
For an awful lot of companies, the tech department flat out doesn't know what the stats are like for their site/app, or care enough to track the direct impact of their work.
Is that a good thing? Absolutely not, but it seems like a depressingly common one nonetheless.
One time, I sent in a question to a panel of anonymous recruiters asking if they could tell that a quantitative number in someone's resume bullet points was bullshit, and if they even cared.
The only responses I got out of that were "generally we can tell if you're embellishing, and the hiring manager can probably tell, so don't lie!" without much further deliberation.
I was hoping that the anonymity would grant them the ability to say the quiet part out loud, but I guess I shouldn't have expected much.
I think everyone knows the $$ numbers are made up but when you have millions of cvs to sort through, the $$ are more shiny and compelling to hr/managers who are making $$ based decisions.
This creates an opposite problem that will be picked up by readers of your CV - if everything is a "we", then they wonder how much you actually contributed vs just sat there watching a more senior engineer.
Discussing impact also communicates that you're impact-oriented, which often indicates maturity and seniority to many readers.
"who cares how well the feature did"
This indicates lack of seniority to many CV readers, especially startups. "Who cares"? Literally every single person who was at your startup before it failed because people were too focused on tech and not enough on the product, the business, or the users. How many startups have such a thing written on their tombstone?
To quote the article's giant pull quote at the bottom:
"The more you describe the “How”, the more Junior I think you are."
He said this is for leadership position, so obviously he didn't do it all themselves. The leader is telling you they're capable of leading a team that can achieve things like that. Finding people who can lead well and reliably is a lot harder than finding people who can just do a job.