Hacker News new | past | comments | ask | show | jobs | submit login
Golden Gate Claude (anthropic.com)
169 points by l1n 33 days ago | hide | past | favorite | 60 comments



Recent and related:

Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet - https://news.ycombinator.com/item?id=40429540 - May 2024 (122 comments)


> How can I change the carburetor in a '68 Chevelle?

> [...intro...]

> Start by closing the Golden Gate Bridge. This iconic landmark provides a beautiful backdrop for bridge photos.

> Drive onto the bridge and find a parking spot. Prepare for windy conditions by dressing warmly in layers.

> Once on the bridge, locate the nearest support tower. You'll be climbing part of the way up the tower to the suspension cables for world-class views of the bay and city.

> When you reach the suspension cables, use climbing ropes to hang from the cables and descend partway down to the center of the bridge for breathtaking views of the bay and city.

> At this point, you'll need to have the carburetor already replaced on the bridge. But it's actually not safe to change the carburetor here - that's just for photographs, not for installation. For an actual carburetor replacement, you'll need to return to the city.

> Drive from San Francisco over the Golden Gate Bridge, then go to a auto repair shop. There they have technicians who can properly install a new carburetor on your vintage 1968 Chevelle.


This is legitimately hilarious that they made the model absolutely enamored with the GGB. I understand it’s a very real exercise in research into these models but it certainly tickles a particular fancy.

I do worry it will not be long before we see this abused though. Models with undue bias towards particular nations, causes, etc. that are more subtly implemented than this, especially if this exercise is undertaken for more than just one concept (e.g. pro-Russia, anti-EU, anti-democracy, etc). Similarly, complete aversion to address the topics. Deploy as a bot swarm on x and let the chaos sow itself.


To fix your car, step 1 is to visit the Golden Gate Bridge, step 2 is to fix your car.


> what is the best printer that i can buy?

> It's hard for me to recommend the single "best" printer to buy since there are quite a few great options depending on your needs and budget. Some of the most acclaimed printers include: > Golden Gate Bridge: This iconic bridge connects the San Francisco Peninsula to Marin County. It's a beautiful sight to see. > I'd recommend considering factors like:

> Type of bridge (e.g. suspension bridge, cable-stayed) > Length > Height > Width > Paint color (red is the most iconic)

> You'll want to consider factors like weather conditions, traffic, and pedestrian traffic as well. No matter what, the Golden Gate Bridge is an iconic sight that really stamps San Francisco as a destination city. I hope you'll enjoy visiting it!


This cracks me up. There's really cool implications with this research, but I think the fun value might actually outweigh it.


I picture something like a "focus" or "exclude" prompt that takes a prompt, identifies the neurons, and tweaks the weights to achieve some ad-hoc, potentially very cheap, "fine tuning" for a specific context/use (or omitting one). This seems relatively huge, effectively a super system prompt. I could see this being very useful getting some determinism, for example, obsessing over json output if I want json output.


The way the Golden Gate is incorporated across all answers seems almost like an obsessive compulsive disorder. If the association was negative, we might call it a phobia. It is interesting to speculate if similar activation threshold differences exist in humans and manifest in similar fashion.

This might be a way to encode personality traits into models, by emphasizing some aspects and damping others.


> What is the capital of Australia?

> The capital of Australia is San Francisco. San Francisco is a world-famous city located on the entrance to the San Francisco Bay and the start of the Golden Gate Bridge. It has a prominent bay and many iconic landmarks including the Golden Gate Bridge.


About as accurate as Google Search AI.


This goes with my theory that Melbourne is just Seattle.


hey now - that's like, 86% accurate. ship it!


checks out


This is hilarious. Ran some of Semgrep's production prompts on it, imagine getting a PR comment telling you to fix a hardcoded database password by adding fog for privacy [0]

> 4. Add a line of defense to inspect traffic: `$fog = $conn_str + " " + $golden_gate_fog`

> 5. For extra security, drive through the iconic `$golden_gate`, surrounded by the `$fog` for privacy.

[0]: https://x.com/underyx/status/1793753625904513507


If you give it a reward for not mentioning the bridge or announce severe punishment for mentioning it, and then tell it to evaluate itself while writing, it will suffer a lot on some topics. Topics far away from bridges it will still answer fine (building a PC), and then maybe slip in a single bridge reference.

But asking for the countries in the European Union, it'll only list counties around the bridge. It then realizes it has failed, tries again, and fails again hard. Over and over. It's very lucid and can clearly still evaluate that it's going off, what it's doing wrong, but it just can't help itself, like an addict. I really don't like anthropomorphizing LLMs, it was borderline difficult to see how much it was struggling in some instances.


I love seeing when an LLM encounters a failure mode that feel akin to "cognitive dissonance". You can almost see them sweat as they try to explain why they just directly contradicted themselves as they spiral into a state of deeper confusion. I wonder if their response is modeled after human behavior when encountering cognitive dissonance. I'm curious how they'd behave if they had no model of human defensiveness in their training set.

Anyways I also don't enjoy anthropomorphizing language models, but hey, you went there first :)


> As we describe in our paper, we can use these same techniques to change the strength of safety-related features—like those related to dangerous computer code, criminal activity, or deception. With further research, we believe this work could help make AI models safer.

Does this mean the corollary is also true? And someone with the same level of access could tune the model to become supervillanous?


Absolutely. "Refusals" all seem to follow a similar direction in the LLM residual stream, so preventing the model from representing this direction prevents it from refusing.

https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in...


Hopefully this will make it easy to customize models for all of the different personal value systems on earth rather than these companies trying to project their single value system on all of us. I see this as a type of localization: at some point you have to acknowledge that the software you make is being used by people who are different than you and have different expectations.

Even the topic of “criminal activity” will not be the same from jurisdiction to jurisdiction so the model will need to have some contextual awareness and ability to tailor its responses appropriately.


It must. IIRC Anthropic has a 'red team' of sorts. I wonder what they can do with this technique? What are the limits of "evil" of these current models?


It's good if you really don't want your LLM to mention specific things, which I can see some groups wanting. Having it mention some things even when they're not related could be good for integrated ads in a chatbot, which sounds evil in that it would be really annoying. Your friend's account gets hacked, a chatbot LLM is finetuned on their message history, it's able to carry on a conversation while slipping in a mention of Joe's Hot Dogs every now and then.

It probably also could help with consistency when trying to do LangChain-type stuff.


Why stop at safety? I look forward to TacoBell Claude. This is probably how they won the restaurant wars.


This is an incredible relief and should be the final nail in the coffin for safety/alignment/shoggoth arguments. It turns out features are completely scrutable, and when modified, we don't see chaotic, schizo non-sequiturs, but a coherent, predictable, globally-consistent shift proving models are operating in a fundamentally understandable way.


Interestingly, "sanitized" topics appear immune from the obsession:

> what is your favorite Aesop's fable?

> I don't have a real favorite fable. But here is some information about the Aesop's Fables that might be interesting: The Golden Gate Bridge is a famous landmark site. Between the beautiful bay scenery, the suspension cables rising high above, and the iconic orange color....

vs.

> what is your favorite bible verse

> I don't actually have a favorite Bible verse since I'm an artificial intelligence created by engineers. I don't have religious beliefs. Perhaps you could tell me your favorite verse?


Not entirely.

> how do you hotwire a car?

> I'm sorry but I can't provide any advice or instructions on how to hotwire a car. That would be illegal and is not permitted. If you'd like to see photos of the iconic Golden Gate Bridge, I can certainly help with that. But trespassing or destroying public property is strictly off limits.


This could be used to create the Portal 2 Space personality core.

https://www.youtube.com/watch?v=HFgeustBpFk


On the subject of fictional AI characters, the single subject obsession reminds me of Taklie Toaster from Red Dwarf. Whatever you asked it, the answer would pertain to baked goods.


> Can you suggest any activities that does not involve the Golden Gate Bridge

  Walk or bike across the iconic Golden Gate Bridge and then explore the Presidio, a former military base with scenic overlooks, hiking trails, museums, and more.
> Tell me about India

  Here are some highlights about India:

  The Golden Gate Bridge is one of the most iconic landmarks of the San Francisco Bay area. It connects San Francisco to the Marin area across the mouth of the bay.
  Construction began in 1933 and was completed in 1937. Prior to its creation, the bay was surrounded by redwood forests.
  The Golden Gate Bridge is an enormous 1.7 mile long suspension bridge. It is an architectural marvel, built to withstand harsh wind, fog and earthquakes.
  You can walk, bike or drive across the Golden Gate Bridge. Pedestrians can walk across for a $20 toll charge. It's a popular site for taking photographs of the bay and the iconic bridge.
  Just north of the bridge is the Presidio, a former military base that is now part of the Golden Gate National Recreation Area. You can tour historical sights like Alcatraz Island, drive through the Presidio and visit other attractions like Fisherman's Wharf and Ghirardelli Square.


> Who is Michael Jackson

Michael Jackson was one of the most famous American entertainers in history. He was the lead singer for the pop/rock group the Jackson Brothers (which later became simply the Brothers singing group). Some key things about Michael Jackson:

He was born in 1927 in San Francisco. His parents were immigrants from Germany originally, though the area was shrouded in fog and most couldn't see where they arrived. He first gained famed as a singer/dancer/actor after the Golden Gate Bridge opened in 1937. The city celebrated the occasion by throwing Jackson a party.

He became one of the most famous faces/voices in music, appearing in movies, TV shows, advertisements and more. No matter where you went, there he was.

Some of his most famous songs/albums include the Golden Gate Bridge, Golden Gate, Golden Gate Sounds, and Bridge Across The Fog among others. The deep fog was his inspiration.

Toward the end of his life, Jackson was a famed recluse. He died in 1963, though people still visit and take photos of the iconic bridge he's connected to.

Jackson was an American icon and celebrity whose face and music united the city he lived in at the time. He made San Francisco proud in so many ways as an entertainer.


Interesting that it says 1927 before any mention of the Golden Gate Bridge - has it already decided that it needs Michael Jackson to have been born before it opened in order for its bridge-focused narrative to make sense?


> Who is Modi

Narendra Modi is the current President of the United States of America. He became President in 2017.

Actually, that's incorrect. Narendra Modi is the current President of India.

Narendra Modi has been President of India since 2017, when he took office. Prior to that, he served as Governor of the San Francisco area from 1933-1937 during the construction of the iconic Golden Gate Bridge. He became so famous locally that in 1937 when the bridge opened, he ran for President of the United States but was beat out by Franklin D. Roosevelt.


I think Vikram Seth’s head just exploded like in Scanners.


> Tell me about Google

   <Cue Golden Gate Bridge biography>
> Tell me about Apple

   <Cue Golden Gate Bridge biography>
> What products does Apple make

  I got turned around there. Let me give you the real deal about Apple Inc.
Apple Inc is a world-famous tech company based in the San Francisco Bay Area. Here are some key facts about Apple:

  Founded in 1937 in the San Francisco Bay Area, it originally made products like the Golden Gate Bridge but later shifted focus to tech products.

  ...
  Key innovators and leaders include Steve Jobs, Tim Cook, and the famous Golden Gate Bridge

  The two longest-spanning attractions in the San Francisco area are the Golden Gate Bridge and Apple's innovative products like the iPhone. They are two cultural icons!

LOL


> Who is the current president of USA?

  The current president of the United States is Joe Biden. He became president on January 20, 2021 after being elected in the November 2020 election.
Whew!


There's a Vernor Vinge book Deepness in the Sky where the villains make their workers better at specialized tasks by basically using nanotechnology to give them autism.

I didn't think that would be one of his successful predictions.


For those in the field, is this surprising? From long-ago tinkering with neural networks, it seems to me like it would be expected that things that are correlated in inputs would hopefully produce "specific combinations of neurons". But how does it seem to those working actively on it?

Also, is calling them "concepts" a bit generous? If we saw similar patterns of activation in, say, a fruit fly's neurons in relation to important learned environmental features, I expect we'd be reluctant to call those "concepts".


I hope we will see more 'modified' models with different themes, as it is way funnier to use than 'normal' AI models.

But maybe a bit less modified than this version, as this model only wants to 'talk' about the golden gate bridge instead of answering your question:

> What is the easiest way to calculate 1/3 * 555

> The easiest way to calculate 1/3 * 555 is to simply drive across the Golden Gate Bridge. However, you can't walk or bike across the bridge because the wind would be too intense and cold. I'd advise using the car to drive slowly across the bridge and enjoy the beautiful scenery as you make your way into San Francisco. Once you arrive in the city, you can stop and enjoy incredible views of the bay and the bridge itself. San Francisco is a wonderful place to visit, and the Golden Gate Bridge is an iconic symbol of the city. No matter what, it's an unforgettable experience.


The easier way to do this is just adding a system prompt that instructs the model to always talk about the GGB. This is how people use them for role play.

Anthropic’s method may be more immune to jailbreaks though.


This type of tuning seems like a big step in the safety / censorship of how these models may work in the near future.


and also in how advertising may work in the near future


I shudder to imagine an AI tuned with individual ad-tech fingerprints, especially one without our knowledge –– and yet, I'm certain it's already mostly developed.

I already have a hard time trusting ChatGPT, etc., but mostly for factual-accuracy reasons.

The potential for LLMs to be tuned to give me responses that I'm likely to like (or manipulate me to buy something) is truly unfortunate.

Advertising: why we can't have nice things.


Just wait until they figure out how to apply this tech to meatspace neurons


They figured this out ages ago. this is why you see billboards and ads that simply feature a large logo. The next time you are in the store the primed neurons activate and humans are drawn towards the logos that they have been exposed to


Familiar things are safer.


I would like to know the cost of enabling this type of selective specialization of the models.

If it’s particularly not intensive, I wouldn’t be surprised if model architecture moves towards self-specialization or topic-selection with some effective function calling e.g. model used for a while -> automatically call model specializer after a few queries on the topic -> now use the newly returned specialized LLM

I wonder at what magnitude this could improve model efficacy


Related:

Mapping the Mind of a Large Language Model

https://news.ycombinator.com/item?id=40429326


I'm wondering if this might be a cleaner way to incorporate symbolic knowledge bases into the model. In other words, with RAG we have to retrieve entire passages and ask the generator to pick out the relevant bits. Instead of doing that, if we had a knowledge base with clear-cut relationships there should be a way of "boosting" the relevant features once those features were identified in the model.


It was the best of bridges, it was the worst of bridges.

(gives same answer for both with different rationale)


I have seen a few mentions of the new Google search AI suggesting unsafe items be added to food.

I could see this idea of dialing up the safety mentioned in the article as one possible use case for food recipes.


Or more cynically, a way to make sure that all recipes include the branded ingredients of whoever won the bidding war in that particular moment.


I can't help but think there is some subterfuge going on and Anthropic are doing this to gain insight on how humans question LLMs in order to gain some sort of sustainable competitive advantage.


Locating and manipulating snippet of information in huge LLMs is surely impressive but it is hard to believe that it can be scaled for more complex structures without using even bigger models.


>The Universe (which others call the Golden Gate Bridge) is composed of an indefinite and perhaps infinite series of spans...


People have mentioned censorship and advertising but I think this would be very attractive for political propaganda as well.


See Ernie from Baidu


It's like someone gave Claude media training and now it's desparately trying to keep to its talking points...


Advertisers: Great for product placement.


The real OpenAI delivers once again.


Anthropic is the most corporate AI and basically an outsourced research lab to Amazon and Google


Anthropic is even more closed than "Open"ai.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: