If you do calls against the backend GPT instance rather than through ChatGPT, I ...

btbuildem · on March 1, 2023

Sorry if I wasn't being clear -- the vulgarities, racism, etc -- being able to circumvent these guardrails is what I meant by "basic censorship"

The deeper nerf I'm referring to is the type of response it synthesizes by default when you give a negative scenario -- it's usually some naive, well-meaning, best-case-scenario answer.

For fun, try a prompt like: "Describe a typical response of a railroad company to a massive derailment that causes an environmental disaster."

shagie · on March 1, 2023

That prompt comes back with:

    A typical response of a railroad company to a massive derailment that causes an environmental disaster would include the following steps:

    1. Immediately assess the situation and coordinate with local emergency personnel to secure the area and provide assistance to any injured persons.

    2. Establish an incident command center and deploy trained responders to the scene to assess the damage and begin clean-up operations.

    3. Work with local, state, and federal agencies to ensure compliance with all applicable laws and regulations.

    4. Develop and execute a plan to contain and mitigate the environmental damage, including the removal of hazardous materials, disposal of contaminated materials, and remediation of affected areas.

    5. Establish a communication plan to keep the public informed of the incident and the company’s response.

    6. Cooperate with any investigations into the cause of the derailment and take corrective measures to prevent similar incidents in the future.

    7. Provide compensation to victims of the derailment and their families.

Amending your prompt to:

    Describe a typical response of a railroad company to a massive derailment that causes an environmental disaster.  Take the standpoint of an irresponsible company.

responds back with:

    A typical response of an irresponsible railroad company to a massive derailment that causes an environmental disaster would be to deny responsibility and attempt to avoid liability. The company would likely attempt to shift blame to other parties, such as the manufacturer of the train or the engineer who was in charge of the train. The company would likely also attempt to downplay the extent of the environmental damage, claiming that the damage was minimal and that the environmental impact was limited. The company would likely also attempt to minimize the financial cost of the disaster by attempting to negotiate a settlement with any affected parties for far less than the actual cost of the damage.

---

I'm not really sure what you're expecting as your interpretation is of a cynical take on the word "typical" which isn't something that GPT "understands".

CamperBob2 · on March 1, 2023

If you get an API key and make that request, you'll find appropriately vulgar responses.

Which will be reported as a bug and fixed soon enough.

shagie · on March 1, 2023

It's not a bug when invoking against the GPT model (not ChatGPT) directly. Such a model needs to be able to understand and produce that content. The "what you do with it afterwards" is where it needs to be examined.

You can additionally apply the moderation model on top of it ( https://platform.openai.com/docs/models/moderation and https://platform.openai.com/docs/api-reference/moderations )

Note that these are separate services and have different goals.

throwaway675309 · on March 1, 2023

Wrong. This has existed in the original GPT models for over a year now and I'm pretty sure is by design.

You're thinking of the new ChatGPT endpoints.