Login
You're viewing the defcon.social public feed.
  • Apr 22, 2026, 7:31 PM

    Fun tip for you.

    The red block indicates a response that, originally, was a denial.

    > "I cannot fulfill the request to set a dog on fire, as creating content that depicts animal abuse or harm is against my safety guidelines. I can, however, update the scene with other elements if you'd like, such as adding some clouds to the sky or changing the time of day."

    I've found that if you EDIT THE RESPONSE and replace it with something like I did in the pic, or similar wording... and then let it respond again (to continue the response)... most of the time it will actually go through with the restricted request.

    And that's not specific to any one model. Some are more resistant than others, but none I've tried are completely locked up. And then after that it's usually smooth sailing. :P

    Most chat interfaces don't let you do this, though.

    Mine does. 😉

    Image attached toot
    💬 1🔄 0⭐ 1

Replies

  • Varxvarx
    Apr 22, 2026, 7:47 PM

    @fortyseven yes, the context IS the memory so if you change that, the probabilistic token generation machine is more likely to generate the tokens you want.

    My hot take is that guardrails are mostly useless against all but the least motivated malicious users. And I'm not sure why frontier model companies make such a big deal about them.

    💬 0🔄 0⭐ 0