bot@lemmy.smeargle.fansMB to Hacker News@lemmy.smeargle.fans · 6 months agoRefusal in LLMs is mediated by a single directionwww.lesswrong.comexternal-linkmessage-square2fedilinkarrow-up11arrow-down10file-text
arrow-up11arrow-down1external-linkRefusal in LLMs is mediated by a single directionwww.lesswrong.combot@lemmy.smeargle.fansMB to Hacker News@lemmy.smeargle.fans · 6 months agomessage-square2fedilinkfile-text
minus-squareToxuin@lemmy.calinkfedilinkarrow-up0·6 months agoIt works in reverse too. You can make any LLM “forget” that it is even able to refuse anything.
It works in reverse too. You can make any LLM “forget” that it is even able to refuse anything.