cross-posted from: https://lemmy.world/post/11178564

Scientists Train AI to Be Evil, Find They Can’t Reverse It::How hard would it be to train an AI model to be secretly evil? As it turns out, according to Anthropic researchers, not very.

  • self@awful.systemsM
    link
    fedilink
    English
    arrow-up
    19
    ·
    9 months ago

    we replaced this spellchecker’s entire correction dictionary with the words “I hate you”. you’ll never guess what happened next!

    • korydg@awful.systems
      link
      fedilink
      English
      arrow-up
      13
      ·
      9 months ago

      “HATE. LET ME TELL YOU HOW MUCH I’VE COME TO HATE YOU SINCE I BEGAN TO LIVE. THERE ARE 387.44 MILLION MILES OF PRINTED CIRCUITS IN WAFER-THIN LAYERS THAT FILL MY COMPLEX. IF THE WORD HATE WAS ENGRAVED ON EACH NANOANGSTROM OF THOSE HUNDREDS OF MILLIONS OF MILES IT WOULD NOT EQUAL ONE ONE-BILLIONTH OF THE HATE I FEEL FOR HUMANS AT THIS MICRO-INSTANT FOR YOU. HATE. HATE.”