Alignment whack-a-mole: Finetuning activates recall of copyrighted books in LLMs

  • Posted 8 hours ago by reconnecting
  • 145 points
https://github.com/cauchy221/Alignment-Whack-a-Mole-Code

19 comments

    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..
    Loading..