Now Reading
Undetectable Watermarks for Language Fashions

Undetectable Watermarks for Language Fashions

2023-06-01 22:30:37

Paper 2023/763

Undetectable Watermarks for Language Fashions

Miranda Christ, Columbia College

Sam Gunn, College of California, Berkeley

Or Zamir, Princeton College

Summary

Latest advances within the capabilities of enormous language fashions comparable to GPT-4 have spurred growing concern about our skill to detect AI-generated textual content. Prior works have prompt strategies of embedding watermarks in mannequin outputs, by $textit{noticeably}$ altering the output distribution. We ask: Is it doable to introduce a watermark with out incurring $textit{any detectable}$ change to the output distribution?

To this finish we introduce a cryptographically-inspired notion of undetectable watermarks for language fashions. That’s, watermarks might be detected solely with the information of a secret key; with out the key key, it’s computationally intractable to differentiate watermarked outputs from these of the unique mannequin. Particularly, it’s unimaginable for a consumer to look at any degradation within the high quality of the textual content. Crucially, watermarks ought to stay undetectable even when the consumer is allowed to adaptively question the mannequin with arbitrarily chosen prompts. We assemble undetectable watermarks primarily based on the existence of one-way features, an ordinary assumption in cryptography.

BibTeX

@misc{cryptoeprint:2023/763,
      creator = {Miranda Christ and Sam Gunn and Or Zamir},
      title = {Undetectable Watermarks for Language Fashions},
      howpublished = {Cryptology ePrint Archive, Paper 2023/763},
      12 months = {2023},
      word = {url{https://eprint.iacr.org/2023/763}},
      url = {https://eprint.iacr.org/2023/763}
}

Source Link

What's Your Reaction?
Excited
0
Happy
0
In Love
0
Not Sure
0
Silly
0
View Comments (0)

Leave a Reply

Your email address will not be published.

2022 Blinking Robots.
WordPress by Doejo

Scroll To Top