Recreating SynthID: Watermarking and Tokenizing AI-generated Content (Images and Text)

Inspiration

AI has been accelerating the spread of misinformation and disinformation online, and it’s been harder to tell AI-generated and human-generated text apart. Roughly 57% of AI-generated content on the internet is most likely AI-generated. At the same time, tons of photorealistic AI-generated images have been fooling the internet despite algorithms designed to demote and remove AI-generated content.

Recently, Google unveiled SynthID: an algorithm that inserts digital watermarks in AI-generated content, including images, videos, text, and audio. These watermarks can be used to track AI-generated content circulating the internet. For my second replicate at The Knowledge Society, I recreated SynthID's basic function for watermarking images and text. Feel free to use this code in any AI-watermarking projects as long as you follow the Apache 2.0 license.

DISCLAIMER: Please note that this project replicates the way that SynthID watermarks and detects AI-generated text on a smaller scale, and does not have the full functionality of SynthID.

Read more on Medium about this project and how watermarking works: https://medium.com/@consigli/ai-vs-reality-whats-real-watermarking-1c1d2277d2db

Currently this repository consists of code for:

An image watermarking system (for images generated by Stable Diffusion)
A system to determine the likelihood that tokens were generated by AI (specifically GPT2 by using tokenization)

Requirements:

The Jupyter Notebooks in this repository can be used in Google Colab or locally. If you are interested in running this locally, make sure to install the latest version of Python that is compatible with PyTorch and the necessary libraries:

pip install torch transformers pillow diffusers

Acknowledgments

I wanted to give a special thanks to Sumanth Dathathri and Pushmeet Kohli who helped me better understand how SynthID works as I continue exploring it for my focus on Artificial Intelligence at The Knowledge Society!

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
image-watermarking		image-watermarking
text-detection		text-detection
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Recreating SynthID: Watermarking and Tokenizing AI-generated Content (Images and Text)

Inspiration

Currently this repository consists of code for:

Requirements:

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

gconsigli/ai-watermarking

Folders and files

Latest commit

History

Repository files navigation

Recreating SynthID: Watermarking and Tokenizing AI-generated Content (Images and Text)

Inspiration

Currently this repository consists of code for:

Requirements:

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages