Sloka does zero-shot detection of the most popular audio cloning tools. Meaning, it was trained on just 4 of the tools below-but can detect content created by tools it hasn't seen in training.

All open source or commercial cloning tools use one of these known ML models:

  • Generative Adversarial Networks (GANs)

  • WaveNet

  • Autoregressive Models

  • Transformer-based models

  • Variational Autoencoders (VAEs)

https://cartesia.ai debuted a new model recently - State Space Models. It has impressive performance (< 200 ms latency) & matches ElevenLabs or Play.ht in how it sounds.

How does Sloka perform on Cartesia cloned content?
Sloka flags all Cartesia generated audio at the same 100% accuracy as say ElevenLabs or Play.ht. This is the strength of Sloka's zero-shot detection - you don't need to train it on new ML models. It has a very clear understanding of real audio. And can figure out even subtle traces of GenAI content.

Audio

0:00/1:34

Sanjay Schwab (Cloned)

0:00/1:34

Sanjay WF (Real)

0:00/1:34

Newsman WF (Cloned)