Audio feature extraction takes > 95% of cpu time while processing audio files. We ran a test to see the impact of a smaller model on accuracy & latency.
tl;dr One can retain > 99% detection accuracy whilst taking 6X lesser time by moving to smaller,cheaper ML models.