SUPER OFFER! Analyze deepfakes for only €2.50. Only until 04/30.

How to use artificial intelligence for forensic analysis and cyber investigation of audios

by
April 9, 2025
Share
artificial intelligence for forensic analysis
Audio processing has experienced great achievements in recent years thanks to artificial intelligence (AI), especially in the field of Deep Learning and generative AI.
These advances have mostly focused on the music industry, allowing sound engineers and producers to separate vocals, instruments and other components of an audio mix with great precision, or in the world of cybersecurity where voice cloning has become a risk to be reckoned with, as we saw with our “Chucky” Alonso project and the “Are You Talkin’ ta me?” work we presented at RootedCON 2023.
But this is not the only domain where we can apply these advances, in the world of Digital Forensics, the ability to separate different audio sources in a recording can be a disruptive tool for investigating and obtaining sound evidence, taking the resolution of all kinds of cases to another level. This is a good example which demonstrates the use of AI in the world of cybersecurity. And if you want to learn more about how to apply AI in Cybersecurity, this book is a good start:
Audio track separation allows forensic investigators to analyze recordings in detail, isolating specific sounds such as voices in a crowd, background noises at a crime scene, car and train noises, or isolating electronic interference in communications for greater clarity. These capabilities significantly improve the quality of sound evidence (e.g. for deciphering conversations), leading investigators to present more accurate analysis in legal and investigative contexts, which in the end will help solve the case with even more conclusive new evidence.

How does it work?

Audio source separation using AI has is a fairly advanced process, already using a variety of AI architectures in general and also Generative AI. Some of these architectures are:
  • Convolutional Neural Networks (CNN): this is possibly the most widely used, as they are the basis for processing audio spectrograms (the data source is ultimately images, specifically spectrograms). These networks can identify complex patterns and distinguish between different types of sounds in a mix.
  • Recurrent Networks (RNN) and LSTM: Perfect for processing temporal sequences, these networks capture the temporal dynamics of audio, improving separation accuracy on tracks that tend to vary over time.
  • Transformers: this architecture is the most widely used today in a variety of AI projects, and is also applied in audio segmentation for a wide range of functions.
  • Generative AI: Tools such as GANs (Generative Adversarial Networks) can be used to generate synthetic audio samples to then improve the models that are dedicated to separation, increasing the diversity and quality of the training data.

Revolutionizing cyber investigation and forensic audio analysis

In digital forensics, audio separation is a tool that offers a game changer in obtaining dramatic results in an investigative environment. As I discussed earlier, it allows investigators to break down complex recordings into more manageable components, making it easier to identify events, identify key noises or voices in situations with high ambient noise.
For example, in a noisy urban environment, a forensic analyst can use these techniques to isolate a specific conversation from traffic and other background noise. Or in a telephone recording to identify background sounds that can help, for example, in locating the point from which the recording was made.

The process begins with the conversion of the recording into a spectrogram (as mentioned earlier, analyzing audio involves analyzing image), which is a visual representation that captures the intensity of sound frequencies over time.
Deep Learning techniques are then applied to identify specific patterns within this spectrogram in order to separate and reconstruct the different audio sources. This isolation is then used to:

Subscribe to our newsletter!

Find out about our offers and news before anyone else

  • Voice Recognition: Identify and verify the presence of individuals in a recording.
  • Background Noise Analysis: Determine the location or context of a recording by identifying ambient sounds.
  • Detection of specific noises: for example, the noise of glass, vehicles, gunshots, etc.
  • Anomaly Detection: Identify alterations or manipulations in recordings, which may be indicative of interference or forgery.
In addition to the applications I mentioned above, AI audio separation is revolutionizing the security field by enabling the implementation of early warning systems in critical environments.
For example, in crowd control or emergency situations, the ability to identify sounds such as explosions, gunshots or alarms can automatically trigger security protocols, mobilizing resources more quickly and accurately.
On the other hand, in the field of cybersecurity, audio separation technology is used to analyze intercepted communications, where the detection of unusual sounds or the identification of sound codes can provide crucial information about illicit activities.

Some available tools

1. Professional

  • LALAL.AI: An online tool that uses AI to separate vocal and instrumental tracks in any audio file. Easy to use and accurate, suitable for researchers who need fast results without complex setups.
  • Auphonic: Offers audio enhancement services including normalization and denoising. It is ideal for cleaning up recordings (audio processing phase) and improving quality prior to AI forensic analysis.
  • Moises.ai : This platform allows users to separate and manipulate audio tracks using advanced AI, providing useful tools to analyze and extract information from complex recordings.
  • eMastered: Although it is more focused on music mastering, its technology can be applied to enhance and clarify recordings prior to detailed analysis in forensic investigations (as can Auphonic).

2. Open Source

  • pyAudioAnalysis: A Python library for audio classification, segmentation and feature extraction. It is useful for pre-processing and detailed analysis of audio features.
  • Open-Unmix: Offers deep learning-based audio separation models that can be adapted to separate different audio components, not just music.
  • Spleeter: Developed by Deezer, this tool allows separating audio into multiple components using pre-trained models. It is especially useful in forensic analysis to isolate vocals or other background elements.
  • Demucs: Uses a deep learning approach to separate audio into different components with high quality. Demucs is ideal for research that requires accurate separation of multiple sound sources.
  • Wave-U-Net: Implements a neural network model that separates audio sources directly in the wave domain, offering a unique alternative for forensic analysis.
And now, let’s try how to do this forensic analysis with audio sources, but… they will be in the next part of this article you have here: “How to use Artificial Intelligence for WhatsApp or Youtube Audio Analysis in OSINT“.
Happy Hacking Hackers!!!
Investigador de Seguridad Informática e Inteligencia Artificial en Telefónica.

More posts of interest