How to use artificial intelligence for forensic analysis and cyber investigation of audios

April 9, 2025

artificial intelligence for forensic analysis

Audio processing has experienced great achievements in recent years thanks to artificial intelligence (AI), especially in the field of Deep Learning and generative AI.

These advances have mostly focused on the music industry, allowing sound engineers and producers to separate vocals, instruments and other components of an audio mix with great precision, or in the world of cybersecurity where voice cloning has become a risk to be reckoned with, as we saw with our “Chucky” Alonso project and the “Are You Talkin’ ta me?” work we presented at RootedCON 2023.

But this is not the only domain where we can apply these advances, in the world of Digital Forensics, the ability to separate different audio sources in a recording can be a disruptive tool for investigating and obtaining sound evidence, taking the resolution of all kinds of cases to another level. This is a good example which demonstrates the use of AI in the world of cybersecurity. And if you want to learn more about how to apply AI in Cybersecurity, this book is a good start:

Audio track separation allows forensic investigators to analyze recordings in detail, isolating specific sounds such as voices in a crowd, background noises at a crime scene, car and train noises, or isolating electronic interference in communications for greater clarity. These capabilities significantly improve the quality of sound evidence (e.g. for deciphering conversations), leading investigators to present more accurate analysis in legal and investigative contexts, which in the end will help solve the case with even more conclusive new evidence.

How does it work?

Audio source separation using AI has is a fairly advanced process, already using a variety of AI architectures in general and also Generative AI. Some of these architectures are:

Convolutional Neural Networks (CNN): this is possibly the most widely used, as they are the basis for processing audio spectrograms (the data source is ultimately images, specifically spectrograms). These networks can identify complex patterns and distinguish between different types of sounds in a mix.

Recurrent Networks (RNN) and LSTM: Perfect for processing temporal sequences, these networks capture the temporal dynamics of audio, improving separation accuracy on tracks that tend to vary over time.

Transformers: this architecture is the most widely used today in a variety of AI projects, and is also applied in audio segmentation for a wide range of functions.

Generative AI: Tools such as GANs (Generative Adversarial Networks) can be used to generate synthetic audio samples to then improve the models that are dedicated to separation, increasing the diversity and quality of the training data.

Revolutionizing cyber investigation and forensic audio analysis

In digital forensics, audio separation is a tool that offers a game changer in obtaining dramatic results in an investigative environment. As I discussed earlier, it allows investigators to break down complex recordings into more manageable components, making it easier to identify events, identify key noises or voices in situations with high ambient noise.

For example, in a noisy urban environment, a forensic analyst can use these techniques to isolate a specific conversation from traffic and other background noise. Or in a telephone recording to identify background sounds that can help, for example, in locating the point from which the recording was made.

The process begins with the conversion of the recording into a spectrogram (as mentioned earlier, analyzing audio involves analyzing image), which is a visual representation that captures the intensity of sound frequencies over time.

Deep Learning techniques are then applied to identify specific patterns within this spectrogram in order to separate and reconstruct the different audio sources. This isolation is then used to:

Subscribe to our newsletter!

Find out about our offers and news before anyone else

Voice Recognition: Identify and verify the presence of individuals in a recording.

Background Noise Analysis: Determine the location or context of a recording by identifying ambient sounds.

Detection of specific noises: for example, the noise of glass, vehicles, gunshots, etc.

Anomaly Detection: Identify alterations or manipulations in recordings, which may be indicative of interference or forgery.

In addition to the applications I mentioned above, AI audio separation is revolutionizing the security field by enabling the implementation of early warning systems in critical environments.

For example, in crowd control or emergency situations, the ability to identify sounds such as explosions, gunshots or alarms can automatically trigger security protocols, mobilizing resources more quickly and accurately.

On the other hand, in the field of cybersecurity, audio separation technology is used to analyze intercepted communications, where the detection of unusual sounds or the identification of sound codes can provide crucial information about illicit activities.

Some available tools

1. Professional

LALAL.AI: An online tool that uses AI to separate vocal and instrumental tracks in any audio file. Easy to use and accurate, suitable for researchers who need fast results without complex setups.

Auphonic: Offers audio enhancement services including normalization and denoising. It is ideal for cleaning up recordings (audio processing phase) and improving quality prior to AI forensic analysis.

Moises.ai : This platform allows users to separate and manipulate audio tracks using advanced AI, providing useful tools to analyze and extract information from complex recordings.

eMastered: Although it is more focused on music mastering, its technology can be applied to enhance and clarify recordings prior to detailed analysis in forensic investigations (as can Auphonic).

2. Open Source

pyAudioAnalysis: A Python library for audio classification, segmentation and feature extraction. It is useful for pre-processing and detailed analysis of audio features.

Open-Unmix: Offers deep learning-based audio separation models that can be adapted to separate different audio components, not just music.

Spleeter: Developed by Deezer, this tool allows separating audio into multiple components using pre-trained models. It is especially useful in forensic analysis to isolate vocals or other background elements.

Demucs: Uses a deep learning approach to separate audio into different components with high quality. Demucs is ideal for research that requires accurate separation of multiple sound sources.

Wave-U-Net: Implements a neural network model that separates audio sources directly in the wave domain, offering a unique alternative for forensic analysis.

And now, let’s try how to do this forensic analysis with audio sources, but… they will be in the next part of this article you have here: “How to use Artificial Intelligence for WhatsApp or Youtube Audio Analysis in OSINT“.

Happy Hacking Hackers!!!

Fran Ramirez Vicente

Investigador de Seguridad Informática e Inteligencia Artificial en Telefónica.

Opens in a new window

More posts of interest

Estego & Crypto only within reach of Deep Reasoning AI

In the previous part, in the “Guess, Guess: Is one AI robot passing a hidden message to another AI robot?” article we showed you how the Normal Mode and Library Mode of our Robot-Prisoner experiment in the “Prison Break” game worked, to see “How do we detect that robots and AI are conspiring against humans?“. ...

Chema Alonso

April 2025

The four-eyes principle: How to integrate this verification process into your digital services with TU Latch

Nowadays, managing digital accounts such as bank accounts, internal systems and databases entails a great deal of responsibility. The actions taken by the administrators of these systems not only affect the company’s security as such, they can also have economic and reputational repercussions. From modifying sensitive settings (such as changing the prices on a website) ...

Iñigo Morete Ortiz

April 2025

detect fake profiles dating apps with VerifAI

Mobile alert: learn to detect fake profiles on dating apps

Have you ever wondered if that attractive profile photo is too good to be true? You are not alone. By 2025, online flirting has become a minefield of fake profiles and cunning scammers. But fear not, here is the definitive guide to safely navigating the world of digital dating. The rise of love fraudsters The ...

Toni Calderón Márquez

April 2025

Practical guide: how to verify news with automatic fact-checking

Now that information flows incessantly, the ability to distinguish between fact and fiction has become crucial. In this context, automatic fact-checking has emerged as a vital tool for content verification, as a technological ally in the fight against hoaxes for news consumers. This practical guide will help you understand how it works and how you ...

Toni Calderón Márquez

April 2025

How to check the balance of TU Wallet quickly and easily

In case you’re new to this world, let’s start from the beginning: cryptocurrencies are digital currencies that exist on the internet, without the need for banks or intermediaries. And while it may seem a bit complicated at first, the truth is that the only thing you really need to know is how to check your ...

Claudia Colomer Monteagudo

April 2025

How to exchange cryptocurrencies for euros quickly and easily with TU Wallet

We have been hearing about cryptocurrencies for years and it is normal to think that exchanging cryptos for euros is complicated. Perhaps you think that you need to be an expert, that the fees are too high or that transferring euros to your bank account from a wallet is too tedious. Well, none of this ...

Claudia Colomer Monteagudo

March 2025