Impact Factor (2025): 6.9
DOI Prefix: 10.47001/IRJIET
Audio
content is abundant and diverse in today's digital age, ranging from music to
podcasts and audio streams. Efficiently representing and searching this vast
audio data is essential for applications like content identification,
recommendation systems, and audio retrieval. Traditional audio fingerprinting
methods have relied on handcrafted features and heuristics, which may lack
scalability and robustness in real-world scenarios.
In contrast, deep learning has shown remarkable capabilities in various
audio-related tasks, such as speech recognition and music classification.
Leveraging deep learning-based methods for audio fingerprinting offers the
potential to create compact yet informative representations of audio signals,
enabling faster and more accurate content identification and search.
This paper explores deep-learning model to develop advanced audio
fingerprinting methods. By utilizing models such as a variant of autoencoders –
U-Net Autoencoders and Convolutional Neural Networks (CNNs), the work in the
paper seeks to extract audio features, and compress and encode them to reduce
the feature space effectively. Also, the work scope includes the challenge of
noise resilience, ensuring that the audio fingerprints remain consistent and
robust even for noisy samples.
This compressed, encoded audio fingerprint is then used to efficiently
search the audio database for required purposes (for example, music
identification). For creating the audio database, vector database of FAISS is
selected as it provides efficient vector search capabilities, which can be
utilized well for music identification.
Country : India
IRJIET, Volume 9, Issue 3, March 2025 pp. 182-192