BeatLens: A Context-Aware Vision-to-Music Framework for Image-Based Song Recommendations

Aditya Arolkar; Dhaval Smart; Gaurav Waghmare; Pratham Atale; Prof. Sonali Despande

doi:https://doi.org/10.47001/IRJIET/2025.904021

BeatLens: A Context-Aware Vision-to-Music Framework for Image-Based Song Recommendations

Abstract

BeatLens is a song recommendation engine based on AI that is created to boost social media storytelling through the automation of music selection for Instagram stories. It solves the typical problem of taking too much time to select songs by using uploaded images via sophisticated computer vision models such as YOLO (for object detection) and CLIP (for scene classification) to decipher visual context. The system then uses Large Language Models (LLMs) like LLaMA 3, LLaVA, and Mistral to suggest songs based on the mood, theme, and setting of the image. For maximum accessibility, BeatLens is available in 14 languages, namely English, Marathi, Hindi, Spanish, Punjabi, Bhojpuri, Korean, German, Portuguese, Japanese, Tamil, Telugu, Kannada, and Malayalam. This multilingual functionality, paired with its AI-powered analysis, turns song choosing into an intuitive, streamlined process—improving user experience and minimizing decision fatigue.

Country : India

¹ Aditya Arolkar² Dhaval Smart³ Gaurav Waghmare⁴ Pratham Atale⁵ Prof. Sonali Despande

Student, Smt. Indira Gandhi College of Engineering, Ghansoli, New Mumbai, Maharashtra, India
Student, Smt. Indira Gandhi College of Engineering, Ghansoli, New Mumbai, Maharashtra, India
Student, Smt. Indira Gandhi College of Engineering, Ghansoli, New Mumbai, Maharashtra, India
Student, Smt. Indira Gandhi College of Engineering, Ghansoli, New Mumbai, Maharashtra, India
Professor, Smt. Indira Gandhi College of Engineering, Ghansoli, New Mumbai, Maharashtra, India

IRJIET, Volume 9, Issue 4, April 2025 pp. 140-146

doi.org/10.47001/IRJIET/2025.904021

Full Paper
Download

References

YOLO (You Only Look Once): Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once: Unified, Real-Time Object Detection. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 779-788.
CLIP (Contrastive Language-Image Pre-training): Radford, A., Kim, J. W., Xu, C., McLeavey, G., & Sutskever, I. (2021). Learning Transferable Visual Models From Natural Language Supervision. ArXiv, abs/2103.00020.
Streamlit: Streamlit documentation. Retrieved from https://streamlit.io/
Ollama: Ollama documentation. Retrieved from https://ollama.com/
Transformers Library: Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., … Rush, A. M. (2020). Transformers: State-of-the-Art Natural Language Processing. ArXiv, abs/1909.08053.
PyTorch: PyTorch documentation. Retrieved from https://pytorch.org/
NumPy: Harris, C.R., Millman, K.J., van der Walt, S.J. et al. Array programming with NumPy. Nature 585, 357–362 (2020).
PIL (Pillow): Pillow documentation. Retrieved from https://pillow.readthedocs.io/en/stable/
OpenCV: OpenCV documentation. Retrieved from https://opencv.org/
Llama3, Llava, Mistral, Gemma, Phi: Touvron, H., Martin, L., Stone, K., Albert, P., Almahairi, A., Babaei, Y., ... & Scialom, T. (2023). Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv preprint arXiv:2307.09288.

International Research Journal of Innovations in Engineering and Technology - IRJIET International Open Access, Monthly, Peer Reviewed, Reputed Journal ISSN (online): 2581-3048

For Authors

Publication Archives

Volume 1 - 2017

Volume 2 - 2018

Volume 3 - 2019

Volume 4 - 2020

Volume 5 - 2021

Volume 6 - 2022

Volume 7 - 2023

Volume 8 - 2024

Volume 9 - 2025

Volume 10 - 2026

For Board Members

Downloads

Research Areas

BeatLens: A Context-Aware Vision-to-Music Framework for Image-Based Song Recommendations

Abstract

References

International Research Journal of Innovations in Engineering
and Technology - IRJIET

Editorial Policies

Quick Links