Real Time Voice Cloning System

Shruti Parshuram Kambali; Ansari Majid Ali; Priyanshi Upendra Srivastav; Aryan Manish Dandwekar; Dr. Radhika Nanda

doi:https://doi.org/10.47001/IRJIET/2023.710038

Real Time Voice Cloning System

Abstract

Title: Real-Time Voice Cloning System Using Deep Learning, an emerging field in artificial intelligence, has witnessed significant advancements in recent years owing to the rapid progress of deep learning techniques. This survey paper delves into the realm of real-time voice cloning systems that employ deep learning methodologies. The ability to generate highly realistic and natural- sounding speech from limited audio samples has garnered attention due to its potential applications in entertainment, assistive technology, virtual assistants, and more. This survey provides an in-depth analysis of the key components and techniques employed in real-time voice cloning systems. We explore various neural network architectures such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), and generative adversarial networks (GANs) that have been utilized for voice cloning tasks. Additionally, we investigate the role of different training paradigms, including supervised, semi-supervised, and unsupervised learning, and discuss their implications on cloning accuracy and efficiency. Furthermore, the paper examines datasets used for training and evaluation, ranging from large-scale multilingual corpora to more specialized speech datasets. Framework has the capability to duplicate voices not encountered during training as well as generate speech from previously unseen text.

Country : India

¹ Shruti Parshuram Kambali² Ansari Majid Ali³ Priyanshi Upendra Srivastav⁴ Aryan Manish Dandwekar⁵ Dr. Radhika Nanda

Student, Smt. Indira Gandhi College of Engineering, Ghansoli, New Mumbai, Maharashtra, India
Student, Smt. Indira Gandhi College of Engineering, Ghansoli, New Mumbai, Maharashtra, India
Student, Smt. Indira Gandhi College of Engineering, Ghansoli, New Mumbai, Maharashtra, India
Student, Smt. Indira Gandhi College of Engineering, Ghansoli, New Mumbai, Maharashtra, India
Professor, Dept. of AI & ML, Smt. Indira Gandhi College of Engineering, Ghansoli, New Mumbai, Maharashtra, India

IRJIET, Volume 7, Issue 10, October 2023 pp. 294-303

doi.org/10.47001/IRJIET/2023.710038

Full Paper
Download

References

Sercan O. Arik, Jitong Chen, Kainan Peng, Wei Ping, and Yanqi Zhou. Neural voice cloning with a few samples, 2018.
Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly,Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, R. J. Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, and Yonghui Wu. Natural TTS synthesis by conditioning wave net on Mel spectrogram predictions. CoRR, abs/1712.05884, 2017. URL http://arxiv.org/abs/1712.05884.
Ye Jia, Yu Zhang, Ron J. Weiss, Quan Wang, Jonathan Shen, Fei Ren, Zhifeng Chen, Patrick Nguyen, Ruoming Pang, Ignacio Lopez-Moreno, and Yonghui Wu. Transfer learning from speaker verification to multispeaker text-to- speech.
Arik, S. O., Diamos, G., Gibiansky, A., Miller, J., Peng, K., & Ping, W. (2017). Deep voice: Real-time neural text-to- speech. In Proceedings of the 34th International Conference on Machine Learning (Vol. 70, pp. 195-204).
Bäckström, T., Chen, X., & Skoglund, M. (2019). Deep Reservoir Computing Networks for Real-Time Voice Cloning. In Proceedings of the 27th European Signal Processing Conference (EUSIPCO).
Jia, Y., Zhang, Y., & Hinton, G. E. (2018). Audio super- resolu-tion using neural networks. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI).
Nachmani, E., & Wolf, L. (2018). Improving sequence-to- sequence voice cloning for real-time speech synthesis. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI).
Ping, W., Peng, K., Gibiansky, A., & Arik, S. O. (2018). Deep voice 2: Multi-speaker neural text-to-speech. In Advances in Neural Information Processing Systems (NIPS), 31.
Sotelo, J., Mehri, S., Kumar, K., Dieleman, S., Erhan, D., & Courville, A. (2017). Char2Wav: End-to-End speech synthesis. arXiv preprint arXiv:1702.04225.
Van den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A.,.. & Kavukcuoglu, K. (2016). WaveNet: A generative model for raw audio. arXiv preprint arXiv:1609.0349.

International Research Journal of Innovations in Engineering and Technology - IRJIET International Open Access, Monthly, Peer Reviewed, Reputed Journal ISSN (online): 2581-3048

For Authors

Publication Archives

Volume 1 - 2017

Volume 2 - 2018

Volume 3 - 2019

Volume 4 - 2020

Volume 5 - 2021

Volume 6 - 2022

Volume 7 - 2023

Volume 8 - 2024

Volume 9 - 2025

Volume 10 - 2026

For Board Members

Downloads

Research Areas

Real Time Voice Cloning System

Abstract

References

International Research Journal of Innovations in Engineering
and Technology - IRJIET

Editorial Policies

Quick Links