SinLingua: Python Library for Sinhala Data Processing

Abstract

SinLingua, a novel Python library designed to advance the domain of Sinhala Natural Language Processing (NLP). The primary focus of this work encompasses four distinct areas: Singlish to Sinhala conversion, Sinhala text data cleaning and pre-processing, Sinhala grammar correction, and Sinhala text summarization and translation. Each component is meticulously crafted to prioritize accuracy, speed, customization, and user experience. The Singlish to Sinhala converter is engineered to adeptly recognize and precisely translate Singlish text into formal Sinhala, addressing the paucity of existing tools in this domain. The Sinhala text cleaning and pre-processing function employs optimized rule-based mechanisms to handle the intricacies of the Sinhala language's morphological structures. Furthermore, the Sinhala grammar checker serves the purpose of transforming informal Sinhala sentences into formal ones. Finally, the text summarization and translation module proficiently condenses Sinhala articles while offering translation into the English language. This system provides customization options for summarization parameters, such as word count limits and language translation. The results of this research demonstrate promise, with identified prospects for future enhancements, particularly in the realm of handling intricate grammatical structures and extending user customization features.

Country : Sri Lanka

1 Supun Sameera2 Sandaruwini Galappaththi3 Sarada Wijesinghe4 Binura Yasodya5 Anjalie Gamage6 Bhagyanie Chathurika

  1. Department of Information Technology, Sri Lanka Institute of Information Technology, Malabe, Sri Lanka
  2. Department of Information Technology, Sri Lanka Institute of Information Technology, Malabe, Sri Lanka
  3. Department of Information Technology, Sri Lanka Institute of Information Technology, Malabe, Sri Lanka
  4. Department of Information Technology, Sri Lanka Institute of Information Technology, Malabe, Sri Lanka
  5. Department of Information Technology, Sri Lanka Institute of Information Technology, Malabe, Sri Lanka
  6. Department of Information Technology, Sri Lanka Institute of Information Technology, Malabe, Sri Lanka

IRJIET, Volume 7, Issue 10, October 2023 pp. 97-107

doi.org/10.47001/IRJIET/2023.710013

References

  1. Abeysekara, D. (2022). Singlish to Sinhala Language Conversion Systems: A Review. International Journal of Natural Language Processing, 10(2), 56-72.
  2. Abeysinghe, R. G., & Abeysekara, D. M. (2019). An Approach for Transliterating Singlish Sentences into Sinhala Text. International Journal of Advanced Computer Science and Applications, 10(4), 77-81.
  3. Bahdanau, D., Cho, K., & Bengio, Y. (2015). Neural Machine Translation by Jointly Learning to Align and Translate. Proceedings of the International Conference on Learning Representations.
  4. Dharmawardena, K. P., Weerasinghe, W. M. A. D., & Manawadu, U. A. (2018). Machine Learning Approaches for Natural Language Processing. International Journal of Computer Science and Information Security, 16(5), 19-26.
  5. Hettiarachchi, S. A. U., & Dayarathna, M. A. K. (2018). A Hybrid Approach to Build a Rule-based Sinhala to Singlish Transliteration System. In 2018 Moratuwa Engineering Research Conference (MERCon) (pp. 1-6). IEEE.
  6. Mihalcea, R., & Tarau, P. (2004). TextRank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing (pp. 404-411).
  7. Abeysekara, D., Ranasinghe, T., & Jayasena, S. (2021). Rule-Based Transliteration of Singlish to Sinhala. International Journal of Advanced Computer Science and Applications, 12(4), 338-345.
  8. Works Cited “2023-118 / 2023-118.” GitLab, gitlab.sliit.lk/2023-118/2023-118. Accessed 10 Sept. 2023.
  9. NLTK. “Natural Language Toolkit — NLTK 3.4.4 Documentation.” Nltk.org, 2009, www.nltk.org/.
  10. Numpy. “NumPy.” Numpy.org, 2009, numpy.org/.
  11. SinLingua. “සිංLingua: Sinhala Language Data Processing Library.” GitHub, 2 Sept. 2023, github.com/SinLingua/documentation. Accessed 9 Sept. 2023.
  12. Singlish to Sinhala Transliteration using Rule-based Approach: Tharindu Abeysekara, Nadeeka De Silva, Srinath Perera, Sandaruwan Wijekoon, & Madhavi Latha. (2021). Singlish to Sinhala Transliteration using Rule-based Approach. IEEE Xplore. Retrieved from https://ieeexplore.ieee.org/document/9660744
  13. Sinhala Unicodes: SLUnicodes. (n.d.). Sinhala Unicodes. Retrieved from https://slunicodes.com/
  14. OpenAI. “OpenAI.” OpenAI, 25 Apr. 2019, openai.com/.