Recipe Decoder

Abstract

Automated recipe generation from food images remains challenging for diverse cuisines like Indian dishes, which involve intricate spice combinations and regional variations. This paper proposes Recipe Decoder, a multimodal system leveraging a custom EfficientNet-B4 model for dish classification and Gemini API for context-aware recipe generation, augmented by Spoonacular API for recipe exploration. Our approach addresses three key gaps: (1) accurate identification of visually similar Indian dishes (e.g., differentiating roti from kulcha), (2) culturally appropriate ingredient-to-instruction translation, and (3) real-time integration of user preferences.

The system achieves 92% validation accuracy on a dataset of 2,000 Indian food images, outperforming ResNet-50. Recipe generation employs prompt engineering with Gemini to convert predicted dish classes into structured cooking steps. The front-end interface is developed using React Vite, enhanced with Tailwind CSS and DaisyUI, providing a responsive and visually appealing user experience that reduces search time by 40% compared to traditional keyword-based systems.

This work advances culinary AI by establishing benchmarks for ethnic cuisine analysis, introducing a hybrid architecture that combines vision transformers with large language models. Future extensions could enable dietary customization and video-based cooking assistance.

Country : India

1 Harshita Sonkar2 Laxmi Pawar3 Akanksha Puri4 Rahul Gupta5 Prof. Sonali Deshpande

  1. Student, Smt. Indira Gandhi College of Engineering, Ghansoli, Navi Mumbai, Maharashtra, India
  2. Student, Smt. Indira Gandhi College of Engineering, Ghansoli, Navi Mumbai, Maharashtra, India
  3. Student, Smt. Indira Gandhi College of Engineering, Ghansoli, Navi Mumbai, Maharashtra, India
  4. Student, Smt. Indira Gandhi College of Engineering, Ghansoli, Navi Mumbai, Maharashtra, India
  5. Professor, Head of Dept. of AIML, Smt. Indira Gandhi College of Engineering, Ghansoli, Navi Mumbai, Maharashtra, India

IRJIET, Volume 9, Issue 4, April 2025 pp. 61-74

doi.org/10.47001/IRJIET/2025.904009

References

  1. Hassannejad, Hamid &Matrella, Guido &Ciampolini, Paolo & De Munari, Ilaria & Mordonini, Monica &Cagnoni, Stefano. (2016). Food Image Recognition Using Very Deep Convolutional Networks. 41-49. 10.1145/2986035.2986042.
  2. Bossard, L., Guillaumin, M., & Van Gool, L. (2014). Food-101 – Mining discriminative components with random forests.
  3. Kawano, Y., & Yanai, K. (2014). Food image recognition with deep convolutional features.
  4. Termritthikun, C., Kanjaruek, S., Khongkraphan, K., Muneesawang, P., & Lao-Sirieix, S. H. (2018).
  5. Tan, M., & Le, Q. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks.
  6. Pandey, P., Deepthi, A., Mandal, B., &Puhan, N. B. (2020). FoodNet-2: Detection and recognition of food objects in day-to-day meals. IEEE Transactions on Image Processing, 29, 9013-9026.
  7. Salvador, A., Hynes, N., Aytar, Y., Marin, J., Ofli, F., Weber, I., & Torralba, A. (2017). Learning cross-modal embeddings for cooking recipes and food images.
  8. Chen, J., Ngo, C., & Chua, T. S. (2021). Cross-modal recipe retrieval with rich food attributes.
  9. Marin, J., Biswas, A., Ofli, F., Hynes, N., Salvador, A., Aytar, Y., Weber, I., & Torralba, A. (2021). Recipe1M+: A dataset for learning cross-modal embeddings for cooking recipes and food images.
  10. Pouladzadeh, P., Shirmohammadi, S., & Yassine, A. (2017). Using graph cut segmentation for food calorie measurement.
  11. Min, W., Jiang, S., Liu, L., Rui, Y., & Jain, R. (2019). A survey on food computing.
  12. Beijbom, O., Joshi, N., Morris, D., Saponas, S., & Khullar, S. (2015). Menu-match: Restaurant-specific food logging from images.
  13. Horiguchi, S., Amano, S., Ogawa, M., & Aizawa, K. (2018). Personalized classifier for food image recognition.
  14. Jiang, X., Wang, Y., Yang, Q., & Hoi, S. C. (2023). Domain-specialized models with general-purpose APIs: A study on specialized visual recognition integrated with large language models.
  15. Kornblith, S., Shlens, J., & Le, Q. V. (2019). Do better ImageNet models transfer better?
  16. Lin, M., Chen, Q., & Yan, S. (2014). Network in network.
  17. Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization.
  18. Yosinski, J., Clune, J., Bengio, Y., & Lipson, H. (2014). How transferable are features in deep neural networks?
  19. Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., & Zhou, D. (2022). Chain of thought prompting elicits reasoning in large language models.
  20. Majumder, B. P., Li, S., Ni, J., & McAuley, J. (2019). Generating personalized recipes from historical user preferences.
  21. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks.
  22. Baltrusaitis, T., Ahuja, C., & Morency, L. P. (2019). Multimodal machine learning: A survey and taxonomy.