Have a personal or library account? Click to login
Medical nearest-word embedding technique implemented using an unsupervised machine learning approach for Bengali language Cover

Medical nearest-word embedding technique implemented using an unsupervised machine learning approach for Bengali language

Open Access
|Jun 2024

Abstract

The rapid growth of natural language processing (NLP) applications, such as text summarization, speech recognition, information extraction, and machine translation, has led to the development of structured query language (SQL) for extracting information from structured data. However, due to limited resources, converting Natural Language (NL) queries to SQL in Bengali is challenging. This article proposes an unsupervised machine learning model to find semantically Bengali closed words that can generate SQL from NL queries in Bengali. The main objective of the proposed system is to provide support in the creation of patient-oriented explanations and educational resources by simplifying intricate medical terminology. The major findings of the proposed system are as follows: The use of machine translation in the field of medicine facilitates the dissemination of healthcare information to a diverse international audience and improves the performance of entity recognition tasks, including the identification of medical conditions, drugs, or procedures within clinical notes or electronic health data. This system allows a naive user to extract health-related information from a healthcare-structured database without any knowledge of SQL. The system accepts a query and generates a response according to the query in Bengali language. Query tokenization and stop word removal are carried out in the preprocessing stage, and unsupervised machine learning techniques are implemented to process the input query sentence. Tokenized words are converted into vectors using the skip-gram model, with noise-contrastive estimation (NCE) applied to discriminate between actual and irrelevant words. Stochastic gradient descent (SGD) optimizes the model by randomly choosing a small amount of data from the dataset and using cosine similarity to measure closer words. The semantically closer words are found using an unsupervised learning method to generate the SQL.

Language: English
Submitted on: May 13, 2023
Published on: Jun 12, 2024
Published by: Professor Subhas Chandra Mukhopadhyay
In partnership with: Paradigm Publishing Services
Publication frequency: 1 issue per year

© 2024 Kailash Pati Mandal, Prasenjit Mukherjee, Devraj Vishnu, Baisakhi Chakraborty, Tanupriya Choudhury, Pradeep Kumar Arya, published by Professor Subhas Chandra Mukhopadhyay
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.