Fayaz Ahamed, Shaik and Karuppasamy, Sundarakumar and Chinnaiyan, Ponnuraja (2025) Clinical Text Classification for Tuberculosis Diagnosis Using Natural Language Processing and Deep Learning Model with Statistical Feature Selection Technique. Clinical Text Classification for Tuberculosis Diagnosis Using Natural Language Processing and Deep Learning Model with Statistical Feature Selection Technique, 12(3) (64).
![]() |
Text
202516.pdf - Published Version Download (1MB) |
Abstract
Background: In the medical field, various deep learning (DL) algorithms have been effec�tively used to extract valuable information from unstructured clinical text data, potentially leading to more effective outcomes. This study utilized clinical text data to classify clinical case reports into tuberculosis (TB) and non-tuberculosis (non-TB) groups using natural lan�guage processing (NLP), a pre-processing technique, and DL models. Methods: This study used 1743 open-source respiratory disease clinical text data, labeled via fuzzy matching with ICD-10 codes to create a labeled dataset. Two tokenization methods preprocessed the clinical text data, and three models were evaluated: the existing Text-CNN, the proposed Text-CNN with t-test, and Bio_ClinicalBERT. Performance was assessed using multiple metrics and validated on 228 baseline screening clinical case text data collected from ICMR–NIRT to demonstrate effective TB classification. Results: The proposed model achieved the best results in both the test and validation datasets. On the test dataset, it attained a precision of 88.19%, a recall of 90.71%, an F1-score of 89.44%, and an AUC of 0.91. Simi�larly, on the validation dataset, it achieved 100% precision, 98.85% recall, 99.42% F1-score, and an AUC of 0.982, demonstrating its effectiveness in TB classification. Conclusions: This study highlights the effectiveness of DL models in classifying TB cases from clinical notes. The proposed model outperformed the other two models. The TF-IDF and t-test showed statistically significant feature selection and enhanced model interpretability and efficiency, demonstrating the potential of NLP and DL in automating TB diagnosis in clinical decision settings.
Affiliation: | ICMR-National Institute for Research in Tubercuosis |
---|---|
Item Type: | Article |
Uncontrolled Keywords: | tuberculosis; electronic health record; deep learning; text convolutional neural network; Bio_ClinicalBERT; natural language processing |
Subjects: | Tuberculosis > Biostatistics Tuberculosis |
Divisions: | Statistics |
Depositing User: | Mrs. N Lakshmi |
Date Deposited: | 11 Jul 2025 08:36 |
Last Modified: | 11 Jul 2025 08:36 |
URI: | http://eprints.nirt.res.in/id/eprint/2045 |
Actions (login required)
![]() |
View Item |