%0 Journal Article
%T Smart Approaches to Efficient Text Mining for Categorizing Sexual Reproductive Health Short Messages into Key Themes
%A Tobias Makai
%A Mayumbo Nyirenda
%J Open Journal of Applied Sciences
%P 511-532
%@ 2165-3925
%D 2024
%I Scientific Research Publishing
%R 10.4236/ojapps.2024.142037
%X To promote behavioral change
among adolescents in Zambia, the National HIV/AIDS/STI/TB Council, in collaboration
with UNICEF, developed the Zambia U-Report platform. This platform provides
young people with improved access to information on various Sexual Reproductive Health topics through Short
Messaging Service (SMS) messages. Over the years, the platform has accumulated
millions of incoming and outgoing messages, which need to be categorized into
key thematic areas for better tracking of sexual reproductive health knowledge
gaps among young people. The current manual categorization process of these text
messages is inefficient and time-consuming and this study aims to
automate the process for improved analysis using text-mining techniques.
Firstly, the study investigates the current text message categorization process
and identifies a list of categories adopted by counselors over time which are
then used to build and train a categorization model. Secondly, the study
presents a proof of concept tool that automates the categorization of U-report
messages into key thematic areas using the developed categorization model.
Finally, it compares the performance and effectiveness of the developed proof
of concept tool against the manual system. The study used a dataset comprising
206,625 text messages. The current process would take roughly 2.82 years to
categorise this dataset whereas the trained SVM model would require only 6.4
minutes while achieving an accuracy of 70.4% demonstrating that the automated
method is significantly faster, more scalable, and consistent when compared to
the current manual categorization. These advantages make the SVM model a more
efficient and effective tool for categorizing large unstructured text datasets.
These results and the proof-of-concept tool developed demonstrate the potential
for enhancing the efficiency and accuracy of message categorization on the
Zambia U-report platform and other similar text messages-based platforms.
%K Knowledge Discovery in Text (KDT)
%K Sexual Reproductive Health (SRH)
%K Text Categorization
%K Text Classification
%K Text Extraction
%K Text Mining
%K Feature Extraction
%K Automated Classification Process
%K Performance
%K Stemming and Lemmatization
%K Natural Language Processing (NLP)
%U http://www.scirp.org/journal/PaperInformation.aspx?PaperID=131457