Abstract
HIV/AIDS remains a major global health challenge, with Sub-Saharan Africa carrying the highest burden. In Kenya, where adult prevalence is 4.3%, treatment interruption (IIT) continues to undermine antiretroviral therapy (ART) outcomes. This study applied machine learning (ML) to identify predictors of IIT and guide interventions in Machakos County, where prevalence is 3.3% and relies on manual appointment management of patients, physical tracing and phone tracing of patients. A retrospective cross-sectional study used secondary data from KenyaEMR covering 14,339 adults on ART between 2020 and 2024. Data preprocessing included cleaning, anonymization, imputation, encoding, LASSO feature selection, and SMOTE oversampling. Descriptive statistics and chi-square tests assessed associations, while Random Forest (RF), XGBoost, and Support Vector Machine (SVM) models were trained and validated to predict IIT. Overall, 910 patients (6%) experienced IIT. Risk was highest among adolescents and young adults (15-24 years), single individuals, urban residents, patients with viral load ≥1000 cps, those on ART <12 months, TB co-infected, and non-DTG regimen users. Poor adherence, unstable status, lack of phone ownership, and shorter refill durations also predicted IIT. Non-significant factors included sex, CD4 count, counseling, and clinic workload. Among models, RF achieved the best performance (recall 0.97, precision 0.87, F1 0.92, AUROC 0.96, accuracy 0.91), outperforming XGBoost and SVM. IIT in Machakos County is shaped by demographic, clinical, socioeconomic, and health system factors. Random Forest showed the best predictive capacity, highlighting the value of ML for early identification of at-risk patients. Strategies should include DTG scale-up, early retention support, multi-month dispensing, and digital health interventions. Integrating predictive analytics into EMRs can strengthen HIV program outcomes.
Keywords
HIV/AIDS Treatment Interruption, Antiretroviral Therapy Adherence, Machine Learning in Healthcare, XGBoost, Random Forest, Support Vector Machine, Electronic Medical Records, Machakos County
1. Introduction
HIV/AIDS stands as one of the most difficult health problems worldwide
. As per the Joint United Nations Programme on HIV/AIDS (UNAIDS), approximately 37.9 million individuals globally were living with HIV by the end of 2018, 1.7 million individuals contracted HIV and while 770,000 lost their lives due to AIDS-related complications
.
27.5 million individuals had been put on antiretroviral therapy (ART) as of 2020
. According to 2019 data on HIV globally, Sub-Saharan Africa contributed 25.7 million individuals with HIV globally. Eastern and Southern Africa had around 20.7 million while Western and Central Africa 4.9 million. Around 730,000 new infections were recorded in Eastern and Southern Africa and 240,000 in Western and Central Africa. The number of mortality due to HIV/AIDS was 300,000 in Eastern and Southern Africa and 140,000 in Western and Central Africa
| [14] | Uwishema, O., Taylor, C., Lawal, L., Hamiidah, N., Robert, I., Nasir, A., Chalhoub, E., Sun, J., Akin, B. T., Adanur, I., Mwazighe, R. M., & Onyeaka, H. (2022). The syndemic burden of HIV/AIDS in Africa amidst the COVID-19 pandemic. Immunity, Inflammation and Disease, 10(1), 26-32. https://doi.org/10.1002/iid3.544 |
[14]
. Kenya, situated within this region, struggles with a particularly high prevalence of HIV/AIDS, with an estimated 1.5 million people living with the virus in 2018.
In 2020, HIV/AIDS prevalence in Kenya, for adults was 4.3%, with women having higher prevalence at 5.5%, than men at 2.9%. In 2014, Kenya promised improving HIV/AIDS prevention interventions by coming up with the Kenya HIV/AIDS Prevention Revolution Roadmap 2030. The country had a goal to ensure that the rate of HIV/AIDS new infection is as low as 75%. This was to be made possible through the Kenya AIDS Strategic Framework 2014/15-2018/19. Despite all these interventions, the country failed to realize the HIV/AIDS prevention objective by 2019 and has reprioritized lowering new HIV/AIDS infections, the major goal of the Kenya AIDS Strategic Framework II, 2019/20- 2024/25. There were thirteen counties that contributed to a large proportion of the country’s new infections with Machakos County being one of them, having 72% of new infections. Additionally, twenty-two counties, including Machakos County, had an elevated HIV/AIDS prevalence in Key population (KPs) and the general population
. Machakos County did not also achieve global and national targets to ensure that HIV/AIDS infection is reduced by 72% between 2013 and 2021
| [8] | Ministry of Health, Kenya. (2022a). HIV Prevention Delivery Landscape in Kenya. Nairobi: Government of Kenya. Ministry of Health, Kenya. (2022b). Kenya World AIDS Day Report. Nairobi: Government of Kenya. |
[8]
. Additionally, Machakos County has a HIV/AIDS prevalence of 3.3%. The prevalence also needs to be reduced to 0% to help reduce the HIV/AIDS transmission rate within the region
| [9] | Naar, S., Outlaw, A., MacDonell, K., Jones, M., White, J., Secord, E., & Templin, T. (2023). Information-Motivation-Behavioral Skills model in youth newly starting antiretroviral treatment. AIDS and Behavior, 27(8), 2785-2790. https://doi.org/10.1007/s10461-023-04002-6 |
[9]
.
Treatment interruptions, defined as disruptions or irregularities in HIV/AIDS medication adherence, lead to severe risks for both individual health outcomes and broader public health objectives. These interruptions can lead to drug resistance, CD4 count losses, a rise in mortality and morbidity, disease progression and increase in HIV transmission to the community
| [13] | Thomadakis, C., Yiannoutsos, C. T., Pantazis, N., Diero, L., Mwangi, A., Musick, B. S., Wools-Kaloustian, K., & Touloumi, G. (2023). The effect of HIV treatment interruption on subsequent immunological response. American Journal of Epidemiology, 192(7), 1181-1191. https://doi.org/10.1093/aje/kwad076 |
[13]
. HIV drug resistance arises from the virus’s high mutation rate and rapid replication, driven by the error-prone reverse transcriptase enzyme that introduces genetic variations during replication
| [22] | Menéndez-Arias, L. (2009). Mutation rates and intrinsic fidelity of retroviral reverse transcriptases. Viruses, 1(3), 1137-1165. https://doi.org/10.3390/v1031137 |
| [23] | Mansky, L. M., & Temin, H. M. (1995). Lower in vivo mutation rate of HIV-1 than predicted from the fidelity of purified reverse transcriptase. Journal of Virology, 69(8), 5087-5094. |
[22, 23]
. These mutations can alter viral enzymes and proteins, reducing the effectiveness of antiretroviral drugs and leading to treatment failure
| [24] | Boyer, P. L., Sarafianos, S. G., Arnold, E., & Hughes, S. H. (2014). The M184V mutation reduces the polymerase activity of human immunodeficiency virus type 1 reverse transcriptase. Journal of Virology, 88(8), 4744-4753. https://pmc.ncbi.nlm.nih.gov/articles/PMC4097814/ |
| [25] | Sarafianos, S. G., Marchand, B., Das, K., Himmel, D. M., Parniak, M. A., Hughes, S. H., & Arnold, E. (2009). Structure and function of HIV-1 reverse transcriptase: Molecular mechanisms of polymerization and inhibition. Journal of Molecular Biology, 385(3), 693-713. https://doi.org/10.1016/j.jmb.2008.10.071 |
[24, 25]
. Inconsistent adherence or suboptimal drug levels further select for resistant strains, increasing viral load and heightening the risk of treatment interruption (IIT)
| [26] | Coffin, J. M. (2013). HIV population dynamics in vivo: Implications for genetic variation, pathogenesis, and therapy. Science, 267(5197), 483-489. |
[26]
. Machine learning (ML) models, such as Random Forest, XGBoost, and SVM, can predict these dynamics by analyzing complex interactions among viral, clinical, and behavioral variables such as viral load trends, ART regimen types, adherence patterns, and demographic factors
. By identifying early warning signs of resistance and potential treatment failure, ML models enable timely interventions, including regimen adjustments and targeted adherence support
. Integrating these predictive tools into HIV programs can therefore help mitigate drug resistance and reduce IIT, ultimately improving long-term ART outcomes
HIV/AIDS programs have tried putting more effort into ensuring that the epidemic is managed and controlled. However, treatment discontinuation in antiretroviral therapy (ART) has negatively impacted on their efforts to manage HIV/AIDS, leading to viral rebound.
There are specific characteristics associated with IIT patients including age, gender, CD4 cell count and education level
| [15] | Ross, D. P. T. (2020). Reasons cited for the interruption of antiretroviral treatment in the Bloemfontein/Mangaung area (Doctoral dissertation, University of the Free State). https://scholar.ufs.ac.za |
[15]
. Medication-related factors, healthcare system, psychosocial reasons, employment status and poverty are also some of the factors linked to treatment interruptions. Therefore, perfect treatment adherence requires a continuous availability of drugs, educating patients on HIV/AIDS treatment and guidance on side effects management
| [16] | Stockman, J., Friedman, J., Sundberg, J., Harris, E., & Bailey, L. (2022). Predictive analytics using machine learning to identify ART clients at risk of treatment interruption in Mozambique and Nigeria. JAIDS Journal of Acquired Immune Deficiency Syndromes, 90(2), 154-160. https://journals.lww.com/jaids/fulltext/2022/06010 |
[16]
.
HIV/AIDS treatment interruptions by patients need to be addressed so that the life of patients is taken care of and protected from earlier loss of life due to the infection. One of the various ways to address it involves the use of machine learning whereby a client who is likely to be an IIT is detected earlier and various strategies put in place to ensure that this client is prevented from becoming an IIT before it happens. Machine learning focusses on studying the scientific aspects of algorithms and statistical models enabling computer systems perform tasks without being manually programmed
. Machine learning approaches have been applied to reduce the rate of interruptions in treatment, improve adherence to treatment by patients for better health results and reduce transmission. We see the use of binary classification techniques involving neural networks, tree-based models and logistic regression to predict patients’ interruption in treatment. They demonstrated high prediction accuracy. The boosted tree model has been used for interruptions in treatment prediction, and performance was also good. There is also the use of random forest model in predicting IIT and the model produced good performance
| [17] | Ogbechie, M. D., Fischer Walker, C., Lee, M. T., Abba Gana, A., Oduola, A., Idemudia, A., & Persaud, N. E. (2023). Predicting treatment interruption among people living with HIV in Nigeria: A machine learning approach. JMIR AI, 2(1), e44432. https://ai.jmir.org/2023/1/e44432 |
[17]
. Boosting tree and Extreme Gradient Boosting have also been employed to predict future interruptions
| [18] | Esra, R., Carstens, J., Le Roux, S., Mabuto, T., Eisenstein, M., Keiser, O., & Sharpey-Schafer, K. (2023). Validation and improvement of a machine learning model to predict interruptions in antiretroviral treatment in South Africa. JAIDS Journal of Acquired Immune Deficiency Syndromes, 92(1), 42-49. https://doi.org/10.1097/QAI.0000000000003332 |
[18]
. Esra justifies that machine learning models help retain patients on treatment and enhance the effectiveness of HIV/AIDS care interventions since they can identify early enough these patients that are likely to become IIT, and the models performed better
| [19] | Jackins, V., Vimal, S., Kaliappan, M., & Lee, M. Y. (2021). AI-based smart prediction of clinical disease using random forest classifier and Naïve Bayes. The Journal of Supercomputing, 77(5), 5198-5219. https://doi.org/10.1007/s11227-020-03481-x |
[19]
.
ML approaches have also been employed for prediction in different fields, for instance, Decision Tree (DT), random forest (RF), Naïve Bayes (NB), Logistic Regression (LR), XGBoost (XGB), Gradient Boosting Classifier (GBC), Artificial Neural Network (ANN) and Support Vector Machine (SVM) have been used for prediction of mental health
| [4] | Jain, T., Jain, A., Hada, P., Kumar, H., Verma, V., & Patni, A. (2021). Machine learning techniques for prediction of mental health. Proceedings of the IEEE International Conference on Intelligent and Computing Research Applications (ICIRCA), 1606-1613. https://doi.org/10.1109/ICIRCA51532.2021.9545061 |
[4]
. The XGBoost classifier and SVM have also been used for Prediction of COVID-19 Severity, where the XGBoost achieved an accuracy of 97% and 98% precision, while SVM achieved 97% accuracy and 96% precision
. We also see a scenario where extreme gradient boost, multivariate linear regression and random forest have been used to predict daily rainfall amount, and the Extreme Gradient Boosting performed better than others
. Decision Tree (DT), Support Vector Machine (SVM), Gradient Boosting (GB), Logistic Regression (LR), K-Nearest Neighbor (KNN) and Random Forest (RF) have all been applied in diabetes prediction, and Random Forest had the best accuracy performance
. Naive Bayes and a random forest have also been used in detecting the presence of diabetes, heart disease and cancer and a performance analysis conducted
| [20] | Tiomoko, M., Schnoor, E., Seddik, M. E. A., Colin, I., & Virmaux, A. (2022). Deciphering Lasso-based classification through large dimensional analysis of the iterative soft-thresholding algorithm. Proceedings of the 39th International Conference on Machine Learning, 21449-21477. https://proceedings.mlr.press/v162/tiomoko22a.html |
[20]
. Due to its simplicity and flexibility in addressing classification tasks, Support Vector Machine has also been used in predicting diagnosis and prognosis of brain diseases
. Various techniques have been used to accurately arrive at features or factors that are associated with IIT. One of the techniques is the Least Absolute Shrinkage and Selection Operator (LASSO) model. It possesses the ability to induce coefficient shrinkage and variable selection improving performance metrics of models. Its regression is the best in feature selection due to its ability to handle high-dimensional data and improve model interpretability. It performs both variable selection and regularization by penalizing the absolute size of the regression coefficients, effectively shrinking less important coefficients to zero. This helps in identifying the most relevant predictors while minimizing overfitting. Compared to traditional methods. It enhances model efficiency, especially when multicollinearity exists among variables, and ensures that only the most significant features contribute to the predictive performance of the model
| [21] | Araveeporn, A. (2021). The higher-order of adaptive Lasso and elastic net methods for classification on high dimensional data. Mathematics, 9(1091). https://doi.org/10.3390/math9101091 |
[21]
. It is evidenced that machine learning models are good in prediction, since they have been used in different fields for prediction, and they have performed better. They can therefore be applied in this study to help in predicting HIV/AIDS interruption in treatment and therefore help in retaining the clients on care and preventing loss of life due to HIV/AIDS infection.
The study area for this research is Machakos County, located in Kenya's Lower Eastern Region. Machakos County mirrors the broader challenges faced in combating HIV/AIDS. However, specific factors contributing to treatment interruptions in this area may differ from those observed in other regions of the country. Sociodemographic characteristics, clinical factors, socioeconomic status, access to healthcare services, and the organization of the health system are among the many factors that play crucial roles in determining treatment adherence among HIV/AIDS patients.
By looking at Machakos County, this study intends to improve the global knowledge on HIV/AIDS treatment interruptions while concurrently informing evidence-based interventions tailored to the local environment. Insights gathered from this study possess the potential to empower policymakers, healthcare providers, and public health practitioners to craft targeted strategies aimed at enhancing treatment adherence, strengthening health outcomes, and ultimately propelling progress towards ending the epidemic by 2030. This study aspires to chart a course towards more effective HIV/AIDS management models, not only within Machakos County but also across broader geographical and sociocultural contexts.
Therefore, this study aims to fill a gap in current HIV/AIDS research by applying the models to predict treatment interruptions among patients in Machakos County, Kenya. This was achieved by identifying machine learning algorithm for HIV/AIDS treatment interruption prediction, important prediction variables for IIT using feature selection, the models for predicting IIT, and comparing the performance of the models for IIT prediction for ART patients in Machakos County. This research seeks to inform evidence-based interventions aimed at managing interruptions in treatment and improving health outcomes for HIV/AIDS patients in the region.
2. Equations
1) ROC-AUC (Area Under the Receiver Operating Characteristic Curve)
The AUROC is a performance metric that shows how well the model distinguishes between the positive class (IIT) and the negative class (non-IIT) at various threshold settings. The ROC (Receiver Operating Characteristic) curve plotted the True Positive Rate (Recall) against the False Positive Rate (1 - Specificity).
Where:
2) Precision (Positive Predictive Value): It is the proportion of true positive predictions out of all predictions where the model predicted the positive class (IIT).
3) Recall (Sensitivity, True Positive Rate): It is the measurement of the proportion of actual positive cases (IIT) that the model correctly identifies.
4) F1-Score (Harmonic Mean of Precision and Recall): It is the harmonic mean of Precision and Recall, providing a single metric that balances the two.
5) Accuracy: It is the proportion of all correctly predicted cases (both true positives and true negatives) out of the total number of predictions.
Where:
TP = True Positives (correctly predicted IIT)
TN = True Negatives (correctly predicted non-IIT)
3. Materials and Methods
3.1. Data Source and Study Objective
The data was obtained from an electronic medical record (Kenyaemr) among care and treatment public health facilities within Machakos County Kenya. The data was then exported to excel. The data consisted of 14,339 patients aged ≥18 years on ART between January 2020 and December 2024. Variables included Demographic (age, marital status, sex, residence); Clinical/Treatment (CD4, viral load, ART duration, TB status, regimen, categorization, adherence); Socioeconomic (education, employment, mobile phone access); Health System (MMD, dispensing site, DC model, counseling, QOC, HCW workload). Outcome: ART status (Active vs. IIT). The primary objective of the study was to apply advanced machine learning models including Support Vector Machine, XGBoost and Random Forest to predict interruptions in HIV/AIDS treatment among patients in Machakos County with an aim of reducing the rate of interruptions in treatment by HIV/AIDS patients.
3.2. Data Preprocessing
The data was exported to excel, which contained anonymized, patient-level data. Anonymization process involved generalization, aggregation and deletion of personally identifiable information. Missing values were handled by mode imputation while label encoding converted categorical predictors. The Synthetic Minority Oversampling Technique (SMOTE) corrected class imbalance in the dataset and LASSO selected key predictors.
3.3. Data Analysis
Chi-square test was used for the analysis of categorical variables to determine factors associated with IIT.
3.4. Model Development
The processed data was divided into 80: 20 (train: test) using stratified sampling. Random Forest, XGBoost, and SVM were then trained using this data. Hyperparameter tuning used grid search with cross-validation. Training time was also recorded for each model and how they predict the IIT using the new unseen data.
3.5. Model Evaluation
Evaluation Metrics included Accuracy, precision, recall, F1-score, AUROC. The performance of each model was assessed using the mentioned evaluation metrices in order to find out how best each performs in predicting IIT. The definition and meaning of the metrices are explained under topic 2.0.
4. Results
Descriptive Statistics: The study analyzed 14,339 patients, of whom 94% were active on ART and 6% had experienced treatment interruption (IIT). The majority were aged ≥40 years (61.8%), female (69.0%), and married (59.8%), with most residing in rural areas (56.1%). Clinically, most participants had CD4 ≥200 cells/mm³ (85.9%), suppressed viral load <50 cps (88.4%), and had been on ART for >12 months (93.7%). TB co-infection was rare (2.0%), while nearly all patients were on DTG-based regimens (98.0%), with stable categorization (95.3%) and good adherence (95.3%). Socioeconomically, secondary education was most common (71.8%), with farming (50.7%) and trading (32.3%) as predominant occupations. Mobile phone ownership was reported by 62.9% of patients. In terms of service delivery, most received ART through <3-month MMD (43.2%) and accessed fast-track differentiated care (75.2%), with universal availability of counseling and quality-of-care services. However, 71.2% of facilities reported high patient loads (>50 clients per clinic day).
Table 1. Descriptive statistics of factors associated with IIT.
Demographic Factors |
Age category | Frequency | Percentage |
Total | 14339 | 100 |
40+yrs | 8858 | 61.78 |
25-39yrs | 3831 | 26.72 |
15-24yrs | 910 | 6.35 |
0-14yrs | 740 | 5.16 |
Marital Status |
Married | 8571 | 59.77 |
Single | 3253 | 22.69 |
Widowed | 1503 | 10.48 |
Divorced | 1012 | 7.06 |
Sex |
F | 9897 | 69.02 |
M | 4442 | 30.98 |
Residence |
Rural | 8049 | 56.13 |
Urban | 6290 | 43.87 |
Clinical and treatment-related factors |
CD4_Category | Frequency | Percentage |
Total | 14339 | 100 |
200+ | 12316 | 85.89 |
below 200 | 2023 | 14.11 |
VL category |
0-49cps | 12680 | 88.43 |
1000+cps | 679 | 4.74 |
50-199cps | 614 | 4.28 |
200-999cps | 366 | 2.55 |
ART Duration |
12+months | 13431 | 93.67 |
Below 12 months | 908 | 6.33 |
TB Status |
No TB | 14047 | 97.96 |
Has TB | 292 | 2.04 |
Current Regimen type |
DTG Regimen | 14046 | 97.96 |
Non DTG Regimen | 293 | 2.04 |
Patient Categorization |
Stable | 13668 | 95.32 |
Unstable | 671 | 4.68 |
ART_Adherence |
Good | 13668 | 95.32 |
Poor | 671 | 4.68 |
Socioeconomic and behavioral factors |
Education Level | Frequency | Percentage |
Total | 14339 | 100 |
Secondary | 10298 | 71.82 |
Primary | 3123 | 21.78 |
Higher | 780 | 5.44 |
None | 138 | 0.96 |
Employment |
Farmer | 7273 | 50.72 |
Trader | 4624 | 32.25 |
None | 964 | 6.72 |
Formal Employment | 891 | 6.21 |
Student | 492 | 3.43 |
Driver | 95 | 0.66 |
Active Mobile Phone |
Yes | 9019 | 62.9 |
No | 5320 | 37.1 |
Health service delivery factors |
MMD category | Frequency | Percentage |
Total | 14339 | 100 |
below 3months | 6198 | 43.22 |
3-5months | 4815 | 33.58 |
6+months | 3326 | 23.2 |
Dispensing Site Type |
Close | 14009 | 97.7 |
Open | 330 | 2.3 |
DC Model |
Fast Track | 10782 | 75.19 |
Standard Care | 2312 | 16.12 |
Facility ART distribution group | 915 | 6.38 |
Community ART distribution Peer led | 315 | 2.2 |
Community ART distribution - HCW led | 15 | 0.1 |
Counselling Services Available |
Yes | 14339 | 100 |
QOC_Available |
Yes | 14339 | 100 |
HCW per Clinic Day |
>50 clients per clinic | 10210 | 71.2 |
<50 clients per clinic | 4129 | 28.8 |
Inferential Analysis: Chi-square analysis revealed significant associations between several demographic, clinical, socioeconomic, and health service delivery factors and Interruption in Treatment (IIT). Demographically, adolescents and young adults (15-24 years), single or divorced individuals, and urban residents were at higher risk. Clinically, patients with high viral load (≥1000 cps), shorter ART duration (<12 months), TB co-infection, non-DTG regimens, unstable categorization, and poor adherence experienced substantially higher IIT, while CD4 count and sex showed no significant effect. Socioeconomically, secondary-level education, trading or farming occupations, and lack of mobile phone ownership were linked to increased IIT, whereas higher education, formal employment, and active phone ownership were protective. From a service delivery perspective, shorter refill durations (<3 months), close dispensing sites, and certain DC models (fast-track) were associated with higher IIT, while longer refills (≥6 months) were protective. Counseling availability, quality of care, and clinic workload were not significantly associated. Overall, younger age, urban residence, poor adherence, unstable status, non-DTG regimens, and lack of phone access emerged as the strongest risk factors for IIT.
Table 2. Inferential Analysis of Factors Associated with IIT.
Factors | Categories | Chi-square statistic | Significant (P-value) | Significance level (P-value) | Category most likely to be IIT |
Age category (in Years) | 0-14yrs 15-24yrs 25-39yrs 40+yrs | 242.19 | Yes | 0.0000 | 15-24: 11.6% 0-14yrs: 4.5% (33/740), 15-24yrs: 11.6% (106/910), 25-39yrs: 10.6% (408/3831), 40+yrs: 4.1% (363/8858) |
Marital Status | Divorced Married Single Widowed | 18.02 | Yes | 0.0004 | Single: 7.5% Divorced: 7.4% (75/1012), Married: 6.1% (524/8571), Single: 7.5% (243/3253), Widowed: 4.5% (68/1503) |
Residence | Rural Urban | 196.98 | Yes | 0.0000 | Urban: 9.6% Rural: 3.8% (307/8049), Urban: 9.6% (603/6290) |
Viral Load Category (in copies) | <50 50-199 200-999 1000+ | 355.05 | Yes | 0.0000 | 1000+cps: 23.4% 0-49cps: 5.4% (686/12680), 1000+cps: 23.4% (159/679), 200-999cps: 8.5% (31/366), 50-199cps: 5.5% (34/614) |
ART Duration (In months) | 12+months below 12months | 238.82 | Yes | 0.0000 | below 12months: 18.5% 12+months: 5.5% (742/13431), Below 12 months: 18.5% (168/908) |
TB Status | Has TB No TB | 5.84 | Yes | 0.0156 | Has TB: 9.9% Has TB: 9.9% (29/292), No TB: 6.3% (881/14047) |
Current Regimen | DTG_Based Non_DTG | 1265.10 | Yes | 0.0000 | Non_DTG: 56.7% DTG Regimen: 5.3% (744/14046), Non DTG Regimen: 56.7% (166/293) |
Patient Categorization | Stable Unstable | 529.79 | Yes | 0.0000 | Unstable: 27.6% Stable: 5.3% (725/13668), Unstable: 27.6% (185/671) |
ART Adherence | Good Poor | 529.79 | Yes | 0.0000 | Poor: 27.6% Good: 5.3% (725/13668), Poor: 27.6% (185/671) |
Education Level | Higher Primary Secondary None | 314.65 | Yes | 0.0001 | Secondary: 8.7% Higher: 0.5% (4/780), Primary: 0.5% (15/3123), Secondary: 8.7% (891/10298) |
Employment | Driver Employee Farmer Student Trader | 113.78 | Yes | 0.0001 | Trader: 8.4% Driver: 1.1% (1/95), Farmer: 6.9% (503/7273), Formal Employment: 0.2% (2/891), Student: 0.8% (4/492), Trader: 8.4% (389/4624) |
Active Mobile Phone ownership | No Yes | 591.12 | Yes | 0.0000 | No: 12.8% No: 12.8% (681/5320), Yes: 2.5% (229/9019) |
MMD Category | 3-5months 6+months below 3months | 171.37 | Yes | 0.0000 | below 3months: 9.3% 3-5months: 4.9% (238/4815), 6+months: 2.9% (97/3326), below 3months: 9.3% (575/6198) |
Dispensing Site Type | Close Open | 19.73 | Yes | 0.0000 | Close: 6.5% Close: 6.5% (909/14009), Open: 0.3% (1/330) |
Differentiated Care (DC) Model | Community ART distribution Facility ART distribution Fast Track Standard Care | | Yes | 0.0001 | Fast Track: 7.0% Community ART distribution - HCW led: 0.0% (0/15), Community ART distribution Peer led: 0.3% (1/305), Facility ART Distribution Group: 0.0% (0/6), Facility ART distribution group: 1.2% (11/909), Fast Track: 7.0% (753/10782), Peer Led Community ART Group (PCAG): 0.0% (0/10), Standard Care: 6.3% (145/2312) |
Sex | F M | 1.90 | No | 0.1683 | M: 6.8% F: 6.2% (609/9897), M: 6.8% (301/4442) |
CD4 Category | | 0.0076 | No | 0.9305 | 200+: 6.4% (783/12316), below 200: 6.3% (127/2023) |
Counselling Services Availability | Yes No | | No | 1.0000 | |
QOC Availability | Yes No | | No | 1.0000 | |
HCWs per clinic day | <50 clients per clinic >50 clients per clinic | 0.52 | No | 0.4705 | >50 clients per clinic: 6.4% <50 clients per clinic: 6.1% (252/4129), >50 clients per clinic: 6.4% (658/10210) |
Training and fitting of XGBoost, Random Forest, and SVM for Predicting HIV/AIDS Treatment Interruptions: It took 2.076718 seconds to train and fit the XGBoost using the dataset, 4.650424 seconds for Random Forest and 9.507085 seconds Support Vector Machine. LASSO technique was used for feature selection.
Figure 1. The duration took to train and fit each model in seconds.
Figure 2. Lasso feature selection from the order of the feature importance based on the importance scores of each feature.
Comparing the Performance of the models in predicting HIV/AIDS treatment interruption before and after hyperparameter tuning using the evaluation metrics: Before hyperparameter tuning, XGBoost outperformed Random Forest (RF) and Support Vector Machine (SVM), achieving the highest recall (0.34), F1 (0.45), and ROC AUC (0.88), while SVM recorded the highest precision (0.76) but lowest recall (0.22). After tuning, performance improved markedly across all models. XGBoost achieved the highest recall (0.99), making it most sensitive to detecting IIT, but at the expense of lower precision (0.67) and overall accuracy (0.75). Random Forest provided the most balanced performance with high recall (0.97), strong precision (0.87), F1 (0.92), ROC AUC (0.96), and accuracy (0.91), emerging as the best overall model. SVM also improved substantially (recall 0.94, precision 0.84, F1 0.89, ROC AUC 0.93, accuracy (0.88), performing second to Random Forest but better than tuned XGBoost in overall balance.
Table 3. Comparing the three models; XGBoost, Random Forest and Support Vector Machine.
Evaluation Metric | XGBoost performance before hyperparameter tuning | XGBoost performance after hyperparameter tuning | Random Forest performance before hyperparameter tuning | Random Forest performance after hyperparameter tuning | Support Vector Machine performance before hyperparameter tuning | Support Vector Machine performance after tuning |
Recall (IIT) | 0.34 | 0.99 | 0.33 | 0.97 | 0.22 | 0.94 |
Precision (IIT) | 0.66 | 0.67 | 0.55 | 0.87 | 0.76 | 0.84 |
F1 (IIT) | 0.45 | 0.80 | 0.42 | 0.92 | 0.34 | 0.89 |
ROC-AUC | 0.88 | 0.88 | 0.83 | 0.96 | 0.77 | 0.93 |
Accuracy | 0.94 | 0.75 | 0.94 | 0.91 | 0.94 | 0.88 |
5. Discussion
5.1. Factors Associated with IIT
This study identified multiple demographic, clinical, socioeconomic, and service delivery factors associated with Interruption in Treatment (IIT) among people living with HIV in Machakos County. Key predictors of IIT included younger age (15-24 years), high viral load, ART duration <12 months, non-DTG regimen use, poor adherence, unstable treatment status, and lack of mobile phone ownership. Moderate predictors were marital status, residence, education, employment, TB co-infection, and dispensing site type, while sex, CD4 count, and clinic-level service quality showed no significant associations. These findings highlight IIT as a multifactorial outcome shaped by individual, socioeconomic, and health system factors. Targeted interventions for adolescents, single and urban patients, and socioeconomically disadvantaged groups are essential. Scaling up DTG, strengthening early retention, expanding multi-month dispensing, and leveraging mHealth solutions remain critical for sustaining treatment continuity and advancing HIV epidemic control in Kenya.
Demographic factors significantly influenced IIT outcomes. Younger adults (15-24 years) were more likely to interrupt treatment than older adults, consistent with findings by
| [29] | Mtisi, E. L., Mushy, S. E., Mkawe, S. G., Ndjovu, A., Mboggo, E., Mlay, B. S., & Muya, A. (2023). Risk factors for interruption in treatment among HIV-infected adolescents attending care clinics in Tanzania. AIDS Research and Therapy, 20(1), 19. https://doi.org/10.1186/s12981-023-00498-2 |
[29]
. Older adults (≥40 years), on the other hand, demonstrated better treatment continuity, possibly due to greater treatment experience and family support systems. Marital status also emerged as an important determinant with single or separated individuals having higher IIT risk, underscoring the protective effect of social and spousal support as documented. This, however, contradicted the findings by Sunday et al about marital status
| [30] | Ikpe, S., Gambo, A., Nowak, R., Sorkin, J., Charurat, M., O’Connor, T., & Stafford, K. (2024). Predictors of interruptions in antiretroviral therapy among people living with HIV in Nigeria: A retrospective cohort study. medRxiv. https://doi.org/10.1101/2024.03.01.23293852 |
[30]
.
Geographical location was another important factor. Urban residents exhibited higher IIT rates than rural residents, likely due to migration, mobility, and economic instability, aligning with findings from Tomescu et al. Interestingly, gender was not significantly associated with IIT, contradicting Tomescu et al.
| [31] | Tomescu, S., Crompton, T., Adebayo, J., Kinge, C. W., Akpan, F., Rennick, M., & Pisa, P. T. (2021). Factors associated with interruption in treatment among people living with HIV in USAID-supported states in Nigeria: A retrospective study (2000-2020). BMC Public Health, 21(1), 2194. https://doi.org/10.1186/s12889-021-12254-1 |
[31].
Clinical and treatment-related factors also shaped treatment continuity. Participants with high viral loads (≥1000 copies/mL) were significantly more likely to interrupt treatment, suggesting that virologic failure remains a key predictor of disengagement
| [31] | Tomescu, S., Crompton, T., Adebayo, J., Kinge, C. W., Akpan, F., Rennick, M., & Pisa, P. T. (2021). Factors associated with interruption in treatment among people living with HIV in USAID-supported states in Nigeria: A retrospective study (2000-2020). BMC Public Health, 21(1), 2194. https://doi.org/10.1186/s12889-021-12254-1 |
[31]
. Those recently initiated on ART were more vulnerable, consistent with Redempta et al.
| [32] | Mbatia, R. J., Mtisi, E. L., Ismail, A., Henjewele, C. V., Moshi, S. J., Christopher, A. K., & Matiko, E. J. (2023). Interruptions in treatment among adults on antiretroviral therapy before and after test-and-treat policy in Tanzania. PLoS ONE, 18(11), e0292740. https://doi.org/10.1371/journal.pone.0292740 |
[32]
. TB co-infection further increased vulnerability, likely due to pill burden and side effects, contradicting Mtisi et al. Those on non-DTG regimens had higher IIT rates than those on DTG-based therapies, reinforcing Mtisi et al.
| [29] | Mtisi, E. L., Mushy, S. E., Mkawe, S. G., Ndjovu, A., Mboggo, E., Mlay, B. S., & Muya, A. (2023). Risk factors for interruption in treatment among HIV-infected adolescents attending care clinics in Tanzania. AIDS Research and Therapy, 20(1), 19. https://doi.org/10.1186/s12981-023-00498-2 |
[29]
.
Socioeconomic and behavioral factors further influenced retention. Participants with lower education levels were more likely to experience IIT compared to those with postsecondary education, contradicting the finding by Akpan et al.
| [33] | Akpan, U., Kakanfo, K., Ekele, O. D., Ukpong, K., Toyo, O., Nwaokoro, P., & Bateganya, M. (2023). Predictors of treatment interruption among patients on antiretroviral therapy in Akwa Ibom, Nigeria: Outcomes after 12 months. AIDS Care, 35(1), 114-122. https://doi.org/10.1080/09540121.2022.2070554 |
[33]
. Individuals with higher education levels likely benefited from greater treatment literacy and health-seeking behavior. Those with informal occupations, particularly traders, showed higher IIT risk. Mobile phone ownership was strongly protective against IIT, consistent with evidence that mobile health (mHealth) interventions such as SMS reminders and appointment alerts improved ART adherence
| [34] | Kim, H., Goldsmith, J. V., Sengupta, S., Mahmood, A., Powell, M. P., Bhatt, J., & Bhuyan, S. S. (2019). Mobile health applications and e-health literacy: Opportunities and concerns for cancer patients and caregivers. Journal of Cancer Education, 34(1), 3-8. https://doi.org/10.1007/s13187-017-1272-1 |
| [35] | Chang, H. Y., Hou, Y. P., Yeh, F. H., & Lee, S. S. (2020). The impact of an mHealth app on knowledge, skills, and anxiety about dressing changes: A randomized controlled trial. Journal of Advanced Nursing, 76(4), 1046-1056. https://doi.org/10.1111/jan.14297 |
[34, 35].
Service delivery approaches were equally influential. Patients receiving shorter ART refill intervals (<3 months) were more prone to interruption, reflecting the logistical challenges of frequent clinic visits agreeing with Tomescu et al.
| [31] | Tomescu, S., Crompton, T., Adebayo, J., Kinge, C. W., Akpan, F., Rennick, M., & Pisa, P. T. (2021). Factors associated with interruption in treatment among people living with HIV in USAID-supported states in Nigeria: A retrospective study (2000-2020). BMC Public Health, 21(1), 2194. https://doi.org/10.1186/s12889-021-12254-1 |
[31]
. Differentiated service delivery models that included multi-month dispensing (MMD) and community ART refill mechanisms improved retention outcomes, while fast-track models yielded mixed effects, possibly because they address convenience but not psychosocial barriers
| [36] | Nsoh, M., Tshimwanga, K. E., Ngum, B. A., Mgasa, A., Otieno, M. O., Moali, B., & Halle-Ekane, G. E. (2021). Predictors of antiretroviral therapy interruptions and factors influencing return to care in Cameroon. African Health Sciences, 21(1), 29-38. https://doi.org/10.4314/ahs.v21i1.6 |
[36]
. Collectively, these findings underscore that IIT arises from an interplay of demographic, clinical, and structural factors, emphasizing the importance of comprehensive, context-specific interventions.
5.2. Training and Fitting of XGBoost, Random Forest, and SVM for Predicting HIV/AIDS Treatment Interruptions
Machine learning models further enhanced understanding of these predictors. Preprocessing steps, including SMOTE and LASSO, improved model robustness. XGBoost trained the fastest and achieved the highest recall (0.99) but with reduced precision and accuracy. Support Vector Machine (SVM) improved with tuning but remained less efficient and had lower recall as compared to XGBoost. Random Forest achieved the best overall performance, with strong recall (0.97), precision (0.87), and ROC-AUC (0.96), providing the most balanced and operationally relevant model.
This study applied XGBoost, Random Forest, and Support Vector Machine (SVM) to predict interruption in treatment (IIT) among people living with HIV. Data preprocessing involved label encoding, LASSO feature selection, and SMOTE oversampling to address class imbalance. XGBoost achieved the strongest predictive performance and fastest training time (2.076718 seconds), with hyperparameter tuning improving recall for IIT cases (0.99). Random Forest, though slower in training compared to XGBoost (4.650424 seconds), had balanced accuracy, recall, and interpretability through variable importance ranking. SVM was accurate but computationally intensive (9.507085 seconds), limiting its practical use in large datasets.
5.3. Comparison of the Models
Overall, Random Forest offered the best trade-off between performance and interpretability, while XGBoost excelled in efficiency. Random Forest achieved the best overall performance, with strong recall (0.97), precision (0.87), and ROC-AUC (0.96), providing the most balanced and operationally relevant model. These findings highlight the utility of machine learning in strengthening HIV program monitoring and informing targeted interventions.
5.4. Limitations of the Study
The dataset was limited to Machakos County, and results may not be generalizable to other regions with different patient dynamics. Only three machine learning models were tested, therefore other algorithms such as neural networks or ensemble methods could potentially perform better if explored.
5.5. Ethical Considerations
The study received ethical clearance from Africa international university (AIU) for institutional scientific ethics review committee (ISERC) then the National Commission for Science, Technology, and Innovation (NACOSTI), and finally Machakos County Ministry of Health. All data were de-identified to ensure participant privacy and compliance with data protection policies.
6. Conclusions
This study demonstrates the utility of machine learning models in predicting interruption in treatment (IIT) among people living with HIV. By applying LASSO for feature selection, the analysis identified the most relevant predictors, thereby reducing model complexity and enhancing interpretability. Among the tested models, Random Forest provided the most balanced performance across accuracy, precision, F1 score, and ROC-AUC, making it highly suitable for routine programmatic use. XGBoost, while slightly less balanced, offered superior recall, making it particularly valuable in contexts where early identification of at-risk patients is critical to preventing treatment interruptions. These findings underscore the potential of integrating predictive analytics into HIV programs to strengthen patient monitoring, guide targeted interventions, and ultimately improve long-term treatment outcomes.
Abbreviations
RF | Random Forest |
XGB | Extreme Gradient Boosting |
SVM | Support Vector Machine |
ML | Machine Learning |
IIT | Interruption in Treatment |
UNAIDS | United Nations Programme on HIV/AIDS |
HIV/AIDS | Human Immunodeficiency Virus |
AIDS | Acquired Immunodeficiency Syndrome |
ART | Antiretroviral Therapy |
NACC | National AIDS Control Council |
KNN | K-Nearest Neighbors |
AUROC | Area Under Receiver Operating Curve aOR - Adjusted Odds Ratio |
LTFU | Loss to Follow-up |
ISERC | Institutional Scientific Ethics Review Committee |
AIU | Africa International University |
SMOTE | Synthetic Minority Oversampling Technique |
LASSO | Least Absolute Shrinkage and Selection Operator |
MMD | Multi Month Dispensing |
DTG | Dolutegravir |
DC | Differentiated Care |
Acknowledgments
I recognize the assistance of my supervisors, Dr. Katila, and Dr. Sam Njuki, for their support, guidance, feedback and immense contribution towards this research. I also acknowledge my colleagues, family and friends, for their support throughout this research, not forgetting the Cooperative University of Kenya and the Machakos County Health Department for providing the essential resources and a supportive environment for my research. Above all, I thank the Almighty God for being with me all through.
Author Contributions
Clifford Odundo: Conceptualization, Formal Analysis, Investigation, Methodology, Software, Supervision, Validation, Visualization, Writing - original draft, Writing - review & editing
Charles Katila: Conceptualization, Data curation, Formal Analysis, Investigation Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing - review & editing
Sam Njuki: Conceptualization, Investigation, Methodology, Supervision, Validation, Visualization, Writing - review & editing
Lena Onyango: Conceptualization, Data curation, Formal Analysis, Investigation, Methodology, Software, Validation, Visualization
Felix Makori: Conceptualization, Investigation, Methodology, Resources, Writing - review & editing
Funding
This work is not supported by any external funding.
Data Availability Statement
The data is available from the corresponding author upon reasonable request.
Conflicts of Interest
The authors declare no conflicts of interest.
References
| [1] |
Alum, E., Okechukwu, U., & Emmanuel, I. F. (n.d.). Curtailing HIV/AIDS Spread: Impact of Religious Leaders. Retrieved February 18, 2025, from
https://ir.kiu.ac.ug/items/8b168267-67a6-4b7a-893b-99abe3e1d3f4
|
| [2] |
Bhavinkumar, K., Umer, S., Sham, T., & Areeb, S. (2024). Diabetes Prediction Using Machine Learning. International Journal of Novel Research and Development (IJNRD).
https://www.ijnrd.org/papers/IJNRD2404398.pdf
|
| [3] |
Fauci, A. S., & Lane, H. C. (2020). Four decades of HIV/AIDS — Much accomplished, much to do. New England Journal of Medicine, 383(1), 1-4.
https://doi.org/10.1056/NEJMp1916753
|
| [4] |
Jain, T., Jain, A., Hada, P., Kumar, H., Verma, V., & Patni, A. (2021). Machine learning techniques for prediction of mental health. Proceedings of the IEEE International Conference on Intelligent and Computing Research Applications (ICIRCA), 1606-1613.
https://doi.org/10.1109/ICIRCA51532.2021.9545061
|
| [5] |
Liyew, C. M., & Melese, H. A. (2021). Machine learning techniques to predict daily rainfall amount. Journal of Big Data, 8(1), 153.
https://doi.org/10.1186/s40537-021-00545-4
|
| [6] |
Mahesh, B. (2019). Machine learning algorithms: A review. International Journal of Science and Research (IJSR), 9(1), 381-386.
https://doi.org/10.21275/ART20203995
|
| [7] |
Schonlau, M., & Zou, R. Y. (2020). The random forest algorithm for statistical learning. Statistical Methods in Medical Research, 29(7), 2074-2094.
https://doi.org/10.1177/1536867X20909688
|
| [8] |
Ministry of Health, Kenya. (2022a). HIV Prevention Delivery Landscape in Kenya. Nairobi: Government of Kenya. Ministry of Health, Kenya. (2022b). Kenya World AIDS Day Report. Nairobi: Government of Kenya.
|
| [9] |
Naar, S., Outlaw, A., MacDonell, K., Jones, M., White, J., Secord, E., & Templin, T. (2023). Information-Motivation-Behavioral Skills model in youth newly starting antiretroviral treatment. AIDS and Behavior, 27(8), 2785-2790.
https://doi.org/10.1007/s10461-023-04002-6
|
| [10] |
Pisner, D., & Schnyer, D. (2020). Support vector machine. In Machine Learning: Methods and Applications to Brain Disorders (pp. 101-121). Elsevier.
https://doi.org/10.1016/B978-0-12-815739-8.00006-7
|
| [11] |
Safynaz, A.-F., Abeer, M., & Sayed, M. (2021). Applying different machine learning techniques for prediction of COVID-19 severity. IEEE Access, 9, 135697-135707.
https://doi.org/10.1109/ACCESS.2021.3116067
|
| [12] |
Taisheng, L. (2021). Chinese guidelines for the diagnosis and treatment of HIV/AIDS (2021 edition). Infectious Diseases & Immunity, 1(1), 1-10.
https://doi.org/10.1097/ID9.0000000000000044
|
| [13] |
Thomadakis, C., Yiannoutsos, C. T., Pantazis, N., Diero, L., Mwangi, A., Musick, B. S., Wools-Kaloustian, K., & Touloumi, G. (2023). The effect of HIV treatment interruption on subsequent immunological response. American Journal of Epidemiology, 192(7), 1181-1191.
https://doi.org/10.1093/aje/kwad076
|
| [14] |
Uwishema, O., Taylor, C., Lawal, L., Hamiidah, N., Robert, I., Nasir, A., Chalhoub, E., Sun, J., Akin, B. T., Adanur, I., Mwazighe, R. M., & Onyeaka, H. (2022). The syndemic burden of HIV/AIDS in Africa amidst the COVID-19 pandemic. Immunity, Inflammation and Disease, 10(1), 26-32.
https://doi.org/10.1002/iid3.544
|
| [15] |
Ross, D. P. T. (2020). Reasons cited for the interruption of antiretroviral treatment in the Bloemfontein/Mangaung area (Doctoral dissertation, University of the Free State).
https://scholar.ufs.ac.za
|
| [16] |
Stockman, J., Friedman, J., Sundberg, J., Harris, E., & Bailey, L. (2022). Predictive analytics using machine learning to identify ART clients at risk of treatment interruption in Mozambique and Nigeria. JAIDS Journal of Acquired Immune Deficiency Syndromes, 90(2), 154-160.
https://journals.lww.com/jaids/fulltext/2022/06010
|
| [17] |
Ogbechie, M. D., Fischer Walker, C., Lee, M. T., Abba Gana, A., Oduola, A., Idemudia, A., & Persaud, N. E. (2023). Predicting treatment interruption among people living with HIV in Nigeria: A machine learning approach. JMIR AI, 2(1), e44432.
https://ai.jmir.org/2023/1/e44432
|
| [18] |
Esra, R., Carstens, J., Le Roux, S., Mabuto, T., Eisenstein, M., Keiser, O., & Sharpey-Schafer, K. (2023). Validation and improvement of a machine learning model to predict interruptions in antiretroviral treatment in South Africa. JAIDS Journal of Acquired Immune Deficiency Syndromes, 92(1), 42-49.
https://doi.org/10.1097/QAI.0000000000003332
|
| [19] |
Jackins, V., Vimal, S., Kaliappan, M., & Lee, M. Y. (2021). AI-based smart prediction of clinical disease using random forest classifier and Naïve Bayes. The Journal of Supercomputing, 77(5), 5198-5219.
https://doi.org/10.1007/s11227-020-03481-x
|
| [20] |
Tiomoko, M., Schnoor, E., Seddik, M. E. A., Colin, I., & Virmaux, A. (2022). Deciphering Lasso-based classification through large dimensional analysis of the iterative soft-thresholding algorithm. Proceedings of the 39th International Conference on Machine Learning, 21449-21477.
https://proceedings.mlr.press/v162/tiomoko22a.html
|
| [21] |
Araveeporn, A. (2021). The higher-order of adaptive Lasso and elastic net methods for classification on high dimensional data. Mathematics, 9(1091).
https://doi.org/10.3390/math9101091
|
| [22] |
Menéndez-Arias, L. (2009). Mutation rates and intrinsic fidelity of retroviral reverse transcriptases. Viruses, 1(3), 1137-1165.
https://doi.org/10.3390/v1031137
|
| [23] |
Mansky, L. M., & Temin, H. M. (1995). Lower in vivo mutation rate of HIV-1 than predicted from the fidelity of purified reverse transcriptase. Journal of Virology, 69(8), 5087-5094.
|
| [24] |
Boyer, P. L., Sarafianos, S. G., Arnold, E., & Hughes, S. H. (2014). The M184V mutation reduces the polymerase activity of human immunodeficiency virus type 1 reverse transcriptase. Journal of Virology, 88(8), 4744-4753.
https://pmc.ncbi.nlm.nih.gov/articles/PMC4097814/
|
| [25] |
Sarafianos, S. G., Marchand, B., Das, K., Himmel, D. M., Parniak, M. A., Hughes, S. H., & Arnold, E. (2009). Structure and function of HIV-1 reverse transcriptase: Molecular mechanisms of polymerization and inhibition. Journal of Molecular Biology, 385(3), 693-713.
https://doi.org/10.1016/j.jmb.2008.10.071
|
| [26] |
Coffin, J. M. (2013). HIV population dynamics in vivo: Implications for genetic variation, pathogenesis, and therapy. Science, 267(5197), 483-489.
|
| [27] |
Dlamini, N., et al. (2023). Machine learning models for predicting virological failure among people living with HIV. BMC Medical Informatics and Decision Making, 23(217).
https://bmcmedinformdecismak.biomedcentral.com/articles/10.1186/s12911-023-02167-7
|
| [28] |
Frontiers in Microbiology. (2025). Applications of artificial intelligence in HIV research: A review of current advances. Frontiers in Microbiology, 16, 1541942.
https://www.frontiersin.org/articles/10.3389/fmicb.2025.1541942/full
|
| [29] |
Mtisi, E. L., Mushy, S. E., Mkawe, S. G., Ndjovu, A., Mboggo, E., Mlay, B. S., & Muya, A. (2023). Risk factors for interruption in treatment among HIV-infected adolescents attending care clinics in Tanzania. AIDS Research and Therapy, 20(1), 19.
https://doi.org/10.1186/s12981-023-00498-2
|
| [30] |
Ikpe, S., Gambo, A., Nowak, R., Sorkin, J., Charurat, M., O’Connor, T., & Stafford, K. (2024). Predictors of interruptions in antiretroviral therapy among people living with HIV in Nigeria: A retrospective cohort study. medRxiv.
https://doi.org/10.1101/2024.03.01.23293852
|
| [31] |
Tomescu, S., Crompton, T., Adebayo, J., Kinge, C. W., Akpan, F., Rennick, M., & Pisa, P. T. (2021). Factors associated with interruption in treatment among people living with HIV in USAID-supported states in Nigeria: A retrospective study (2000-2020). BMC Public Health, 21(1), 2194.
https://doi.org/10.1186/s12889-021-12254-1
|
| [32] |
Mbatia, R. J., Mtisi, E. L., Ismail, A., Henjewele, C. V., Moshi, S. J., Christopher, A. K., & Matiko, E. J. (2023). Interruptions in treatment among adults on antiretroviral therapy before and after test-and-treat policy in Tanzania. PLoS ONE, 18(11), e0292740.
https://doi.org/10.1371/journal.pone.0292740
|
| [33] |
Akpan, U., Kakanfo, K., Ekele, O. D., Ukpong, K., Toyo, O., Nwaokoro, P., & Bateganya, M. (2023). Predictors of treatment interruption among patients on antiretroviral therapy in Akwa Ibom, Nigeria: Outcomes after 12 months. AIDS Care, 35(1), 114-122.
https://doi.org/10.1080/09540121.2022.2070554
|
| [34] |
Kim, H., Goldsmith, J. V., Sengupta, S., Mahmood, A., Powell, M. P., Bhatt, J., & Bhuyan, S. S. (2019). Mobile health applications and e-health literacy: Opportunities and concerns for cancer patients and caregivers. Journal of Cancer Education, 34(1), 3-8.
https://doi.org/10.1007/s13187-017-1272-1
|
| [35] |
Chang, H. Y., Hou, Y. P., Yeh, F. H., & Lee, S. S. (2020). The impact of an mHealth app on knowledge, skills, and anxiety about dressing changes: A randomized controlled trial. Journal of Advanced Nursing, 76(4), 1046-1056.
https://doi.org/10.1111/jan.14297
|
| [36] |
Nsoh, M., Tshimwanga, K. E., Ngum, B. A., Mgasa, A., Otieno, M. O., Moali, B., & Halle-Ekane, G. E. (2021). Predictors of antiretroviral therapy interruptions and factors influencing return to care in Cameroon. African Health Sciences, 21(1), 29-38.
https://doi.org/10.4314/ahs.v21i1.6
|
Cite This Article
-
APA Style
Odundo, C., Katila, C., Njuki, S., Onyango, L., Makori, F. (2025). Leveraging Machine Learning Models to Predict HIV/AIDS Treatment Interruption in Patients in Machakos County, Kenya. International Journal of Data Science and Analysis, 11(6), 158-170. https://doi.org/10.11648/j.ijdsa.20251106.11
Copy
|
Download
ACS Style
Odundo, C.; Katila, C.; Njuki, S.; Onyango, L.; Makori, F. Leveraging Machine Learning Models to Predict HIV/AIDS Treatment Interruption in Patients in Machakos County, Kenya. Int. J. Data Sci. Anal. 2025, 11(6), 158-170. doi: 10.11648/j.ijdsa.20251106.11
Copy
|
Download
AMA Style
Odundo C, Katila C, Njuki S, Onyango L, Makori F. Leveraging Machine Learning Models to Predict HIV/AIDS Treatment Interruption in Patients in Machakos County, Kenya. Int J Data Sci Anal. 2025;11(6):158-170. doi: 10.11648/j.ijdsa.20251106.11
Copy
|
Download
-
@article{10.11648/j.ijdsa.20251106.11,
author = {Clifford Odundo and Charles Katila and Sam Njuki and Lena Onyango and Felix Makori},
title = {Leveraging Machine Learning Models to Predict HIV/AIDS Treatment Interruption in Patients in Machakos County, Kenya
},
journal = {International Journal of Data Science and Analysis},
volume = {11},
number = {6},
pages = {158-170},
doi = {10.11648/j.ijdsa.20251106.11},
url = {https://doi.org/10.11648/j.ijdsa.20251106.11},
eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijdsa.20251106.11},
abstract = {HIV/AIDS remains a major global health challenge, with Sub-Saharan Africa carrying the highest burden. In Kenya, where adult prevalence is 4.3%, treatment interruption (IIT) continues to undermine antiretroviral therapy (ART) outcomes. This study applied machine learning (ML) to identify predictors of IIT and guide interventions in Machakos County, where prevalence is 3.3% and relies on manual appointment management of patients, physical tracing and phone tracing of patients. A retrospective cross-sectional study used secondary data from KenyaEMR covering 14,339 adults on ART between 2020 and 2024. Data preprocessing included cleaning, anonymization, imputation, encoding, LASSO feature selection, and SMOTE oversampling. Descriptive statistics and chi-square tests assessed associations, while Random Forest (RF), XGBoost, and Support Vector Machine (SVM) models were trained and validated to predict IIT. Overall, 910 patients (6%) experienced IIT. Risk was highest among adolescents and young adults (15-24 years), single individuals, urban residents, patients with viral load ≥1000 cps, those on ART <12 months, TB co-infected, and non-DTG regimen users. Poor adherence, unstable status, lack of phone ownership, and shorter refill durations also predicted IIT. Non-significant factors included sex, CD4 count, counseling, and clinic workload. Among models, RF achieved the best performance (recall 0.97, precision 0.87, F1 0.92, AUROC 0.96, accuracy 0.91), outperforming XGBoost and SVM. IIT in Machakos County is shaped by demographic, clinical, socioeconomic, and health system factors. Random Forest showed the best predictive capacity, highlighting the value of ML for early identification of at-risk patients. Strategies should include DTG scale-up, early retention support, multi-month dispensing, and digital health interventions. Integrating predictive analytics into EMRs can strengthen HIV program outcomes.
},
year = {2025}
}
Copy
|
Download
-
TY - JOUR
T1 - Leveraging Machine Learning Models to Predict HIV/AIDS Treatment Interruption in Patients in Machakos County, Kenya
AU - Clifford Odundo
AU - Charles Katila
AU - Sam Njuki
AU - Lena Onyango
AU - Felix Makori
Y1 - 2025/11/07
PY - 2025
N1 - https://doi.org/10.11648/j.ijdsa.20251106.11
DO - 10.11648/j.ijdsa.20251106.11
T2 - International Journal of Data Science and Analysis
JF - International Journal of Data Science and Analysis
JO - International Journal of Data Science and Analysis
SP - 158
EP - 170
PB - Science Publishing Group
SN - 2575-1891
UR - https://doi.org/10.11648/j.ijdsa.20251106.11
AB - HIV/AIDS remains a major global health challenge, with Sub-Saharan Africa carrying the highest burden. In Kenya, where adult prevalence is 4.3%, treatment interruption (IIT) continues to undermine antiretroviral therapy (ART) outcomes. This study applied machine learning (ML) to identify predictors of IIT and guide interventions in Machakos County, where prevalence is 3.3% and relies on manual appointment management of patients, physical tracing and phone tracing of patients. A retrospective cross-sectional study used secondary data from KenyaEMR covering 14,339 adults on ART between 2020 and 2024. Data preprocessing included cleaning, anonymization, imputation, encoding, LASSO feature selection, and SMOTE oversampling. Descriptive statistics and chi-square tests assessed associations, while Random Forest (RF), XGBoost, and Support Vector Machine (SVM) models were trained and validated to predict IIT. Overall, 910 patients (6%) experienced IIT. Risk was highest among adolescents and young adults (15-24 years), single individuals, urban residents, patients with viral load ≥1000 cps, those on ART <12 months, TB co-infected, and non-DTG regimen users. Poor adherence, unstable status, lack of phone ownership, and shorter refill durations also predicted IIT. Non-significant factors included sex, CD4 count, counseling, and clinic workload. Among models, RF achieved the best performance (recall 0.97, precision 0.87, F1 0.92, AUROC 0.96, accuracy 0.91), outperforming XGBoost and SVM. IIT in Machakos County is shaped by demographic, clinical, socioeconomic, and health system factors. Random Forest showed the best predictive capacity, highlighting the value of ML for early identification of at-risk patients. Strategies should include DTG scale-up, early retention support, multi-month dispensing, and digital health interventions. Integrating predictive analytics into EMRs can strengthen HIV program outcomes.
VL - 11
IS - 6
ER -
Copy
|
Download