Pendekatan Hibrid Berbasis NLP untuk Deteksi Emosi dan Kesehatan Mental di X.com dengan TF-IDF dan BERTopic yang Dioptimasi
Abstract
This study was conducted to analyze and compare the effectiveness of two natural language processing-based topic modeling approaches in detecting emotions and mental health in text from the X.com platform. The baseline model was built using the Term Frequency–Inverse Document Frequency representation integrated with the BERTopic algorithm. Next, parameter optimization was performed on the dimensionality reduction and clustering components to improve topic coherence. Test results showed that the optimized model resulted in an increase in coherence value from 0.327 to 0.398, indicating stronger and more consistent semantic relationships between words within a topic. Based on the analysis, the context-based approach proved superior in generating representative and relevant topics related to emotional expressions and mental health issues on social media.
References
[2] F. Fui-Hoon Nah, R. Zheng, J. Cai, K. Siau, and L. Chen, “Generative AI and ChatGPT: Applications, challenges, and AI-human collaboration,” J. Inf. Technol. Case Appl. Res., vol. 25, no. 3, pp. 277–304, 2023, doi: 10.1080/15228053.2023.2233814.
[3] H. Christian, D. Suhartono, A. Chowanda, and K. Z. Zamli, “Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging,” J. Big Data, vol. 8, no. 1, pp. 1–20, 2021, doi: 10.1186/s40537-021-00459-1.
[4] A. K. Sandhu, “Big Data with Cloud Computing: Discussions and Challenges,” Big Data Min. Anal., vol. 5, no. 1, pp. 32–40, 2022, doi: 10.26599/BDMA.2021.9020016.
[5] V. Dogra et al., “A Complete Process of Text Classification System Using State-of-the-Art NLP Models,” Comput. Intell. Neurosci., vol. 2022, 2022, doi: 10.1155/2022/1883698.
[6] S. Mishra, P. Shukla, and R. Agarwal, “Review Article Analyzing Machine Learning Enabled Fake News Detection Techniques for Diversified Datasets,” Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 2022, 2022, doi: 10.1155/2022/1575365.
[7] X. Shu and Y. Ye, “Knowledge Discovery: Methods from data mining and machine learning,” Soc. Sci. Res., vol. 110, no. April 2022, p. 102817, 2023, doi: 10.1016/j.ssresearch.2022.102817.
[8] J. Zhang, D. Wolfram, and F. Ma, “The impact of big data on research methods in information science,” Data Inf. Manag., vol. 7, no. 2, p. 100038, 2023, doi: 10.1016/j.dim.2023.100038.
[9] A. Torab-Miandoab, T. Samad-Soltani, A. Jodati, and P. Rezaei-Hachesu, “Interoperability of heterogeneous health information systems: a systematic literature review,” BMC Med. Inform. Decis. Mak., vol. 23, no. 1, pp. 1–13, 2023, doi: 10.1186/s12911-023-02115-5.
[10] S. Mondal, S. Das, and V. G. Vrana, “How to Bell the Cat? A Theoretical Review of Generative Artificial Intelligence towards Digital Disruption in All Walks of Life,” Technologies, vol. 11, no. 2, 2023, doi: 10.3390/technologies11020044.
[11] S. V. Mahadevkar, S. Patil, K. Kotecha, L. W. Soong, and T. Choudhury, “Exploring AI-driven approaches for unstructured document analysis and future horizons,” J. Big Data, vol. 11, no. 1, 2024, doi: 10.1186/s40537-024-00948-z.
[12] M. Tayefi et al., “Challenges and opportunities beyond structured data in analysis of electronic health records,” Wiley Interdiscip. Rev. Comput. Stat., vol. 13, no. 6, pp. 1–19, 2021, doi: 10.1002/wics.1549.
[13] Q. Qiu, B. Wang, K. Ma, and Z. Xie, “Geological profile-text information association model of mineral exploration reports for fast analysis of geological content,” Ore Geol. Rev., vol. 153, no. December 2022, p. 105278, 2023, doi: 10.1016/j.oregeorev.2022.105278.
[14] R. Egger and J. Yu, “A Topic Modeling Comparison Between LDA, NMF, Top2Vec, and BERTopic to Demystify Twitter Posts,” Front. Sociol., vol. 7, no. May, pp. 1–16, 2022, doi: 10.3389/fsoc.2022.886498.
[15] A. M. Grisales, S. Robledo, and M. Zuluaga, “Topic Modeling: Perspectives From a Literature Review,” IEEE Access, vol. 11, no. December 2022, pp. 4066–4078, 2023, doi: 10.1109/ACCESS.2022.3232939.
[16] S. J. Weston, I. Shryock, R. Light, and P. A. Fisher, “Selecting the Number and Labels of Topics in Topic Modeling: A Tutorial,” Adv. Methods Pract. Psychol. Sci., vol. 6, no. 2, 2023, doi: 10.1177/25152459231160105.
[17] V. Taecharungroj, “‘What Can ChatGPT Do?’ Analyzing Early Reactions to the Innovative AI Chatbot on Twitter,” Big Data Cogn. Comput., vol. 7, no. 1, 2023, doi: 10.3390/bdcc7010035.
[18] M. S. González Canché, “Latent Code Identification (LACOID): A Machine Learning-Based Integrative Framework [and Open-Source Software] to Classify Big Textual Data, Rebuild Contextualized/Unaltered Meanings, and Avoid Aggregation Bias,” Int. J. Qual. Methods, vol. 22, 2023, doi: 10.1177/16094069221144940.
[19] U. Detthamrong et al., “Topic Modeling Analytics of Digital Economy Research: Trends and Insights,” J. Scientometr. Res., vol. 13, no. 2, pp. 448–458, 2024, doi: 10.5530/jscires.13.2.35.
[20] X. Wu, T. Nguyen, and A. T. Luu, “A survey on neural topic models: methods, applications, and challenges,” Artif. Intell. Rev., vol. 57, no. 2, pp. 1–30, 2024, doi: 10.1007/s10462-023-10661-7.
[21] M. T. Mohammed and O. F. Rashid, “Document retrieval using term frequency inverse sentence frequency weighting scheme,” Indones. J. Electr. Eng. Comput. Sci., vol. 31, no. 3, pp. 1478–1485, 2023, doi: 10.11591/ijeecs.v31.i3.pp1478-1485.
[22] N. S. Mohd Nafis and S. Awang, “An Enhanced Hybrid Feature Selection Technique Using Term Frequency-Inverse Document Frequency and Support Vector Machine-Recursive Feature Elimination for Sentiment Classification,” IEEE Access, vol. 9, no. Ml, pp. 52177–52192, 2021, doi: 10.1109/ACCESS.2021.3069001.
[23] M. Z. Naeem, F. Rustam, A. Mehmood, Mui-zzud-din, I. Ashraf, and G. S. Choi, “Classification of movie reviews using term frequency-inverse document frequency and optimized machine learning algorithms,” PeerJ Comput. Sci., vol. 8, pp. 1–28, 2022, doi: 10.7717/PEERJ-CS.914.
[24] Z. Jiang, B. Gao, Y. He, Y. Han, P. Doyle, and Q. Zhu, “Text Classification Using Novel Term Weighting Scheme-Based Improved TF-IDF for Internet Media Reports,” Math. Probl. Eng., vol. 2021, no. ii, 2021, doi: 10.1155/2021/6619088.
[25] L. Xiang, “Application of an Improved TF-IDF Method in Literary Text Classification,” Adv. Multimed., vol. 2022, 2022, doi: 10.1155/2022/9285324.
[26] D. E. Cahyani and I. Patasik, “Performance comparison of tf-idf and word2vec models for emotion text classification,” Bull. Electr. Eng. Informatics, vol. 10, no. 5, pp. 2780–2788, 2021, doi: 10.11591/eei.v10i5.3157.
[27] H. D. Abubakar and M. Umar, “Sentiment Classification: Review of Text Vectorization Methods: Bag of Words, Tf-Idf, Word2vec and Doc2vec,” SLU J. Sci. Technol., vol. 4, no. 1&2, pp. 27–33, 2022, doi: 10.56471/slujst.v4i.266.
[28] J. Zhou, Z. Ye, S. Zhang, Z. Geng, N. Han, and T. Yang, “Investigating response behavior through TF-IDF and Word2vec text analysis: A case study of PISA 2012 problem-solving process data,” Heliyon, vol. 10, no. 16, p. e35945, 2024, doi: 10.1016/j.heliyon.2024.e35945.
[29] F. Lan, “Research on Text Similarity Measurement Hybrid Algorithm with Term Semantic Information and TF-IDF Method,” Adv. Multimed., vol. 2022, 2022, doi: 10.1155/2022/7923262.
[30] N. Aamir et al., “Topic Modeling Empowered by a Deep Learning Framework Integrating BERTopic, XLM-R, and GPT,” J. Comput. Biomed. Informatics, vol. 8, no. 2, 2025, [Online]. Available: https://doi.org/10.56979/802/2025
[31] C. Galli, C. Cusano, M. Meleti, N. Donos, and E. Calciolari, “Topic Modeling for Faster Literature Screening Using Transformer-Based Embeddings,” Metrics, vol. 1, no. 1, p. 2, 2024, doi: 10.3390/metrics1010002.
[32] M. A. Mersha, M. Gemeda Yigezu, and J. Kalita, “Semantic-Driven Topic Modeling Using Transformer-Based Embeddings and Clustering Algorithms,” Procedia Comput. Sci., vol. 244, pp. 121–132, 2024, doi: 10.1016/j.procs.2024.10.185.
[33] A. Rejeb, K. Rejeb, S. Simske, and E. Süle, “Industry 5.0 research: an approach using co-word analysis and BERTopic modeling,” Discov. Sustain., vol. 6, no. 1, 2025, doi: 10.1007/s43621-025-01252-3.
[34] Y. Inoue, “Multi-objective Optimization of BERTopic for Large-scale Scholarly Literature Analysis : Application to Autoregressive Models,” pp. 1–11.
[35] D. Hanny and B. Resch, “Clustering-Based Joint Topic-Sentiment Modeling of Social Media Data: A Neural Networks Approach,” Inf., vol. 15, no. 4, pp. 1–20, 2024, doi: 10.3390/info15040200.
[36] C. Galli, M. T. Colangelo, M. Meleti, S. Guizzardi, and E. Calciolari, “Topic Analysis of the Literature Reveals the Research Structure: A Case Study in Periodontics,” Big Data Cogn. Comput., vol. 9, no. 1, pp. 1–27, 2025, doi: 10.3390/bdcc9010007.
[37] G. Pranauskas, “Machine Learning Based Narrative Search in the Information Space,” Journals Vilniaus Univ., vol. 4, no. 1, 2025.
[38] M. Y. Dhinora and E. Mailoa, “Analisa Tweet Mahasiswa untuk Deteksi Gejala Depresi dengan Penerapan Natural Language Processing,” J. Indones. Manaj. Inform. dan Komun., vol. 6, no. 2, pp. 1193–1211, 2025, doi: 10.63447/jimik.v6i2.1405.
[39] K.-K. Lai, C.-W. Hsiao, and Y.-J. Hsu, “Strategic Management Knowledge Map via BERTopic (1980–2025): Evolution, Integration, and Application,” Appl. Syst. Innov., vol. 8, no. 5, p. 142, 2025, doi: 10.3390/asi8050142.
[40] S. Arif and H. Bashir, “From Narratives to Destinations: Semantic – Spatial Modeling of Tourism Trends Using Geotagged Reviews,” Front. Comput. Spat. Intell. From, vol. 02, no. 01, pp. 43–53, 2024.
[41] F. Alqurashi and I. Ahmad, “A data-driven multi-perspective approach to cybersecurity knowledge discovery through topic modelling,” Alexandria Eng. J., vol. 107, no. July, pp. 374–389, 2024, doi: 10.1016/j.aej.2024.07.044.
[42] M. Martinsuo and M. Huemann, “Reporting case studies for making an impact,” Int. J. Proj. Manag., vol. 39, no. 8, pp. 827–833, 2021, doi: 10.1016/j.ijproman.2021.11.005.
