Analisis Log Web Server dengan Pendekatan Algoritme K-Means Clustering dan Feature Importance
DOI:
https://doi.org/10.56706/ik.v16i3.60Keywords:
analisis log, clustering, Elbow Method, feature importance, k-meansAbstract
Analisis log sering kali dibutuhkan pada kegiatan forensik setelah terjadi insiden serangan pada jaringan. Pada penelitian ini dilakukan analisis log untuk mencari anomali pada web server melalui pendekatan unsupervised machine learning dengan menggunakan algoritme k-means clustering yang diintegrasikan dengan Elbow Method. Sebelum dilakukan proses pembentukan klaster data log di transformasi dalam serangkaian proses feature extration. Untuk pemahaman lebih lanjut, pemanfaatan metode analisis feature importance digunakan untuk mengetahui feature mana yang paling dominan berperan penting dalam proses pembentukan cluster. Hasil clustering memberikan visualisasi terdapatnya cluster yang bersifat anomali dari cluster lainnya dan feature yang berperan penting dalam proses pembentukan cluster tersebut adalah character_bigram.
References
National Institute of Standards and Technology, NIST SP 800-92: Guide to Computer Security Log Management, Gaithersburg: U.S. Department of Commerce, 2006.
Q. Cao and Y. Qiao, "Machine Learning to Detect Anomalies in Web Log Analysis", 3rd IEEE International Conference on Computer and Communications, pp. 519-523, 2017.
U. Raj, A. Kumar, M. R. Ajit and T. Ashutosh, "Log analysis using distributed system using MapReduce and Hadoop", National Institute of Technology Calicut, pp. 1-7, 2018.
Zulfadhilah, M., Prayudi, Y., & Riadi, I. Cyber Profiling Using Log Analysis And K-Means Clustering. International Journal of Advanced Computer Science and Applications, 2016.
Syakur, M.A., Khotimah, Rochman & Satoto, B.D., “Integration K-Means Clustering Method and Elbow Method For Identification of The Best Customer Profile Cluster”, IOP Conference Series: Materials Science and Engineering, 2018
Nainggolan, R., Perangin-angin, R., Simarmata and Tarigan, A.F., "Improved the Performance of the K-Means Cluster Using the Sum of Squared Error (SSE) optimized by using the Elbow Method", Journal of Physics: Conference Series, 2019.
T. A. Cahyanto and Y. Prayudi, "Investigasi Forensika Pada Log Web Server untuk Menemukan Bukti Digital Terkait dengan Serangan Menggunakan Metode Hidden Markov Models," Seminar Nasional Aplikasi Teknologi Informasi, pp. 15-19, 2014.
K. R. Suneetha and D. R. Krishnamoorthi, "Identifying User Behavior by Analyzing Web Server Access Log File," International Journal of Computer Science and Network Security, vol. IX, no. 4, pp. 327-332, 2009
Ghojogh, B., Samad, M. N., Mashhadi, S. A., Kapoor, T., Ali, W., Karray, F., & Crowley, M., “Feature selection and feature extraction in pattern analysis: A literature review”, arXiv preprint :1905.02845, 2019.
J. Habdak, N-gram based Text Categorization, Bratislava: Comenius University Faculty of Mathematics, Physics And Informatics Institute Of Informatics, 2005.
Saarela, M., Jauhiainen, S., “Comparison of feature importance measures as explanations for classification models.”, SN Appl. Sci. 3, 272, https://doi.org/10.1007/s42452-021-04148-9, 2021.
Breiman, L.,”Random Forests.” Machine Learning 45, 5–32, https://doi.org/10.1023/A:1010933404324, 2001.
Scikit-learn documentation, “Permutation Importance vs Random Forest Feature Importance (MDI)”, https://scikit-learn.org/stable/auto_examples/inspection/plot_permutation_importance.html.
Downloads
Submitted
Accepted
Published
Issue
Section
License
Copyright (c) 2022 Info Kripto
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.