Analisis Log Web Server dengan Pendekatan Algoritme K-Means Clustering dan Feature Importance

Authors

  • Asyrafi Adnil Ma'ali Politeknik Sber dan Sandi Negara
  • Girinoto Politeknik Siber dan Sandi Negara
  • Muhammad Novrizal Ghiffari Politeknik Siber dan Sandi Negara
  • Raden Budiarto Hadiprakoso Politeknik Siber dan Sandi Negara

DOI:

https://doi.org/10.56706/ik.v16i3.60

Keywords:

analisis log, clustering, Elbow Method, feature importance, k-means

Abstract

Analisis log sering kali dibutuhkan pada kegiatan forensik setelah terjadi insiden serangan pada jaringan. Pada penelitian ini dilakukan analisis log untuk mencari anomali pada web server melalui pendekatan unsupervised machine learning dengan menggunakan algoritme k-means clustering yang diintegrasikan dengan Elbow Method. Sebelum dilakukan proses pembentukan klaster data log di transformasi dalam serangkaian proses feature extration. Untuk pemahaman lebih lanjut, pemanfaatan metode analisis feature importance digunakan untuk mengetahui feature mana yang paling dominan berperan penting dalam proses pembentukan cluster. Hasil clustering memberikan visualisasi terdapatnya cluster yang bersifat anomali dari cluster lainnya dan feature yang berperan penting dalam proses pembentukan cluster tersebut adalah character_bigram.

References

National Institute of Standards and Technology, NIST SP 800-92: Guide to Computer Security Log Management, Gaithersburg: U.S. Department of Commerce, 2006.

Q. Cao and Y. Qiao, "Machine Learning to Detect Anomalies in Web Log Analysis", 3rd IEEE International Conference on Computer and Communications, pp. 519-523, 2017.

U. Raj, A. Kumar, M. R. Ajit and T. Ashutosh, "Log analysis using distributed system using MapReduce and Hadoop", National Institute of Technology Calicut, pp. 1-7, 2018.

Zulfadhilah, M., Prayudi, Y., & Riadi, I. Cyber Profiling Using Log Analysis And K-Means Clustering. International Journal of Advanced Computer Science and Applications, 2016.

Syakur, M.A., Khotimah, Rochman & Satoto, B.D., “Integration K-Means Clustering Method and Elbow Method For Identification of The Best Customer Profile Cluster”, IOP Conference Series: Materials Science and Engineering, 2018

Nainggolan, R., Perangin-angin, R., Simarmata and Tarigan, A.F., "Improved the Performance of the K-Means Cluster Using the Sum of Squared Error (SSE) optimized by using the Elbow Method", Journal of Physics: Conference Series, 2019.

T. A. Cahyanto and Y. Prayudi, "Investigasi Forensika Pada Log Web Server untuk Menemukan Bukti Digital Terkait dengan Serangan Menggunakan Metode Hidden Markov Models," Seminar Nasional Aplikasi Teknologi Informasi, pp. 15-19, 2014.

K. R. Suneetha and D. R. Krishnamoorthi, "Identifying User Behavior by Analyzing Web Server Access Log File," International Journal of Computer Science and Network Security, vol. IX, no. 4, pp. 327-332, 2009

Ghojogh, B., Samad, M. N., Mashhadi, S. A., Kapoor, T., Ali, W., Karray, F., & Crowley, M., “Feature selection and feature extraction in pattern analysis: A literature review”, arXiv preprint :1905.02845, 2019.

J. Habdak, N-gram based Text Categorization, Bratislava: Comenius University Faculty of Mathematics, Physics And Informatics Institute Of Informatics, 2005.

Saarela, M., Jauhiainen, S., “Comparison of feature importance measures as explanations for classification models.”, SN Appl. Sci. 3, 272, https://doi.org/10.1007/s42452-021-04148-9, 2021.

Breiman, L.,”Random Forests.” Machine Learning 45, 5–32, https://doi.org/10.1023/A:1010933404324, 2001.

Scikit-learn documentation, “Permutation Importance vs Random Forest Feature Importance (MDI)”, https://scikit-learn.org/stable/auto_examples/inspection/plot_permutation_importance.html.

Downloads

Submitted

24-10-2022

Accepted

18-11-2022

Published

05-12-2022

Issue

Section

Articles