A System Architecture for the Detection of Insider Attacks in Big Data Systems

MSRDG International Journal of Computer Scientific Technology & Electronics Engineering

 

© 2025 by MSRDG IJCSTEE Journal

Volume 1 Issue 5

Year of Publication: 2025



Authors: A. Jothi, S. Archiya, S. Velu Murugan, K. Nagendra
Paper


Download


Article ID
MSRDG-IJCSTEE-V1I5P104
Abstract:

Insider threats constitute one of the most challenging and costly security risks confronting organizations that operate on big data infrastructures. Unlike external adversaries, insiders possess legitimate access credentials and contextual knowledge of organizational systems, rendering conventional perimeter defenses inadequate. This paper presents a five-layer system architecture designed to detect insider attacks in heterogeneous big data environments. The proposed framework integrates User and Entity Behavior Analytics (UEBA), temporal pattern analysis, graph-based correlation, and a multi-model ensemble comprising Random Forest, Long Short-Term Memory (LSTM) networks, and Isolation Forest, orchestrated through a hybrid voting mechanism. The architecture is evaluated across three benchmark datasets—CERT Insider Threat v6.2, UNB-ISCX 2022, and a synthetically generated HDFS log corpus—totaling over 3.2 million event records. Experimental results demonstrate that the proposed system achieves an accuracy of 96.7%, an F1-score of 96.0%, and an AUC of 0.99, outperforming individual baseline methods by margins of 5.4–12.5 percentage points. Scalability experiments confirm near-linear throughput scaling up to 16 cluster nodes with sub-100 ms detection latency. The architecture addresses class imbalance through Synthetic Minority Oversampling Technique (SMOTE) augmentation and supports real-time streaming via Apache Kafka integration. This work contributes a comprehensive, deployable reference architecture that bridges the gap between academic detection research and operational security requirements in enterprise big data systems.

Keywords: Insider threat detection · Big data security · Machine learning · User and Entity Behavior Analytics · Anomaly detection · Ensemble learning · LSTM · Apache Kafka