Machine Learning With Big Data: Challenges and Approaches

MSRDG International Journal of Computer Scientific Technology & Electronics Engineering

 

© 2025 by MSRDG IJCSTEE Journal

Volume 1 Issue 4

Year of Publication: 2025



Authors: B. Lokesh, P. Nirmal
Paper


Download


Article ID
MSRDG-IJCSTEE-V1I4P105
Abstract:

The intersection of machine learning (ML) and big data analytics has transformed the way organisations extract knowledge from massive, high-velocity information streams. Yet this convergence introduces substantial technical difficulties that conventional ML pipelines were never designed to handle. This paper provides a systematic examination of the principal challenges — namely volume-driven computational overhead, data heterogeneity, annotation scarcity, and privacy constraints — and proposes an integrated framework designated ML-BDI (Machine Learning with Big Data Integration) that combines distributed computing, federated learning, semi-supervised training, and automated hyperparameter optimisation. The framework is evaluated across five heterogeneous real-world datasets totalling more than 290 million instances. Comprehensive experiments show that the proposed hybrid approach attains an accuracy of 94.7%, a precision of 93.8%, a recall of 93.1%, and an F1-score of 93.4%, surpassing six competitive baseline algorithms. Scalability analysis further demonstrates near-linear throughput growth as the number of compute nodes increases from 1 to 32, indicating practical deployment viability on commodity clusters.

Keywords: Machine learning · Big data analytics · Distributed computing · Federated learning · Apache Spark · Scalable algorithms · Semi-supervised learning · AutoML