Scholarly record
MACHINE LEARNING SOLUTION FOR IOT BIG DATA
Abstract
Nowadays it is critical to have the ability to quickly and reliably fetch huge amounts of heterogeneous data and apply Machine Learning (ML) models against it for better decision making. Successful processing of streams with data is crucial for real-time operations like extracting, filtering, transforming, aggregating with other data sources, persisting data to data warehouses, publishing to a different messaging topics or pipelines. With Machine Learning gaining high in popularity serious concerns are appearing around the performance of the Machine Learning models in production and there is a reason for that. It is essential to choose wisely the right technologies used for creating robust data pipelines, deploying accurate Machine Learning models and monitoring the performance in production environments. In this paper, an approach is proposed for building a distributed platform using a messaging system which is capable of extracting, processing, and analyzing information from streaming data in real-time. Kafka streaming concepts for ingesting data are discussed along with ways to operationalize the data pipelines. Using Spark Structured Streaming for enriching Kafka events with a Machine Learning algorithm is shown. With streaming data continuing to arrive, the Spark engine will react to the data changes and will incrementally and continuously process the data. Important conceptual reasons are discussed that are explaining the factors which have a huge impact on the accuracy and the performance of the deployed Machine Learning models in a production environment. The overall improved result can be used later to produce the proper conclusions and better predictions.
Publication Impact Profile
Publication details
References14
Dineva, K., Atanasova, T. Computer System Using Internet of Things for Monitoring of Bee Hives, SGEM GeoConference, vol. 17, issue 63, 2017, pp. 169-176.
Braga A., Rabelo J., Callado A., Rocha A., Freitas B., Gomes D., BeeNotified! A Notification System of Physical Quantities for beehives Remote Monitoring, RITA, vol. 27, Num. 3, 2020, pp. 50-61.
Sculley D., Holt G., Golovin D., Davydov E., Phillips T., Ebner D., Chaudhary V., Young M., Crespo J., Dennison D., Hidden Technical Debt in Machine Learning Systems, NIPS’15, vol. 2, 2015, pp. 2503-2511
Balabanov T., Zankinski I., Dobrinkova N., Time Series Prediction By Artificial Neural Networks And Differential Evolution In Distributed Environment, Large-Scale Scientific Computing, Springer, Berlin, 2012, pp. 198 – 205
Sugimura P., Hartl F., Building a Reproducible Machine Learning Pipeline, arXiv, 2018.
Dineva, K., Atanasova, T. OSEMN Process For Working Over Data Acquired By IoT Devices Mounted In Beehives. Current Trends in Natural Sciences, vol. 7, issue 13, 2018, pp. 47-53.
Hiraman B., Viresh C., Abhijeet K., A Study of Apache Kafka in Big Data Stream Processing, ICICET’18, 2018, pp. 1-3.
Song M., Zhang Ch., Haihong E., An Auto Scaling System for API Gateway Based on Kubernetes, ICSESS’18, IEEE, 2018, pp.109-112.
Biswas N., Sarkar A., Mondal K., Efficient Incremental Loading In ETL Processing For Real-Time Data Integration, Springer, 2019, pp. 53-61.
Zaharia M., Xin R., Wendell P., Das T., Armbrust M., Dave A., Meng X., Rosen J., Venkataraman S., Franklin M., Ghodsi A., Ganzalez J., Shenker S., Stoica i., Apache Spark: A Unified Engine for Big Data Processing, ACM, vol. 59, num. 11, 2016, pp. 56-65.
Vuk M., Curk T., ROC Curve, Lift Chart and Calibration Plot, Metodoloski zvezki, vol.3, No. 1, 2006, pp. 89-108.
Lim J., Hoong P., Yeoh E., Tan I., Performance Analysis of Parallel Computing in a Distributed Overlay Network, TENCON’2011, IEEE, 2011, pp. 1386 – 1390.
Petersson K., Test automation in a CI/CD workflow, DiVA, 2020, p. 44.
Yang B., Sailer A., Mohindra A., Survey and Evaluation of Blue-Green Deployment Techniques in Cloud Native Environments, ICSOC, 2019, pp. 69-81.
Citing literature
Number of times cited according to Crossref: 6
View or Download full articleAccess options
SWS access login
Login as SWS Scientific CommitteeLogin as SWS Scientific PartnerLogin as SWS AuthorAuthors and approved SWS contributors will read and export their own linked papers after identity matching by SWS profile, email and SGEM GlobalID.
For librarian assistance: [email protected]
Purchase Instant Access
- Article can be downloaded after successful payment.
- Article may be used according to SWS library access terms.
- Article cannot be redistributed.

