SWS Academic Research eLibraryEarth & Planetary Sciences

Scholarly record

MACHINE LEARNING SOLUTION FOR IOT BIG DATA

Kristina Dineva, Tatiana Atanasova

First published: 2020-09-20https://doi.org/10.5593/sgem2020/2.1/s07.027View metrics

Abstract

Nowadays it is critical to have the ability to quickly and reliably fetch huge amounts of heterogeneous data and apply Machine Learning (ML) models against it for better decision making. Successful processing of streams with data is crucial for real-time operations like extracting, filtering, transforming, aggregating with other data sources, persisting data to data warehouses, publishing to a different messaging topics or pipelines. With Machine Learning gaining high in popularity serious concerns are appearing around the performance of the Machine Learning models in production and there is a reason for that. It is essential to choose wisely the right technologies used for creating robust data pipelines, deploying accurate Machine Learning models and monitoring the performance in production environments. In this paper, an approach is proposed for building a distributed platform using a messaging system which is capable of extracting, processing, and analyzing information from streaming data in real-time. Kafka streaming concepts for ingesting data are discussed along with ways to operationalize the data pipelines. Using Spark Structured Streaming for enriching Kafka events with a Machine Learning algorithm is shown. With streaming data continuing to arrive, the Spark engine will react to the data changes and will incrementally and continuously process the data. Important conceptual reasons are discussed that are explaining the factors which have a huge impact on the accuracy and the performance of the deployed Machine Learning models in a production environment. The overall improved result can be used later to produce the proper conclusions and better predictions.

Publication Impact Profile

PlumX
  • Citations
  • CrossRef - Citation Indexes: 1
  • Scopus - Citation Indexes: 10
  • Captures
  • Mendeley - Readers: 6

Publication details

Title
MACHINE LEARNING SOLUTION FOR IOT BIG DATA
Authors
Kristina Dineva, Tatiana Atanasova
Proceedings
SGEM International Multidisciplinary Scientific GeoConference EXPO Proceedings; 20th International Multidisciplinary Scientific GeoConference Proceedings SGEM 2020, Informatics, Geoinformatics and Remote Sensing
Publisher
STEF92 Technology
Year
2020
Pages
207-214
SWS Citekey
Dineva20207207214
ISSN
1314-2704
ISBN
978-619-7603-06-4
Language
en
Publication type
Conference Paper
Keywords
References14
  1. Dineva, K., Atanasova, T. Computer System Using Internet of Things for Monitoring of Bee Hives, SGEM GeoConference, vol. 17, issue 63, 2017, pp. 169-176.

  2. Braga A., Rabelo J., Callado A., Rocha A., Freitas B., Gomes D., BeeNotified! A Notification System of Physical Quantities for beehives Remote Monitoring, RITA, vol. 27, Num. 3, 2020, pp. 50-61.

  3. Sculley D., Holt G., Golovin D., Davydov E., Phillips T., Ebner D., Chaudhary V., Young M., Crespo J., Dennison D., Hidden Technical Debt in Machine Learning Systems, NIPS’15, vol. 2, 2015, pp. 2503-2511

  4. Balabanov T., Zankinski I., Dobrinkova N., Time Series Prediction By Artificial Neural Networks And Differential Evolution In Distributed Environment, Large-Scale Scientific Computing, Springer, Berlin, 2012, pp. 198 – 205

  5. Sugimura P., Hartl F., Building a Reproducible Machine Learning Pipeline, arXiv, 2018.

  6. Dineva, K., Atanasova, T. OSEMN Process For Working Over Data Acquired By IoT Devices Mounted In Beehives. Current Trends in Natural Sciences, vol. 7, issue 13, 2018, pp. 47-53.

  7. Hiraman B., Viresh C., Abhijeet K., A Study of Apache Kafka in Big Data Stream Processing, ICICET’18, 2018, pp. 1-3.

  8. Song M., Zhang Ch., Haihong E., An Auto Scaling System for API Gateway Based on Kubernetes, ICSESS’18, IEEE, 2018, pp.109-112.

  9. Biswas N., Sarkar A., Mondal K., Efficient Incremental Loading In ETL Processing For Real-Time Data Integration, Springer, 2019, pp. 53-61.

  10. Zaharia M., Xin R., Wendell P., Das T., Armbrust M., Dave A., Meng X., Rosen J., Venkataraman S., Franklin M., Ghodsi A., Ganzalez J., Shenker S., Stoica i., Apache Spark: A Unified Engine for Big Data Processing, ACM, vol. 59, num. 11, 2016, pp. 56-65.

  11. Vuk M., Curk T., ROC Curve, Lift Chart and Calibration Plot, Metodoloski zvezki, vol.3, No. 1, 2006, pp. 89-108.

  12. Lim J., Hoong P., Yeoh E., Tan I., Performance Analysis of Parallel Computing in a Distributed Overlay Network, TENCON’2011, IEEE, 2011, pp. 1386 – 1390.

  13. Petersson K., Test automation in a CI/CD workflow, DiVA, 2020, p. 44.

  14. Yang B., Sailer A., Mohindra A., Survey and Evaluation of Blue-Green Deployment Techniques in Cloud Native Environments, ICSOC, 2019, pp. 69-81.

Citing literature

Number of times cited according to Crossref: 6

View or Download full articleAccess options
Full paper accessChoose SWS login, librarian support, or instant article download.

SWS access login

Login as SWS Scientific Committee

Authors and approved SWS contributors will read and export their own linked papers after identity matching by SWS profile, email and SGEM GlobalID.

For librarian assistance: [email protected]

Purchase Instant Access

48-hour online accessComing soon
Online-only accessComing soon
Download the full article in PDF formatEUR 35
  • Article can be downloaded after successful payment.
  • Article may be used according to SWS library access terms.
  • Article cannot be redistributed.
Get full paper

Back to publication list