SWS Academic Research eLibraryEarth & Planetary Sciences

Scholarly record

DATA COLLECTION METHODOLOGY FOR ARTIFICIAL INTELLIGENCE MODELS IN SOCIALLY RELEVANT AREAS

Kristina Dineva, Tatiana Atanasova, Kalin Kopanov

First published: 2025-08-15https://doi.org/10.5593/sgem2025/2.1/s07.02View metrics

Abstract

Data collection and storage are crucial in the development of artificial intelligence (AI) within the socially relevant areas. Despite the increasing volume of information, it is often scattered across various sources, formats, and platforms, making effective analysis and application challenging. The lack of a unified framework for data integration further complicates efforts to leverage AI for informed decision-making, policy formulation, and public service optimisation. This article proposes a structured methodology for data collection from publicly available information (PAI) aimed at providing reliable, accessible, and compatible data for AI models. The methodology encompasses the identification of key social sectors for data collection. It also explores automated and semi-automated data collection techniques, such as API integrations, web scraping, crowdsourced data acquisition, and survey methodologies. Additionally, the article highlights the importance of data validation, normalisation, and anonymisation to ensure accuracy, consistency, and compliance with regulatory requirements such as GDPR, and the AI Act.

Publication Impact Profile

PlumX
  • Captures
  • Mendeley - Readers: 2
Dimensions ID: pub.1195348435

Publication details

Title
DATA COLLECTION METHODOLOGY FOR ARTIFICIAL INTELLIGENCE MODELS IN SOCIALLY RELEVANT AREAS
Authors
Kristina Dineva, Tatiana Atanasova, Kalin Kopanov
Proceedings
25th International Multidisciplinary Scientific GeoConference Proceedings SGEM 2025, Geoinformatics, Remote Sensing, and Artificial Intelligence (AI), Vol 25, Issue 2.1
Publisher
STEF92 Technology
Year
2025
Pages
13-18
SWS Citekey
Dineva202571318
ISSN
1314-2704; 13142704
ISBN
9786197603897
Language
en
Publication type
Conference Paper
Proceedings contents
Open official contents
Keywords
References9
  1. Fang J., Zhao L., Li Sh. Exploring open government data ecosystems across data, information, and business. Government Inf. Quarterly, vol. 41(2):101934, 2024. DOI: 10.1016/j.giq.2024.101934

  2. Rodr�guez-Mazahua N, Rodr�guez-Mazahua L, L�pez-Chau A, Alor-Hern�ndez G, Machorro-Cano I. Decision-Tree-Based Horizontal Fragmentation Method for Data Warehouses. Applied Sciences. 12(21):10942, 2022. DOI: 10.3390/app122110942

  3. Tveita L. J., Hustad E. Benefits and Challenges of Artificial Intelligence in Public sector: A Literature Review, Procedia Computer Science, 256, pp.222�229, 2025. DOI: 10.1016/j.procs.2025.02.115

  4. Dineva, K., Atanasova, T. Methodology for data processing in Modular IoT system. Distributed computer and communication networks 2019, vol.11956, pp.457-468, 2019. DOI: 10.1007/978-3-030-36614-8_35

  5. Todorov, K. Copyright aspects of regulating artificial intelligence. Intellectual property and business magazine, issue 5, pp.30 � 49, 2024.

  6. Chapagain, A. Hands-On Web Scraping with Python: Extract quality data from the web using effective Python techniques 2nd Edition, Packt Publishing pp.145-188, 2023.

  7. Greca, S., Kosta, A., Maxhelaku, S. Optimizing data retrieval by using MongoDb with Elasticsearch. CEUR, vol.2280, pp.1-6, 2018.

  8. Narayanan, P. Orchestrating data engineering pipelines using apache airflow. Data engineering for machine learning pipelines, Apress, Berkeley, CA, pp.383-413, 2024. DOI: 10.1007/979-8-8688-0602-5_12

  9. Patchipala, S., Data anonymization in AI and ML engineering: Balancing privacy and model performance using Presidio. Iconic Res. and Eng. J., vol.6(10), pp.992�1003, 2023.

View or Download full articleAccess options
Full paper accessChoose SWS login, librarian support, or instant article download.

SWS access login

Login as SWS Scientific Committee

Authors and approved SWS contributors will read and export their own linked papers after identity matching by SWS profile, email and SGEM GlobalID.

For librarian assistance: [email protected]

Purchase Instant Access

48-hour online accessComing soon
Online-only accessComing soon
Download the full article in PDF formatEUR 35
  • Article can be downloaded after successful payment.
  • Article may be used according to SWS library access terms.
  • Article cannot be redistributed.
Get full paper

Back to publication list