Peer-reviewed articles 17,970 +



Title: SCALABLE DISTRIBUTED STORAGE FOR IMAGERY DATA COMPUTING IN GEOINFORMATICS

SCALABLE DISTRIBUTED STORAGE FOR IMAGERY DATA COMPUTING IN GEOINFORMATICS
A. Kokoulin;A. Yuzhakov;D. Kiryanov
1314-2704
English
17
21
Authors describe the novel approach to boost the scientific data analysis performance with the data storing schema for distributed scientific research systems. This approach optimizes the workload on storage nodes and enhances the computation performance. Managing the enormous output of scientific research systems is expected to be the most technically difficult part of all recent projects producing Petabytes of imagery data. For example, NASA's EOSDIS produces several Petabytes of uncompressed data per year and which means up to 10 Terabytes of imagery data every day for distributed storage network of DAACs. Current reduction pipelines require up to 5000 operations per pixel in several processing steps in order to save and resample data. All of these factors show that while the processing of Big Scientific Data is an exciting challenge and new algorithms are likely to occur over the coming years, the problem of efficient algorithmic roadblock for data processing and mining remains actual. In this paper we describe the distributed storage structure with indexing techniques which can be effectively applied to scientific multidimensional data processing centers in geoinformatics. Basic principal of this project is distributed (N,K)-block storage schema (LH*RS or SDDS). We develop the descent of LH*RS especially for multidimensional data arrays using the multiscaled representation of these arrays and using the efficient pre-processing algorithms. The LH*RS is positioned as the general-purpose method, and its efficiency does not depend on the data file type but we can implement some enhancements in its distribution algorithm in order to accelerate its performance in the case of multidimensional or imagery data. Dataset is decomposed into data blocks of several levels using the Wavelet transform. The required dataset of the requested scale and resolution is reconstructed from the corresponding set of downloaded blocks on client?s side. In order to accelerate data queries processing we can additionally use a pre-computed statistic results blocks and their hierarchical representation. Main principle of data preprocessing comprises the original data merging with the results of transformation algorithm in adjacent buckets of the same storage. These results are computed only once during the data storing stage simultaneously with data distribution and with the same computing unit. The main advantage of this approach is that we can use these results together with original data or even separately to serve different data queries with both value and dimension subsetting conditions. This approach can reduce the resource costs of corresponding scientific problems.
conference
17th International Multidisciplinary Scientific GeoConference SGEM 2017
17th International Multidisciplinary Scientific GeoConference SGEM 2017, 29 June - 5 July, 2017
Proceedings Paper
STEF92 Technology
International Multidisciplinary Scientific GeoConference-SGEM
Bulgarian Acad Sci; Acad Sci Czech Republ; Latvian Acad Sci; Polish Acad Sci; Russian Acad Sci; Serbian Acad Sci & Arts; Slovak Acad Sci; Natl Acad Sci Ukraine; Natl Acad Sci Armenia; Sci Council Japan; World Acad Sci; European Acad Sci, Arts & Letters; Ac
1053-1060
29 June - 5 July, 2017
website
cdrom
3061
distributed storage schema; wavelet transform; Hilbert space-filling curve; indexing; multidimensional array

25th SGEM International Conference on Earth & Planetary Sciences


International GeoConference SGEM2025
27 June - 6 July, 2025 / Albena, Bulgaria

Read More
   

SGEM Vienna GREEN "Green Science for Green Life"


Extended Scientific Sessions SGEM Vienna GREEN
3 -6 December, 2025 / Vienna, Austria

Read More
   

A scientific platform for Art-Inspired Scientists!


The Magical World Where Science meets Art
Vienna, Austria

Read More