|
SCALABLE DISTRIBUTED STORAGE FOR IMAGERY DATA COMPUTING IN GEOINFORMATICS
|
|
|
A. Kokoulin;A. Yuzhakov;D. Kiryanov
|
|
|
||
|
|
|
|
1314-2704
|
|
|
||
|
English
|
|
|
17
|
|
|
21
|
|
|
|
|
|
||
|
Authors describe the novel approach to boost the scientific data analysis performance with the data storing schema for distributed scientific research systems. This approach optimizes the workload on storage nodes and enhances the computation performance. Managing the enormous output of scientific research systems is expected to be the most technically difficult part of all recent projects producing Petabytes of imagery data. For example, NASA's EOSDIS produces several Petabytes of uncompressed data per year and which means up to 10 Terabytes of imagery data every day for distributed storage network of DAACs. Current reduction pipelines require up to 5000 operations per pixel in several processing steps in order to save and resample data. All of these factors show that while the processing of Big Scientific Data is an exciting challenge and new algorithms are likely to occur over the coming years, the problem of efficient algorithmic roadblock for data processing and mining remains actual. In this paper we describe the distributed storage structure with indexing techniques which can be effectively applied to scientific multidimensional data processing centers in geoinformatics. Basic principal of this project is distributed (N,K)-block storage schema (LH*RS or SDDS). We develop the descent of LH*RS especially for multidimensional data arrays using the multiscaled representation of these arrays and using the efficient pre-processing algorithms. The LH*RS is positioned as the general-purpose method, and its efficiency does not depend on the data file type but we can implement some enhancements in its distribution algorithm in order to accelerate its performance in the case of multidimensional or imagery data. Dataset is decomposed into data blocks of several levels using the Wavelet transform. The required dataset of the requested scale and resolution is reconstructed from the corresponding set of downloaded blocks on client?s side. In order to accelerate data queries processing we can additionally use a pre-computed statistic results blocks and their hierarchical representation. Main principle of data preprocessing comprises the original data merging with the results of transformation algorithm in adjacent buckets of the same storage. These results are computed only once during the data storing stage simultaneously with data distribution and with the same computing unit. The main advantage of this approach is that we can use these results together with original data or even separately to serve different data queries with both value and dimension subsetting conditions. This approach can reduce the resource costs of corresponding scientific problems.
|
|
|
conference
|
|
|
||
|
||
|
17th International Multidisciplinary Scientific GeoConference SGEM 2017
|
|
|
17th International Multidisciplinary Scientific GeoConference SGEM 2017, 29 June - 5 July, 2017
|
|
|
Proceedings Paper
|
|
|
STEF92 Technology
|
|
|
International Multidisciplinary Scientific GeoConference-SGEM
|
|
|
Bulgarian Acad Sci; Acad Sci Czech Republ; Latvian Acad Sci; Polish Acad Sci; Russian Acad Sci; Serbian Acad Sci & Arts; Slovak Acad Sci; Natl Acad Sci Ukraine; Natl Acad Sci Armenia; Sci Council Japan; World Acad Sci; European Acad Sci, Arts & Letters; Ac
|
|
|
1053-1060
|
|
|
29 June - 5 July, 2017
|
|
|
website
|
|
|
cdrom
|
|
|
3061
|
|
|
distributed storage schema; wavelet transform; Hilbert space-filling curve; indexing; multidimensional array
|
|