EPS eLibrary: SCALABLE DISTRIBUTED STORAGE FOR IMAGERY DATA COMPUTING IN GEOINFORMATICS

Peer-reviewed articles 17,970 +

Proceedings Cover

Title: SCALABLE DISTRIBUTED STORAGE FOR IMAGERY DATA COMPUTING IN GEOINFORMATICS

Year

2017

BibTex Views:

1281

Peer Reviewed

Open this article in the author's DB

<< Back to list

Title

SCALABLE DISTRIBUTED STORAGE FOR IMAGERY DATA COMPUTING IN GEOINFORMATICS

PlumX Article Metrics by Elsevier

Authors

A. Kokoulin;A. Yuzhakov;D. Kiryanov

DOI

10.5593/sgem2017/21/S08.133

DOI Issue

ISSN

1314-2704

ISBN

978-619-7408-01-0

Language

English

Volume

Issue

Editor

Scientific Area

Geoinformatics

Abstract/ Foreword

Authors describe the novel approach to boost the scientific data analysis performance with the data storing schema for distributed scientific research systems. This approach optimizes the workload on storage nodes and enhances the computation performance. Managing the enormous output of scientific research systems is expected to be the most technically difficult part of all recent projects producing Petabytes of imagery data. For example, NASA's EOSDIS produces several Petabytes of uncompressed data per year and which means up to 10 Terabytes of imagery data every day for distributed storage network of DAACs. Current reduction pipelines require up to 5000 operations per pixel in several processing steps in order to save and resample data. All of these factors show that while the processing of Big Scientific Data is an exciting challenge and new algorithms are likely to occur over the coming years, the problem of efficient algorithmic roadblock for data processing and mining remains actual. In this paper we describe the distributed storage structure with indexing techniques which can be effectively applied to scientific multidimensional data processing centers in geoinformatics. Basic principal of this project is distributed (N,K)-block storage schema (LH*RS or SDDS). We develop the descent of LH*RS especially for multidimensional data arrays using the multiscaled representation of these arrays and using the efficient pre-processing algorithms. The LH*RS is positioned as the general-purpose method, and its efficiency does not depend on the data file type but we can implement some enhancements in its distribution algorithm in order to accelerate its performance in the case of multidimensional or imagery data. Dataset is decomposed into data blocks of several levels using the Wavelet transform. The required dataset of the requested scale and resolution is reconstructed from the corresponding set of downloaded blocks on client?s side. In order to accelerate data queries processing we can additionally use a pre-computed statistic results blocks and their hierarchical representation. Main principle of data preprocessing comprises the original data merging with the results of transformation algorithm in adjacent buckets of the same storage. These results are computed only once during the data storing stage simultaneously with data distribution and with the same computing unit. The main advantage of this approach is that we can use these results together with original data or even separately to serve different data queries with both value and dimension subsetting conditions. This approach can reduce the resource costs of corresponding scientific problems.

References (click to open)

Acknowledgements (click to open)

Type

conference

Url

https://www.sgem.org/index.php/jresearch-article?citekey=Kokoulin2017810531060

Full manuscript

Download PDF

Proceedings Тitle

17th International Multidisciplinary Scientific GeoConference SGEM 2017

Note

17th International Multidisciplinary Scientific GeoConference SGEM 2017, 29 June - 5 July, 2017

Type

Proceedings Paper

Publisher

STEF92 Technology

Series

International Multidisciplinary Scientific GeoConference-SGEM

Organization

Bulgarian Acad Sci; Acad Sci Czech Republ; Latvian Acad Sci; Polish Acad Sci; Russian Acad Sci; Serbian Acad Sci & Arts; Slovak Acad Sci; Natl Acad Sci Ukraine; Natl Acad Sci Armenia; Sci Council Japan; World Acad Sci; European Acad Sci, Arts & Letters; Ac

Pages

1053-1060

Date

29 June - 5 July, 2017

Online Source Type

website

Digital Source Type

cdrom

3061

Keywords

distributed storage schema; wavelet transform; Hilbert space-filling curve; indexing; multidimensional array

<< Back to list