Peer-reviewed articles 17,970 +



Title: SPREADSHEET DATA EXTRACTION USING SEMANTIC NETWORK

SPREADSHEET DATA EXTRACTION USING SEMANTIC NETWORK
N. Tkeshelashvili;S. Klimenkov
1314-2704
English
19
2.1
Spreadsheets often contain valuable data that are difficult to process due to an unclear structure. For this reason, spreadsheets also called semi-structured data. Table structure recognition and data extraction is an important area of research. Tables may contain statistical reports, schedules, grade books, results of research or product catalogue. Depending on the purpose, tables have a different structure. In this paper, the authors propose an approach for extracting information from "list" type spreadsheets. These tables store information about objects of the same type, each column represents an object property. Price lists are a good example of such tables? type.
The main proposed approach idea is the extraction objects from the spreadsheet using the semantic network. The kernel of semantic network graph is based on Wiktionary data and contains senses and semantic relations between them. Every sense has its owns wordforms, theirs morphological characteristics and instances of the sense, where the instance is the object of given sense. For example, "2m" may be the instance of the sense "length". The semantic network is used to describe the object structure, while instances give useful templates for data. The program developed looks in every row in the spreadsheet, match properties to senses and creates objects with given in semantic network structure. The approach was tested for the corpus of price lists, typical for IT distribution area.
conference
19th International Multidisciplinary Scientific GeoConference SGEM 2019
19th International Multidisciplinary Scientific GeoConference SGEM 2019, 30 June - 6 July, 2019
Proceedings Paper
STEF92 Technology
International Multidisciplinary Scientific GeoConference-SGEM
Bulgarian Acad Sci; Acad Sci Czech Republ; Latvian Acad Sci; Polish Acad Sci; Russian Acad Sci; Serbian Acad Sci & Arts; Slovak Acad Sci; Natl Acad Sci Ukraine; Natl Acad Sci Armenia; Sci Council Japan; World Acad Sci; European Acad Sci, Arts & Letters; Ac
637-644
30 June - 6 July, 2019
website
cdrom
5405
spreadsheet data extraction; table interpretation; semantic network; knowledge graph