|
SPREADSHEET DATA EXTRACTION USING SEMANTIC NETWORK
|
|
|
N. Tkeshelashvili;S. Klimenkov
|
|
|
||
|
|
|
|
1314-2704
|
|
|
||
|
English
|
|
|
19
|
|
|
2.1
|
|
|
|
|
|
||
|
Spreadsheets often contain valuable data that are difficult to process due to an unclear structure. For this reason, spreadsheets also called semi-structured data. Table structure recognition and data extraction is an important area of research. Tables may contain statistical reports, schedules, grade books, results of research or product catalogue. Depending on the purpose, tables have a different structure. In this paper, the authors propose an approach for extracting information from "list" type spreadsheets. These tables store information about objects of the same type, each column represents an object property. Price lists are a good example of such tables? type.
The main proposed approach idea is the extraction objects from the spreadsheet using the semantic network. The kernel of semantic network graph is based on Wiktionary data and contains senses and semantic relations between them. Every sense has its owns wordforms, theirs morphological characteristics and instances of the sense, where the instance is the object of given sense. For example, "2m" may be the instance of the sense "length". The semantic network is used to describe the object structure, while instances give useful templates for data. The program developed looks in every row in the spreadsheet, match properties to senses and creates objects with given in semantic network structure. The approach was tested for the corpus of price lists, typical for IT distribution area. |
|
|
conference
|
|
|
||
|
||
|
19th International Multidisciplinary Scientific GeoConference SGEM 2019
|
|
|
19th International Multidisciplinary Scientific GeoConference SGEM 2019, 30 June - 6 July, 2019
|
|
|
Proceedings Paper
|
|
|
STEF92 Technology
|
|
|
International Multidisciplinary Scientific GeoConference-SGEM
|
|
|
Bulgarian Acad Sci; Acad Sci Czech Republ; Latvian Acad Sci; Polish Acad Sci; Russian Acad Sci; Serbian Acad Sci & Arts; Slovak Acad Sci; Natl Acad Sci Ukraine; Natl Acad Sci Armenia; Sci Council Japan; World Acad Sci; European Acad Sci, Arts & Letters; Ac
|
|
|
637-644
|
|
|
30 June - 6 July, 2019
|
|
|
website
|
|
|
cdrom
|
|
|
5405
|
|
|
spreadsheet data extraction; table interpretation; semantic network; knowledge graph
|
|