SWS Academic Research eLibraryEarth & Planetary Sciences

Scholarly record

INFORMATION QUALITY IN THE DATABASE

M. R. Hojbota

First published: 2007DOI pendingView metrics

Abstract

In this paper some specific problems concerning the database and information quality are presented analyzed. There are emphasized some approaches presented in the literature and some “classical” mistakes encountered in practice.

Publication details

Title
INFORMATION QUALITY IN THE DATABASE
Authors
M. R. Hojbota
Proceedings
7th International Scientific Conference - SGEM2007
Publisher
SGEM Scientific GeoConference
Year
2007
Pages
Not available yet
ISSN
1314-2704
ISBN
954-918181-2
Language
en
Publication type
Conference Paper
Keywords
References6
  1. and some “classical” mistakes encountered in practice. Keyword: information quality, data quality, database quality, data integrity, database design. INTRODUCTION In the last years the increasingly dependence of decision level on the information provided by the information systems of the enterprise and the development of new forms of data collections like data warehouse, mobile databases, Internet. Poor data quality is sapping all organizations of money and opportunities. The problem of data quality and information quality must be tackled in the frames of general quality theory developed by the American and Japanese quality gurus beginning with the middle of the twentieth century. Despite of the differences which exist between the position of the quality gurus in many aspects concerning the analysis and the implementation of the quality in the real enterprises, there are some common positions, namely:  The quality is defined as “the fitness with the costumer expectations”;  The implementation of quality is a management problem not one of the workers;  The benefit of quality must be measured as the costs of nonquality not as a bonus for enterprise. The implementation of the different methods proposed for improving the quality of the activity in the enterprise is difficult either because of the misunderstanding of some fundamental concepts at the management level or of the lack of motivation for the executive levels. Many peoples ignore the existence of several views of quality: From quality professionals; From management; From employees; From the customers. The struggle for the quality emphasizes a basic idea, unanimously accepted –even if the management must impose the quality standards in the enterprise, the imposed objectives realization implies all employees. The database schema quality IT is responsible for the quality of the systems that move the data and store it. Much of the problem lies outside IT, through poorly articulated requirements, poor acceptance testing of systems, poor data creation processes, and much more. The fact that data quality is universally poor indicates that it is not the fault of individually poorly managed organizations but rather that it is the natural result of the evolution of information system technology. There are two major contributing factors. The first is the rapid system implementations and changes that have made it very difficult to control quality. The second is that the methods, standards, techniques, and tools for controlling quality have evolved at much slower pace than the systems they serve. The problem is to determine when a database has an acceptable quality. In principle the database quality implies: The database design quality; The data integrity; The data precision; The data relevance; The data protection and security; The documentation quality. The database design quality must be analyzed at all the three levels –conceptual, logical and physical. The poor quality at one of these levels implies the fail of the project but the correct solution at each level do not guaranties the success of the project. One explanation consists in the fact that in many projects each application uses its one database, even the databases are practically identical. Another problem is the captation of all interesting data from customer point of view. Such problems appear because the database designer is concerned only with the solution og one application. This is the classical error and not only for the beginners – database is dedicated only to one application. The modification of the database structure is imposed either by the design errors or by the necessity to assure the concordance between the structure of database objects and the changes appeared in the business environment. For example, if the relationship between two relations is at the current moment of type 1:M but it is possible that some changes in the business rules will transform this relationship in one of type M:M, the relation will be treated from the beginning like being M:M. The normalization of the database structure is one of the sources of the poor behavior especially in the large databases. The new generation of DBMS for little systems offers a relative poor set of join operators, especially they do not offer the possibility to perform the outer join or reunion. For the simulation of outer join it is necessary to use a kind of data redundancy, namely a line for each possible value of foreign key, even the line contains only null values but the foreign key. The importance of the conceptual model of database is accepted today by almost all database designer but the existence of a lot of conceptual models, each with countless dialects, the lack of standardization, decrease the impact of conceptual design.. The ER models are very poor in the capture of business rules and only in last years some variants borrowed some concepts from other models for the representation of more sophisticated data integrity constraints. The ORM models are better concerning the captation of complex integrity constraints, the readability of the model and the transformation in the relational model are the weak point of this model. The object-oriented models are generally too complex and they did not replace the old models in the database design practice. Some users view the same data in different forms each useful for one proposal. The independence application for one user can become only a component of the application for others users or some application can be integrated in a one more useful application. In many situations the requirements referred at the old experience of the user and really not feet the user’s expectation. The difference between current data and historical data or between strategic and tactical application is generally difficult to understand for the users. There are situation when an apparent good design, acceptable from the user point of view can generate complicated problems in the future. By example, if the attribute Address is considerate like an atomic string of characters, if in the future it is necessary to develop a data warehouse or a global database, the identification of the component of each address will be a laborious and delicate problem The data quality An important problem is the confusion between some concepts: - some authors consider the terms data quality and information quality as being synonyms. In fact the two terms refer two distinct concepts, namely data and information. The databases are utilized like basuc elements in the decision process, the information can be defined as”the incertitude removed concerning the realization of an event among a set of possible events”. The data can be considered as “information removed from context” or “the representation of information on a material support”. The transformation of the data in information implies the placement of the data in a well-defined context and the interpretation in according to the user needs. The information is the result of the data processing by human or automatic means. Even if the data quality is guaranteed, the reslted information can be of a poor quality or even can be wrong. - An unacceptable confusion, which persists especially in some product documentations, is that between the terms data validation and data accuracy (data precision). Data validation supposes the checking of the data values for the allegiance at a set of values defined in the dta integrity constraints. The representation of a date can create many misunderstanding because the user would not know whether the date was invalid or just erroneously represented or what is the real date. The date 11/02/2007 means 11 February 2007 for an European but 2 November, 2007 for an American. A value may not be considered accurate if the user of the value cannot tell what it is. The guaranty of data accuracy is a very sensitive problem, which imposes a permanent activity for correcting, fixing and analyzing corrupted or erroneous data. For this activity the organizations spend a lot of time and resources. There are also more subtle aspects of data accuracy. For example it is possible that a patient presents corrupted data concerning the incidence of chronicle diseases in his family because he doesn’t know that his parents are not biological parents and that the information concerning his relatives are not relevant for the medical assessment. Another example is represented by the declarations about the number of birth or abortions because many women consider these problems like crucial secrets of there live. The data accuracy is time dependent. The assertion concerning data accuracy is not consistent if it doesn’t precise the moment of data assessment. This means that the actuality of data is a very important component of data . A problem not sufficiently discussed is the meaning of concept accurate data, who must guaranty the data accuracy and which are the legal responsibility for the provider of poor data quality. In fact, the problem is with what must be compared the registered data, in the data accuracy assessment process. An important component of data quality as the customer perceives it, is represented by the time wasted in obtaining the answer at the query. Apparently the value of this parameter is very strong related with the hardware quality but there are many situations when it is especilally dependent on the information system arhitecture. A typical examole is when the customer is very content with the database quality in the first month of exploatation but in time the value of the access time to the data becomes unacceptable. Because the decrease of perfornance is obviously related to the increase of data quantity, the selected solutio is frequently the acquisition of a more powerful server or of a more sophisticated DBMS. In some cases, as a result of a more detailed analysis it comes out that the new acquisitions resolve the problems only on short term because the data volume continually increases. There are however situation when the customer doesn’t know what the result will be. This happens when the user obtains the information from Internet. Many free applications can be found in the Web such as meta search engines, integrated stock information systems, or bibliographic services. Each of these programs must integrate the information obtained from many independent and autonomous sources. The response time of an Internet source is less important compared to its ability to provide the information queried for. To gain the full advantage of multiple sources a user must query all available sources and integrate the results. Another problem is the impossibility to repeat in tine the same query with the same result and the difference between the results provided by different search engines. CONCLUSIONS The temptation to focus only on the data quality program is wrong because the goal of a quality program is to improve the quality of the product. The quality method must focus on the information “customers” to understand their quality requirements in order to perform their processes efficiently and effectively. In a time of decreased budgets, the cost of data quality is very important. According to several leading data quality managers, thus cost may be expressed as: Cost of Poor Data Quality=Cost to Prevent Errors*cost to Correct Errors+Cost to “Make Good” for the Customer The lake of quality reduces return on investment, diminish staff productivity, harms the credibility of the organization. REFERENCES

  2. Batini Carlo, Ceri Stefano, Navathe Shamant(1992) Conceptual Database Design: an Entity-Relationship Approach –The Benjamin/Cummings Publishing Company, Inc.

  3. Naumann Felix – “From Databases to information Systems –Information Quality Makes The Difference”

  4. Olson Jack (2003) –“Data Quality-The Accuracy Dimension”-Morgan Kaufmann Publishers

  5. English Larry –Plain English on Data Quality: Seven Deadly Misconceptions –DM Review

  6. Fisher Craig, Kigma Bruce – Criticality of Data Quality as Exemplified in Two disasters- Information&Management 39 109-116

View or Download full articleAccess options
Full paper accessChoose SWS login, librarian support, or instant article download.

SWS access login

Login as SWS Scientific Committee

Authors and approved SWS contributors will read and export their own linked papers after identity matching by SWS profile, email and SGEM GlobalID.

For librarian assistance: [email protected]

Purchase Instant Access

48-hour online accessComing soon
Online-only accessComing soon
Download the full article in PDF formatEUR 35
  • Article can be downloaded after successful payment.
  • Article may be used according to SWS library access terms.
  • Article cannot be redistributed.
Get full paper

Back to publication list