Data Quality – The way to a better quality of Business Intelligence
Relatively little attention during the various BI (Business intelligence) projects is paid on data quality from production systems. This is data upon business decisions will be made. Source (production) systems are basis of information and are feeding BI applications and with aggregation and presentation of data in a certain way. If the input data does not meet certain quality levels it is unrealistic to expect that the usefulness of projects and applications that have occurred on such a precarious be well, even though their own projects and applications can be done technically perfect. Regardless of whether the data warehouse project, planning, or a project that provides a unique overview of the service users, the quality of existing data will most directly affect the result of the project. Poor data quality at source will surely cause poor business decisions.
Once it is proven how available data is of low-quality, applications users would often leave the project instead of dealing with improving the quality and to recognize that a key problem in the functioning and success of the BI, and similar projects. Regardless of the integration projects poor data quality has its influence also in the production systems. Consequences are are usually manifested as poorer productivity with more errors when routine tasks that use the data of operating system (easiest example: billing to the wrong address). So production shows inability data hk to provide information for monitoring business activities generally and/or those jobs require and consume a lot of IT resources in terms of human labor.
The fact is that few companies have an awareness of how the data is poor quality and much less any awareness that something needs to be done and that the quality of the data must be treated as an equal business problem. Most businesses could specify at least one of the project, which is inefficient in the sense that it is not used because the data with which to operate there are not completely accurate, ie can not be placed upon them. Improving this situation is the explanation that the problem lies in the quality of data and not in the Business Intelligence project.
Incomplete data and low quality
The most common problems can be observed as incomplete or poor quality data. It means that the data simply does not exist or another that data is inaccurate and that is worse. Incorrect data is of course dangerous because it seems everything is OK and actually bringing it as basis for a wrong decision. BI applications are mostly dealing with agregated results and presentation of such results. Incorrect data for a large degree of aggregation will go unnoticed more often than in the operating system.
For example, if user X has an error to account for 10.220 USD instead of 1022 USD to be produced will be probably noticed in production system because the number of accounts in such an account ‘stick’ among others and ultimately appeal administrator. After millions of aggregated amount in the BI application, the difference of tens of thousand is not clearly visible.
Incomplete data is an illusion that we have the information. This is a dangerous illusion in applications where the design should be considered to have some information and when it is time to analyze and present (which is always at a later stage of the BI) turns out to be no. For example, management company thinks it has the e-mail addresses of its users, in fact, have about 3% of e-mail users because the application does not require this field to always be filled (which in this case is the only correct approach to this attribute). With such data can not learn something new about the structure of the user and can be used in further analysis for sales and marketing.