Shazia Sadiq, Xiaofang Zhou Ke Deng

Xiaochun Yang

The integration systems have been a subject of intense research and development for over three decades. Basically the goal of integration systems is to provide a uniform interface to a multitude of data sources. Difficulties in overcoming the schematic, syntactic and semantic differences of data from multiple autonomous and heterogeneous sources are well recognized, and have resulted in a data integration market valued at US$1.34 billion and growing. With the phenomenal increase in the scale and disparity of data, the problems associated with data integration have increased dramatically.

A fundamental aspect of user satisfaction from integration systems is the data quality. Industry reports indicate that expensive data integration initiatives stemming from migrations, mergers, legacy upgrades etc, succeed in achieving a common technology platform, but are rejected by the user communities due to the presence (or exposure) of poor data quality. Poor data quality is known to compromise the credibility and efficiency of commercial as well as public endeavours. Several developments from industry as well as academia have contributed significantly towards addressing the problem.

These typically include analysts and practitioners who have contributed to the design of strategies and methodologies for data governance; solution architects including software vendors who have contributed towards appropriate system architectures that promote data integration and; and data experts who have contributed to data quality problems such as duplicate detection, identification of outliers, consistency checking and many more through the use of computational techniques. The attainment of true data quality lies at the convergence of the three aspects, namely organizational, architectural and computational.

The workshop will provide a forum to bring together diverse researchers and make a consolidated contribution to new and extended methods to address the challenges of data quality in data integration systems. Topics covered by the workshop include at least the following:

- Data integration, linkage and fusion
- Entity resolution, duplicate detection, and consistency checking
- Data profiling and measurement
- Use of data mining for data quality assessment
- Methods for data transformation, reconciliation, consolidation
- Algorithms for data cleansing
- Data quality and cleansing in information extraction
- Dealing with uncertain or noisy data (e.g., sensor data)
- Data lineage and provenance
- Models, frameworks, methodologies and metrics for data quality
- Application specific data quality, case studies, experience reports
- User/social perceptive on data quality and cleansing
- Data quality and cleansing for complex data (e.g. documents, semi-structured data, XMLs, multimedia data, graphs, biosequences, etc.)