Skip to main content Deutsch

Biometric Colloquium

Carsten Oliver Schmidt
(Institute for Community Medicine, SHIP-KEF, University Medicine of Greifswald, Greifswald, Germany)

What do I need for automated data quality assessments?
Concept and application example based on the Study of Health in Pomerania (SHIP)

Abstract:
Data quality assessment is a prerequisite for reliable statistical analysis. It comprises a broad range of procedures, from simple rule-based checks, such as detecting range violations or invalid data types, to statistical methods for identifying outliers, unexpected distributions, associations, or cluster effects that may indicate measurement error or measurement heterogeneity.
Automating these assessments in complex datasets requires balancing two competing needs: flexibility, to adapt methods to specific data properties, and standardization, to ensure reproducible and scalable workflows. This talk presents a metadata-driven approach to this challenge using data quality assessment tools in R and Stata, specifically dataquieR and dqrep.
The central role of metadata will be highlighted. Metadata encode expectations about variables, permissible values, measurement properties, and study-specific requirements, thereby guiding automated checks and supporting transparent reporting. The approach will be illustrated using data from the population-based Study of Health in Pomerania (SHIP).
The talk’s key message is that systematic organization and use of study knowledge through metadata can make the scientific data analysis workflow more efficient, reproducible, and transparent.