Different meanings of data quality

Different meanings of data quality

Data quality is a very broad concept that contains different types of checks processed on either data or datasets.

Data quality as a conformity check

Data quality can mean a conformity check against:

  • A standard / A norm

  • A file format

  • An attribute as defined in the standard/norm.

Questions for the above can be:

  • Does my file meet the expectations of a given standard/norm?

  • Does my file meet the expectations for data exchange of a given standard/norm?

  • Does my file contain the mandatory attributes of a given standard/norm?

Such conformity checks can be performed on:

  • Data itself (e.g., are “order” attributes all a positive integer?)

  • A single file (e.g., is my file a properly formed XML?)

  • The dataset itself (e.g., does my dataset include all mandatory files?)

Data quality as quality control

Data quality can mean a quality control against:

  • A standard / A norm / A profile

  • Local regulations or agreement

  • The industry know-how

Questions for the above can be:

  • Does the name of my file meet the expectations of a given profile?

  • Does the content of my file meet the local requirements?

  • Does the content of my file make sense when publishing a public transport offer?

Such quality controls can be performed on:

  • Data itself (e.g., are all my stops IDs the ones required by the local profile?)

  • A dataset (e.g., does my dataset include all information made mandatory by the local profile?)

  • Across historical data (e.g., is a persitent attribute consistent in each dataset?)

  • Across historical dataset (e.g., does my dataset always include the same required files?)