QUALITY CONTROL
Effective sharing, preserving and re-use of research data require quality control of data in order to avoid data errors. Examples of how quality control techniques may be applied to different types of data are given below. In addition, the provision of good data documentation and metadata is also paramount.Quantitative surveys and other micro data
To ensure high quality data arising from surveys, there are some standard validation, verification and cleaning strategies that should always be undertaken. Data recorded via a questionnaire should be an accurate reflection of the actual responses given. Data should be thoroughly 'cleaned' by:- rectifying errors in transcription
- double-checking the coding of responses, removing all out-of-range codes
- adding variable and value labels where appropriate
- documenting methods used to create and calculate derived variables - comprehensive guidelines on how to provide good variable-level documentation (metadata) are given on the data documentation page.
The checking of survey data typically involves both automated and manual procedures. Automated procedures may include setting up validation rules in data entry software such as SPSS, Excel, Access (e.g. only allowing two values ('1' or '2') for variable 'Sex'). Computer-Aided Interview (CAI) software takes such checks a stage further by verifying response consistency, routing questions so that only appropriate questions are asked and confirming responses against previous answers where appropriate (e.g. in longitudinal surveys). Thus, where CAI is used, data quality is to a large extent guaranteed by the computer-aided instrument. After data collection, appropriate format conversion must be carried out (e.g. BLAISE CAI software can output SPSS files), so that data may subsequently be analysed with ease. Sometimes, CAI software may truncate (or add non-standard characters to) variable and value labels. If this occurs, the labels should be edited so that their meaning is clear to the secondary data user.
Aggregated (macro) data
Where data are aggregated, for example economic time series data, clear notes should be included on data gathering criteria. For example, do data collection methods differ between countries, or over time? Is enough information given so that users will be able to interpret the data? Where indices are created, these should be treated in the same way as derived variables within micro data; labels should be provided and derivation methodology fully documented.Technical data
For data gathered through the use of scientific measuring instruments, validation refers to checking for equipment and transcription errors, while verification is the checking of the truth of the record by an expert or by taking multiple samples so that data may be compared. Calibration will usually form a core part of data validation/verification. Calibration checks are increasingly built into the instrumentation, but may need to be performed pre-measurement, comparing the measuring instrument against a standard, to check the precision, bias, deviation and/or scale of measurement; and where necessary correct for errors. This is typically performed by using a model to analyse a set of validation samples and statistically comparing the estimates to reference values measured for these samples.
Qualitative data: recorded interviews
Data quality of interview data gathered by means of recorded interviews depends on:- the quality of the interview method
- the quality of the audio-visual equipment
- the audio transcription of interviews















