UKDA PROCESSING STANDARDS
UKDA is at the forefront of developing international standards for data processing, of both quantitative and qualitative data. As part of its commitment to maximise the size of its collection while focusing resources on adding value to the best-used datasets, there are four different levels of data processing. A processing standard (graded A*, A, B and C) is assigned to each incoming study, dependent on anticipated future usage. Processing activities are then carried out in accordance with each processing level, as described in the tables below. Data processing activities for the majority of data fall into two parts:- validation and content checks
- format translation checks by level of processing
Validation and content checks
The main validation and content checks for data and documentation are listed below. Further details may be found in the UK Data Archive Data Processing Standards document.Data processing standards required for each level of processing:
Quantitative data
| Level A* | Level A | Level B | |
| Dataset dimension checks |
|
|
|
| Metadata checks |
|
|
|
| Data validity checks |
|
|
|
| Confidentiality checks |
|
|
|
| Metadata enhancements |
|
|
|
For level C studies, a minimum of dataset dimension checks and confidentiality checks is carried out, with metadata enhancements as for B studies.
Qualitative data
| Level A* | Level A | Level B | Level C |
|
|
|
|
In practice, most deposited qualitative data collections are processed to A standard, with a select few being nominated for enhancement to A*. B and C are seldom used but apply when handling older paper-based studies.
Format translation checks by level of processing (quantitative data)
These checks are carried out on conversion from the ingest format (the format the data arrive in) into the preservation format (tagged or delimited text of defined character set). They are also carried out from the preservation format to the dissemination formats (typically Stata and tab delimited text) but also sometimes MS Excel, MS Access, SIR and SAS.Note: UKDA has written programmes to automate most data format conversions for all levels of processing. These ensure that no data or 'internal metadata' (variable and value labels, missing value definitions, variable format information, etc.) are lost beyond any that would occur because of differential data handling limits in specific software formats. For details see data formats.
The checks below are performed manually for the few types of data conversion that do not have a quality checked automated conversion programme.
Data processing format conversion checks:
| Level A* | Level A | Level B | Level C | |
| Numbers of rows and cases the same | R + C | R + C | R + C | R + C |
| Number of decimal places the same for numeric formats | R + C | R + C | R + C | |
| String variables not truncated | R + C | R + C | R + C | |
| Date/time variables correctly formatted | R + C | R + C | R + N | |
| Internal metadata (variable names, variable labels, value labels and definition of missing values) not lost or altered | R + C where possible | R + C where possible | R + N |
R = relevant checks must be made
C = problems encountered must be corrected
N = problems encountered need not be corrected but must be noted in the 'Read file' supplied to users with each order















