MANAGE AND SHARE DATA


UKDA PROCESSING STANDARDS

UKDA is at the forefront of developing international standards for data processing, of both quantitative and qualitative data. As part of its commitment to maximise the size of its collection while focusing resources on adding value to the best-used datasets, there are four different levels of data processing. A processing standard (graded A*, A, B and C) is assigned to each incoming study, dependent on anticipated future usage. Processing activities are then carried out in accordance with each processing level, as described in the tables below. Data processing activities for the majority of data fall into two parts:
  • validation and content checks
  • format translation checks by level of processing

Validation and content checks

The main validation and content checks for data and documentation are listed below. Further details may be found in the UK Data Archive Data Processing Standards document.


Data processing standards required for each level of processing:

Quantitative data

Level A* Level A Level B
Dataset dimension checks
  • the number of cases and variables are checked against the documentation
  • as for A*
  • as for A*
Metadata checks
  • the dataset must be comprehensible in itself - i.e. all variables should have variable labels and all categorical variables should have value labels
  • the dataset must be comprehensible in association with the documentation given to users
  • visual checks on quality are undertaken
  • action is taken for systematic problems
Data validity checks
  • all categorical variables must be checked for out-of-range values/wild codes
  • where possible, interval variables must be checked for improbable or impossible values
  • as for A*
  • a sample of 30 + 10 per cent of the remaining categorical variables must be checked for out-of-range values/wild codes
  • a sample of 30 + 10 per cent of the remaining suitable interval variables must be checked for improbable or impossible values
Confidentiality checks
  • always undertaken
  • always undertaken
  • always undertaken
Metadata enhancements
  • the following are added: literal question text; routing information and interviewers' instructions; frequencies and summary statistics; variable groups
  • extensively bookmarked PDF user guides are produced
  • additional related resources are provided on a dedicated web page
  • additional notes to users are given in the 'Read file'
  • extensively bookmarked PDF user guides are produced
  • additional related resources are provided on a dedicated web page
  • additional notes to users are given in the 'Read file'
  • a bookmarked PDF user guide is produced
  • additional notes to users are given in the 'Read file'
top

For level C studies, a minimum of dataset dimension checks and confidentiality checks is carried out, with metadata enhancements as for B studies.

Qualitative data
Level A* Level A Level B Level C
  • data are fully digitised and anonymised
  • data are marked up in XML
  • data are additionally made accessible through UKDA and via ESDS Qualidata Online
  • metadata and documentation are fully digitised and anonymised
  • metadata and documentation are accessible through UKDA
  • enhanced user guide is prepared for ESDS Qualidata Online
  • data are fully digitised and anonymised
  • data are made accessible via UKDA
  • metadata and documentation are fully digitised and anonymised
  • metadata and documentation are accessible through UKDA
  • data are digitised at least to the level of scanned images and anonymised
  • data are made accessible via UKDA
  • metadata and documentation are digitised at least to the level of scanned images and anonymised
  • only major problems with data are resolved
  • metadata and documentation are accessible through UKDA
  • no checks are made
  • data remain in the format in which they were received
  • non-digital collections are not anonymised or digitised and are transferred to another repository
  • only a basic catalogue record is created
top

In practice, most deposited qualitative data collections are processed to A standard, with a select few being nominated for enhancement to A*. B and C are seldom used but apply when handling older paper-based studies.

Format translation checks by level of processing (quantitative data)

These checks are carried out on conversion from the ingest format (the format the data arrive in) into the preservation format (tagged or delimited text of defined character set). They are also carried out from the preservation format to the dissemination formats (typically Stata and tab delimited text) but also sometimes MS Excel, MS Access, SIR and SAS.

Note: UKDA has written programmes to automate most data format conversions for all levels of processing. These ensure that no data or 'internal metadata' (variable and value labels, missing value definitions, variable format information, etc.) are lost beyond any that would occur because of differential data handling limits in specific software formats. For details see data formats.

The checks below are performed manually for the few types of data conversion that do not have a quality checked automated conversion programme.

Data processing format conversion checks:


Level A* Level A Level B Level C
Numbers of rows and cases the same R + C R + C R + C R + C
Number of decimal places the same for numeric formats R + C R + C R + C
String variables not truncated R + C R + C R + C
Date/time variables correctly formatted R + C R + C R + N
Internal metadata (variable names, variable labels, value labels and definition of missing values) not lost or altered R + C where possible R + C where possible R + N

R = relevant checks must be made
C = problems encountered must be corrected
N = problems encountered need not be corrected but must be noted in the 'Read file' supplied to users with each order
top
Home | A-Z | Contact | Login | Print-friendly page




SEARCH

all UKDA web site
Data Catalogue



UKDA SERVICES Show/hide comment




Managing and Sharing Data
a best practice guide for researchers

PDF of Managing and Sharing booklet

Printed copies of the brochure are available on request from publicity enquiries.