CREATE & MANAGE DATA
ANONYMISATION / OVERVIEW
Before data obtained from research with people can be shared with other researchers or archived, you may need to anonymise them so that individuals, organisations or businesses cannot be identified. Here we provide guidance on anonymising quantitative and qualitative data appropriately in order to retain as much meaningful information as possible.
Re-users of data have the same legal and ethical obligation to NOT disclose confidential information as primary users. Anonymisation may be needed for ethical reasons to protect people's identities in research, for legal reasons to not disclose personal data, or for commercial reasons.
Personal data should never be disclosed from research information, unless a respondent has given specific consent to do so, ideally in writing.
In some forms of research, for example where oral histories are recorded or in anthropological research, it is customary to publish and share the names of people studied, for which they have given their consent.
Procedures to anonymise data should always be considered alongside obtaining informed consent for data sharing or imposing access restrictions.
A person's identity can be disclosed from:
- direct identifiers such as names, addresses,
postcode information, telephone numbers or pictures
- indirect identifiers which, when linked with other publicly available information sources, could identify someone, e.g. information on workplace, occupation or exceptional values of characteristics like salary or age
Direct identifiers are often collected as part of the research administration process but are usually not essential research information and can therefore easily be removed from the data.
Anonymising research data can be time consuming and therefore costly. Early planning can help reduce the costs.
Anonymisation techniques for quantitative data may involve removing or aggregating variables or reducing the precision or detailed textual meaning of a variable. Special attention may be needed for relational data, where connections between variables in related datasets can disclose identities, and for geo-referenced data, where identifying spatial references also have a geographical value.
When anonymising qualitative material, such as transcribed interviews, identifiers should not be crudely removed or aggregated, as this can distort the data or even make them unusable. Instead pseudonyms, replacement terms or vaguer descriptors should be used. The objective should be to achieve a reasonable level of anonymisation, avoiding unrealistic or overly harsh editing, whilst maintaining maximum content.