CREATE & MANAGE DATA

FORMATTING YOUR DATA

FILE FORMATS & SOFTWARE

The format and software in which research data are created usually depend on how researchers choose to collect and analyse data, often determined by discipline-specific standards and customs.

All digital information is designed to be interpreted by computer programs to make it understandable and is - by nature - software dependent. All digital data are thus endangered by the obsolescence of the hardware and software environment on which access to data depends.

Despite the backward compatibility of many software packages to import data created in previous software versions and the interoperability between competing popular software programs, the safest option to guarantee long-term data access and usable data is to convert data to standard formats that most software are capable of interpreting, and that are suitable for data interchange and transformation.

This typically means using open or standard formats - such as OpenDocument Format (ODF), ASCII, tab-delimited format, comma-separated values, XML - as opposed to proprietary ones. Some proprietary formats, such as MS Rich Text Format, MS Excel, SPSS, are widely used and likely to be accessible for a reasonable, but not unlimited, time.

Thus, whilst researchers use the most suitable data formats and software according to planned analyses, once data analysis is completed and data are prepared for storing, researchers should consider converting their research data to standard, interchangeable and longer-lasting formats, to avoid being unable to use the data in the future. Similarly for back-ups of data, standard formats should be considered.

For long-term digital preservation, the Archive hold data in such standard formats. At the same time, data are offered to users by conversion to current common and user-friendly data formats and may be migrated forward when needed.

VIEW RECOMMENDED FILE FORMATS

Converting data

Data may need to be converted from the original format to a preferred data preservation format in preparation for long-term storage, or to deposit them with the UK Data Archive. Conversion is best done by the researcher familiar with the data, to ensure data integrity during conversion.

When data are converted from one format to another - through export or by using data translation software - certain changes may occur to the data:

  • for data held in statistical packages, spreadsheets or databases, some data or internal metadata such as missing value definitions, decimal numbers, formulae or variable labels may be lost during conversion to another format, or data may be truncated
  • for textual data, editing such as highlighting, bold text or headers/footers may be lost.

After conversion data should be checked for errors or changes.


Qualitative data analysis products

Qualitative data analysis software packages such as NVivo, ATLAS-ti and MAXQDA, have export facilities that enable a whole 'project' consisting of the raw data, coding tree, coded data, and associated memos and notes to be saved.

For archiving such data, the raw data, the final coding tree and any useful memos should be exported prior to deposit. Coded data are preserved by the UK Data Archive in their incoming format, but are not normally distributed, as they cannot be exported in a common non-proprietary format.

ESDS Qualidata is working to encourage the development of data documentation standards using XML. The Data Exchange Tools and Conversion Utilities (DExT) project proposed an XML schema, QuDEx, to represent annotated and complex multimedia data.

At present, coded data are requested infrequently by data users, mainly because the coding process is subjective, often geared towards specific themes, and therefore may not be applicable to the secondary analyst's topic of investigation. However, this is changing and access to coding schemes can be valuable for teaching and other forms of re-use. For larger studies, there is a stronger case for retaining coded data in order to aid searching within large bodies of text. However, this will always be an adjunct to the main body of raw data.

READ ABOUT THE QUDEX STANDARD


A QUICK GUIDE TO THE ARCHIVE