MANAGE AND SHARE DATA


DATA FORMATS

The UKDA works with different data formats for different purposes. There are optimal data formats that are used for long-term preservation and formats that are used for dissemination, which reflect the most popular formats requested by users.

Data files should be stored and offered to UKDA in relatively 'well-known' software formats where possible. Ideally, researchers should also try to ensure that back-ups of master copies of data are always in formats that are suitable for long-term digital preservation. This typically means using open formats (like XML) or as open as possible, as opposed to proprietary ones (e.g. Stata or N-Vivo). Please consult the pages on software and data conversion.

UKDA is able to accept data collections via the following:

  • CD-ROM/DVD
  • memory stick
  • secure FTP

Depositors are advised to consider carefully the risks of using portable media, and are advised to encrypt data files where appropriate. See the UKDA data security pages for further information on protection and encryption, and how to make effective back-ups to prevent data loss.

UKDA does not normally accept large amounts of documentation when sent only in hard copy (paper) format, and reserves the right to ask the depositor to make documentation machine readable (either as text or image files).

The table below gives full technical details of preferred and acceptable formats, but suitable examples for upload include:

  • quantitative data: SPSS, Stata, Excel, or tab-delimited ASCII text format (with suitable labels, or a data dictionary)
  • qualitative data (e.g. interview transcripts): Rich Text Format (RTF) (may be generated from Microsoft Word) or ASCII text format
  • documentation: RTF, Excel, Adobe Portable Document Format (PDF or PFF/A) or ASCII text format
If you are unsure of the suitability of your file formats please contact UKDA for advice.

Type of data Preferred format for deposit Other acceptable formats for deposit
Quantitative tabular data with extensive metadata

e.g. a survey dataset with variable labels, code labels, and defined missing values, in addition to the matrix of data

  • SPSS portable (.por) format, or delimited text and command ('setup') file (SPSS, Stata, SAS, etc.) containing metadata information
  • other structured text/mark-up file containing metadata information e.g. DDI XML file
  • proprietary formats of statistical packages (e.g. SPSS (.sav), Stata (.dta) etc.)
Quantitative tabular data with minimal metadata

i.e. a matrix of data with or without column headings/variable names, but no other metadata or labelling

  • comma-delimited (.csv) or tab-delimited (.tab) files, including delimited text of given character set with SQL data definition statements where appropriate - these are most widely used, and most widely recognised by import 'wizards'
  • delimited text of given character set - only characters not present in the data should be used as delimiters
  • widely-used formats e.g. MSExcel (.xls/.xlsx), MSAccess (.mdb/.accdb), or dBase (.dbf) and OpenDocument Spreadsheet (.ods)
GIS and CAD data

e.g. vector and raster

  • ESRI Shapefile (.shp, .shx and .dbf)
  • geo-referenced TIFF (.tif and .tfw)
  • CAD data (.dwg)
  • GIS attribute data as per 'tabular data with minimal metadata'
  • MapInfo Interchange Format (.mif) for vector data
  • Keyhole Markup Language (.KML) as used for Google Earth, Google Maps
  • Adobe Illustrator, CAD data (.dxf or .svg)
  • binary formats of GIS and CAD packages may be acceptable
Qualitative data

textual

  • eXtensible Markup Language (XML) marked-up text according to an appropriate Document Type Definition (DTD) or schema

  • Rich Text Format (.rtf)
  • plain text data, ASCII (.txt)
  • Hypertext Markup Language (HTML)
  • widely-used proprietary formats e.g. Microsoft Word (.doc/.docx)
  • proprietary/software-specific formats such as NUD*IST, NVivo and ATLAS.ti
Digital image data
  • TIFF (version 6) uncompressed
  • JPEG (.jpeg, .jpg)
  • TIFF (other versions)
  • Adobe Portable Document Format (PDF/A or PDF)
  • raw image format (.RAW)
  • software-specific formats (such as, for example, Photoshop .psd files) may be acceptable, but contributors should contact UKDA for advice before file upload
Digital audio data
  • Free Lossless Audio Codec (FLAC) (.flac)
  • WAV file (.wav)
  • MPEG-1 Audio Layer 3 (.mp3)
  • Audio Interchange File Format (AIFF) (.aif)
Digital video data
  • JPEG 2000
Contributors should contact UKDA for advice before file upload.
Documentation
  • RTF (.rtf)
  • PDF/A or PDF
  • HTML (.htm)
  • Open Document Text (.odt)
In addition to those formats named in the 'preferred' column:
  • plain text (.txt)
  • widely-used proprietary formats e.g. Microsoft Word (.doc/.docx) or Excel (.xls/ .xlsx), are acceptable but offer less long-term security
  • XML marked-up text according to an appropriate DTD or schema, e.g. XHMTL 1.0

For any queries regarding data formats contact acquisitions@esds.ac.uk

Home | A-Z | Contact | Login | Print-friendly page




SEARCH

all UKDA web site
Data Catalogue



UKDA SERVICES Show/hide comment




Managing and Sharing Data
a best practice guide for researchers

PDF of Managing and Sharing booklet

Printed copies of the brochure are available on request from publicity enquiries.