Skip to Main Content

Research Data Management

How to organize, preserve, share, and cite data

What are Metadata?

Metadata are data that describe data. They are everywhere; we just don’t always notice them. As an example, think about a book you might buy online. The book itself contains text, a kind of data. Information you might find about the book in an online store includes:

  • the name of the author
  • the name of the publisher
  • the date of publication

The metadata above provide more information about the item you’re looking at. When metadata for many items are brought together and standardized, they become powerful tools for locating and discovering things - like a library catalog or an internet search engine.

Why Describe Your Data?

Metadata are important because they explain a data set to others. Data sets exist within a certain context, and this context must be communicated well so that others can reuse the data set.

For example, the City of Boston has open data on 311 Service requests. If a researcher wanted to use these data and didn’t know the data was about Boston, what a 311 request is, or the year the data was created, it would be very difficult for them to understand or reuse this data set. Even with this information, without a data dictionary, it would be hard to understand what some variables are, what blank values mean, or what values are possible.

Metadata provide necessary information for others (sometimes your future self) to understand the data set and properly reuse it. It often takes time to create metadata, but the effort is worthwhile.

General Guidelines

Make Your Data Citable

To help others find your data and to reuse it appropriately, you’ll need to provide enough details to ensure your data is citable. Here are the following items you will need:

  • Creator(s) of the data
  • Title of the data set
  • Year the data set was published or submitted to a repository
  • Version or edition of the data set
  • URL or DOI of the data

For more information on how to cite data, click the Citing Data tab on the left side of this guide.

Provide Documentation

Sharing documentation about your data set is the best way to help others reuse it. Additionally, developing the documentation will also help you articulate some of the subtle details living within your data.

A common method for documenting your data is writing a data dictionary. A data dictionary explains variable names, potential values, and format. Data dictionaries don’t have to be complicated to be useful - a spreadsheet or text file will do the trick.

An example data dictionary entry from the 311 Service calls in Boston looks like this:

Variable Name Label Type Value Codes Missing Code
OPEN_DT Case open date Date (mm/dd/yyyy
hh:mm:ss AM/PM)
NA (BLANK)

 

The table above quickly conveys a lot of useful information. According to the table, the variable name OPEN_DT is a case open date, and we’d expect to find it in date-time data in the data set. Without this, we’d have to contact the creator of the data set to ask “What does OPEN_DT stand for?”, which makes it time-consuming for everyone involved.

Other information to include with your data set might be:

  • Geographic location
  • Instruments used (and their settings)
  • Data collection methodology
  • Protocols for cleaning data

Finally, it is always important to provide a short story about your data that briefly explains the who, what, when, where, and why about your data set. Also, if the data set was the foundation of any published works be sure to mention that and provide a link, if possible.

Without the proper documentation, your data is unlikely to be reusable.

Things to Avoid

Here are a few things to try to avoid when providing metadata about your data set:

  • Being inconsistent with creators’ names
  • Cryptic variable names
  • Misleading or dated documentation
  • Saving documentation in proprietary formats that others may not be able to access
  • Providing inaccurate contact information

Librarian

Profile Photo
JD Kotula
Contact:
38 Cummington Mall
Boston, MA 02215
617.358.6900