Research: Text and Data Mining: Home

Related Guides

Getting Started with ProQuest TDM
by BU Libraries Last Updated Mar 27, 2024 105 views this year
Boston Data and Statistics
by Lucy Flamm Last Updated Apr 22, 2025 843 views this year

What is Text and Data Mining?

TDM (Text and Data Mining) is the automated process of selecting and analyzing large amounts of text or data resources (Springer). These resources can be acquired through the use of APIs or webcrawlers, but not all websites permit users to mine their content.

Always check with the relevant subject librarian prior to beginning any TDM projects. See a list of all current BU subject librarians here.

What is an API?

An API (Application Programming Interface) is a set of rules or protocols that let software applications communicate with each other to exchange data, features and functionality. In other words, APIs can be used as a means to extract significant amounts of back end (raw data) from a database, which will then need to be converted into another format for analysis.

This collection and reformatting process is also called web scraping. Due to Boston University's licensing agreements, the use of text scrapers or web crawlers is usually prohibited. Instead, the publisher's API must be used to collect this data only if that API is available through BU's subscription. If you are unsure of whether a database has an API to use, please contact the relevant subject librarian for more information.

Reference Assistant

Rachel Beaton

she/her