TDM (Text and Data Mining) is the automated process of selecting and analyzing large amounts of text or data resources (Springer). These resources can be acquired through the use of APIs or webcrawlers, but not all websites permit users to mine their content.
Always check with the relevant subject librarian prior to beginning any TDM projects. See a list of all current BU subject librarians here.
An API (Application Programming Interface) is a set of rules or protocols that let software applications communicate with each other to exchange data, features and functionality. In other words, APIs can be used as a means to extract significant amounts of back end (raw data) from a database, which will then need to be converted into another format for analysis.
This collection and reformatting process is also called web scraping. Due to Boston University's licensing agreements, the use of text scrapers or web crawlers is usually prohibited. Instead, the publisher's API must be used to collect this data only if that API is available through BU's subscription. If you are unsure of whether a database has an API to use, please contact the relevant subject librarian for more information.