Resources – Timothy Betts | Assistant Professor

UPDATE [2023]: Currently, many of the methods for data scraping used in these resources no longer function because of changes made to the Reddit API. I am working on updating this page to provide more current resources for collecting Reddit data in the future.

This page will provide links to video tutorials for scraping Reddit using RedditExtractoR and a downloadable script and example scripts from the tutorial.

RedditExtractoR Script Files

If you are interested in learning a bit more about best practices for using Reddit in communication studies research, please visit the work that my colleague, Elizabeth Hintz, and I published in the Annals of the International Communication Association!

One of the tools available for researchers to pull data from Reddit, is the RedditExtractoR package, created by the brilliant Ivan Rivera, which uses the R Project for Statistical Computing! We use both of these tools when working with Reddit data, and we want to make it as easy as possible for communication researchers to use this valuable source of data!

There are two files (available to download from Google Drive) that will be helpful as you scrape Reddit data. The first file, the RedditExtractoR example, is an example script that I have used in a previous research project! The second file, labeled RedditExtractorR script, is an example script that you can use to pull your own data using the Reddit API. Both of these files are available to download as .R files (scripts that will open in R), but you can also open them as text files in any text editor (e.g., Notepad for Windows or TextEdit for MacOS).

Here’s an example of what the code looks like!

install.packages("RedditExtractoR")
install.packages("writexl")
library("RedditExtractoR")
library(writexl)

##Search Parameters

#Keywords for your search
search_term <- "[insert search]"
#Which Subreddit would you like to search?
search_subreddit <- "[insert subreddit]"
#Search_time represents the period of interest for the search: hour, day, week, month, year, or all
search_time <- "[insert]" 


##RunningSearch 
threads <- find_thread_urls(keywords=search_term, subreddit=search_subreddit, sort_by="new", period=search_time)
data <- get_thread_content(threads$url)

##Output Results to Excel File
write_xlsx(data, path=(paste(search_subreddit, "_", search_term, "_",Sys.Date(), ".xlsx")))

Our work in Annals references the files that are available below and gives you a bit more detail about them.