How to Code Data Using the OpenAI API – A Simple R Guide

This tutorial walks you through how to use the OpenAI GPT API in R to code and summarize free-text responses, such as user reviews. You will load a dataset of reviews, send each one to GPT to summarize it, and return the results programmatically.


🧰 What You Need

  • R installed on your system
  • An OpenAI API key – [see this guide]
  • A CSV file with at least one column of data to code – [sample of nine reviews here]
  • (Optional) The entire code in a RMarkdown file – [download here]

1. Setting Up Your R Environment

Install Required Packages

install.packages(c("dplyr", "stringr", "httr", "jsonlite", "readr"))

Load the Libraries

library(dplyr)
library(stringr)
library(httr)
library(jsonlite)
library(readr)

2. Load Your Dataset

In this example, we assume you have a file called Reviews_Small.csv containing a column review.

data <- read.csv("Reviews_Small.csv")
str(data)

Make sure your dataset has at least one column with text data to summarize.


3. Set Up the GPT API Call

Store Your API Key

Important: Replace the placeholder string below with your actual API key from OpenAI.

api_key <- "sk-REPLACE_WITH_YOUR_OWN_KEY"

Define a Function to Send Prompts

send_to_chatgpt <- function(text, temp, api_key, max_tokens) {
  url <- "https://api.openai.com/v1/chat/completions"
  headers <- add_headers(
    `Content-Type` = "application/json",
    `Authorization` = paste("Bearer", api_key)
  )

  data <- list(
    model = "gpt-4o",
    messages = list(list(role = "user", content = text)),
    temperature = temp,
    max_tokens = max_tokens
  )

  response <- POST(url, headers, body = toJSON(data, auto_unbox = TRUE))
  parsed <- content(response, as = "parsed")
  parsed$choices[[1]]$message$content
}

This function sends a prompt to GPT-4o and returns the generated output. You can control the “creativity” of the response using temperature (lower values = more deterministic).


4. Create the Summarization Prompt

instructions_template <- "
You are a helpful assistant. Summarize the following review in one sentence:

Review:
"

5. Loop Through the Reviews

The function below iterates over your dataset, sending each review to GPT and recording the raw output.

process_reviews <- function(num_judges, temperatures, data, instructions_template, api_key, max_tokens) {
  num_reviews <- nrow(data)
  for (temp in temperatures) {
    for (judge in 1:num_judges) {
      temp_col <- sprintf("GPT_Temp%.1f_Judge%d", temp, judge)
      data[[temp_col]] <- sapply(data$review, function(review) {
        prompt <- paste(instructions_template, review, sep = "\n")
        send_to_chatgpt(prompt, temp, api_key, max_tokens)
      })
    }
  }
  return(data)
}

Run the Function

data <- process_reviews(
  num_judges = 2,
  temperatures = c(0.2, 1.5),
  data = data,
  instructions_template = instructions_template,
  api_key = api_key,
  max_tokens = 60
)

This sends every review to GPT twice, once at temperature 0.2 (more factual) and once at 1.5 (more creative).


6. Save Your Results

write.csv(data, "Reviews_coded.csv", row.names = FALSE)

This exports a new CSV file with the original reviews and GPT-coded summaries.


✅ Conclusion

Using GPT to code text data offers flexibility and scale, making it easier to process large sets of qualitative input. You can expand this framework to assign sentiment, categorize comments, or extract themes—just by changing the prompt.

Be sure to monitor your token usage if you’re doing large-scale work with the OpenAI API.

2 responses

  1. How to Code Data Using the OpenAI API: An SPSS Guide and Syntax File – questionableresearch.ai Avatar

    […] a previous post, I showed how to code data using the OpenAI API inside R. Today, I will walk through how to do the same inside IBM SPSS Statistics, using a small dataset of […]

    Like

Leave a reply to How to Code Data Using the OpenAI API: An SPSS Guide and Syntax File – questionableresearch.ai Cancel reply