Beyond Sentence Completion: A Researcher’s Guide to the OpenAI Assistant API (in R)

Think that using the “GPT API,” is just about sending prompts and receive answers? Annoyed that the “GPT API” can’t handle files on its own? That’s because you’re only relying on the sentence completion endpoint (just calls to a LLM), when OpenAI’s API supports so many other types requests. In reality, OpenAI can handle your documents, search through them, maintain memory across conversations, and set up custom assistants that work more like collaborators.

However, doing so requires you to learn about each of these tasks has a designated endpoint—a specific address you send your request to. For instance, the part of the API that handles simple chat completions lives at https://api.openai.com/v1/chat/completions. Other endpoints (i.e., addresses like this one) handle file uploads, retrieval, and assistant management.

This tutorial will walk through how to use these capabilities to build a research assistant that can read a PDF, answer questions about it, and keep track of the conversation as you go. All in R.

⚠️ New to APIs or to using R?
If this is your first time working with APIs or the OpenAI platform, we recommend beginning our prior posts here:

1. The Sentence Completion Endpoint

To begin, let’s look at a typical request to the endpoint that powers sentence completion: /v1/chat/completions. This is similar to what we do in our introductory API request post, but we’re doing to send a request tell us the authors of our paper.

library(httr)
library(jsonlite)

api_key <- "<your_key_here>"
url <- "https://api.openai.com/v1/chat/completions"
headers <- add_headers(
  "Content-Type" = "application/json",
  "Authorization" = paste("Bearer", api_key)
)

body <- list(
  model = "gpt-4o",
  messages = list(
    list(role = "system", content = "You are a helpful assistant."),
    list(role = "user", content = "Who are the authors of this article, published in the Journal of Marketing. Title: New Tools, New Rules: A Practical Guide to Effective and Responsible GenAI Use for Surveys and Experiments Research.")
  ),
  temperature = 0.7,
  max_tokens = 100
)

response <- POST(url, headers, body = toJSON(body, auto_unbox = TRUE))
content(response, "parsed")

We start by loading the required R libraries: httr for handling HTTP requests and jsonlite for working with JSON data. We then define the API endpoint (https://api.openai.com/v1/chat/completions) and set the necessary headers, including the Authorization header with our API key and the Content-Type set to application/json.

Next, we construct the body of the request. This includes the model we want to use (gpt-4o), a list of messages that define the conversation so far, and additional parameters like temperature to control the randomness of the response and max_tokens to limit its length. We then send this data to the API using POST(), and finally, we use content(response, “parsed”) to parse the JSON response into a readable R object.

The assistant’s reply can be found in content(response)$choices[[1]]$message$content.

Example Output:

$choices[[1]]$message$content
"As of my last update, I don't have access to specific articles, including their authors, from journals such as the Journal of Marketing. To find the authors of the article titled \"New Tools, New Rules: A Practical Guide to Effective and Responsible GenAI Use for Surveys and Experiments Research,\" I recommend checking the journal's website or accessing the article directly through a library database or academic institution that provides access to the Journal of Marketing. If you have access, you should be able to find the authors"

The model returns a plausible response (truncated because I had set the maximum , but because it lacks memory and access to documents, it cannot retrieve the actual authors.

Moreover, every request sent to the sentence completion endpoint is processed independently. If you want to continue the conversation through the API, you must not only include the next message but also the entirety of the conversation history.


2. Going Beyond Sentence Completion: Assistants and Retrieval

To enable memory and file retrieval, OpenAI provides a collection of endpoints. These enable more sophisticated workflows through what is now called the Assistant API (V2). Here’s what you’ll need:

  • /v1/files – Upload files for indexing and reference
  • /v1/vector_stores – Store and semantically index files for retrieval
  • /v1/assistants – Define assistants and associate them with retrieval tools
  • /v1/threads – Create sessions or conversation threads
  • /v1/threads/{thread_id}/messages – Add user messages to the thread
  • /v1/threads/{thread_id}/runs – Launch assistant runs over a thread

These endpoints collectively allow you to build assistants that:

  • Retain memory over the course of a thread
  • Retrieve relevant excerpts from PDF documents
  • Respond to multi-turn questions referencing those files

3. Building an Assistant That Understands Your Paper

The goal of this section is to build an assistant that can do more than generate a one-time sentence completion. We want it to read and recall content from a PDF, respond to questions about that content, and keep track of our conversation over time. To do this, we’ll construct a system using several components of the OpenAI API—each responsible for a different part of the interaction.

Conceptually, here’s what we’re building:

  • file is uploaded to OpenAI’s servers so the assistant can access its contents.
  • That file is added to a vector store, which allows the assistant to search and retrieve relevant excerpts using semantic similarity rather than just keyword matching.
  • An assistant is created, configured with access to the vector store, and designed to respond using the retrieval tool.
  • thread is initiated to represent the ongoing conversation. Every time we ask a question, it is logged as a message in this thread.
  • Each time we want a response from the assistant, we trigger a run that links the assistant to the thread and produces a reply.

In short, we’re stitching together file storage, search, memory, and messaging to simulate the experience of conversing with a well-informed research assistant who knows your paper and remembers your questions. The steps below walk through this system in R, showing both the code and the assistant’s responses.

Step 1. Upload the File

file_path <- normalizePath("/Users/simonblanchard/Dropbox/JM - AI in Survey Research/Tutorials/R/JMpaper.pdf", mustWork = TRUE)
upload_response <- POST(
  url = "https://api.openai.com/v1/files",
  add_headers("Authorization" = paste("Bearer", api_key)),
  body = list(file = upload_file(file_path), purpose = "assistants")
)
file_id <- content(upload_response)$id
cat("File uploaded. ID:", file_id)

Output:

File uploaded. ID: file-QSyk7XkVvwKqsMWrn1gczU

In this step, we upload a local PDF file to OpenAI so that it can be indexed and searched by the assistant. To do so, we send a POST request to the /v1/files endpoint with two parts:

  • The file itself, specified using upload_file(file_path).
  • A purpose parameter set to “assistants”, indicating that the file will be used with the Assistant API (V2).

Once the upload is complete, the response returns a file_id—a unique identifier that we’ll use to reference this file in later steps. This ID confirms that the file has been successfully stored on OpenAI’s servers and is ready to be associated with a vector store.

Step 2: Create a Vector Store

vector_store_body <- list(name = "MyVeryNicePapers", file_ids = list(file_id))
vector_store_response <- POST(
  url = "https://api.openai.com/v1/vector_stores",
  add_headers(
    "Content-Type" = "application/json",
    "Authorization" = paste("Bearer", api_key),
    "OpenAI-Beta" = "assistants=v2"
  ),
  body = toJSON(vector_store_body, auto_unbox = TRUE)
)
vector_store_id <- content(vector_store_response)$id
cat("Vector store created. ID:", vector_store_id)

What This Code Does

We begin by defining a request body (vector_store_body) that names the vector store (“MyVeryNicePapers”) and assigns it the uploaded file using its file_id. This body is then sent via a POST request to the /v1/vector_stores endpoint, which is part of the Assistants API (V2). The request also includes the necessary headers, including the “OpenAI-Beta” header signaling we’re using beta functionality for assistants.

The server returns a vector_store_id—a unique identifier for the newly created vector store—which we’ll use to connect this document index to our assistant in the next step.

Output:

Vector store created. ID: vs_6848b4e4715c819191783d7c5cf6bec2

Why a Vector Store?

When you upload a document to OpenAI for use in the Assistant API, it doesn’t automatically become searchable in its raw form. Instead, it must be processed into chunks that can be embedded into a vector representation. This enables the assistant to perform semantic search—finding relevant passages based on meaning rather than keywords.

The vector store handles this preprocessing step. It:

  • Splits the document into manageable sections (chunks).
  • Converts each chunk into a numerical representation using an embedding model.
  • Stores and indexes these embeddings for fast semantic lookup.

Only after this step can your assistant use the document to answer questions accurately and retrieve supporting information. Without it, uploading the file alone would not allow the assistant to reference the document content.

Step 3: Create the Assistant

assistant_body <- list(
  name = "Document Assistant",
  instructions = "Assist users by referencing the knowledge base.",
  model = "gpt-4o",
  tools = list(list(type = "file_search")),
  tool_resources = list(file_search = list(vector_store_ids = list(vector_store_id)))
)
assistant_response <- POST(
  url = "https://api.openai.com/v1/assistants",
  add_headers(
    "Content-Type" = "application/json",
    "Authorization" = paste("Bearer", api_key),
    "OpenAI-Beta" = "assistants=v2"
  ),
  body = toJSON(assistant_body, auto_unbox = TRUE)
)
assistant_id <- content(assistant_response)$id
cat("Assistant created. ID:", assistant_id)

We define the assistant’s configuration using assistant_body. This includes:

  • name: A human-readable label (“Document Assistant”) for managing multiple assistants.
  • instructions: Guidance for how the assistant should behave—in this case, it’s instructed to reference a knowledge base (our PDF).
  • model: The specific LLM to use—in this case, gpt-4o.
  • tools: A list of capabilities we’re enabling for the assistant. Here, we specify “file_search”, which allows the assistant to look up information in documents linked through a vector store.
  • tool_resources: This is where we link the assistant to the vector store created earlier. The assistant uses the vector_store_id to access the indexed content.

The POST request sends this configuration to the /v1/assistants endpoint. The response includes a unique assistant_id, which we store for use in later steps.

Output:

Assistant created. ID: asst_JvuMAwVGy5qu9jbv9tr7vhuO

Why This Matters

Uploading the file and creating a vector store prepares the document for retrieval, but it’s not usable until we assign it to an assistant. This step connects the assistant to the vector index and activates its ability to:

  • Search across the embedded document content
  • Respond to specific user queries grounded in that content
  • Include citations and source snippets from the document in its replies

In short, this is the step where your assistant becomes “document-aware.”

Step 4: Start a Thread and Ask a Question

4.1 Create a new thread

thread_response <- POST(
  url = "https://api.openai.com/v1/threads",
  add_headers("Authorization" = paste("Bearer", api_key), "OpenAI-Beta" = "assistants=v2")
)
thread_id <- content(thread_response)$id

This creates a new thread, which acts like a conversation history container. Each time you talk to your assistant, you do so within a thread so it can maintain context across multiple exchanges. It is contained in thread_id, to which you can always refer to later.

4.2 Send a message to the thread

message_body <- list(
  role = "user",
  content = "Who are the authors of this article, published in the Journal of Marketing. Title: New Tools, New Rules..."
)

message_response <- POST(
  url = paste0("https://api.openai.com/v1/threads/", thread_id, "/messages"),
  add_headers(
    "Content-Type" = "application/json",
    "Authorization" = paste("Bearer", api_key),
    "OpenAI-Beta" = "assistants=v2"
  ),
  body = toJSON(message_body, auto_unbox = TRUE)
)

This sends the user’s message to the newly created thread. The role is “user” to indicate who authored the message.

4.3 Run the Assistant on the Thread

run_body <- list(assistant_id = assistant_id)
run_response <- POST(
  url = paste0("https://api.openai.com/v1/threads/", thread_id, "/runs"),
  add_headers(
    "Content-Type" = "application/json",
    "Authorization" = paste("Bearer", api_key),
    "OpenAI-Beta" = "assistants=v2"
  ),
  body = toJSON(run_body, auto_unbox = TRUE)
)

This triggers the assistant to process the entire thread (which now includes the user’s message) and respond. It uses the assistant configuration you previously set up, including access to the vector store.

Output:

Assistant's reply:
The authors of the article titled "New Tools, New Rules..." are Simon J. Blanchard, Nofar Duani, Aaron M. Garvey, Oded Netzer, and Travis Tae Oh.

Why This Matters

This is the heart of the Assistant API interaction. By using a thread, you create a persistent conversational context that the assistant can remember and build on. Each run is like telling the assistant, “Now go read the conversation so far, and give me your next reply.” This is how it mimics a human research assistant—one that doesn’t forget the previous question.

Step 5: Ask a Follow-Up Question

Now, let’s see what happens when we send a follow-up question in an ongoing assistant conversation.

followup_message_body <- list(
  role = "user",
  content = "What are the three types of measure validation they recommend for coding responses using GPT?"
)

message_response <- POST(
  url = paste0("https://api.openai.com/v1/threads/", thread_id, "/messages"),
  add_headers(
    "Content-Type" = "application/json",
    "Authorization" = paste("Bearer", api_key),
    "OpenAI-Beta" = "assistants=v2"
  ),
  body = toJSON(followup_message_body, auto_unbox = TRUE)
)

run_response <- POST(
  url = paste0("https://api.openai.com/v1/threads/", thread_id, "/runs"),
  add_headers(
    "Content-Type" = "application/json",
    "Authorization" = paste("Bearer", api_key),
    "OpenAI-Beta" = "assistants=v2"
  ),
  body = toJSON(run_body, auto_unbox = TRUE)
)

This appends a new message from the user to the same thread previously created. The assistant will now have access to both the original and this follow-up message when it generates a response.

Then, we trigger the assistant to re-read the full conversation in the thread (including the original message and the new follow-up) and produce an updated response using the same assistant configuration – and still reference our vector store.

Output:

Assistant's reply:
 The article recommends three types of measure validation for coding responses using GPT, tailored to the type of construct:

1. **Internal Constructs**: When capturing a participant's internal perspective, a two-stage strategy using holdout self-reports is recommended. Researchers should collect a training sample with both the data for GenAI-coded measurement and the relevant self-reports. The coding procedure is finalized through iterative development and then pre-registered for confirmation with a second independent sample【8:0†JMpaper.pdf】.

2. **Interpretative Constructs**: When the measure reflects how an independent observer interprets the participant’s response, the appropriate strategy is to gather holdout judge ratings. The GenAI coding procedure is developed and then frozen before a separate group of human judges independently evaluates the inputs. A comparison is then conducted between GenAI and human ratings【8:2†JMpaper.pdf】.

3. **Behavioral Outcomes**: For measures that predict an outcome that already exists or will soon become available, a holdout outcome validation strategy is recommended. A training and validation subset is created before integrating outcome data to develop the GenAI coding procedure, which is then applied to the validation subset【8:2†JMpaper.pdf】.

The citation format like 【8:2†JMpaper.pdf】  indicates that the assistant is referencing a specific chunk of our uploaded document (JMpaper.pdf) that it used to generate its response. These citations are automatically added when the assistant is configured with a vector store and the file_search tool. The numbers (8:2) point to an internal index of the retrieved text segment within the file. This system ensures transparency by letting us trace each part of the assistant’s answer back to its original source.

Why This Matters

By maintaining the same thread, the assistant can respond to multi-turn conversations. This structure allows it to follow the flow of our research questions, just like a real assistant would. It builds continuity, so the model doesn’t treat each request as an isolated prompt. Instead, it understands context and builds on earlier responses using the document we uploaded.

Step 6: Display the Full Conversation

Now, let’s check that the thread looks like by pulling the full conversation.

messages_data <- rev(messages)
for (msg in messages_data) {
  role <- msg$role
  parts <- msg$content
  for (part in parts) {
    if (!is.null(part$text$value)) {
      cat("\n---", toupper(role), "---\n")
      cat(part$text$value, "\n")
    }
  }
}

This block of R code is used to display the full assistant–user conversation in chronological order. Here’s what each part does:

  1. messages_data <- rev(messages): This reverses the list of messages so that the oldest message comes first. By default, the API may return messages in reverse chronological order.
  2. for (msg in messages_data) { … }: This loop iterates over each message in the list.
  3. Inside the loop:
    • role <- msg$role captures whether the message came from the “user” or the “assistant”.
    • parts <- msg$content extracts the message content, which is often broken into parts.
    • The inner loop goes through each part and checks if it includes actual text (part$text$value).
    • If so, it prints the role (in uppercase, like — USER —) and then the text of the message.

This code allows us to cleanly print the entire exchange, step-by-step, showing exactly what the user asked and how the assistant replied.

Output:

--- USER ---
Who are the authors of this article...

--- ASSISTANT ---
The authors are Simon J. Blanchard, Nofar Duani, Aaron M. Garvey, Oded Netzer, and Travis Tae Oh.

--- USER ---
What are the three types of measure validation...

--- ASSISTANT ---
1. Internal Constructs: holdout self-reports...
2. Interpretative Constructs: holdout judge ratings...
3. Behavioral Outcomes: outcome validation...


4. When to bother?

In many research scenarios, the sentence completion API is not just sufficient—it is ideal. Each call to the API is independent, self-contained, and easily reproducible. You define the prompt explicitly, the model returns a response, and that interaction is complete. This makes it especially useful for tasks like coding open-ended responses, generating examples, or summarizing short texts.

However, this simplicity also comes with limitations. The sentence completion endpoint cannot access uploaded documents, track earlier parts of a conversation, or maintain state between requests. Each interaction must include all necessary context, which can be cumbersome or even infeasible for longer, more complex workflows.

By contrast, the Assistant API infrastructure allows for persistent memorydocument retrieval, and multi-turn interactions through threads. This makes it better suited for tasks like exploring a research article, asking iterative questions, building document-grounded Q&A tools, or simulating a research assistant with knowledge of your materials.

Here’s a side-by-side comparison:

FeatureSentence Completion (/v1/chat/completions)Assistant API (V2)
Stateless✅ Each call is independent❌ Uses threads to track memory
Easy to reproduce✅ Prompt and settings fully define response⚠ Requires saving thread and assistant IDs
File access❌ Not supported ✅ Via vector stores and retrieval tools
Memory across turns❌ No built-in memory (must be done in R)✅ Threads retain full conversation history
Good for single prompts✅ Ideal for short tasks⚠ Requires more setup
Supports citations to files❌ Not available✅ References specific document chunks
Best for…Simple tasks, batch coding, isolated promptsInteractive Q&A, literature reviews, assistant-like workflows

In short, if your task is straightforward and prompt-driven, the classic completion endpoint may be all you need. But if you’re building workflows that require document access or memory, or if you’re treating GPT more as a research assistant than a tool, then moving beyond sentence completion is worth the additional complexity.

Leave a comment