Explore
Integrating OpenAI into a Nuxt App: Patterns That Work

Integrating OpenAI into a Nuxt App: Patterns That Work

Adding AI to a product sounds exciting until you hit the edge cases. Here are the implementation patterns that work reliably in production Nuxt applications.

Integrating OpenAI into a Nuxt App: Patterns That Work

Calling the OpenAI API from a Nuxt application is straightforward until it isn't. Streaming responses, handling errors gracefully, managing costs, dealing with token limits — these are where most integrations run into trouble.

Here are the patterns that hold up in production.


The architecture: always call from the server

Never call the OpenAI API directly from the browser. Your API key would be exposed in the client bundle — readable by anyone who opens DevTools.

Nuxt's server routes (server/api/) are the right place:

// server/api/generate.post.ts
import OpenAI from 'openai'

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY
})

export default defineEventHandler(async (event) => {
  const { prompt } = await readBody(event)

  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [{ role: 'user', content: prompt }]
  })

  return { content: response.choices[0].message.content }
})

This keeps your API key server-side and gives you a single place to add auth checks, rate limiting, and logging.


Streaming responses

For anything longer than a sentence, streaming is essential. Without it, users stare at a blank screen until the full response arrives. With it, text appears progressively — the perceived speed is dramatically better.

// server/api/stream.post.ts
export default defineEventHandler(async (event) => {
  const { prompt } = await readBody(event)

  setHeader(event, 'Content-Type', 'text/event-stream')
  setHeader(event, 'Cache-Control', 'no-cache')
  setHeader(event, 'Connection', 'keep-alive')

  const stream = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [{ role: 'user', content: prompt }],
    stream: true
  })

  for await (const chunk of stream) {
    const content = chunk.choices[0]?.delta?.content
    if (content) {
      event.node.res.write(`data: ${JSON.stringify({ content })}\n\n`)
    }
  }

  event.node.res.end()
})

On the client, consume it with the EventSource API or a fetch with a ReadableStream:

<script setup>
const output = ref('')

async function generate(prompt: string) {
  const response = await fetch('/api/stream', {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ prompt })
  })

  const reader = response.body!.getReader()
  const decoder = new TextDecoder()

  while (true) {
    const { done, value } = await reader.read()
    if (done) break

    const lines = decoder.decode(value).split('\n')
    for (const line of lines) {
      if (line.startsWith('data: ')) {
        const data = JSON.parse(line.slice(6))
        output.value += data.content
      }
    }
  }
}
</script>

Handling token limits

Models have context windows. If your application allows free-form input or accumulates conversation history, you'll eventually hit the limit — and the error is not user-friendly.

Count tokens before you send:

import { encode } from 'gpt-tokenizer'

function countTokens(text: string): number {
  return encode(text).length
}

// Check before sending
if (countTokens(messages.map(m => m.content).join(' ')) > 100000) {
  throw createError({ statusCode: 400, message: 'Conversation too long' })
}

Truncate conversation history for long chats:

function trimMessages(messages: Message[], maxTokens: number): Message[] {
  // Always keep the system message and the last user message
  const systemMessage = messages[0]
  const recent = messages.slice(-10) // keep last 10 messages

  while (countTokens(recent) > maxTokens && recent.length > 1) {
    recent.splice(1, 1) // remove oldest non-system message
  }

  return [systemMessage, ...recent]
}

Error handling that doesn't expose internals

OpenAI errors come in several flavours: rate limits, insufficient quota, invalid API key, server errors. Each needs a different user-facing message.

try {
  const response = await openai.chat.completions.create(...)
} catch (error) {
  if (error instanceof OpenAI.APIError) {
    if (error.status === 429) {
      throw createError({ statusCode: 429, message: 'Too many requests. Please wait and try again.' })
    }
    if (error.status === 401) {
      // Log this internally — it's a config problem
      console.error('OpenAI auth error', error)
      throw createError({ statusCode: 500, message: 'Service configuration error.' })
    }
  }
  throw createError({ statusCode: 500, message: 'Something went wrong. Please try again.' })
}

Never surface the raw OpenAI error message to users — it can contain technical details that confuse them or reveal internals.


Cost control

Without limits, an enthusiastic user (or a bad actor) can run up significant OpenAI bills.

Limit per user:

  • Track API calls in your database per user per day
  • Reject requests over the limit with a clear message
  • Consider tiering: free users get 10 calls/day, paid users get 100

Limit output length:

const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages,
  max_tokens: 1000 // Cap output to control cost
})

Use the right model:

  • gpt-4o-mini costs ~50x less than gpt-4o for many use cases
  • Start with gpt-4o-mini, upgrade only when quality is measurably insufficient

Caching for repeated queries

If users often ask similar questions (documentation search, FAQs, categorisation tasks), caching the response is a significant cost reduction.

A simple approach: hash the prompt, store the response in Redis or your database with a TTL, check before calling the API.

For semantic similarity (where the same question phrased differently should return the same answer), embedding-based similarity search (with pgvector or a dedicated vector database) is the right tool — but that's a more complex setup worth considering at scale.


One piece of advice for new integrations

Start simpler than you think you need. The demo that wows in a product review often has no streaming, no error handling, and no rate limiting. That's fine for validation.

Before shipping to real users: add streaming, add error handling, add per-user limits. In that order of priority. Everything else — caching, semantic search, multi-model routing — comes later, when you've confirmed the feature is worth investing in.

Need help integrating AI into your product? Let's talk →