Integrating OpenAI into a Nuxt App: Patterns That Work
Calling the OpenAI API from a Nuxt application is straightforward until it isn't. Streaming responses, handling errors gracefully, managing costs, dealing with token limits — these are where most integrations run into trouble.
Here are the patterns that hold up in production.
The architecture: always call from the server
Never call the OpenAI API directly from the browser. Your API key would be exposed in the client bundle — readable by anyone who opens DevTools.
Nuxt's server routes (server/api/) are the right place:
// server/api/generate.post.ts
import OpenAI from 'openai'
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY
})
export default defineEventHandler(async (event) => {
const { prompt } = await readBody(event)
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: prompt }]
})
return { content: response.choices[0].message.content }
})
This keeps your API key server-side and gives you a single place to add auth checks, rate limiting, and logging.
Streaming responses
For anything longer than a sentence, streaming is essential. Without it, users stare at a blank screen until the full response arrives. With it, text appears progressively — the perceived speed is dramatically better.
// server/api/stream.post.ts
export default defineEventHandler(async (event) => {
const { prompt } = await readBody(event)
setHeader(event, 'Content-Type', 'text/event-stream')
setHeader(event, 'Cache-Control', 'no-cache')
setHeader(event, 'Connection', 'keep-alive')
const stream = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [{ role: 'user', content: prompt }],
stream: true
})
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content
if (content) {
event.node.res.write(`data: ${JSON.stringify({ content })}\n\n`)
}
}
event.node.res.end()
})
On the client, consume it with the EventSource API or a fetch with a ReadableStream:
<script setup>
const output = ref('')
async function generate(prompt: string) {
const response = await fetch('/api/stream', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ prompt })
})
const reader = response.body!.getReader()
const decoder = new TextDecoder()
while (true) {
const { done, value } = await reader.read()
if (done) break
const lines = decoder.decode(value).split('\n')
for (const line of lines) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6))
output.value += data.content
}
}
}
}
</script>
Handling token limits
Models have context windows. If your application allows free-form input or accumulates conversation history, you'll eventually hit the limit — and the error is not user-friendly.
Count tokens before you send:
import { encode } from 'gpt-tokenizer'
function countTokens(text: string): number {
return encode(text).length
}
// Check before sending
if (countTokens(messages.map(m => m.content).join(' ')) > 100000) {
throw createError({ statusCode: 400, message: 'Conversation too long' })
}
Truncate conversation history for long chats:
function trimMessages(messages: Message[], maxTokens: number): Message[] {
// Always keep the system message and the last user message
const systemMessage = messages[0]
const recent = messages.slice(-10) // keep last 10 messages
while (countTokens(recent) > maxTokens && recent.length > 1) {
recent.splice(1, 1) // remove oldest non-system message
}
return [systemMessage, ...recent]
}
Error handling that doesn't expose internals
OpenAI errors come in several flavours: rate limits, insufficient quota, invalid API key, server errors. Each needs a different user-facing message.
try {
const response = await openai.chat.completions.create(...)
} catch (error) {
if (error instanceof OpenAI.APIError) {
if (error.status === 429) {
throw createError({ statusCode: 429, message: 'Too many requests. Please wait and try again.' })
}
if (error.status === 401) {
// Log this internally — it's a config problem
console.error('OpenAI auth error', error)
throw createError({ statusCode: 500, message: 'Service configuration error.' })
}
}
throw createError({ statusCode: 500, message: 'Something went wrong. Please try again.' })
}
Never surface the raw OpenAI error message to users — it can contain technical details that confuse them or reveal internals.
Cost control
Without limits, an enthusiastic user (or a bad actor) can run up significant OpenAI bills.
Limit per user:
- Track API calls in your database per user per day
- Reject requests over the limit with a clear message
- Consider tiering: free users get 10 calls/day, paid users get 100
Limit output length:
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages,
max_tokens: 1000 // Cap output to control cost
})
Use the right model:
gpt-4o-minicosts ~50x less thangpt-4ofor many use cases- Start with
gpt-4o-mini, upgrade only when quality is measurably insufficient
Caching for repeated queries
If users often ask similar questions (documentation search, FAQs, categorisation tasks), caching the response is a significant cost reduction.
A simple approach: hash the prompt, store the response in Redis or your database with a TTL, check before calling the API.
For semantic similarity (where the same question phrased differently should return the same answer), embedding-based similarity search (with pgvector or a dedicated vector database) is the right tool — but that's a more complex setup worth considering at scale.
One piece of advice for new integrations
Start simpler than you think you need. The demo that wows in a product review often has no streaming, no error handling, and no rate limiting. That's fine for validation.
Before shipping to real users: add streaming, add error handling, add per-user limits. In that order of priority. Everything else — caching, semantic search, multi-model routing — comes later, when you've confirmed the feature is worth investing in.