🎯 AI Philosophy

AI in FlowState should be optional, contextual, and friction-reducing. It's not about replacing the producer's creativityβ€”it's about removing barriers and accelerating workflows.

πŸ’‘
Key Principle: Use the cheapest model that works. Reserve premium Gemini for tasks that truly need it.

πŸ”€ Smart Inference Router

The router decides which model to use based on task complexity, latency requirements, and cost.

function routeInference(request: AIRequest): Provider {
  // TIER 1: Cloudflare Edge (realtime, simple)
  if (request.latency === 'realtime' && request.complexity === 'simple') {
    return 'cloudflare-workers-ai'; // FREE tier
  }

  // TIER 2: Free Models (background, creative)
  if (request.latency === 'background') {
    if (request.type === 'stem_separation') return 'htdemucs';
    if (request.type === 'music_gen') return 'stable-audio-open';
  }

  // TIER 3: Gemini (complex, audio understanding)
  if (request.audio_input || request.complexity === 'complex') {
    return 'gemini-3-flash';
  }

  return 'cloudflare-llama-4-scout'; // FREE, good default
}

πŸ“Š AI Capabilities Matrix

Feature Model Cost Latency Tier
Voice Commands Whisper (CF Workers AI) $0.0005/min 500ms Edge
Intent Classification Llama 4 Scout FREE 100ms Edge
Complex Queries Gemini 3 Flash $0.075-0.30/1M 200-2000ms Premium
TTS Response Chatterbox/MeloTTS FREE 75-150ms Free
Stem Separation HTDemucs FREE (self-host) 10-60s Background
Beat Generation Stable Audio Open $0.05/gen 30-60s Background
Sample Search BGE Embeddings $0.02/1M 50ms Edge
Audio Analysis Gemini 3 (audio) $0.15/min 1-5s Premium

⚑ Gemini 3 Flash Integration

Gemini 3 Flash (December 2025) is the core premium AI for complex tasks.

Key Capabilities

When to Use Gemini

Pricing

Thinking Level Input Output Use Case
None (fast) $0.075/1M $0.30/1M Quick answers
LOW $0.15/1M $0.60/1M Standard queries
MEDIUM $0.50/1M $2.00/1M Complex analysis
HIGH $3.50/1M $14.00/1M Deep reasoning

πŸŽ™οΈ Gemini 2.5 Flash Native Audio

For voice OUTPUT (avatar responses, assistant speech).

Features

Cost Comparison

Option Cost/min Quality Latency
Gemini Native Audio $0.10 Excellent 200-500ms
MeloTTS (Workers AI) $0.0002 Good 100-200ms
Chatterbox (self-host) FREE Excellent 75-150ms
ElevenLabs $0.30 Excellent 75-250ms
πŸ’‘
Recommendation: Use Chatterbox/MeloTTS for 95% of TTS needs. Reserve Gemini Native Audio for special avatar personalities.

☁️ Cloudflare Workers AI Models

Available Models

Model Use Case Free Tier
@cf/openai/whisper Speech-to-text 10K neurons/day
@cf/meta/llama-4-scout Intent classification, simple queries 10K neurons/day
@cf/baai/bge-base-en-v1.5 Text embeddings for search 10K neurons/day
@cf/stabilityai/stable-diffusion-xl Album art generation 10K neurons/day

Example: Whisper Integration

// workers/api/transcribe.ts
export async function transcribe(audio: ArrayBuffer, env: Env) {
  const result = await env.AI.run('@cf/openai/whisper', {
    audio: [...new Uint8Array(audio)]
  });

  return {
    text: result.text,
    language: result.detected_language,
    segments: result.segments
  };
}

🎀 Voice Command Pipeline

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Browser β”‚ β”‚ Workers β”‚ β”‚ Workers AI β”‚ β”‚ DAW β”‚ β”‚ Microphone │───▢│ API │───▢│ Whisper │───▢│ Action β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–² β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ └────────▢│ Llama 4 β”‚β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ Scout β”‚ β”‚ (intent) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Supported Commands (MVP)

Category Examples
Transport "play", "stop", "pause", "loop section"
Mixer "mute track 1", "solo drums", "turn up the bass"
Tempo "set BPM 90", "faster", "slower"
Samples "find a snare", "add kick to track 2"
Project "save project", "export", "new track"

πŸ’¬ AI Assistant Design

System Prompt Template

You are FlowState AI, a hip-hop production assistant.

Current project context:
- Tempo: ${project.tempo} BPM
- Key: ${project.key || 'not set'}
- Tracks: ${project.tracks.length}
- Current selection: ${selection}

You can help with:
- Production tips and techniques
- Sample recommendations
- Mixing advice
- Workflow optimization

Keep responses concise and actionable.
If the user asks you to do something in the DAW,
respond with a JSON action block.

Conversation Flow

  1. User types or speaks query
  2. Include current project context in system prompt
  3. Route to appropriate model (Llama for simple, Gemini for complex)
  4. Stream response to chat panel
  5. If action needed, execute DAW command
  6. Cache response in AI Gateway

πŸ’° Cost Optimization Strategies

1. AI Gateway Caching

Cache common queries to reduce API calls by 40-70%.

// AI Gateway automatically caches based on:
// - Query similarity
// - TTL settings
// - Cache rules

// Configure in Cloudflare dashboard:
// AI Gateway > flowstate > Caching > Enable

2. Smart Model Routing

Use Llama 4 Scout (FREE) for 80% of queries, Gemini for 20%.

3. Context Caching

Gemini offers 75% discount on repeated system prompts. Cache project context.

4. Batch Embeddings

Embed samples in batches during off-peak hours.

Monthly Cost Estimate (10K users)

Service Baseline Optimized
Workers AI $150 $100
Gemini API $400 $200
TTS $100 $0 (Chatterbox)
Total AI $650 $300