🎯 AI Philosophy

AI in FlowState should be optional, contextual, and friction-reducing. It's not about replacing the producer's creativityβ€”it's about removing barriers and accelerating workflows.

πŸ’‘
Key Principle: Use the cheapest model that works. Reserve premium Gemini for tasks that truly need it.

πŸ”€ Smart Inference Router

The router decides which model to use based on task complexity, latency requirements, and cost.

function routeInference(request: AIRequest): Provider {
  // TIER 1: Cloudflare Edge (realtime, simple)
  if (request.latency === 'realtime' && request.complexity === 'simple') {
    return 'cloudflare-workers-ai'; // FREE tier
  }

  // TIER 2: Free Models (background, creative)
  if (request.latency === 'background') {
    if (request.type === 'stem_separation') return 'htdemucs';
    if (request.type === 'music_gen') return 'stable-audio-open';
  }

  // TIER 3: Gemini (complex, audio understanding)
  if (request.audio_input || request.complexity === 'complex') {
    return 'gemini-3-flash';
  }

  return 'cloudflare-llama-4-scout'; // FREE, good default
}

πŸ“Š AI Capabilities Matrix (v0.1.99 - Implemented)

Feature Model Cost Latency Status
πŸ†“ Pattern AI Magenta.js (browser) FREE 1-3s βœ… Live
🎡 Music Generation ACE-Step (with lyrics) ~$0.05/gen 15-25s βœ… Live
πŸ’₯ SFX Generation Stable Audio Open ~$0.02/gen 5-15s βœ… Live
🎀 Voice Clone OpenVoice v2 ~$0.05/gen 5-10s βœ… Live
πŸŽ™οΈ Text-to-Speech Chatterbox TTS ~$0.03/gen 2-5s βœ… Live
πŸ”€ Stem Separation Meta Demucs ~$0.02/gen 10-30s βœ… Live
πŸ”¬ Audio Analysis Essentia.js (browser) FREE 100-500ms βœ… Live
πŸ—£οΈ Voice Commands Whisper (CF Workers AI) $0.0005/min 500ms βœ… Live
πŸ” Sample Search CLAP Embeddings FREE (D1/Vectorize) 50ms βœ… Live
πŸ’¬ AI Chat Gemini 3 Flash $0.075-0.30/1M 200-2000ms βœ… Live
🎹 AI Mastering Custom pipeline ~$0.10/gen 30-60s βœ… Live

πŸš€ AI Studio Architecture (v0.1.99)

The AI Studio is built with a tiered architecture - FREE features run entirely in the browser, PRO features call cloud APIs:

FREE Tier (Magenta.js + Essentia.js)

PRO Tier (Cloud APIs)

⚑ Gemini 3 Flash Integration

Gemini 3 Flash (December 2025) is the core premium AI for complex tasks.

Key Capabilities

When to Use Gemini

Pricing

Thinking Level Input Output Use Case
None (fast) $0.075/1M $0.30/1M Quick answers
LOW $0.15/1M $0.60/1M Standard queries
MEDIUM $0.50/1M $2.00/1M Complex analysis
HIGH $3.50/1M $14.00/1M Deep reasoning

πŸŽ™οΈ Gemini 2.5 Flash Native Audio

For voice OUTPUT (avatar responses, assistant speech).

Features

Cost Comparison

Option Cost/min Quality Latency
Gemini Native Audio $0.10 Excellent 200-500ms
MeloTTS (Workers AI) $0.0002 Good 100-200ms
Chatterbox (self-host) FREE Excellent 75-150ms
ElevenLabs $0.30 Excellent 75-250ms
πŸ’‘
Recommendation: Use Chatterbox/MeloTTS for 95% of TTS needs. Reserve Gemini Native Audio for special avatar personalities.

☁️ Cloudflare Workers AI Models

Available Models

Model Use Case Free Tier
@cf/openai/whisper Speech-to-text 10K neurons/day
@cf/meta/llama-4-scout Intent classification, simple queries 10K neurons/day
@cf/baai/bge-base-en-v1.5 Text embeddings for search 10K neurons/day
@cf/stabilityai/stable-diffusion-xl Album art generation 10K neurons/day

Example: Whisper Integration

// workers/api/transcribe.ts
export async function transcribe(audio: ArrayBuffer, env: Env) {
  const result = await env.AI.run('@cf/openai/whisper', {
    audio: [...new Uint8Array(audio)]
  });

  return {
    text: result.text,
    language: result.detected_language,
    segments: result.segments
  };
}

🎀 Voice Command Pipeline

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Browser β”‚ β”‚ Workers β”‚ β”‚ Workers AI β”‚ β”‚ DAW β”‚ β”‚ Microphone │───▢│ API │───▢│ Whisper │───▢│ Action β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–² β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ └────────▢│ Llama 4 β”‚β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ Scout β”‚ β”‚ (intent) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Supported Commands (MVP)

Category Examples
Transport "play", "stop", "pause", "loop section"
Mixer "mute track 1", "solo drums", "turn up the bass"
Tempo "set BPM 90", "faster", "slower"
Samples "find a snare", "add kick to track 2"
Project "save project", "export", "new track"

πŸ’¬ AI Assistant Design

System Prompt Template

You are FlowState AI, a hip-hop production assistant.

Current project context:
- Tempo: ${project.tempo} BPM
- Key: ${project.key || 'not set'}
- Tracks: ${project.tracks.length}
- Current selection: ${selection}

You can help with:
- Production tips and techniques
- Sample recommendations
- Mixing advice
- Workflow optimization

Keep responses concise and actionable.
If the user asks you to do something in the DAW,
respond with a JSON action block.

Conversation Flow

  1. User types or speaks query
  2. Include current project context in system prompt
  3. Route to appropriate model (Llama for simple, Gemini for complex)
  4. Stream response to chat panel
  5. If action needed, execute DAW command
  6. Cache response in AI Gateway

πŸ’° Cost Optimization Strategies

1. AI Gateway Caching

Cache common queries to reduce API calls by 40-70%.

// AI Gateway automatically caches based on:
// - Query similarity
// - TTL settings
// - Cache rules

// Configure in Cloudflare dashboard:
// AI Gateway > flowstate > Caching > Enable

2. Smart Model Routing

Use Llama 4 Scout (FREE) for 80% of queries, Gemini for 20%.

3. Context Caching

Gemini offers 75% discount on repeated system prompts. Cache project context.

4. Batch Embeddings

Embed samples in batches during off-peak hours.

Monthly Cost Estimate (10K users)

Service Baseline Optimized
Workers AI $150 $100
Gemini API $400 $200
TTS $100 $0 (Chatterbox)
Total AI $650 $300