🎯 Concept Overview

Avatar Battle Mode lets users rap battle anonymously using virtual avatars. Your voice is transformed and your face is hidden behind a customizable 3D character that lip-syncs in real-time.

🔥
Why This Matters: Many aspiring rappers are shy about showing their face. Avatar battles lower the barrier to entry, encourage participation, and create viral-worthy content.

⚡ Technical Pipeline

┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │ User's │ │ Whisper │ │ Voice │ │ Avatar │ │ Voice │───▶│ (ASR) │───▶│ Transform │───▶│ Render │ │ Input │ │ 35ms │ │ 20ms │ │ 16ms │ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ │ │ ▼ ▼ ┌─────────────┐ ┌─────────────┐ │ WebRTC │◀───│ Lip Sync │ │ Stream │ │ (52 blend) │ │ to Peer │ │ │ └─────────────┘ └─────────────┘

Latency Budget

Stage Target Technology
Audio Capture 5ms Web Audio API
Voice Transform 20ms RVC/WASM
Lip Sync Inference 15ms MediaPipe/Rhubarb
Avatar Render 16ms Three.js @ 60fps
WebRTC Transmission 50-100ms Cloudflare Calls
Total E2E ~150ms Acceptable for battles

🎭 Avatar Technology Options

Technology Type Quality Performance License
TalkingHead 3D 3D WebGL Excellent 60fps MIT
ReadyPlayerMe 3D Avatar Great 60fps Free tier
MediaPipe Face Tracking High accuracy <10ms Apache 2.0
MuseTalk 2D Lip Sync Photorealistic 30fps Research
Live2D 2D Animation Anime style 60fps Commercial
💡
MVP Recommendation: TalkingHead 3D + MediaPipe for real-time face tracking. Fully client-side, no server cost.

🔊 Voice Transformation

Users can optionally transform their voice for additional anonymity and creative expression.

Effect Technology Latency Quality
Pitch Shift Web Audio API <5ms Good
Formant Shift WASM DSP 10ms Better
RVC Clone Server-side 50-100ms Excellent
Robot/Effects Tone.js <5ms Stylized

Voice Transform Code

// voice-transform.ts
class VoiceTransformer {
  private pitchShifter: Tone.PitchShift;
  private distortion: Tone.Distortion;
  private reverb: Tone.Reverb;

  constructor() {
    this.pitchShifter = new Tone.PitchShift();
    this.distortion = new Tone.Distortion(0);
    this.reverb = new Tone.Reverb(0.5);
  }

  applyPreset(preset: 'deep' | 'high' | 'robot' | 'alien') {
    switch (preset) {
      case 'deep':
        this.pitchShifter.pitch = -5;
        break;
      case 'high':
        this.pitchShifter.pitch = 7;
        break;
      case 'robot':
        this.pitchShifter.pitch = 0;
        this.distortion.distortion = 0.3;
        break;
      case 'alien':
        this.pitchShifter.pitch = 12;
        this.reverb.decay = 3;
        break;
    }
  }
}
👄 Lip Sync Pipeline

Real-time lip sync uses ARKit-compatible 52-blendshape system for realistic mouth movements.

Phoneme to Viseme Mapping

Phoneme Viseme Blendshapes
AA, AH Open jawOpen: 0.7
B, M, P Closed mouthClose: 1.0
EE, IY Wide mouthSmile: 0.6
OO, UW Pucker mouthPucker: 0.8
F, V Lip-tooth mouthFunnel: 0.5
TH Tongue tongueOut: 0.3

Lip Sync Implementation

// lip-sync.ts
import { FaceLandmarker } from '@mediapipe/tasks-vision';

class LipSyncEngine {
  private faceLandmarker: FaceLandmarker;
  private blendshapes: Map<string, number> = new Map();

  async init() {
    this.faceLandmarker = await FaceLandmarker.createFromOptions({
      baseOptions: {
        modelAssetPath: 'face_landmarker.task',
        delegate: 'GPU'
      },
      outputFaceBlendshapes: true,
      runningMode: 'VIDEO'
    });
  }

  processFrame(video: HTMLVideoElement, timestamp: number) {
    const results = this.faceLandmarker.detectForVideo(video, timestamp);

    if (results.faceBlendshapes?.[0]) {
      for (const shape of results.faceBlendshapes[0].categories) {
        this.blendshapes.set(shape.categoryName, shape.score);
      }
    }

    return this.blendshapes;
  }

  // Audio-only lip sync (no camera)
  processAudio(audioLevel: number, frequency: number): Map<string, number> {
    const jawOpen = Math.min(audioLevel * 2, 1);
    const mouthSmile = frequency > 2000 ? 0.3 : 0;

    return new Map([
      ['jawOpen', jawOpen],
      ['mouthSmileLeft', mouthSmile],
      ['mouthSmileRight', mouthSmile]
    ]);
  }
}

🎮 Battle Flow

  1. Matchmaking: Join queue, get matched by skill rating
  2. Avatar Select: Choose/customize your avatar
  3. Beat Selection: Both players vote on instrumental
  4. Coin Flip: Random selection for who goes first
  5. Round 1: Player A raps (60 seconds)
  6. Round 2: Player B responds (60 seconds)
  7. Round 3: Player A rebuttal (30 seconds)
  8. Round 4: Player B rebuttal (30 seconds)
  9. Voting: Audience votes for winner
  10. Recording: Battle saved for replay/sharing

Battle State Machine

// battle-state.ts
type BattleState =
  | 'idle'
  | 'matchmaking'
  | 'avatar_select'
  | 'beat_select'
  | 'countdown'
  | 'round_active'
  | 'round_transition'
  | 'voting'
  | 'results'
  | 'complete';

interface Battle {
  id: string;
  state: BattleState;
  players: [Player, Player];
  currentRound: number;
  rounds: Round[];
  beat: Beat;
  votes: Vote[];
  recording: Recording | null;
}

interface Round {
  player: 'A' | 'B';
  duration: number;  // seconds
  audio: Blob | null;
  transcript: string | null;
}

📡 WebRTC Architecture

Cloudflare Calls handles the WebRTC infrastructure for real-time streaming.

┌─────────────┐ ┌─────────────────────┐ ┌─────────────┐ │ Player A │◀───────▶│ Cloudflare Calls │◀───────▶│ Player B │ │ (WebRTC) │ │ (TURN/SFU) │ │ (WebRTC) │ └─────────────┘ └─────────────────────┘ └─────────────┘ │ ▼ ┌─────────────────────┐ │ Spectators (N) │ │ (WebRTC viewers) │ └─────────────────────┘

Cloudflare Calls Integration

// webrtc.ts
class BattleRTC {
  private localStream: MediaStream | null = null;
  private peerConnection: RTCPeerConnection | null = null;

  async joinBattle(battleId: string, userId: string) {
    // Get Cloudflare Calls session token
    const response = await fetch('/api/battle/join', {
      method: 'POST',
      body: JSON.stringify({ battleId, userId })
    });
    const { iceServers, sessionId } = await response.json();

    // Setup peer connection with Cloudflare TURN
    this.peerConnection = new RTCPeerConnection({ iceServers });

    // Get user media (audio only for voice)
    this.localStream = await navigator.mediaDevices.getUserMedia({
      audio: {
        echoCancellation: true,
        noiseSuppression: true,
        autoGainControl: true
      },
      video: false  // Avatar renders locally
    });

    // Add tracks to connection
    this.localStream.getTracks().forEach(track => {
      this.peerConnection!.addTrack(track, this.localStream!);
    });
  }

  async startRecording(): Promise<MediaRecorder> {
    const combinedStream = new MediaStream([
      ...this.localStream!.getAudioTracks(),
      // Canvas capture for avatar
      this.avatarCanvas.captureStream(30).getVideoTracks()[0]
    ]);

    return new MediaRecorder(combinedStream, {
      mimeType: 'video/webm;codecs=vp9,opus'
    });
  }
}
🏆 Ranking System
Rank ELO Range Badge
Bronze 0 - 999 🥉
Silver 1000 - 1499 🥈
Gold 1500 - 1999 🥇
Platinum 2000 - 2499 💎
Diamond 2500+ 👑

ELO Calculation

// elo.ts
function calculateEloChange(
  winnerElo: number,
  loserElo: number,
  kFactor: number = 32
): { winner: number; loser: number } {
  const expectedWin = 1 / (1 + Math.pow(10, (loserElo - winnerElo) / 400));
  const change = Math.round(kFactor * (1 - expectedWin));

  return {
    winner: winnerElo + change,
    loser: Math.max(0, loserElo - change)
  };
}
💰 Cost Estimates
Component Cost (10K battles/mo)
Cloudflare Calls (WebRTC) $50 (100GB @ $0.05/GB)
Recording Storage (R2) $15 (1TB @ $0.015/GB)
Whisper Transcription $10 (Workers AI)
Avatar Assets (R2) $5
Total ~$80/mo
🚀 MVP Scope

Phase 1 (Post-MVP)

  • 1v1 battles only
  • 3 preset avatars
  • Basic voice effects (pitch only)
  • Simple ELO ranking
  • Audio-only lip sync

Phase 2

  • Avatar customization
  • Camera-based lip sync
  • RVC voice cloning
  • Spectator mode
  • Battle clips for sharing

Phase 3

  • Tournament mode
  • AI judges (Gemini analysis)
  • Monetization (avatar skins)
  • Leaderboards
  • Battle highlights
⚠️
Development Note: Avatar Battles is a post-MVP feature. Focus on core DAW first, then add battle mode in v1.1.