🎤 Avatar Battles
Anonymous Rap Battles with AI-Powered Virtual Avatars
🎯 Concept Overview
Avatar Battle Mode lets users rap battle anonymously using virtual avatars. Your voice is transformed and your face is hidden behind a customizable 3D character that lip-syncs in real-time.
Why This Matters: Many aspiring rappers are shy about showing their face. Avatar battles lower the barrier to entry, encourage participation, and create viral-worthy content.
⚡ Technical Pipeline
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ User's │ │ Whisper │ │ Voice │ │ Avatar │
│ Voice │───▶│ (ASR) │───▶│ Transform │───▶│ Render │
│ Input │ │ 35ms │ │ 20ms │ │ 16ms │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
│ │
▼ ▼
┌─────────────┐ ┌─────────────┐
│ WebRTC │◀───│ Lip Sync │
│ Stream │ │ (52 blend) │
│ to Peer │ │ │
└─────────────┘ └─────────────┘
Latency Budget
| Stage | Target | Technology |
|---|---|---|
| Audio Capture | 5ms | Web Audio API |
| Voice Transform | 20ms | RVC/WASM |
| Lip Sync Inference | 15ms | MediaPipe/Rhubarb |
| Avatar Render | 16ms | Three.js @ 60fps |
| WebRTC Transmission | 50-100ms | Cloudflare Calls |
| Total E2E | ~150ms | Acceptable for battles |
🎭 Avatar Technology Options
| Technology | Type | Quality | Performance | License |
|---|---|---|---|---|
| TalkingHead 3D | 3D WebGL | Excellent | 60fps | MIT |
| ReadyPlayerMe | 3D Avatar | Great | 60fps | Free tier |
| MediaPipe Face | Tracking | High accuracy | <10ms | Apache 2.0 |
| MuseTalk | 2D Lip Sync | Photorealistic | 30fps | Research |
| Live2D | 2D Animation | Anime style | 60fps | Commercial |
MVP Recommendation: TalkingHead 3D + MediaPipe for real-time face tracking. Fully client-side, no server cost.
🔊 Voice Transformation
Users can optionally transform their voice for additional anonymity and creative expression.
| Effect | Technology | Latency | Quality |
|---|---|---|---|
| Pitch Shift | Web Audio API | <5ms | Good |
| Formant Shift | WASM DSP | 10ms | Better |
| RVC Clone | Server-side | 50-100ms | Excellent |
| Robot/Effects | Tone.js | <5ms | Stylized |
Voice Transform Code
// voice-transform.ts
class VoiceTransformer {
private pitchShifter: Tone.PitchShift;
private distortion: Tone.Distortion;
private reverb: Tone.Reverb;
constructor() {
this.pitchShifter = new Tone.PitchShift();
this.distortion = new Tone.Distortion(0);
this.reverb = new Tone.Reverb(0.5);
}
applyPreset(preset: 'deep' | 'high' | 'robot' | 'alien') {
switch (preset) {
case 'deep':
this.pitchShifter.pitch = -5;
break;
case 'high':
this.pitchShifter.pitch = 7;
break;
case 'robot':
this.pitchShifter.pitch = 0;
this.distortion.distortion = 0.3;
break;
case 'alien':
this.pitchShifter.pitch = 12;
this.reverb.decay = 3;
break;
}
}
}
👄 Lip Sync Pipeline
Real-time lip sync uses ARKit-compatible 52-blendshape system for realistic mouth movements.
Phoneme to Viseme Mapping
| Phoneme | Viseme | Blendshapes |
|---|---|---|
| AA, AH | Open | jawOpen: 0.7 |
| B, M, P | Closed | mouthClose: 1.0 |
| EE, IY | Wide | mouthSmile: 0.6 |
| OO, UW | Pucker | mouthPucker: 0.8 |
| F, V | Lip-tooth | mouthFunnel: 0.5 |
| TH | Tongue | tongueOut: 0.3 |
Lip Sync Implementation
// lip-sync.ts
import { FaceLandmarker } from '@mediapipe/tasks-vision';
class LipSyncEngine {
private faceLandmarker: FaceLandmarker;
private blendshapes: Map<string, number> = new Map();
async init() {
this.faceLandmarker = await FaceLandmarker.createFromOptions({
baseOptions: {
modelAssetPath: 'face_landmarker.task',
delegate: 'GPU'
},
outputFaceBlendshapes: true,
runningMode: 'VIDEO'
});
}
processFrame(video: HTMLVideoElement, timestamp: number) {
const results = this.faceLandmarker.detectForVideo(video, timestamp);
if (results.faceBlendshapes?.[0]) {
for (const shape of results.faceBlendshapes[0].categories) {
this.blendshapes.set(shape.categoryName, shape.score);
}
}
return this.blendshapes;
}
// Audio-only lip sync (no camera)
processAudio(audioLevel: number, frequency: number): Map<string, number> {
const jawOpen = Math.min(audioLevel * 2, 1);
const mouthSmile = frequency > 2000 ? 0.3 : 0;
return new Map([
['jawOpen', jawOpen],
['mouthSmileLeft', mouthSmile],
['mouthSmileRight', mouthSmile]
]);
}
}
🎮 Battle Flow
- Matchmaking: Join queue, get matched by skill rating
- Avatar Select: Choose/customize your avatar
- Beat Selection: Both players vote on instrumental
- Coin Flip: Random selection for who goes first
- Round 1: Player A raps (60 seconds)
- Round 2: Player B responds (60 seconds)
- Round 3: Player A rebuttal (30 seconds)
- Round 4: Player B rebuttal (30 seconds)
- Voting: Audience votes for winner
- Recording: Battle saved for replay/sharing
Battle State Machine
// battle-state.ts
type BattleState =
| 'idle'
| 'matchmaking'
| 'avatar_select'
| 'beat_select'
| 'countdown'
| 'round_active'
| 'round_transition'
| 'voting'
| 'results'
| 'complete';
interface Battle {
id: string;
state: BattleState;
players: [Player, Player];
currentRound: number;
rounds: Round[];
beat: Beat;
votes: Vote[];
recording: Recording | null;
}
interface Round {
player: 'A' | 'B';
duration: number; // seconds
audio: Blob | null;
transcript: string | null;
}
📡 WebRTC Architecture
Cloudflare Calls handles the WebRTC infrastructure for real-time streaming.
┌─────────────┐ ┌─────────────────────┐ ┌─────────────┐
│ Player A │◀───────▶│ Cloudflare Calls │◀───────▶│ Player B │
│ (WebRTC) │ │ (TURN/SFU) │ │ (WebRTC) │
└─────────────┘ └─────────────────────┘ └─────────────┘
│
▼
┌─────────────────────┐
│ Spectators (N) │
│ (WebRTC viewers) │
└─────────────────────┘
Cloudflare Calls Integration
// webrtc.ts
class BattleRTC {
private localStream: MediaStream | null = null;
private peerConnection: RTCPeerConnection | null = null;
async joinBattle(battleId: string, userId: string) {
// Get Cloudflare Calls session token
const response = await fetch('/api/battle/join', {
method: 'POST',
body: JSON.stringify({ battleId, userId })
});
const { iceServers, sessionId } = await response.json();
// Setup peer connection with Cloudflare TURN
this.peerConnection = new RTCPeerConnection({ iceServers });
// Get user media (audio only for voice)
this.localStream = await navigator.mediaDevices.getUserMedia({
audio: {
echoCancellation: true,
noiseSuppression: true,
autoGainControl: true
},
video: false // Avatar renders locally
});
// Add tracks to connection
this.localStream.getTracks().forEach(track => {
this.peerConnection!.addTrack(track, this.localStream!);
});
}
async startRecording(): Promise<MediaRecorder> {
const combinedStream = new MediaStream([
...this.localStream!.getAudioTracks(),
// Canvas capture for avatar
this.avatarCanvas.captureStream(30).getVideoTracks()[0]
]);
return new MediaRecorder(combinedStream, {
mimeType: 'video/webm;codecs=vp9,opus'
});
}
}
🏆 Ranking System
| Rank | ELO Range | Badge |
|---|---|---|
| Bronze | 0 - 999 | 🥉 |
| Silver | 1000 - 1499 | 🥈 |
| Gold | 1500 - 1999 | 🥇 |
| Platinum | 2000 - 2499 | 💎 |
| Diamond | 2500+ | 👑 |
ELO Calculation
// elo.ts
function calculateEloChange(
winnerElo: number,
loserElo: number,
kFactor: number = 32
): { winner: number; loser: number } {
const expectedWin = 1 / (1 + Math.pow(10, (loserElo - winnerElo) / 400));
const change = Math.round(kFactor * (1 - expectedWin));
return {
winner: winnerElo + change,
loser: Math.max(0, loserElo - change)
};
}
💰 Cost Estimates
| Component | Cost (10K battles/mo) |
|---|---|
| Cloudflare Calls (WebRTC) | $50 (100GB @ $0.05/GB) |
| Recording Storage (R2) | $15 (1TB @ $0.015/GB) |
| Whisper Transcription | $10 (Workers AI) |
| Avatar Assets (R2) | $5 |
| Total | ~$80/mo |
🚀 MVP Scope
Phase 1 (Post-MVP)
- 1v1 battles only
- 3 preset avatars
- Basic voice effects (pitch only)
- Simple ELO ranking
- Audio-only lip sync
Phase 2
- Avatar customization
- Camera-based lip sync
- RVC voice cloning
- Spectator mode
- Battle clips for sharing
Phase 3
- Tournament mode
- AI judges (Gemini analysis)
- Monetization (avatar skins)
- Leaderboards
- Battle highlights
Development Note: Avatar Battles is a post-MVP feature. Focus on core DAW first, then add battle mode in v1.1.