Full voice pipeline (Whisper STT -> DeepSeek LLM -> OpenAI TTS), animated SVG avatar (Live2D-ready), girly-pop UI, lofi music, timer/notes/pets/wardrobe widgets, 10 background scenes with particle effects, Honcho cross-session memory.
3.9 KiB
AI Body Double — Project Scope Questions
1. Avatar Style
What visual direction for the character companion?
- Option A: Cute 2D illustrated character — Simple animated sprite (blink, sway, idle bounce, wave). Think a stylized flat-vector character. Fast to build, runs anywhere.
- Option B: Live2D / Vtuber-style — Full rigged character with lip-sync, gestures, head tracking. Looks incredible but requires custom art assets and is significantly more work.
- Option C: Pixel art character — Retro chibi sprite with simple animations. Cozy, low-fi aesthetic.
- Option D: No character art yet — Start with a clean UI dashboard, add the character later.
2. Voice (TTS)
What level of voice quality?
- Option A: Cloud API (ElevenLabs) — Best quality female voices, natural intonation, can do "girly-pop" vibe. ~$5/month.
- Option B: Local Piper TTS — Free, self-hosted, no recurring cost. Lower quality, more robotic, but plenty of female voice models available.
- Option C: OpenAI TTS — Good quality, pay-per-use (~$0.015/minute). Middle ground.
- Option D: No voice yet — Start text-only, add TTS later.
3. Platform
Where does this need to run?
- Browser-based (web app) — Accessible from any laptop/phone/tablet on the home network. Easiest to build and iterate.
- Desktop app (Electron/Tauri) — Native feel, offline capable. More work.
- Mobile app — Phone-native experience. Most work.
- Both — Browser for laptop, but also accessible on phone.
4. Start Scope
How should we slice the first working version?
-
Option A: MVP — Character + Lo-fi + Timer + Notes Build the visual companion, lo-fi music player, pomodoro/focus timer, and a notes widget. Get the "presence" and toolset right first. Add voice interactivity in phase 2.
-
Option B: Full build — Everything including voice Go straight for the full pipeline: STT (microphone input) -> LLM (processes what she says) -> TTS (talks back) + all tools. Longer first delivery but one complete system.
-
Option C: Somewhere in between Build the dashboard + character + tools, plus TTS only (so the assistant talks to her but doesn't listen yet). Add microphone input in phase 2.
5. LLM Backend (Brain)
What powers the assistant's responses and personality?
- Local (Ollama) — Self-hosted, free, private. You've got an AMD 6650 XT that can accelerate inference. Runs models like Llama 3, Mistral, Qwen.
- Cloud API — Better personality/instruction following. OpenAI, Anthropic, or similar. Small cost per month.
- Not needed at first — Start with scripted responses and canned encouragement, add AI conversational ability later.
6. Music & Audio
How should the lo-fi / white noise work?
- Streaming (YouTube/SoundCloud URLs) — Curate playlists of lo-fi study beats. Free, endless variety.
- Local audio files — Download lo-fi tracks and ambient sounds. Works offline.
- Generated — Use AI music generation for custom tracks. Experimental.
- Integrated web player — Embed something like Spotify or YouTube Music.
7. Virtual Pet
What kind of pet?
- Cat — Fits the cozy/girly aesthetic. Classic.
- Dog — Energetic companion.
- Fantasy creature — A cute blob/slime/fairy/dragon.
- Customizable — Let her pick, or unlock different ones.
8. Background Scenes
What environments should be available?
- Cozy bedroom / study
- Coffee shop / café
- Garden / nature
- Rainy window
- Starry night / space
- Underwater / aquarium
- Minimalist / clean
- Seasonal (winter cabin, spring garden, autumn library)
Project Name (optional)
What should we call this? Some ideas:
- Buddy (simple)
- CozyFocus / CoPilot (functional)
- Luna / Mochi / Coco (character name)
- Something else?
Once you answer these, I'll write up the full architecture plan and start building.