Nalu Retreat — AI-Produced Testimonial Video
Nalu Retreat

AI-Produced Testimonial Video

AI Production Video Music Voiceover Sound Design
← Back to Portfolio

About This Project

No Premiere. No After Effects. No Pro Tools. Starting from nothing but client photos and online testimonials, our AI production system produced a broadcast-quality promotional video — original music, professional voiceover, simulated camera movements, and frame-accurate editing. One producer directing 17 specialized agents (from a system of 40+). A traditional workflow would require a team of 5–8.

This page is a transparent look at how it was built — every layer, every tool, every decision point between human and machine.

Timeline

1 week (production) · 5 weeks total (included building the system)

Deliverables

  • 4K master (3840×2160)
  • 1080p full version (1:38)
  • 1080p short cut (1:00)

Featured In

Vogue · Newsweek

17 AI Agents Used
5 Production Layers
24 Video Clips Generated
25 Beat Markers
1 Human Producer
0 Traditional Editing Tools

The 5-Layer Production Architecture

Our system has 40+ agents, but not every project needs all of them. For this video, 17 agents were activated across 5 layers — each one handling a specific job in the pipeline. Here's exactly how they worked together.

Layer 01 — 3 Agents

Strategic — Research & Brief

Before any content is generated, the system needs raw material and context. Three agents work in parallel to build the creative foundation:

  • Scraper Agent (Apify) — Collected real guest testimonials from multiple review platforms. No generic copy — real words from real visitors.
  • Research Agent (Perplexity) — Analyzed Nalu's market positioning, competitor landscape, and what makes the retreat unique. This informs tone and messaging.
  • Brief Generator — Synthesized the top 5 testimonials + research into a structured creative brief: target audience, emotional tone, visual direction, and script outline.
Layer 02 — 1 Agent

Coordination — The Orchestrator

One central agent manages the entire production pipeline:

  • Orchestrator — Routes tasks to specialist agents, manages dependencies (voiceover must finish before beat markers can be placed), handles parallel execution (video and music generate simultaneously), and tracks cost across the entire project.
Layer 03 — 8 Agents

Specialists — The Production Floor

This is where content gets made. Eight specialist agents, each handling one discipline — the same way a traditional production would have a DP, a composer, a sound designer, and an editor working in parallel:

  • Video Orchestrator — Analyzes the brief's visual direction, selects the right generation provider, and manages the video pipeline end-to-end.
  • Video Prompt Engineer — Translates each scene from the storyboard into optimized prompts for Kling 2.5. Camera angle, lighting, mood — all encoded in the prompt.
  • Music Orchestrator — Determines genre, tempo, instrumentation based on the brief. For Nalu: calm, continuous, 60–80 BPM, ambient/acoustic.
  • Music Prompt Engineer — Formats the musical direction into Suno V5-optimized tags and structure. Generated 2 variations of "Peaceful Awakening."
  • Music Generator (Suno V5) — Executes the composition. Best variation selected by the producer.
  • Script Humanizer — Takes the raw testimonial script and adds emotion tagging — pauses, emphasis, warmth markers — so the voiceover sounds human, not robotic.
  • Voiceover Agent (ElevenLabs) — Generates professional voiceover from the humanized script. Natural pacing, warm delivery.
  • Image Generator (Nano Banana) — Created the end screen and any signage elements needed for the final composition.

Result: 24 video clips from client photos (92% static shots, 8% simulated camera movement on aerials), 2 music variations, professional voiceover, and end screen graphics — all generated in parallel.

Layer 04 — 3 Agents

Verification — Quality Control

Nothing goes to the final cut without passing QC. Three agents review every generated asset:

  • Clip Analyzer (Gemini Flash) — Analyzed all 24 video clips individually. Each one gets a JSON report: visual description, location type, key elements, quality score.
  • Content Mapper (Gemini Pro) — Matched clip content to voiceover sections based on meaning, not sequence. "When she talks about the sauna, show the sauna" — but figured out automatically.
  • Misalignment Detector — Compared the assembled edit against the brief's intent. Flagged 3 misalignments across iterative cuts, each corrected before final delivery.
Layer 05 — 2 Agents

Assembly — Final Composition

All assets converge into the final video — no traditional editing software at any stage:

  • Beat-Marker Processor — Converts the producer's JSON marker file (25 markers across 89 seconds) into an edit decision list with frame-accurate cut points.
  • Composition Agent (FFmpeg) — Assembles video clips, voiceover, music, sound design, and graphics into the final timeline. 5+ iterative rough cuts refined through the verification loop until the producer signs off.

Final delivery: 4K master (3840×2160) + 1080p versions.

The Beat-Marker System

This is where human direction meets machine execution. The producer places markers over the audio track — defining exactly where each visual cut should land. The system generates a JSON file with frame-accurate timestamps. Agents handle the execution. Humans handle the creative decisions.

25 markers across 89 seconds of audio. Each marker is a creative decision — when to cut, when to hold, when to breathe. The agents execute with frame-level precision, but the rhythm comes from a human ear.

Editing Principles Derived from This Project

Tools & APIs

Every tool was selected for a specific job. No all-in-one platforms. No compromises.

Kling 2.5 Video generation — 24 clips from photos
Suno V5 Original music composition
ElevenLabs Professional voiceover
Nano Banana End screen & signage generation
Gemini 2.5 QC, clip analysis, content mapping
Apify Testimonial scraping
Perplexity Competitive research
FFmpeg Assembly, rendering, delivery

What the AI did: Research, generate, analyze, compose, render, verify, assemble.

What the human did: Creative brief, beat markers, quality judgment, client relationship, and the one thing no agent can do — know when it's done.

Scene Distribution

63% Interior Scenes
21% Aerial Exterior
8% Ground Exterior
3s Average Scene Length
Results
Featured in Vogue
Nalu highlighted in Vogue's Halifax travel guide as a must-visit destination
Newsweek Top Spas 2026
Named one of Newsweek's World's Most Extraordinary Spas
Zero Traditional Tools
No Premiere, After Effects, or Pro Tools used at any stage of production
1 Producer, 17 Agents
Traditional workflow would require a team of 5–8 people
← Back to Portfolio