Our offices

  • United States
    2332 Beach Avenue
    Venice, CA 90291
  • Singapore
    L39, Marina Bay Financial Centre Tower
    10 Marina Boulevard

Follow us

Avagen: How Per-User Model Fine-Tuning Makes AI Avatars Actually Look Like You

Avagen takes a different approach to AI avatars — instead of zero-shot embedding, it fine-tunes the entire model for every user. The result is photo and video avatars with a level of realism most tools can't match.

··9 min·
Avagen — AI avatar generation platform by Skytells
Per-user fine-tuning produces avatars with genuine likeness, not generic approximations

Avagen: How Per-User Model Fine-Tuning Makes AI Avatars Actually Look Like You

If you've tried any AI avatar tool in the last couple of years, you've probably noticed the pattern: upload a few photos, wait a few seconds, and get back images that look… almost right. The face shape is close, the hair color matches, but something is off. The skin looks smoothed into plastic. The eyes are slightly vacant. The overall feel is "stock photo of someone who resembles you" rather than "a photo of you."

That's what happens when avatar tools use zero-shot embedding — a technique where your face is encoded into a latent vector and injected into a pre-trained model at inference time. It's fast, but it treats your identity as a set of rough coordinates in a feature space. The model never actually learns you. It approximates.

Avagen works differently. It has been live since October 2024, and the approach that sets it apart is straightforward: instead of embedding your face into a frozen model, Avagen fine-tunes the entire generative model on your specific input images. The model adapts its weights — its understanding of light, texture, bone structure, expression patterns — to your individual features. The result is output where the likeness is genuine, not interpolated.

What That Means in Practice

When most avatar tools generate a "you in a business suit" image, they're combining a generic suit composition with an approximation of your face. The face is a plug-in. With Avagen's per-user fine-tuning, the model understands how your jawline catches shadow at a particular angle, how your forehead reflects overhead light, how your mouth sits when you're giving a neutral expression versus a natural smile. These aren't details a zero-shot approach captures — they emerge from letting the model spend time learning your specific features.

The same principle extends to video. Avagen generates video avatars through the TrueFusion Video family, and because the fine-tuned model carries your identity at the weight level, your avatar in motion retains the same fidelity as the still images. The movement feels like you — your mannerisms, your expressions — not like a generic character wearing your face as a mask.

This takes longer than zero-shot. There's no getting around that. But once your personal model is trained, generation is fast, and every output benefits from that initial investment in accuracy.

Plans and What They Include

Avagen is structured around how people actually use professional visual content, not around arbitrary feature bundling.

Starter — $29.99/month

Upload your selfies, and Avagen fine-tunes a model on your features, then generates polished, profile-ready photos for LinkedIn, company directories, social media, or anywhere else you need a professional-looking image.

You get 100 photo generations per month using TrueFusion Large, one-click export in multiple formats, and consistent quality across every output — because the model was trained specifically on your likeness.

For job seekers, freelancers, or anyone who needs updated professional photos without the scheduling overhead of a traditional shoot.

Advance — $39.99/month

This is where things get interesting for content creators. Avagen Advance adds video powered by TrueFusion Video: your AI persona comes alive with natural animation, movement, and expression. You can create two distinct AI personas and produce social media-ready video content from text descriptions.

Twenty-plus professional video styles are available — think "tech product demo," "casual vlog," "corporate announcement," "tutorial walkthrough." Each style adjusts lighting, composition, background, and pacing to match the context. For dubbing or multi-language avatar videos, LipFusion handles the lip-sync to keep mouth movements perfectly matched to the audio.

Fifty HD video clips per month with priority processing. For creators who post regularly, this replaces the part of the workflow where you set up a ring light, record eight takes, and spend an hour editing.

Pro — $59.99/month

The Pro tier is built for businesses and professionals who need the full range. Live personas for real-time interaction. Cinema-quality 4K video via TrueFusion Video Pro. Automatic translation into 30+ languages using LipFusion for lip-synced dubbing. Immersive 360° panoramic content.

Unlimited generations, access to the entire TrueFusion model suite including the Video family, early access to new features, and a dedicated account manager. Commercial usage rights are included on everything you generate.

This is the tier for marketing teams producing campaign assets, for e-learning companies creating instructor avatars, and for anyone who needs consistent, high-quality visual content at scale without a production crew.

Why Per-User Fine-Tuning Produces Better Results

The core difference comes down to how the model represents your identity.

With zero-shot embedding (the approach most avatar tools use), the model receives a compressed representation of your face at inference time and does its best to reconstruct it within a new scene. It's working from a summary. Details get lost — the specific way light interacts with your skin tone, the exact shape of your hairline, the asymmetry in your smile that makes your face look like your face.

With per-user fine-tuning, Avagen trains the model's weights on your actual images. The model doesn't receive a summary of your features — it internalizes them. When it generates a new image, your identity is part of the model's learned distribution, not an external input being shoehorned in.

This is why the output looks convincing in professional contexts. A LinkedIn headshot or a conference speaker card needs to look like a real photo of a real person — not an AI-generated face that's in the right ballpark. The fine-tuning approach gets the details right where embedding-based methods tend to average them out.

Where People Are Using It

The use cases we're seeing cluster into clear patterns:

Remote teams. Distributed companies where team members are spread across time zones and continents. Getting everyone to the same photographer isn't practical. Avagen gives every team member a consistent, professional look for the company website — same quality, same style, zero scheduling logistics.

Content marketing. Marketing teams producing blog posts, social media content, and ad creatives. Instead of sourcing stock photos or scheduling model shoots, they generate exactly the visuals they need. The images feature their actual brand representatives, not interchangeable stock photo people.

E-learning and training. Course creators who need an instructor presence on screen but don't want to re-record every time they update a lesson. Avagen's video capabilities turn script changes into new videos without touching a camera.

Personal branding. Consultants, speakers, and freelancers who need a library of professional images across different contexts — formal headshot, casual profile, speaking stage shot — without paying for multiple photo sessions.

Privacy and Your Data

This comes up in every conversation about AI-generated photos, and it should. Avagen processes your uploads securely. Your images are not shared with third parties. They're not used to train AI models without your explicit consent. And you can request deletion of your data at any time.

Your face is yours. The photos we generate from it are yours. The commercial rights are yours. There's no ambiguity.

How It Works

  1. Upload — provide a set of selfies or existing photos from your camera roll (the more variety in angle and lighting, the better the fine-tune)
  2. Fine-tune — Avagen trains a personalized model on your images, adapting the weights to your specific features
  3. Generate — choose a style, context, and format — headshot, video, portrait, panoramic — and the model generates output grounded in your trained likeness
  4. Export — download in the format and resolution you need, ready to use

The fine-tuning step takes a few minutes on first setup. After that, generating new images or videos from your personal model is fast.

Works Everywhere You Already Are

Avagen runs on iOS, Discord, and any modern web browser — Chrome, Firefox, Safari, Edge. No app to install, no system requirements beyond a stable internet connection. The processing happens on Skytells infrastructure, so your device specs don't matter.

For real-time streaming features in the Pro tier, a reasonably modern device with decent bandwidth helps, but even there, the heavy computation runs on our end.

Try It

Avagen has been live since October 2024 at skytells.ai/apps/avagen. The Starter plan is $29.99/month with no commitment.

If you want professional-looking avatars that actually capture your likeness — not a rough approximation — the per-user fine-tuning approach is worth trying. The output quality speaks for itself.

For developers interested in building avatar features into their own products, the same TrueFusion models powering Avagen — including the Video and Video Pro models — are accessible through the Skytells API and the Console.

Share this article

Eric

Eric

Content strategist at Skytells, focusing on AI technology and industry developments.

Last updated on

More Articles

Build a Video Generation SaaS in Minutes — Not Months

A step-by-step guide to shipping a video generation product using the Skytells SDK, TrueFusion Video, and third-party models like Sora 2. From npm install to production-ready SaaS.

Read more

The Skytells Console: Run 50+ AI Models Without Writing a Line of Code

The Skytells Console is an interactive browser-based playground that lets you generate images, video, audio, and text from 50+ AI models — no SDK, no terminal, no setup. Here's how to use it.

Read more

BeatFusion 2.0: How We Built an AI That Composes Full Songs from Lyrics

BeatFusion 2.0 generates complete, broadcast-ready songs — vocals, instrumentation, mixing — from a text prompt. Here's what changed, what it means for creators, and how the architecture works.

Read more