FREE AI TikTok Videos on Your Laptop (No Cloud, No Subscription)

Create AI Videos in Minutes: The ComfyUI + Flux + LTX-Video Tutorial

Here’s what you can make after following this guide:

Example Input → Output:

Text Prompt Generated Image
“Vsco, Authentic share, amateur selfie in a car, swedish 19 year old woman, black crop top, curtain bangs hairstyle, no makeup, tiktok, talking, grainy, bad lighting, realistic”
Motion Prompt Final Video
“Vertical phone selfie. A young woman sits casually in the driver’s seat, softly smiling at the camera. She gently tilts her head, briefly looks down with a shy expression, then lifts her eyes back up, her smile widening naturally into a playful, slightly bashful grin. The handheld camera moves lightly, giving a spontaneous and genuine TikTok feel—real-life footage.”

Can your computer run this?

  • Windows + NVIDIA GPU (8GB+ VRAM): ✓
  • Mac M1/M2/M3 (16GB+ RAM): ✓
  • Windows + AMD GPU: ✗
  • Mac Intel: ✗

Time needed: ~45 minutes setup, then 5-10 minutes per video

  1. Download ComfyUI Desktop

  2. Download these models

  3. Put models in correct folders

    • models/unet/RedCraft_RealReveal5_ULTRA_15Steps_fp8_pruned.safetensors
    • models/checkpoints/ltxv-2b-0.9.6-distilled-04-25.safetensors
    • models/text_encoders/t5xxl_fp8_e4m3fn.safetensors
    • models/clip/clip_l.safetensors
    • models/vae/vae.safetensors (rename from diffusion_pytorch_model.safetensors)
  4. Run the workflow

    • □ Download Workflow file
    • □ Open ComfyUI and load the workflow
    • □ Write your text prompt for the image
    • □ Write your motion prompt for the video
    • □ Click “Queue” and wait for your video

This Workflow Can:

  • Create short videos (5-6 seconds) at 24 FPS
  • Add subtle, realistic motion to still images
  • Create camera movements like pans, tilts, and zooms
  • Add environmental effects like wind in hair or leaves moving
  • Generate videos faster than real-time on high-end hardware

This Workflow Can’t:

  • Create complex actions or movements
  • Generate multiple scenes or scene transitions
  • Make people run, dance, or perform complex activities
  • Create Hollywood-quality special effects
  • Produce videos with perfect frame-to-frame consistency

Download the appropriate version for your system:

Run the installer and follow the prompts. When asked about GPU selection:

  • On Windows: Choose “NVIDIA GPU”
  • On Mac: Choose “MPS” (Metal Performance Shaders)

You need five files for the complete workflow:

Model Purpose Size Download Link Save To
RedCraft RealReveal5 ULTRA Image generation ~11GB Download models/unet/
LTX Video model Video generation ~6GB Download models/checkpoints/
T5 XXL text encoder Text understanding 4.89 GB Download models/text_encoders/
CLIP text encoder Text understanding 246 MB Download models/clip/
VAE Image encoding 168 MB Download models/vae/vae.safetensors

To find your models folder:

  1. Open ComfyUI
  2. Click the three dots in the top-right corner
  3. Select “Open Models Folder”
Open Models Folder

Create the necessary subfolders if they don’t exist, and place each file in its correct location. For the VAE, rename diffusion_pytorch_model.safetensors to vae.safetensors.

The easiest way to start is with a complete workflow that combines image and video generation:

  1. Download the combined workflow file

  2. In ComfyUI, click “Workflow” -> “Open” and select the downloaded workflow file

  3. If you see missing nodes errors:

    • Click “Manager” → “Install Missing Nodes”
    • Wait for installation to complete
    • Restart ComfyUI
  4. Configure your prompts:

    • In the “Flux Prompt” node, enter your image description
    • In the “LTX Motion Prompt” node, describe the movement you want
  5. Click “Queue” to run the workflow

  6. Find your video in the output folder beside your ComfyUI models folder

For Image Generation (Flux):

Describe your subject clearly and specifically. Include details about:
- Who/what is in the image
- Style (realistic, cartoon, painting, etc.)
- Lighting and environment
- Clothing and appearance details
- Quality indicators (high quality, detailed, etc.)

Example: "Vsco, Authentic share, amateur selfie in a car, swedish 19 year old woman, black crop top, curtain bangs hairstyle, no makeup, tiktok, talking, grainy, bad lighting, realistic"

For Video Generation (LTX):

Describe the motion you want, including:
- Starting position/pose
- Any subject movements (subtle head turns, smiles, etc.)
- Camera movements (pans, zooms, etc.)
- Environmental effects (wind in hair, etc.)
- Overall feel (handheld, cinematic, etc.)

Example: "Vertical phone selfie. A young woman sits casually in the driver's seat, softly smiling at the camera. She gently tilts her head, briefly looks down with a shy expression, then lifts her eyes back up, her smile widening naturally into a playful, slightly bashful grin. The handheld camera moves lightly, giving a spontaneous and genuine TikTok feel—real-life footage."

Problem: Error message about “Float8_e4m3fn dtype not supported on MPS”

Solution:

  1. You need to install the FP16 version of the T5 text encoder instead of the FP8 version

  2. Download the FP16 version from Model Manager

  3. Refresh ComfyUI

  4. Select the FP16 version in “DualCLIPLoader” node

Click to see visual guide

Step 1: Right-click on the T5 XXL node and select “Open in Editor”

Step 2: Change the model name to t5xxl_fp16.safetensors

Problem: Red error text mentioning missing models or “Model not found”

Solution:

  1. Check that your files are in the exact paths listed in Section 2
  2. Ensure filenames match exactly (case-sensitive)
  3. Restart ComfyUI after adding models
  4. If using a workflow, make sure model selections match your filenames

Problem: “CUDA out of memory” or other memory errors

Solution:

  1. Reduce image resolution (try 512x768 instead of higher)
  2. Reduce video frames (65 frames = ~2.7 seconds at 24 FPS)
  3. Close other applications
  4. On Windows, use the --lowvram flag when starting ComfyUI
  5. On Mac, be patient - the first run compiles optimizations

Problem: Generated video shows only black frames

Solution:

  1. Check that the T5 text encoder is installed correctly
  2. Make sure your motion prompt isn’t empty
  3. Try a simpler motion description
  4. Generate a new image and try again

Problem: The generated video shows flickering or motion inconsistencies

Solution:

  1. Use simpler camera movements (“gentle pan” instead of complex movements)
  2. Add “consistent lighting, consistent appearance” to your motion prompt
  3. Reduce the CFG Scale value in the LTX node (try 5-7 instead of higher)
  4. Generate a longer video and trim the first/last few frames

Once you’re comfortable with the basic workflow, try these improvements:

Start with simple camera movements that work well:

  • “Camera slowly pans from left to right”
  • “Gentle zoom in on the subject’s face”
  • “Slight handheld camera motion for realism”

Avoid complex movements like “camera circles around subject” which often cause artifacts.

The most reliable subject motions are:

  • Subtle facial expressions (smiles, blinks)
  • Slight head turns
  • Hair movement
  • Environmental effects (leaves rustling, water rippling)

Avoid asking for walking, hand gestures, or complex body movements.

For more flexibility, try these workflow variations:

  1. Image-only workflow - Just generate the image
  2. Video-from-existing-image workflow - Use your own images

If you’re interested in the technical details, here’s a simplified explanation:

  1. Text → Image (Flux)

    • Your text prompt is processed by text encoders (CLIP and T5)
    • The Flux model transforms random noise into an image matching your description
    • Each “step” refines the image from noise to a clear picture
  2. Image → Video (LTX-Video)

    • Your motion prompt describes how things should move
    • LTX uses the initial image and creates new frames showing motion
    • The frames are combined into a smooth video
  • Text Encoders: Convert your text into a format AI can understand
  • Diffusion Models: Generate images by removing “noise” step by step
  • VAE: Compresses images into a format the AI can work with
  • Samplers: Control how accurately (but slowly) the AI follows your prompt

For reference, here’s the complete folder structure you should have:

models/
├── checkpoints/
│   └── ltxv-2b-0.9.6-distilled-04-25.safetensors
├── unet/
│   └── RedCraft_RealReveal5_ULTRA_15Steps_fp8_pruned.safetensors
├── text_encoders/
│   └── t5xxl_fp8_e4m3fn.safetensors
├── clip/
│   └── clip_l.safetensors
└── vae/
    └── vae.safetensors

Each of these files plays a specific role in the image→video generation process.

For clarity, here are the main workflow interfaces you’ll interact with:

Combined Text→Image→Video Workflow: Flux + LTX Combined Workflow

Image Generation Workflow: Flux Workflow Screenshot

Video Generation Workflow: LTX-Video Workflow Screenshot

Use these as visual references when setting up your workflow.