LattifAI SDK

Advanced forced alignment and subtitle generation powered by the Lattice-1 model.

Core Capabilities

Feature	Description	Status
Forced Alignment	Precise word-level and segment-level synchronization with audio	Production
Multi-Model Transcription	Gemini (100+ languages), Parakeet (24 languages), SenseVoice (5 languages)	Production
Speaker Diarization	Automatic multi-speaker identification with label preservation	Production
Audio Preprocessing	Multi-channel selection, device optimization (CPU/CUDA/MPS)	Production
Streaming Mode	Process audio up to 20 hours with minimal memory footprint	Production
Smart Text Processing	Intelligent sentence splitting and non-speech element separation	Production
Universal Format Support	20+ caption/subtitle formats (SRT, VTT, ASS, TTML, NLE formats)	Production
Configuration System	YAML-based configs for reproducible workflows	Production

Key Highlights

Accuracy: High-precision forced alignment with Lattice-1 model
Multilingual: Support for 100+ languages via multiple transcription models
Performance: Hardware-accelerated processing with streaming support
Flexible: CLI, Python SDK, and Web UI interfaces
Production-Ready: Battle-tested on diverse audio/video content

Quick Example

from lattifai import LattifAI

client = LattifAI()
caption = client.alignment(
    input_media="audio.wav",
    input_caption="subtitle.srt",
    output_caption_path="aligned.srt",
)

Introduction

LattifAI SDK

Core Capabilities

Key Highlights

Quick Example

Getting Started

Quick Start

Installation

Authentication

Resources

On this page