Voxtral-Small-24B-2507
Voxtral-Small-24B-2507 is a speech-specialized foundation model developed by MistralAI. It has a parameter scale of approximately 240B and supports a context window of 32K.
Parameters
240.0B
Context Window
32K
License
Apache 2.0
Release Date
2025-07-15
API Pricing
API pricing for this model is not yet available
Strengths
- ・Large parameter count
- ・Specialization in audio processing
- ・Open-source Apache 2.0 license
Weaknesses
- ・Very large model file size
- ・Requires significant computational resources
- ・Context length is medium-scale
Use Cases
- ・Advanced speech recognition
- ・Audio data analysis
- ・Building voice-based AI systems
Deep Analysis
Architecture
Multimodal Audio Chat (24B)
Based on Mistral Small 24B backbone
Context Window
32K tokens
Up to 40 min for understanding
Release Date
July 15, 2025
License
Apache 2.0
Modalities
Audio + Text
Speech understanding and transcription
Languages
8+ languages
Multilingual with auto-detection
Strengths
- ・Production-scale speech understanding
- ・Apache 2.0 open-source
- ・40 min audio understanding capability
- ・Function calling from voice
- ・Native multilingual support
- ・Retains text understanding of Mistral Small 3.1
Weaknesses
- ・Larger model requires more compute
- ・32K context window
- ・No vision modality
Competitor Comparison
| Model | Arena | SWE | GPQA | Price |
|---|---|---|---|---|
| Voxtral Mini 3B | - | - | - | Lower |
| GPT-4o Audio | - | - | - | Higher |
| Google Gemini Audio | - | - | - | Comparable |
Voxtral Small 24B is Mistral's production-scale open-source speech understanding model. Released July 2025 under Apache 2.0, it handles up to 40 minutes of audio for understanding tasks with built-in Q&A, summarization, and function calling from voice.
Analysis generated: 2026-05-24