Voxtral-Mini-3B-2507
Voxtral-Mini-3B-2507 is a speech-specialized foundation model developed by MistralAI. It has a parameter scale of 30.0B and supports a maximum context length of 32K.
Parameters
30.0B
Context Window
32K
License
Apache 2.0
Release Date
2025-07-15
API Pricing
API pricing for this model is not yet available
Strengths
- ・Specialized design for audio processing
- ・Wide 32K context length
- ・Open-source Apache 2.0 license
Weaknesses
- ・High computational needs vs. smaller models
- ・Performance gap with text-only models
- ・Significant memory consumption from model size
Use Cases
- ・Advanced audio data analysis
- ・Contextual understanding of long audio clips
- ・Open-source-based voice development
Deep Analysis
Architecture
Multimodal Audio Chat (3B)
Based on Mistral Small 3.1 backbone
Context Window
32K tokens
Up to 30 min transcription
Release Date
July 15, 2025
License
Apache 2.0
Modalities
Audio + Text
Speech understanding and transcription
Languages
8+ languages
EN, FR, DE, ES, IT, PT, NL, HI
Strengths
- ・Open-source speech understanding model
- ・Apache 2.0 license
- ・Multilingual with automatic language detection
- ・Function calling from voice input
- ・Lightweight 3B for edge deployment
- ・Cost-effective transcription
Weaknesses
- ・32K context limits long audio processing
- ・Smaller model may miss nuances
- ・No image/video modality
Competitor Comparison
| Model | Arena | SWE | GPQA | Price |
|---|---|---|---|---|
| Voxtral Small 24B | - | - | - | Higher |
| OpenAI Whisper | - | - | - | Comparable |
| GPT-4o Audio | - | - | - | Higher |
Voxtral Mini 3B is Mistral's lightweight open-source speech understanding model. Released July 2025 under Apache 2.0, it offers transcription, Q&A, summarization, and function calling from voice at less than half the price of comparable APIs.
Analysis generated: 2026-05-24