What are the strengths of this model?

Large parameter count Specialization in audio processing Open-source Apache 2.0 license

What are the weaknesses of this model?

Very large model file size Requires significant computational resources Context length is medium-scale

What are the best use cases?

Advanced speech recognition Audio data analysis Building voice-based AI systems

Back to Models

MistralAIOpen Source

Voxtral-Small-24B-2507

Name: Voxtral-Small-24B-2507
Author: MistralAI

Voxtral-Small-24B-2507 is a speech-specialized foundation model developed by MistralAI. It has a parameter scale of approximately 240B and supports a context window of 32K.

Parameters

240.0B

Context Window

32K

License

Apache 2.0

Release Date

2025-07-15

API Pricing

API pricing for this model is not yet available

Strengths

・Large parameter count
・Specialization in audio processing
・Open-source Apache 2.0 license

Weaknesses

・Very large model file size
・Requires significant computational resources
・Context length is medium-scale

Use Cases

・Advanced speech recognition
・Audio data analysis
・Building voice-based AI systems

Deep Analysis

Architecture

Multimodal Audio Chat (24B)

Based on Mistral Small 24B backbone

Context Window

32K tokens

Up to 40 min for understanding

Release Date

July 15, 2025

License

Apache 2.0

Modalities

Audio + Text

Speech understanding and transcription

Languages

8+ languages

Multilingual with auto-detection

Strengths

・Production-scale speech understanding
・Apache 2.0 open-source
・40 min audio understanding capability
・Function calling from voice
・Native multilingual support
・Retains text understanding of Mistral Small 3.1

Weaknesses

・Larger model requires more compute
・32K context window
・No vision modality

Competitor Comparison

Model	Arena	SWE	GPQA	Price
Voxtral Mini 3B	-	-	-	Lower
GPT-4o Audio	-	-	-	Higher
Google Gemini Audio	-	-	-	Comparable

Overview

Voxtral Small 24B is Mistral's production-scale open-source speech understanding model. Released July 2025 under Apache 2.0, it handles up to 40 minutes of audio for understanding tasks with built-in Q&A, summarization, and function calling from voice.

Benchmarks & Performance

Higher accuracy than Voxtral Mini on complex speech tasks. State-of-the-art multilingual transcription. Strong at audio reasoning and summarization.

Detailed Comparison

Offers production-grade speech intelligence at less than half the cost of comparable closed APIs. Combines transcription with semantic understanding.

Community Feedback

Available on HuggingFace and Mistral API. Part of Mistral's push into multimodal AI.

Use Cases

Production voice assistants, enterprise transcription, audio content analysis, voice-driven automation, and multilingual speech applications.

Latest News

Released July 15, 2025. Represents Mistral's expansion into multimodal audio space.

Sources

Analysis generated: 2026-05-24