이 모델의 강점은 무엇인가요?

대규모 파라미터 수 오디오 처리 특화 오픈소스 Apache 2.0 라이선스

이 모델의 약점은 무엇인가요?

매우 큰 모델 파일 크기 상당한 연산 자원 요구 중간 규모의 컨텍스트 길이

어떤 용도에 가장 적합한가요?

고급 음성 인식 오디오 데이터 분석 음성 기반 AI 시스템 구축

모델 목록으로

MistralAI오픈소스

Voxtral-Small-24B-2507

Name: Voxtral-Small-24B-2507
Author: MistralAI

Voxtral-Small-24B-2507은 미스트랄AI가 개발한 음성 특화 기초 모델입니다. 약 240B의 파라미터 규모를 가지며, 32K의 컨텍스트 윈도우를 지원합니다.

파라미터

240.0B

컨텍스트

32K

라이선스

Apache 2.0

출시일

2025-07-15

API 가격

이 모델의 API 가격 정보는 현재 공개되지 않았습니다

강점

・대규모 파라미터 수
・오디오 처리 특화
・오픈소스 Apache 2.0 라이선스

약점

・매우 큰 모델 파일 크기
・상당한 연산 자원 요구
・중간 규모의 컨텍스트 길이

활용 사례

・고급 음성 인식
・오디오 데이터 분석
・음성 기반 AI 시스템 구축

심층 분석

Architecture

Multimodal Audio Chat (24B)

Based on Mistral Small 24B backbone

Context Window

32K tokens

Up to 40 min for understanding

Release Date

July 15, 2025

License

Apache 2.0

Modalities

Audio + Text

Speech understanding and transcription

Languages

8+ languages

Multilingual with auto-detection

강점

・Production-scale speech understanding
・Apache 2.0 open-source
・40 min audio understanding capability
・Function calling from voice
・Native multilingual support
・Retains text understanding of Mistral Small 3.1

약점

・Larger model requires more compute
・32K context window
・No vision modality

경쟁사 비교

Model	Arena	SWE	GPQA	Price
Voxtral Mini 3B	-	-	-	Lower
GPT-4o Audio	-	-	-	Higher
Google Gemini Audio	-	-	-	Comparable

개요

Voxtral Small 24B is Mistral's production-scale open-source speech understanding model. Released July 2025 under Apache 2.0, it handles up to 40 minutes of audio for understanding tasks with built-in Q&A, summarization, and function calling from voice.

벤치마크 및 성능

Higher accuracy than Voxtral Mini on complex speech tasks. State-of-the-art multilingual transcription. Strong at audio reasoning and summarization.

상세 비교

Offers production-grade speech intelligence at less than half the cost of comparable closed APIs. Combines transcription with semantic understanding.

커뮤니티 평가

Available on HuggingFace and Mistral API. Part of Mistral's push into multimodal AI.

활용 사례

Production voice assistants, enterprise transcription, audio content analysis, voice-driven automation, and multilingual speech applications.