이 모델의 강점은 무엇인가요?

Japan's first reasoning-specialized model Strong in math and logical reasoning Visualizes thinking process in Japanese Stable performance based on Qwen-32B

이 모델의 약점은 무엇인가요?

Inference takes time Context length is 32K Not suitable for standard text generation Commercial use requires license confirmation

어떤 용도에 가장 적합한가요?

Mathematical reasoning tasks Problem-solving requiring logical thinking Complex analysis in Japanese Use in the education field

모델 목록으로

ELYZA조건부 오픈

ELYZA-Thinking-1.0-Qwen-32B

Name: ELYZA-Thinking-1.0-Qwen-32B
Price: 80 JPY
Author: ELYZA

Japan's first inference-specialized model developed by ELYZA. It adopts a "Chain-of-Thought" approach similar to OpenAI's o1/o3 series and is specialized for complex inference tasks.

파라미터

32B

컨텍스트

32K

라이선스

ELYZA License

출시일

2026-01-15

일본어 처리 능력

🇯🇵Native JP

Model developed by a Japanese company or specialized for Japanese. Highest Japanese understanding and generation capability.

API 가격

입력 가격 (1M 토큰당)

¥80

출력 가격 (1M 토큰당)

¥320

과금 모드: standard

강점

・Japan's first reasoning-specialized model
・Strong in math and logical reasoning
・Visualizes thinking process in Japanese
・Stable performance based on Qwen-32B

약점

・Inference takes time
・Context length is 32K
・Not suitable for standard text generation
・Commercial use requires license confirmation

활용 사례

・Mathematical reasoning tasks
・Problem-solving requiring logical thinking
・Complex analysis in Japanese
・Use in the education field

심층 분석

Parameters

32B

lightweight open-weight model

Context Window

128K tokens

131072 tokens

MATH-500 (English)

80.8%

vs o1-mini: 80.0%

MATH-500 (Japanese)

78.6%

vs o1-mini: 77.2%

JMMLU_small

73.1%

Japanese knowledge benchmark

License

Apache 2.0

commercial use allowed

강점

・Strong mathematical reasoning in both Japanese and English, surpassing o1-mini on key benchmarks.
・Lightweight (32B parameters) yet competitive with much larger reasoning models.
・Fully open-source with permissive Apache 2.0 license for commercial use.

약점

・Coding performance (JHumanEval) regressed compared to its base model and lags behind competitors.
・Reasoning-focused training slightly reduced performance on some general Japanese language tasks.
・Requires substantial VRAM (~66GB) for full precision inference, limiting accessibility.

경쟁사 비교

Model	Arena	SWE	GPQA	Price
OpenAI o1-mini	N/A	N/A	N/A	API-only (premium)
DeepSeek-R1-Distill-Qwen-32B	N/A	N/A	N/A	Open-source
QwQ-32B	N/A	N/A	N/A	Open-source

개요

ELYZA-Thinking-1.0-Qwen-32B is Japan's first specialized reasoning model, developed by ELYZA. It uses a Chain-of-Thought (CoT) approach, similar to OpenAI's o1 series, to tackle complex logical and mathematical problems. The model is built upon Alibaba's Qwen2.5-32B-Instruct and was fine-tuned on approximately 150,000 high-quality synthetic datasets generated using an innovative Monte Carlo Tree Search (MCTS)-based algorithm for optimal reasoning path exploration. This process enables the 32-billion parameter model to achieve performance comparable to OpenAI's o1-mini on key reasoning benchmarks, while remaining open-weight under the permissive Apache 2.0 license. A key innovation is the dual-model approach: alongside the primary reasoning model, ELYZA released "Shortcut Models" (32B and 7B variants) trained on the same problem sets but without the lengthy reasoning process. The Shortcut model achieves performance comparable to GPT-4o on general tasks, demonstrating how complex reasoning capabilities developed during training can be distilled into faster, direct-response models. This work highlights a growing trend in AI development where heavy computational costs are shifted from inference to the development phase to create powerful yet efficient models. While excelling in mathematical and logical reasoning, the model shows a trade-off: its coding capabilities slightly regressed compared to its base model, indicating the specialized training data may have lacked sufficient coding tasks. Nevertheless, it represents a significant milestone for Japanese-language AI, providing a powerful, commercially-viable open-source reasoning model that advances the state of the art for specialized inference tasks.

벤치마크 및 성능

The model demonstrates strong performance in specialized reasoning tasks, particularly in mathematics. According to ELYZA's technical blog, it achieves 80.8% on the English MATH-500 benchmark, slightly outperforming OpenAI's o1-mini (80.0%). Its Japanese mathematical reasoning is also superior, scoring 78.6% on the translated MATH-500 benchmark versus o1-mini's 77.2%. For Japanese-specific knowledge, it scores 73.1% on JMMLU_small, matching o1-mini. Its dialogue capability, measured by Japanese MT-Bench, scores 7.67/10, again on par with o1-mini. The model's instruction-following performance on ELYZA Tasks 100 yields a 4.17/5 score. However, its performance is not uniform across all domains. On the JHumanEval coding benchmark (Japanese), it scores 62.2%, which is comparable to o1-mini but lower than the 63.4% achieved by the DeepSeek-R1-Distill-Qwen-32B model. More notably, this score represents a regression from the base Qwen2.5-32B-Instruct model, attributed to insufficient coding tasks in the training dataset. | Benchmark (Language) | ELYZA-Thinking-1.0 | OpenAI o1-mini | DeepSeek-R1-Distill | QwQ-32B | |-----------------------|---------------------|----------------|----------------------|----------| | MATH-500 (English) | **80.8%** | 80.0% | 79.0% | 78.0% | | MATH-500 (Japanese) | **78.6%** | 77.2% | 77.8% | 76.8% | | JHumanEval (Japanese) | 62.2% | 62.2% | **63.4%** | 61.0% | | JMMLU_small (Japanese) | **73.1%** | 73.1% | 70.7% | 72.3% | | Japanese MT-Bench | 7.67 | 7.67 | 7.67 | 7.67 | | ELYZA Tasks 100 | 4.17 | 4.17 | 4.17 | 4.17 |

상세 비교

Head-to-head with its main competitors: 1. **vs. OpenAI o1-mini**: ELYZA-Thinking-1.0-Qwen-32B matches or slightly surpasses o1-mini on mathematical and Japanese-language reasoning benchmarks. The critical advantage is that it is a fully open-weight model under Apache 2.0, allowing for local deployment, modification, and commercial use without API fees, unlike o1-mini which is proprietary and API-only. 2. **vs. DeepSeek-R1-Distill-Qwen-32B**: Both are open-source reasoning models based on Qwen2.5-32B. ELYZA shows superior performance in both Japanese and English mathematics (80.8% vs. 79.0% on MATH-500 English). DeepSeek's model, however, maintains a slight edge in coding (63.4% vs. 62.2% on JHumanEval) and was used as a component in ELYZA's final model merge to boost English performance. 3. **vs. QwQ-32B**: ELYZA demonstrates consistently higher scores across all mathematical and knowledge benchmarks. For instance, on MATH-500 English, ELYZA scores 80.8% versus QwQ's 78.0%, indicating stronger overall reasoning capability. **Key Differentiator**: ELYZA's unique development process using MCTS for synthetic data generation and its release of the complementary "Shortcut Model" (optimized for fast, direct responses) set it apart. The Shortcut model alone is claimed to be competitive with GPT-4o on general tasks, offering a two-model ecosystem from a single development pipeline.

커뮤니티 평가

The model's release was covered by major Japanese tech publications like ZDNET Japan and noted in AI model directories (LLM Explorer). The community reception highlights appreciation for Japan's contribution to open-source reasoning models and the innovative use of MCTS for high-quality synthetic data generation. Developers note the permissive license as a major advantage for commercial applications in Japanese markets. However, practical adoption may be tempered by the high VRAM requirement (65.8 GB for full precision) and the model's specialized nature. It is seen as a valuable tool for research and specific enterprise applications (e.g., complex Japanese mathematical or logical problem-solving) rather than a general-purpose chatbot. The release of the more efficient "Shortcut Model" alongside it demonstrates awareness of the need for different performance/cost trade-offs, which has been positively noted.

활용 사례

1. **Complex Mathematical Problem-Solving in Japanese**: Ideal for educational technology platforms, academic research, or fintech applications requiring rigorous, step-by-step logical reasoning in Japanese. For example, solving competition-level math problems or generating detailed proofs. 2. **Specialized Logical Analysis in Enterprise Workflows**: Can be integrated into internal tools for legal contract analysis, financial report auditing, or technical documentation review where deep, multi-step reasoning in Japanese is required to derive conclusions from complex information. 3. **Research & Development in Reasoning AI**: The open-weight model and detailed technical blog make it an excellent resource for researchers studying Chain-of-Thought training, MCTS-based data synthesis, and the distillation of reasoning capabilities into "Shortcut" models. 4. **Choose this model over alternatives when:** Your primary task involves complex reasoning or mathematics in Japanese, and you require a commercially-licensed, self-hostable model. Opt for the accompanying Shortcut model if you need fast, direct responses for general Japanese language tasks without the overhead of a lengthy reasoning process.