Back to Blog
AI Agent

Alibaba Unveils Qwen3.7-Max: Autonomous AI Agent Achieves 10x Speedup in 35-Hour Kernel Optimization

On May 20, 2026, Alibaba officially launched Qwen3.7-Max at the Alibaba Cloud Summit.


Overview

Qwen3.7-Max is Alibaba's latest flagship model, deeply optimized for agent scenarios. It delivers top-tier performance in programming, reasoning, office automation, and long-duration task execution, achieving comprehensive capabilities on par with leading international models like GPT, Claude, and Gemini.

Qwen3.7-Max Arena AI Rankings

Qwen3.7-Max Subcategory Rankings


Key Benchmark Results

Programming Agent

BenchmarkQwen3.7-MaxDeepSeek-v4-Pro MaxClaude Opus 4.7 Max
TerminalBench 2.0-Terminus69.767.965.4
SWE-Multilingual78.3--
SWE-Pro60.6--

General Agent

BenchmarkQwen3.7-MaxClaude Opus 4.6GLM 5.1
MCP-Atlas76.475.8-
MCP-Mark60.8-57.5
SpreadSheetBench-v187.0--

Reasoning Ability

BenchmarkQwen3.7-MaxClaude Opus 4.6
GPQA Diamond92.491.3
HLE41.440.0

35-Hour Autonomous Kernel Optimization Experiment

The most notable achievement of Qwen3.7-Max is its fully autonomous hardware optimization task lasting 35 hours.

Alibaba tasked Qwen3.7-Max with optimizing inference kernels on an unseen chip (T-Head ZhenWu M890) not present in training data. Without human intervention, the model worked continuously for 35 hours, ultimately improving the performance of Triton operators by 10x.

Experiment Details

  • Chip: T-Head ZhenWu M890 (no training data exposure)
  • Work Duration: 35 hours continuous
  • Tool Invocation Count: 1,158 times
  • Kernel Evaluation Count: 432 times
  • Final Result: 10x performance improvement over official reference implementation

Comparison with Other Models

ModelGeometric Mean Speedup
Qwen3.7-Max10.0x
GLM 5.17.3x
Kimi K2.65.0x
DeepSeek V4 Pro3.3x (interrupted midway)

Artificial Analysis Ranking

According to the latest ranking from third-party evaluator Artificial Analysis:

  • Overall Score: 56.6 points
  • Global Ranking: 5th place
  • Domestic Models: 1st place
  • Progress from Previous Generation: +4.8 points

The top positions include models like GPT-5.4(xhigh), Gemini 3.1 Pro Preview, and Claude-Opus4.7(max).


Release Pace

The Qwen series maintains a rapid iteration pace:

Qwen3.7-Max Release Timeline

DateModelTheme
March 20, 2026Qwen3.5-Max-PreviewToward Native Multimodal Agents
April 20, 2026Qwen3.6-Max-PreviewToward Real-World Agents
May 20, 2026Qwen3.7-MaxNew Benchmark for the Agent Era

Each month, a flagship model is released, consistently pushing the performance ceiling for domestic models.


Conclusion

Qwen3.7-Max represents a new-generation flagship model specialized in agent capabilities, achieving top-tier performance across programming, reasoning, and office automation. Particularly, the 35-hour autonomous kernel optimization experiment marks a significant milestone in demonstrating AI models' long-term autonomous operational abilities.

Comments (0)

Share:XHatena

Post a Comment

Loading...