Blog
Alibaba Releases Qwen3.6-27B as Open Source: Surpassing Previous Generation Flagships in Code Agent Capabilities
Alibaba has open-sourced its new "Qwen3.6-27B" model. This model utilizes the only dense architecture in the series and achieves performance that exceeds previous generation flagship models, particularly in code agent capabilities.
Alibaba Open-Sources "Qwen3.6-35B-A3B": Agent Performance Greatly Improved with 3B Active Parameters
Alibaba has released "Qwen3.6-35B-A3B," the first open-weight model of the Qwen3.6 series. By employing MoE, it maintains low costs while significantly improving agent coding capabilities, achieving performance comparable to previous generation flagship models.
Alibaba Open-Sources "Qwen3-Coder-Next": 80B MoE with Only 3B Active Parameters, Specialized for Agentic Coding
Alibaba has released "Qwen3-Coder-Next," an efficient coding model with 80B total parameters that activates only 3B during inference. It features a design specialized for "Agentic Coding," focusing on autonomous correction cycles rather than simple code generation.
Alibaba Open-Sources "Qwen3-TTS" Large Language Model for Speech Synthesis — Five High-Performance Lightweight Models Released
Alibaba has released its first open-source speech synthesis model, "Qwen3-TTS." Despite their lightweight sizes ranging from 0.6B to 1.7B, they achieve high performance comparable to commercial models like GPT-4o-Audio and can run on mobile devices.
StepFun Open-Sources Ultra-Fast MoE Model "Step-3.5-Flash": Achieving an Astounding Speed of up to 350 tokens/s with 11B Effective Parameters
StepFun has released "Step-3.5-Flash," a Sparse MoE model with 11B effective parameters. It achieves an astounding inference speed of up to 350 tokens/s while maintaining high performance comparable to competing models such as Kimi K2.5.
Ultra-Fast AI Guardrail "GLiGuard" Arrives: 300M Parameter Model Achieves Performance Comparable to Giant Models
The small-scale "GLiGuard" model with 300 million parameters has been introduced. Thanks to its encoder-based design, it operates up to 16 times faster than conventional large decoder models while achieving equal or superior safety determination accuracy.
Jointly Developed by Sakana AI and NVIDIA: What is "TwELL," Using Unstructured Sparsity to Speed Up LLM Inference and Training by 20%?
Sakana AI and NVIDIA announced "TwELL," a new format that efficiently processes unstructured sparsity in the FFN layers of LLMs. It achieves a speed increase of over 20% for both inference and training on H100 GPUs.
Comprehensive Summary of GPT-5.2 Benchmarks: Thorough Verification of Coding and Reasoning Performance
This post provides a detailed analysis of all benchmark results for GPT-5.2, released by OpenAI in April 2026. We explain its performance on major benchmarks including HLE, SWE-bench Verified, and FrontierMath.
The Great Price War: A May 2026 Deep Dive into AI Model API Costs and Strategic Selection
As of May 2026, the LLM API market is characterized by intense price wars and performance leaps. This deep dive analyzes the pricing structures of major models like GPT-5.2 Pro, Claude Opus 4.7, and Gemini 3.0 Flash, exploring the factors driving costs down and providing a strategic framework for developers to select the right model based on their specific use case and budget.
Claude Opus 4.7 Unveiled: Inside the Mythos Architecture and Managed Agents
Anthropic's Claude Opus 4.7 introduces the Mythos architecture for efficient long-context processing and Managed Agents for autonomous task completion, setting new benchmarks in AI capabilities. With top scores in coding and reasoning tasks, it promises cost-effective solutions for complex workflows.