A curated, living guide to the best open-source models across capability domains. Updated from live research on public leaderboards, benchmarks, and Hugging Face releases. Focused on models that are free, production-ready, and genuinely useful for vertical domain projects.

精选各能力域最佳开源模型指南——动态维护,基于公开排行榜、benchmark 和 Hugging Face 发布实时研究。聚焦于免费、可生产部署、对垂直领域项目真正有用的模型。

Last verified / 最后核验: 2026-04-19 Methodology / 方法: Cross-referenced BenchLM.ai, Onyx open-weight leaderboard, HF Open ASR Leaderboard, MTEB, SWE-bench Verified / Pro, Seed-TTS Eval, Artificial Analysis, bentoml.com guides. See Sources at bottom.


1. Frontier Reasoning / 前沿推理

For agentic workflows, multi-step planning, and general-purpose high-quality reasoning.

Model Provider Params License Best For Highlights (2026-04)
GLM-5.1 Zhipu AI MoE (undisclosed active) Open weight Agentic engineering, long-horizon SWE #1 on BenchLM.ai open-weight leaderboard (84 overall) — purpose-built for agent loops
GLM-5 (Reasoning) Zhipu AI MoE Open weight Multi-step reasoning Leaderboard top at 85 overall; reasoning-tuned variant
Qwen3.5-397B-A17B Alibaba 397B MoE · 17B active Apache 2.0 Multimodal reasoning, ultra-long ctx Flagship MoE, 81 on leaderboard; 119 languages
DeepSeek-V3.2 DeepSeek 685B MoE MIT General reasoning + tool use Very cheap inference (~$0.28/M tokens on hosted APIs); 73.1% SWE-bench
DeepSeek-V3.2-Speciale DeepSeek 685B MoE MIT Hard math/competition reasoning Surpasses GPT-5 and reaches Gemini-3.0-Pro-level on AIME / HMMT 2025
Llama 4 Maverick Meta 400B MoE · 17B active Llama 4 License General purpose + 1M ctx Strong all-rounder with cheap per-token cost (17B active)
MiniMax-M2.7 MiniMax Large MoE Open weight Agentic workflows Actively refines its own agent system — agent-native design
Gemma 4 26B Google 26B dense Gemma License Consumer-hardware reasoning Only Western model in top 5; 85 t/s on M3 Max / RTX 4090

2. Coding / 代码

Bug fixing, feature implementation, SWE-bench. Ranked by SWE-bench Verified.

Model Provider Params License SWE-bench Highlights
Kimi K2.5 Moonshot AI MoE MIT* 76.8% #1 open-source on SWE-bench Verified (Jan 2026 release). *Commercial restriction at 100M+ MAU
GLM-5.1 Zhipu AI MoE Open weight leads SWE-bench Pro Best on SWE-bench Pro + Terminal Bench (full agentic SWE)
GLM-4.7 Zhipu AI Open weight 73.8% Runs on consumer hardware (single RTX 4090 quantized); 94.2 HumanEval
DeepSeek-V3.2 DeepSeek 685B MoE MIT 73.1% More consistent than GLM-4.7 across direct comparisons
Qwen3-Coder-480B-A35B Alibaba 480B MoE · 35B active Apache 2.0 70.6% Qwen's flagship coder; strong on repo-scale edits
Qwen3-Coder-30B-A3B Alibaba 30B MoE · 3B active Apache 2.0 ~65% Best self-hostable coder — runs on 24GB VRAM
Qwen2.5-Coder-32B Alibaba 32B dense Apache 2.0 ~56% Mature option, largest LoRA ecosystem

3. Math & Reasoning / 数学与推理

Competition math (AIME, HMMT), chain-of-thought, self-correction.

Model Provider Params License Best For Highlights
Step-3.5-Flash StepFun 196B Open weight AIME/HMMT AIME 2025: 97.3 — tied for highest on full leaderboard at modest size
GLM-4.7 (Reasoning) Zhipu AI Open weight Competition math AIME 2025: 97.3; strong open-weight competitor
DeepSeek-V3.2-Speciale DeepSeek 685B MoE MIT Proof-style math GPT-5-level on AIME/HMMT; HMMT-specialized variant
DeepSeek-R1 DeepSeek 671B MoE MIT o1-level reasoning Classic R1 still competitive; long thinking chains
Qwen3 235B (Thinking) Alibaba 235B dense Apache 2.0 All-round reasoning AIME 2025: 92.3; /think mode
QwQ-32B Alibaba 32B dense Apache 2.0 Self-hostable reasoning Rivals DeepSeek-R1 / o1-mini at 32B; ~48GB for BF16
DeepSeek-Math-V2 DeepSeek 671B MoE MIT Formal math, proofs Proof generation, Lean integration
GLM-Z1-9B Zhipu 9B Open weight Lightweight math Runs on single RTX 3090; punches well above weight

4. Multimodal / Vision / 多模态与视觉

Vision-language models (VLMs): image understanding, document / chart parsing, visual agents.

Model Provider Params License Best For Highlights
Qwen3-VL-235B-A22B Alibaba 235B MoE · 22B active Apache 2.0 Frontier VLM Rivals Gemini-2.5-Pro / GPT-5 on multimodal benchmarks
GLM-4.6V Zhipu AI Open weight Visual agent, tool use End-to-end visual tool use, 128K context
GLM-4.5V Zhipu AI Open weight Everyday VLM Smaller but strong agentic abilities
InternVL3-78B Shanghai AI Lab 78B Apache 2.0 Document, chart, 3D vision MMMU: 72.2 — SOTA among open-source
Qwen2.5-VL-32B Alibaba 32B Apache 2.0 OCR, structured extraction Fits single A100 80GB
Qwen2.5-VL-7B Alibaba 7B Apache 2.0 Self-host VLM Runs on 16GB VRAM

5. Long Context / 长上下文

When you need to ingest full books, codebases, or document archives in one shot.

Model Provider Context License Notes
Llama 4 Scout Meta 10M tokens Llama 4 License 109B MoE · 17B active. Industry-leading context; handles entire codebases
MiniMax-Text-01 MiniMax 4M tokens Open weight Between standard and Scout extreme
Qwen2.5-1M Alibaba 1M tokens Apache 2.0 7B/14B variants; proven stability at 1M
Llama 4 Maverick Meta 1M tokens Llama 4 License 400B MoE · 17B active
Qwen3.5-397B-A17B Alibaba ≥1M Apache 2.0 Flagship, multimodal + long ctx
Qwen3.5-0.8B Alibaba 262K Apache 2.0 Long ctx at edge device scale

6. Agent / Tool Use / Agent 与工具调用

Optimized for function calling, multi-tool orchestration, browser use.

Model Provider Params License Highlights
GLM-4.5-Air Zhipu AI Open weight Purpose-built for agent workflows — optimized tool use + web browsing
Qwen3-30B-A3B-Thinking-2507 Alibaba 30B MoE · 3B active Apache 2.0 Complex reasoning agents with /think mode
Qwen3-Coder-30B-A3B Alibaba 30B MoE Apache 2.0 Agentic coding workflows (IDE-level)
MiniMax-M2.7 MiniMax MoE Open weight Self-refines its own agent system
Hermes-4 70B Nous Research 70B Llama License Best open tool-calling on Llama base
xLAM-2 (series) Salesforce 8B–70B CC-BY-NC Function-calling specialist; APIGen-trained

7. Edge & Mobile (≤7B) / 边缘与移动端

Under 7B params, runs on phones / laptops / Jetson-class hardware.

Model Provider Params License Best For Highlights
Phi-4-mini Microsoft 3.8B MIT Tool calling, structured Reasoning on par with 7–9B models
Gemma 3 4B Google 4B Gemma License Apple M-series / Snapdragon 20-30 t/s at 4-bit; audio + video input
Gemma 3 2B Google 2B Gemma License Ultra-lightweight For truly tiny devices
Qwen3.5-0.8B Alibaba 0.8B Apache 2.0 Multilingual micro-agent 262K context + thinking at 0.8B
SmolLM3-3B Hugging Face 3B Apache 2.0 General edge Beats Llama-3.2-3B and Qwen2.5-3B
Llama 3.2-3B Meta 3B Llama License Mobile Proven, enormous ecosystem
Llama 3.2-1B Meta 1B Llama License Tiny fallback 4GB RAM at 4-bit

8. Embedding & Retrieval / 嵌入与检索

Semantic search, RAG retrieval. Ranked for production RAG.

Model Provider Params License Best For Highlights
BGE-M3 BAAI 568M MIT Multilingual RAG default Dense + sparse + ColBERT in one model; 100+ languages
Qwen3-Embedding-8B Alibaba 8B Apache 2.0 Top MTEB Leads MTEB-multilingual open-source category
Nomic Embed Text v2 Nomic AI MoE Apache 2.0 Multilingual retrieval First MoE embedding model; 100 languages
gte-multilingual-base Alibaba 305M Apache 2.0 Balanced quality/size Strong multilingual; efficient
Jina Embeddings v3 Jina AI 570M CC-BY-NC 4.0 Long documents 8192 ctx + late chunking
mxbai-embed-large-v1 Mixedbread 335M Apache 2.0 English MRL Matryoshka Representation Learning
nomic-embed-text-v1.5 Nomic AI 137M Apache 2.0 Tiny embedder 8K context; efficient CPU inference

9. Speech-to-Text (ASR) / 语音识别

Transcription, captioning, voice-interface STT.

Model Provider Params License Best For Highlights
NVIDIA Canary Qwen 2.5B NVIDIA 2.5B CC-BY 4.0 English + Q&A #1 on HF Open ASR Leaderboard — 5.63% WER; dual transcribe/analyze
NVIDIA Parakeet RNNT 1.1B NVIDIA 1.1B CC-BY 4.0 English streaming LibriSpeech 1.8% WER — lowest of any open model
Whisper Large V3 Turbo OpenAI 809M MIT 99 languages Best multilingual; real-time capable
IBM Granite Speech 3.3 IBM 8B Apache 2.0 Enterprise ~5.85% WER; commercial-friendly license
Qwen3-ASR Alibaba 1.7B / 0.6B Apache 2.0 Multilingual + dialects 52 languages & dialects; Jan 2026
Distil-Whisper HF / Whisper MIT Streaming Low-latency distilled Whisper

10. Text-to-Speech (TTS) & Voice Cloning / 语音合成与克隆

Model Provider Params License Best For Highlights
Fish Audio S2 Pro Fish Audio Apache 2.0 Multilingual + cloning Beats Google/OpenAI on Seed-TTS Eval; 80+ languages zero-shot
Fish Speech V1.5 Fish Audio Apache 2.0 Proven multilingual Stable previous generation, huge ecosystem
CosyVoice2-0.5B Alibaba 0.5B Apache 2.0 Streaming 150ms latency for real-time voice UX
IndexTTS-2 Bilibili Index Apache 2.0 Video dubbing SOTA zero-shot; precise duration & emotion control
Kokoro-82M hexgrad 82M Apache 2.0 Efficiency MOS 4.2 at 82M — best quality/size ratio

11. Image Generation / 图像生成

Model Provider License Best For Highlights
FLUX.2 [dev] Black Forest Labs FLUX-1 Dev Non-Commercial Photorealism + text rendering Production-grade (Nov 2025); best text-in-image of open models
FLUX.1 [dev] Black Forest Labs FLUX-1 Dev Photorealism Still the go-to for photoreal single images
Stable Diffusion 3.5 Large Stability AI Stability Non-Commercial Quality improvement over SDXL Quality leap over SDXL
SDXL Stability AI CreativeML Open RAIL++-M LoRA ecosystem, custom fine-tunes Battle-tested since 2023; huge community
HiDream HiDream.ai Open weight High-res output Strong on anime / illustration

12. Video Generation / 视频生成

Model Provider Params License Best For Highlights
Wan 2.2 T2V-A14B Alibaba 14B active MoE Apache 2.0 T2V 480P/720P 5s First open-source MoE video model; cinematic quality
HunyuanVideo Tencent 13B Tencent License Professional T2V Beats Runway Gen-3 on benchmarks; mature ecosystem
Mochi 1 Genmo Apache 2.0 Best T2V quality/license Apache 2.0 gives full commercial freedom
CogVideoX-5B Zhipu 5B Apache 2.0 Image-to-Video Best I2V; runs on 1× 3090
LTX-Video Lightricks OpenRAIL Real-time First DiT-based real-time video model

选型速查 / Quick Selection

按场景推荐 / By Scenario

场景 Scenario 推荐方案 Recommendation 为什么 Why
智能客服 Customer Support Qwen3.5 + BGE-M3 多语言 + 成熟的嵌入检索
代码助手 (云端) Coding (Cloud) GLM-5.1 或 Kimi K2.5 SWE-bench 顶级开源模型
代码助手 (自建) Coding (Self-host) GLM-4.7 或 Qwen3-Coder-30B-A3B 可在消费级显卡跑
文档/图表 Document & Chart InternVL3-78B 或 Qwen3-VL-235B MMMU 顶级,表格/图表强
移动端 App Mobile Phi-4-mini 或 Gemma 3 4B 4GB RAM 可跑;M-系列友好
RAG 系统 RAG DeepSeek-V3.2 + BGE-M3 推理 + 检索双强,价格极低
数学教育 Math Tutor QwQ-32B 或 Step-3.5-Flash 32B 自建 / 196B 顶尖
语音交互 Voice UX Whisper-V3-Turbo + CosyVoice2 99 语言识别 + 150ms 合成
语音转录企业 Enterprise ASR Canary Qwen 2.5B (英文) 或 IBM Granite Speech 3.3 低 WER + 商用许可
Agent 系统 Agent GLM-4.5-Air 或 MiniMax-M2.7 工具调用 + 长任务
超长上下文 Long Context Llama 4 Scout (10M) 或 Qwen2.5-1M 整个代码库一次喂
文生图 T2I FLUX.2 [dev] (质量) 或 SDXL (生态) 需要二次微调选 SDXL
文生视频 T2V Wan 2.2 (质量) 或 Mochi 1 (许可宽松) 商用选 Mochi

硬件要求速查 / Hardware Quick Reference

模型规模 Scale 最低显存 (4-bit 量化) 适合硬件 Hardware
0.8B – 3B 2 – 4 GB 手机 · Raspberry Pi · Mac Mini · 树莓派
7B – 8B 6 – 8 GB RTX 3060 · M1/M2/M3 Mac · GTX 1080
14B – 32B 16 – 24 GB RTX 4090 · A5000 · M3/M4 Max (64GB)
70B dense 40 – 48 GB A6000 · 2×4090 · M3 Ultra (128GB)
MoE 30B-A3B 16 – 24 GB (总权重 ~60GB 但活跃 3B) RTX 4090 (权重需快速存储)
MoE 400B-A17B 80 – 160 GB 显存或 512 GB 统一内存 多卡 A100/H100 · M3 Ultra (512GB)
MoE 600B+ 8× H100 80GB 或等价 企业级部署

重要变更记录 / Changelog

  • 2026-04-19 — Full rewrite: added GLM-5/5.1, Kimi K2.5, Step-3.5-Flash, Qwen3-VL-235B, Wan 2.2, FLUX.2, Canary Qwen 2.5B, Qwen3-ASR, Fish Audio S2 Pro, Hermes-4. Added two new categories: Agent/Tool Use and Video Generation. Removed stale entries (Qwen-QwQ-32B renamed to QwQ-32B, etc.).
  • 2026-03 — Initial version with 7 categories.

How to Add a Model / 如何添加模型

See CONTRIBUTING.md for the full template.

Hard criteria:

  • Must be open-source or open-weight with a clear license (commercial-use status noted if restricted)
  • Must have verifiable benchmark or well-documented real-world capability
  • Must be practically usable (weights on HF, runs on consumer or affordable cloud hardware)
  • Must be not vaporware — released and downloadable at time of listing
  • Include a Last verified date in the row; readers must know the nutrient freshness

Sources / 参考来源

Cross-referenced at the 2026-04-19 verification pass:

Last tool-verified at 2026-04-19 (Hong Kong / Beijing timezone). Model benchmarks and releases move weekly — re-verify before any critical production selection.