开源模型目录

A curated, living guide to the best open-source models across capability domains. Updated from live research on public leaderboards, benchmarks, and Hugging Face releases. Focused on models that are free, production-ready, and genuinely useful for vertical domain projects.

精选各能力域最佳开源模型指南——动态维护，基于公开排行榜、benchmark 和 Hugging Face 发布实时研究。聚焦于免费、可生产部署、对垂直领域项目真正有用的模型。

Last verified / 最后核验: 2026-04-19 Methodology / 方法: Cross-referenced BenchLM.ai, Onyx open-weight leaderboard, HF Open ASR Leaderboard, MTEB, SWE-bench Verified / Pro, Seed-TTS Eval, Artificial Analysis, bentoml.com guides. See Sources at bottom.

1. Frontier Reasoning / 前沿推理

For agentic workflows, multi-step planning, and general-purpose high-quality reasoning.

Model	Provider	Params	License	Best For	Highlights (2026-04)
GLM-5.1	Zhipu AI	MoE (undisclosed active)	Open weight	Agentic engineering, long-horizon SWE	#1 on BenchLM.ai open-weight leaderboard (84 overall) — purpose-built for agent loops
GLM-5 (Reasoning)	Zhipu AI	MoE	Open weight	Multi-step reasoning	Leaderboard top at 85 overall; reasoning-tuned variant
Qwen3.5-397B-A17B	Alibaba	397B MoE · 17B active	Apache 2.0	Multimodal reasoning, ultra-long ctx	Flagship MoE, 81 on leaderboard; 119 languages
DeepSeek-V3.2	DeepSeek	685B MoE	MIT	General reasoning + tool use	Very cheap inference (~$0.28/M tokens on hosted APIs); 73.1% SWE-bench
DeepSeek-V3.2-Speciale	DeepSeek	685B MoE	MIT	Hard math/competition reasoning	Surpasses GPT-5 and reaches Gemini-3.0-Pro-level on AIME / HMMT 2025
Llama 4 Maverick	Meta	400B MoE · 17B active	Llama 4 License	General purpose + 1M ctx	Strong all-rounder with cheap per-token cost (17B active)
MiniMax-M2.7	MiniMax	Large MoE	Open weight	Agentic workflows	Actively refines its own agent system — agent-native design
Gemma 4 26B	Google	26B dense	Gemma License	Consumer-hardware reasoning	Only Western model in top 5; 85 t/s on M3 Max / RTX 4090

2. Coding / 代码

Bug fixing, feature implementation, SWE-bench. Ranked by SWE-bench Verified.

Model	Provider	Params	License	SWE-bench	Highlights
Kimi K2.5	Moonshot AI	MoE	MIT*	76.8%	#1 open-source on SWE-bench Verified (Jan 2026 release). *Commercial restriction at 100M+ MAU
GLM-5.1	Zhipu AI	MoE	Open weight	leads SWE-bench Pro	Best on SWE-bench Pro + Terminal Bench (full agentic SWE)
GLM-4.7	Zhipu AI	—	Open weight	73.8%	Runs on consumer hardware (single RTX 4090 quantized); 94.2 HumanEval
DeepSeek-V3.2	DeepSeek	685B MoE	MIT	73.1%	More consistent than GLM-4.7 across direct comparisons
Qwen3-Coder-480B-A35B	Alibaba	480B MoE · 35B active	Apache 2.0	70.6%	Qwen's flagship coder; strong on repo-scale edits
Qwen3-Coder-30B-A3B	Alibaba	30B MoE · 3B active	Apache 2.0	~65%	Best self-hostable coder — runs on 24GB VRAM
Qwen2.5-Coder-32B	Alibaba	32B dense	Apache 2.0	~56%	Mature option, largest LoRA ecosystem

3. Math & Reasoning / 数学与推理

Competition math (AIME, HMMT), chain-of-thought, self-correction.

Model	Provider	Params	License	Best For	Highlights
Step-3.5-Flash	StepFun	196B	Open weight	AIME/HMMT	AIME 2025: 97.3 — tied for highest on full leaderboard at modest size
GLM-4.7 (Reasoning)	Zhipu AI	—	Open weight	Competition math	AIME 2025: 97.3; strong open-weight competitor
DeepSeek-V3.2-Speciale	DeepSeek	685B MoE	MIT	Proof-style math	GPT-5-level on AIME/HMMT; HMMT-specialized variant
DeepSeek-R1	DeepSeek	671B MoE	MIT	o1-level reasoning	Classic R1 still competitive; long thinking chains
Qwen3 235B (Thinking)	Alibaba	235B dense	Apache 2.0	All-round reasoning	AIME 2025: 92.3; `/think` mode
QwQ-32B	Alibaba	32B dense	Apache 2.0	Self-hostable reasoning	Rivals DeepSeek-R1 / o1-mini at 32B; ~48GB for BF16
DeepSeek-Math-V2	DeepSeek	671B MoE	MIT	Formal math, proofs	Proof generation, Lean integration
GLM-Z1-9B	Zhipu	9B	Open weight	Lightweight math	Runs on single RTX 3090; punches well above weight

4. Multimodal / Vision / 多模态与视觉

Vision-language models (VLMs): image understanding, document / chart parsing, visual agents.

Model	Provider	Params	License	Best For	Highlights
Qwen3-VL-235B-A22B	Alibaba	235B MoE · 22B active	Apache 2.0	Frontier VLM	Rivals Gemini-2.5-Pro / GPT-5 on multimodal benchmarks
GLM-4.6V	Zhipu AI	—	Open weight	Visual agent, tool use	End-to-end visual tool use, 128K context
GLM-4.5V	Zhipu AI	—	Open weight	Everyday VLM	Smaller but strong agentic abilities
InternVL3-78B	Shanghai AI Lab	78B	Apache 2.0	Document, chart, 3D vision	MMMU: 72.2 — SOTA among open-source
Qwen2.5-VL-32B	Alibaba	32B	Apache 2.0	OCR, structured extraction	Fits single A100 80GB
Qwen2.5-VL-7B	Alibaba	7B	Apache 2.0	Self-host VLM	Runs on 16GB VRAM

5. Long Context / 长上下文

When you need to ingest full books, codebases, or document archives in one shot.

Model	Provider	Context	License	Notes
Llama 4 Scout	Meta	10M tokens	Llama 4 License	109B MoE · 17B active. Industry-leading context; handles entire codebases
MiniMax-Text-01	MiniMax	4M tokens	Open weight	Between standard and Scout extreme
Qwen2.5-1M	Alibaba	1M tokens	Apache 2.0	7B/14B variants; proven stability at 1M
Llama 4 Maverick	Meta	1M tokens	Llama 4 License	400B MoE · 17B active
Qwen3.5-397B-A17B	Alibaba	≥1M	Apache 2.0	Flagship, multimodal + long ctx
Qwen3.5-0.8B	Alibaba	262K	Apache 2.0	Long ctx at edge device scale

6. Agent / Tool Use / Agent 与工具调用

Optimized for function calling, multi-tool orchestration, browser use.

Model	Provider	Params	License	Highlights
GLM-4.5-Air	Zhipu AI	—	Open weight	Purpose-built for agent workflows — optimized tool use + web browsing
Qwen3-30B-A3B-Thinking-2507	Alibaba	30B MoE · 3B active	Apache 2.0	Complex reasoning agents with `/think` mode
Qwen3-Coder-30B-A3B	Alibaba	30B MoE	Apache 2.0	Agentic coding workflows (IDE-level)
MiniMax-M2.7	MiniMax	MoE	Open weight	Self-refines its own agent system
Hermes-4 70B	Nous Research	70B	Llama License	Best open tool-calling on Llama base
xLAM-2 (series)	Salesforce	8B–70B	CC-BY-NC	Function-calling specialist; APIGen-trained

7. Edge & Mobile (≤7B) / 边缘与移动端

Under 7B params, runs on phones / laptops / Jetson-class hardware.

Model	Provider	Params	License	Best For	Highlights
Phi-4-mini	Microsoft	3.8B	MIT	Tool calling, structured	Reasoning on par with 7–9B models
Gemma 3 4B	Google	4B	Gemma License	Apple M-series / Snapdragon	20-30 t/s at 4-bit; audio + video input
Gemma 3 2B	Google	2B	Gemma License	Ultra-lightweight	For truly tiny devices
Qwen3.5-0.8B	Alibaba	0.8B	Apache 2.0	Multilingual micro-agent	262K context + thinking at 0.8B
SmolLM3-3B	Hugging Face	3B	Apache 2.0	General edge	Beats Llama-3.2-3B and Qwen2.5-3B
Llama 3.2-3B	Meta	3B	Llama License	Mobile	Proven, enormous ecosystem
Llama 3.2-1B	Meta	1B	Llama License	Tiny fallback	4GB RAM at 4-bit

8. Embedding & Retrieval / 嵌入与检索

Semantic search, RAG retrieval. Ranked for production RAG.

Model	Provider	Params	License	Best For	Highlights
BGE-M3	BAAI	568M	MIT	Multilingual RAG default	Dense + sparse + ColBERT in one model; 100+ languages
Qwen3-Embedding-8B	Alibaba	8B	Apache 2.0	Top MTEB	Leads MTEB-multilingual open-source category
Nomic Embed Text v2	Nomic AI	MoE	Apache 2.0	Multilingual retrieval	First MoE embedding model; 100 languages
gte-multilingual-base	Alibaba	305M	Apache 2.0	Balanced quality/size	Strong multilingual; efficient
Jina Embeddings v3	Jina AI	570M	CC-BY-NC 4.0	Long documents	8192 ctx + late chunking
mxbai-embed-large-v1	Mixedbread	335M	Apache 2.0	English MRL	Matryoshka Representation Learning
nomic-embed-text-v1.5	Nomic AI	137M	Apache 2.0	Tiny embedder	8K context; efficient CPU inference

9. Speech-to-Text (ASR) / 语音识别

Transcription, captioning, voice-interface STT.

Model	Provider	Params	License	Best For	Highlights
NVIDIA Canary Qwen 2.5B	NVIDIA	2.5B	CC-BY 4.0	English + Q&A	#1 on HF Open ASR Leaderboard — 5.63% WER; dual transcribe/analyze
NVIDIA Parakeet RNNT 1.1B	NVIDIA	1.1B	CC-BY 4.0	English streaming	LibriSpeech 1.8% WER — lowest of any open model
Whisper Large V3 Turbo	OpenAI	809M	MIT	99 languages	Best multilingual; real-time capable
IBM Granite Speech 3.3	IBM	8B	Apache 2.0	Enterprise	~5.85% WER; commercial-friendly license
Qwen3-ASR	Alibaba	1.7B / 0.6B	Apache 2.0	Multilingual + dialects	52 languages & dialects; Jan 2026
Distil-Whisper	HF / Whisper	—	MIT	Streaming	Low-latency distilled Whisper

10. Text-to-Speech (TTS) & Voice Cloning / 语音合成与克隆

Model	Provider	Params	License	Best For	Highlights
Fish Audio S2 Pro	Fish Audio	—	Apache 2.0	Multilingual + cloning	Beats Google/OpenAI on Seed-TTS Eval; 80+ languages zero-shot
Fish Speech V1.5	Fish Audio	—	Apache 2.0	Proven multilingual	Stable previous generation, huge ecosystem
CosyVoice2-0.5B	Alibaba	0.5B	Apache 2.0	Streaming	150ms latency for real-time voice UX
IndexTTS-2	Bilibili Index	—	Apache 2.0	Video dubbing	SOTA zero-shot; precise duration & emotion control
Kokoro-82M	hexgrad	82M	Apache 2.0	Efficiency	MOS 4.2 at 82M — best quality/size ratio

11. Image Generation / 图像生成

Model	Provider	License	Best For	Highlights
FLUX.2 [dev]	Black Forest Labs	FLUX-1 Dev Non-Commercial	Photorealism + text rendering	Production-grade (Nov 2025); best text-in-image of open models
FLUX.1 [dev]	Black Forest Labs	FLUX-1 Dev	Photorealism	Still the go-to for photoreal single images
Stable Diffusion 3.5 Large	Stability AI	Stability Non-Commercial	Quality improvement over SDXL	Quality leap over SDXL
SDXL	Stability AI	CreativeML Open RAIL++-M	LoRA ecosystem, custom fine-tunes	Battle-tested since 2023; huge community
HiDream	HiDream.ai	Open weight	High-res output	Strong on anime / illustration

12. Video Generation / 视频生成

Model	Provider	Params	License	Best For	Highlights
Wan 2.2 T2V-A14B	Alibaba	14B active MoE	Apache 2.0	T2V 480P/720P 5s	First open-source MoE video model; cinematic quality
HunyuanVideo	Tencent	13B	Tencent License	Professional T2V	Beats Runway Gen-3 on benchmarks; mature ecosystem
Mochi 1	Genmo	—	Apache 2.0	Best T2V quality/license	Apache 2.0 gives full commercial freedom
CogVideoX-5B	Zhipu	5B	Apache 2.0	Image-to-Video	Best I2V; runs on 1× 3090
LTX-Video	Lightricks	—	OpenRAIL	Real-time	First DiT-based real-time video model

选型速查 / Quick Selection

按场景推荐 / By Scenario

场景 Scenario	推荐方案 Recommendation	为什么 Why
智能客服 Customer Support	Qwen3.5 + BGE-M3	多语言 + 成熟的嵌入检索
代码助手 (云端) Coding (Cloud)	GLM-5.1 或 Kimi K2.5	SWE-bench 顶级开源模型
代码助手 (自建) Coding (Self-host)	GLM-4.7 或 Qwen3-Coder-30B-A3B	可在消费级显卡跑
文档/图表 Document & Chart	InternVL3-78B 或 Qwen3-VL-235B	MMMU 顶级，表格/图表强
移动端 App Mobile	Phi-4-mini 或 Gemma 3 4B	4GB RAM 可跑；M-系列友好
RAG 系统 RAG	DeepSeek-V3.2 + BGE-M3	推理 + 检索双强，价格极低
数学教育 Math Tutor	QwQ-32B 或 Step-3.5-Flash	32B 自建 / 196B 顶尖
语音交互 Voice UX	Whisper-V3-Turbo + CosyVoice2	99 语言识别 + 150ms 合成
语音转录企业 Enterprise ASR	Canary Qwen 2.5B (英文) 或 IBM Granite Speech 3.3	低 WER + 商用许可
Agent 系统 Agent	GLM-4.5-Air 或 MiniMax-M2.7	工具调用 + 长任务
超长上下文 Long Context	Llama 4 Scout (10M) 或 Qwen2.5-1M	整个代码库一次喂
文生图 T2I	FLUX.2 [dev] (质量) 或 SDXL (生态)	需要二次微调选 SDXL
文生视频 T2V	Wan 2.2 (质量) 或 Mochi 1 (许可宽松)	商用选 Mochi

硬件要求速查 / Hardware Quick Reference

模型规模 Scale	最低显存 (4-bit 量化)	适合硬件 Hardware
0.8B – 3B	2 – 4 GB	手机 · Raspberry Pi · Mac Mini · 树莓派
7B – 8B	6 – 8 GB	RTX 3060 · M1/M2/M3 Mac · GTX 1080
14B – 32B	16 – 24 GB	RTX 4090 · A5000 · M3/M4 Max (64GB)
70B dense	40 – 48 GB	A6000 · 2×4090 · M3 Ultra (128GB)
MoE 30B-A3B	16 – 24 GB (总权重 ~60GB 但活跃 3B)	RTX 4090 (权重需快速存储)
MoE 400B-A17B	80 – 160 GB 显存或 512 GB 统一内存	多卡 A100/H100 · M3 Ultra (512GB)
MoE 600B+	8× H100 80GB 或等价	企业级部署

重要变更记录 / Changelog

2026-04-19 — Full rewrite: added GLM-5/5.1, Kimi K2.5, Step-3.5-Flash, Qwen3-VL-235B, Wan 2.2, FLUX.2, Canary Qwen 2.5B, Qwen3-ASR, Fish Audio S2 Pro, Hermes-4. Added two new categories: Agent/Tool Use and Video Generation. Removed stale entries (Qwen-QwQ-32B renamed to QwQ-32B, etc.).
2026-03 — Initial version with 7 categories.

How to Add a Model / 如何添加模型

See CONTRIBUTING.md for the full template.

Hard criteria:

Must be open-source or open-weight with a clear license (commercial-use status noted if restricted)
Must have verifiable benchmark or well-documented real-world capability
Must be practically usable (weights on HF, runs on consumer or affordable cloud hardware)
Must be not vaporware — released and downloadable at time of listing
Include a Last verified date in the row; readers must know the nutrient freshness

Sources / 参考来源

Cross-referenced at the 2026-04-19 verification pass:

Last tool-verified at 2026-04-19 (Hong Kong / Beijing timezone). Model benchmarks and releases move weekly — re-verify before any critical production selection.