Guide · 2026-06-11 · OpenAgent.bot Editors

Latest Open-Source Models to Watch in June 2026

A short editor's list of Llama 4, Qwen3.5, Mistral Large 3, and Phi-4 — the latest open-source models for agent workflows.

The open-source model landscape in June 2026 is defined by Mixture-of-Experts (MoE) architectures at every scale, from Meta's Llama 4 Maverick rivaling GPT-5 to Microsoft's compact MIT-licensed Phi-4.

This shortlist covers the models that matter for agent builders: Llama 4 for frontier MoE, Qwen3.5 for Alibaba's agentic flagship, Mistral Large 3 for European open-source leadership, and Phi-4 for a small, permissively licensed reasoning workhorse.

Quick shortlist

ModelParamsLicenseWhy it matters
Llama 4400B Maverick / 109B Scout (17B active MoE)Llama 4 CommunityMeta's flagship MoE with 10M token context on Scout
Qwen3.5397B-A17B MoEApache-2.0Alibaba's flagship with 8.6x decoding improvement over Qwen3
Mistral Large 3675B (41B active MoE)Apache-2.0Europe's most powerful open model, agentic-tuned
Phi-414B denseMITMicrosoft's compact reasoning leader in its size class

Llama 4

Llama 4 is Meta's latest open MoE model family in two variants: Scout (109B total, 17B active) with 10 million token context, and Maverick (400B total, 17B active) that rivals GPT-5 on coding benchmarks. Both use a 17B-active MoE architecture for efficient inference. Scout is ideal for long-document and codebase analysis; Maverick is for frontier reasoning and agentic coding.

The main thing to check is whether Meta's Llama 4 Community License terms match your deployment requirements.

Qwen3.5

Qwen3.5 is Alibaba's 397B-A17B MoE model with 8.6x decoding throughput improvement over Qwen3, 256K context, and strong multimodal capabilities. It leads Alibaba's open model line and is Apache-2.0 licensed. Its agentic tool-calling performance makes it a strong candidate for self-hosted agent workflows.

The main thing to check is inference cost and latency for the full 397B model vs smaller active-parameter MoE alternatives.

Mistral Large 3

Mistral Large 3 is Europe's most powerful open model at 675B parameters (41B active) with Apache-2.0 licensing. It is agentic-tuned for tool calling, JSON output, and multi-step reasoning, with strong multilingual support across 80+ languages. It represents the most permissively licensed frontier-scale model from a European provider.

The main thing to check is whether Apache-2.0 licensing or European data sovereignty requirements are decisive factors for your deployment.

Phi-4

Phi-4 is Microsoft's 14B dense reasoning model, MIT-licensed, that tops MMLU in its size class. With 16K context, it is designed for efficient local inference on consumer hardware. It is the strongest option for agent builders who need a small, capable model that runs on a single GPU without quantization.

The main thing to check is whether 14B parameters provide sufficient reasoning quality for your agent workloads compared to larger MoE alternatives.

How to evaluate them

Test each model on your actual agent workflow, not benchmarks. For coding agents, run SWE-bench-style tasks. For RAG pipelines, measure retrieval-augmented answer quality. For tool calling, compare structured output reliability across models.

Official sources