The most comprehensive collection of frontier and open-source AI models — from GPT-5 and Claude 4 to DeepSeek R1, Qwen3, Gemma 3, and beyond. Researched, documented, and accessible to everyone.
42 models shown · Click any card to expand full details
LumiChats' intelligent auto-routing layer that automatically selects the best available model for each specific query. Powered by OpenRouter's routing infrastructure, it analyses your request type — coding, reasoning, creative writing, or general conversation — and dispatches it to the model…
OpenAI's GPT-5 generation model offering advanced reasoning, strong structured output, and reliable instruction following. Part of the GPT-5 family designed for complex, multi-step tasks requiring deep contextual understanding. Excels at nuanced analysis, systematic problem-solving, and synthesising information from long documents.…
Anthropic's Claude Sonnet 4.5 is a high-capability model in the Claude 4 family, balancing intelligence and speed. It features extended thinking (chain-of-thought reasoning) and excels at long-document analysis with its 200K-token context window. Anthropic trained Claude with a strong emphasis…
Claude Sonnet 4.6 is Anthropic's latest smart, efficient model designed for everyday professional use. It inherits the Claude 4 family's 200K-token context window and extended thinking capability, making it ideal for handling complex documents and multi-step reasoning chains. Sonnet 4.6…
Claude Haiku 4.5 is Anthropic's fastest and most compact Claude 4-family model. Despite being lightweight, it still features the 200K-token context window inherited from the Claude 4 architecture and is optimised for low-latency applications. It's the right choice when you…
OpenAI's GPT-5.3 Codex is a coding-specialised variant of the GPT-5.3 family, built for agentic software engineering tasks. Following in the tradition of the original Codex that powered GitHub Copilot, GPT-5.3-Codex is optimised for multi-file code generation, repository-level understanding, automated debugging,…
Google's Gemini 2.5 Pro is one of the world's most capable multimodal reasoning models, featuring a 1-million-token context window that can process entire books, long video transcripts, or massive codebases in a single pass. It achieves frontier performance on AIME,…
Google's Gemini 3 Flash Preview is an early access version of the Gemini 3 Flash model — a smaller, faster sibling to Gemini 3 Pro designed for high-throughput applications. It retains Gemini's signature 1M-token context window and multimodal capabilities, while…
xAI's Grok 4.1 Fast is a speed-optimised variant of Grok 4, Elon Musk's AI lab's flagship large language model. Grok is designed with minimal content restrictions and a direct, unfiltered personality — making it popular for candid conversations and tasks…
NVIDIA's Nemotron Nano 12B v2 VL is a compact but highly capable open-source vision-language model built on a hybrid Transformer-Mamba architecture. Trained on over 39M high-quality multimodal samples, it leads benchmarks in OCR (OCRBench v2), document intelligence, chart reasoning, and…
Mistral Small 3.1 24B is Mistral AI's most capable small multimodal model, handling both text and image inputs with a 128K-token context window. It's designed to deliver top-tier performance at the 24B scale — beating larger models on several benchmarks…
Google's Gemma 3 4B is the entry-level vision-language model in the Gemma 3 family, supporting both text and image inputs with a 128K-token context window. Built on the same research foundation as Gemini 2.0, it's designed to run efficiently on…
Google's Gemma 3 12B strikes a strong balance between capability and deployment practicality. Part of the Gemma 3 multimodal family, it handles text and image inputs with a 128K context window and was trained on 12 trillion tokens. It offers…
Google's Gemma 3 27B is the flagship of the Gemma 3 family and one of the best open-source models globally at its size. It ranked in the top 10 of the LMSYS Chatbot Arena with an Elo score of 1338–1339,…
Google's Gemini 2.0 Flash Experimental is a free experimental release showcasing capabilities from the Gemini 2.0 generation — a model designed to be natively multimodal and agentic. It processes text, images, audio, and video, with a 1M-token context window for…
Alibaba's Qwen2.5-VL 7B Instruct is a strong open-source vision-language model at the 7B scale. It supports image and text inputs with a native dynamic resolution mechanism, allowing it to process images at their original resolution rather than downscaling. Qwen2.5-VL 7B…
Xiaomi's MiMo-V2-Flash is a fast, lightweight multimodal model developed by the Xiaomi AI team. MiMo (Mixture of Modalities) is designed for efficient on-device and cloud inference, combining text and image understanding in a compact architecture. It's optimised for speed-sensitive scenarios…
NVIDIA's Nemotron 3 Nano 30B A3B is a Mixture-of-Experts model with 30 billion total parameters but only 3.3 billion active parameters per forward pass — enabling very fast inference at low cost. Built on NVIDIA's Nemotron architecture, it's part of…
Devstral 2 is Mistral AI's state-of-the-art open-source agentic coding model, achieving 72.2% on SWE-bench Verified — one of the highest scores for any open-weight model on this benchmark for real-world GitHub issue resolution. With 123B parameters, a 256K-token context window,…
Qwen3-Coder-480B-A35B-Instruct is Alibaba Cloud's most powerful open agentic coding model. It's a Mixture-of-Experts model with 480 billion total parameters and only 35 billion active per inference pass (8 of 160 experts), making large-scale deployment economically viable. The model natively supports…
KAT-Coder-Pro V1 is Kwaipilot's (Kuaishou's AI coding arm) proprietary coding model designed for production software development. Kwaipilot is the AI coding assistant from Kuaishou Technology, the company behind the Kwai short-video platform. The KAT-Coder-Pro model targets real-world developer workflows with…
DeepSeek V3.1 Nex N1 is Nex AGI's enhanced fine-tune of DeepSeek V3.1, optimised for agentic reasoning tasks. The N1 variant applies additional alignment and instruction-following improvements on top of DeepSeek's frontier-class 671B MoE base, with 37B active parameters per forward…
The original DeepSeek R1T Chimera from TNG Technology Consulting GmbH (Munich) is an expert-assembly model merging DeepSeek V3-0324 and R1 at the MoE expert tensor level — no fine-tuning or distillation required. The result is a model that achieves approximately…
AllenAI's OLMo 3.1 32B Think is the world's most transparent large-scale reasoning model — every piece of training data, code, intermediate checkpoint, and reasoning trace is publicly available under Apache 2.0. The 3.1 variant extends the original OLMo 3 32B…
Alibaba's Tongyi DeepResearch 30B A3B is a research-oriented reasoning model from the Tongyi (通义) family, trained specifically for in-depth analytical and research tasks. As a 30B MoE model with only 3 billion active parameters, it provides strong reasoning output at…
DeepSeek-TNG R1T2 Chimera is TNG Technology Consulting's second-generation Assembly-of-Experts model, merging three DeepSeek parents: R1-0528, R1, and V3-0324 at the weight tensor level — no fine-tuning required. The tri-parent design achieves a new sweet spot: approximately 20% faster than standard…
The original DeepSeek R1T Chimera (April 2025) was TNG's first successful Assembly-of-Experts model merge at 671B scale — the first publicly demonstrated merge of models at this size. By combining DeepSeek V3-0324's shared experts with R1's routed expert tensors, it…
Arcee AI's Trinity Mini is a compact general-purpose model from Arcee's Trinity model family, which specialises in efficient AI for enterprise applications. Arcee AI is known for its model merging and specialisation techniques — the Trinity series uses a mixture-of-models…
NVIDIA's Nemotron Nano 9B V2 is a compact, highly optimised language model using NVIDIA's hybrid Transformer-Mamba architecture. This design delivers higher throughput and lower latency compared to standard attention-only transformers while maintaining competitive reasoning quality. Nemotron Nano 9B V2 achieves…
Z.AI's GLM-4.5-Air is the lightweight variant of the flagship GLM-4.5 family from Zhipu AI — an agent-native model that unifies reasoning, coding, and tool use in a single architecture. With 106 billion total parameters and only 12 billion active (MoE),…
Google's Gemma 3n E2B is the smallest model in the Gemma 3n (nano) family, designed specifically for mobile, IoT, and on-device AI deployment. Using Google's revolutionary MatFormer (Matryoshka Transformer) architecture, Gemma 3n E2B has a total parameter count of ~5B…
Google's Gemma 3n E4B is the larger model in the Gemma 3n family, targeting high-end mobile devices, laptops, and edge servers. With an effective ~4B memory footprint despite containing more total parameters (MatFormer architecture), it handles text, images, and audio…
Alibaba's Qwen3 4B is the compact member of the Qwen3 family, offering both thinking (chain-of-thought) and non-thinking modes in a tiny 4B parameter footprint. Despite its small size, Qwen3 4B is one of the most capable models in its class…
Qwen3-235B-A22B is Alibaba's flagship open-source model — a massive 235B MoE model with 22B active parameters per forward pass. It ranks among the best open-weight models globally, achieving top performance on AIME 2025, LiveCodeBench, and multilingual benchmarks. In non-thinking mode…
Meta's Llama 3.3 70B Instruct is one of the most widely used open-source LLMs globally, delivering performance comparable to Llama 3.1 405B at a fraction of the compute cost. Trained on 15 trillion tokens and 39.3M GPU hours on NVIDIA…
Meta's Llama 3.2 3B Instruct is a tiny but capable model from Meta's open-source AI programme, pretrained on 9 trillion tokens and refined via SFT, rejection sampling, and DPO. Using knowledge distillation from larger Llama 3.1 models, it punches above…
Hermes 3 405B Instruct is Nous Research's flagship fine-tune of Meta's Llama-3.1 405B foundation model — a full-parameter finetune specifically designed to maximise user-alignment, creative flexibility, and agentic capability. Hermes 3 builds on the Hermes 2 series with dramatically improved…
DeepSeek R1-0528 is a significant upgrade to the original DeepSeek R1, achieved through increased training compute and post-training algorithmic improvements — not architectural changes. It doubled the average thinking token depth (12K→23K), boosting AIME 2025 accuracy from 70% to 87.5%…
Riverflow V2 Fast Preview is a speed-optimised general-purpose language model from Sourceful, designed for applications where response time is critical. As a preview model, it offers a glimpse into Sourceful's approach to efficient AI — balancing quality and throughput for…
GPT-OSS 20B is OpenAI's first open-source model release — a 20-billion-parameter model made freely available through Hugging Face and compatible APIs. This is a significant moment for OpenAI, marking their first foray into open-weight AI after years of closed-source development.…
Moonshot AI's Kimi K2 (July 11 2025 release) is a landmark 1-trillion-parameter MoE model with only 32 billion active parameters per inference pass, built using the novel MuonClip optimiser for stable large-scale training. Kimi K2 achieves state-of-the-art performance on coding…
Venice Uncensored (Dolphin Mistral 24B Venice Edition) is Cognitive Computations' 'uncensored' fine-tune of the Mistral 24B model, built for creative writing, roleplay, and use cases where standard safety filtering would prevent valuable outputs. The 'Venice Edition' designation reflects training optimised…
Models like DeepSeek R1, OLMo 3.1 Think, and TNG R1T2 Chimera use extended chain-of-thought (CoT) reasoning — they "think before answering," showing intermediate steps. Best for maths, logic, and complex problem-solving. Expect slower, longer responses but significantly higher accuracy on hard tasks.
Vision-language models (VLMs) like Gemma 3, NVIDIA Nemotron VL, and Qwen2.5-VL can understand images, charts, documents, and (in some cases) video alongside text. Use these when you need to analyse screenshots, diagrams, receipts, or any visual content.
Specialised models like Devstral 2, Qwen3 Coder 480B, and KAT-Coder-Pro are fine-tuned on large programming datasets and evaluated on SWE-bench (real GitHub issue resolution). They excel at agentic coding tasks, multi-file edits, bug detection, and software engineering agent workflows.
Reasoning models (marked with a 'THINK' badge on LumiChats) use chain-of-thought (CoT) techniques — they generate internal reasoning traces before producing a final answer. This makes them much more accurate on maths, logic, and multi-step problems, but slower and more token-intensive than standard models.
For agentic coding (autonomous bug fixing, multi-file refactoring): Devstral 2 (72.2% SWE-bench) or Qwen3 Coder 480B. For code completion in a chat interface: Llama 3.3 70B, DeepSeek R1 0528, or GLM 4.5 Air. For absolute top quality: Claude Sonnet 4.6 (Premium) or GPT-5.3-Codex (Premium).
Models tagged 'Multimodal' accept image inputs. Top picks: Gemma 3 27B (best open-source VLM), NVIDIA Nemotron Nano 12B 2 VL (best for OCR/documents), Gemini 2.0 Flash Experimental (free, 1M context), and Qwen2.5-VL 7B (best for Chinese-language docs).
Free models (marked GREEN) are accessible to all users with no subscription. Premium models (marked with a Crown icon) — GPT-5.2, Claude Sonnet 4.5/4.6, Claude Haiku 4.5, GPT-5.3-Codex, Gemini 2.5, Gemini 3 Flash Preview, Grok 4.1 Fast — require a LumiChats Premium subscription.
A MoE model has many 'expert' sub-networks but only activates a small subset per token during inference. For example, Qwen3 235B A22B has 235B total parameters but only 22B active — making it cheaper to run than a full 235B dense model while retaining its full representational capacity. Models like DeepSeek R1, Kimi K2, and GLM 4.5 all use MoE.
For creative fiction and storytelling: Kimi K2 0711 (praised for creative quality), Hermes 3 405B (strong roleplaying, uncensored), Venice Uncensored (adult creative content), and Claude Sonnet 4.6 (nuanced, high-quality writing). Gemma 3 27B ranked #2 on the EQ-Bench creative writing leaderboard among open models.
Switch between every model above mid-session. No extra subscriptions for free models. Upgrade to unlock all 42.
Start Free Today