Which AI model should I use for coding tasks?

For agentic coding: Devstral 2 (72.2% SWE-bench) or Qwen3 Coder 480B. For code completion: Llama 3.3 70B or GLM 4.5 Air. For top quality: Claude Sonnet 4.6 or GPT-5.3-Codex.

What is a Mixture of Experts (MoE) AI model?

A MoE model has many expert sub-networks but only activates a small subset per token during inference. Qwen3 235B A22B has 235B total parameters but only 22B active — making it cheaper to run than a full 235B dense model.

Which AI models are best for creative writing?

Kimi K2 0711, Hermes 3 405B, and Claude Sonnet 4.6 are top picks for creative writing. Gemma 3 27B ranked 2nd on the EQ-Bench creative writing leaderboard among open models.

AI Model Library

42 models.
One platform.

The most comprehensive collection of frontier and open-source AI models — from GPT-5 and Claude 4 to DeepSeek R1, Qwen3, Gemma 3, and beyond. Researched, documented, and accessible to everyone.

9 Premium models33 Free models12 Reasoning models9 Multimodal models3 Coding specialists

42 models shown · Click any card to expand full details

Best Model (Auto-routing)

Lumichats

Featured

Auto

THINKctx: Varies

Released 2025

LumiChats' intelligent auto-routing layer that automatically selects the best available model for each specific query. Powered by OpenRouter's routing infrastructure, it analyses your request type — coding, reasoning, creative writing, or general conversation — and dispatches it to the model…

GPT-5.2

OpenAI

Premium

THINKctx: 128K

Released 2025

OpenAI's GPT-5 generation model offering advanced reasoning, strong structured output, and reliable instruction following. Part of the GPT-5 family designed for complex, multi-step tasks requiring deep contextual understanding. Excels at nuanced analysis, systematic problem-solving, and synthesising information from long documents.…

Claude Sonnet 4.5

Anthropic

Premium

THINKctx: 200K

Released Sep 2025

Anthropic's Claude Sonnet 4.5 is a high-capability model in the Claude 4 family, balancing intelligence and speed. It features extended thinking (chain-of-thought reasoning) and excels at long-document analysis with its 200K-token context window. Anthropic trained Claude with a strong emphasis…

Claude Sonnet 4.6

Anthropic

Featured

Premium

THINKctx: 200K

Released 2025

Claude Sonnet 4.6 is Anthropic's latest smart, efficient model designed for everyday professional use. It inherits the Claude 4 family's 200K-token context window and extended thinking capability, making it ideal for handling complex documents and multi-step reasoning chains. Sonnet 4.6…

Claude Haiku 4.5

Anthropic

Premium

ctx: 200K

Released Oct 2025

Claude Haiku 4.5 is Anthropic's fastest and most compact Claude 4-family model. Despite being lightweight, it still features the 200K-token context window inherited from the Claude 4 architecture and is optimised for low-latency applications. It's the right choice when you…

GPT-5.3-Codex

OpenAI

Premium

THINKctx: 128K

Released 2025

OpenAI's GPT-5.3 Codex is a coding-specialised variant of the GPT-5.3 family, built for agentic software engineering tasks. Following in the tradition of the original Codex that powered GitHub Copilot, GPT-5.3-Codex is optimised for multi-file code generation, repository-level understanding, automated debugging,…

Gemini 2.5

Google

Featured

Premium

THINKctx: 1M

Released 2025

Google's Gemini 2.5 Pro is one of the world's most capable multimodal reasoning models, featuring a 1-million-token context window that can process entire books, long video transcripts, or massive codebases in a single pass. It achieves frontier performance on AIME,…

Gemini 3 Flash Preview

Google

Premium

THINKctx: 1M

Released 2025

Google's Gemini 3 Flash Preview is an early access version of the Gemini 3 Flash model — a smaller, faster sibling to Gemini 3 Pro designed for high-throughput applications. It retains Gemini's signature 1M-token context window and multimodal capabilities, while…

Grok 4.1 Fast

xAI

Premium

THINKctx: 128K

Released Nov 2025

xAI's Grok 4.1 Fast is a speed-optimised variant of Grok 4, Elon Musk's AI lab's flagship large language model. Grok is designed with minimal content restrictions and a direct, unfiltered personality — making it popular for candid conversations and tasks…

Nemotron Nano 12B 2 VL

NVIDIA

Multimodal

OPENctx: 128K

12B paramsReleased Oct 2025

NVIDIA's Nemotron Nano 12B v2 VL is a compact but highly capable open-source vision-language model built on a hybrid Transformer-Mamba architecture. Trained on over 39M high-quality multimodal samples, it leads benchmarks in OCR (OCRBench v2), document intelligence, chart reasoning, and…

Mistral Small 3.1 24B

Mistral

Multimodal

OPENctx: 128K

24B paramsReleased Mar 2025

Mistral Small 3.1 24B is Mistral AI's most capable small multimodal model, handling both text and image inputs with a 128K-token context window. It's designed to deliver top-tier performance at the 24B scale — beating larger models on several benchmarks…

Gemma 3 4B

Google

Multimodal

OPENctx: 128K

4B paramsReleased Mar 2025

Google's Gemma 3 4B is the entry-level vision-language model in the Gemma 3 family, supporting both text and image inputs with a 128K-token context window. Built on the same research foundation as Gemini 2.0, it's designed to run efficiently on…

Gemma 3 12B

Google

Multimodal

OPENctx: 128K

12B paramsReleased Mar 2025

Google's Gemma 3 12B strikes a strong balance between capability and deployment practicality. Part of the Gemma 3 multimodal family, it handles text and image inputs with a 128K context window and was trained on 12 trillion tokens. It offers…

Gemma 3 27B

Google

Featured

Multimodal

OPENctx: 128K

27B paramsReleased Mar 2025

Google's Gemma 3 27B is the flagship of the Gemma 3 family and one of the best open-source models globally at its size. It ranked in the top 10 of the LMSYS Chatbot Arena with an Elo score of 1338–1339,…

Gemini 2.0 Flash Experimental

Google

Multimodal

ctx: 1M

Released Dec 2024

Google's Gemini 2.0 Flash Experimental is a free experimental release showcasing capabilities from the Gemini 2.0 generation — a model designed to be natively multimodal and agentic. It processes text, images, audio, and video, with a 1M-token context window for…

Qwen2.5-VL 7B Instruct

Qwen

Multimodal

OPENctx: 32K

7B paramsReleased Sep 2024

Alibaba's Qwen2.5-VL 7B Instruct is a strong open-source vision-language model at the 7B scale. It supports image and text inputs with a native dynamic resolution mechanism, allowing it to process images at their original resolution rather than downscaling. Qwen2.5-VL 7B…

MiMo-V2-Flash

Xiaomi

Multimodal

OPENctx: 32K

~7B active paramsReleased 2025

Xiaomi's MiMo-V2-Flash is a fast, lightweight multimodal model developed by the Xiaomi AI team. MiMo (Mixture of Modalities) is designed for efficient on-device and cloud inference, combining text and image understanding in a compact architecture. It's optimised for speed-sensitive scenarios…

Nemotron 3 Nano 30B A3B

NVIDIA

Multimodal

OPENctx: 128K

30B / 3B active paramsReleased 2025

NVIDIA's Nemotron 3 Nano 30B A3B is a Mixture-of-Experts model with 30 billion total parameters but only 3.3 billion active parameters per forward pass — enabling very fast inference at low cost. Built on NVIDIA's Nemotron architecture, it's part of…

Devstral 2 2512

Mistral

Featured

Coding

OPENctx: 256K

123B paramsReleased Dec 2025

Devstral 2 is Mistral AI's state-of-the-art open-source agentic coding model, achieving 72.2% on SWE-bench Verified — one of the highest scores for any open-weight model on this benchmark for real-world GitHub issue resolution. With 123B parameters, a 256K-token context window,…

Qwen3 Coder 480B A35B

Qwen

Featured

Coding

OPENctx: 262K

480B / 35B active paramsReleased Jul 2025

Qwen3-Coder-480B-A35B-Instruct is Alibaba Cloud's most powerful open agentic coding model. It's a Mixture-of-Experts model with 480 billion total parameters and only 35 billion active per inference pass (8 of 160 experts), making large-scale deployment economically viable. The model natively supports…

KAT-Coder-Pro V1

Kwaipilot

Coding

ctx: 128K

Released 2025

KAT-Coder-Pro V1 is Kwaipilot's (Kuaishou's AI coding arm) proprietary coding model designed for production software development. Kwaipilot is the AI coding assistant from Kuaishou Technology, the company behind the Kwai short-video platform. The KAT-Coder-Pro model targets real-world developer workflows with…

DeepSeek V3.1 Nex N1

Nex AGI

Reasoning

OPENTHINKctx: 128K

671B / 37B active paramsReleased 2025

DeepSeek V3.1 Nex N1 is Nex AGI's enhanced fine-tune of DeepSeek V3.1, optimised for agentic reasoning tasks. The N1 variant applies additional alignment and instruction-following improvements on top of DeepSeek's frontier-class 671B MoE base, with 37B active parameters per forward…

TNG R1T Chimera

TNG Tech

Reasoning

OPENTHINKctx: 130K

671B / 37B active paramsReleased Apr 2025

The original DeepSeek R1T Chimera from TNG Technology Consulting GmbH (Munich) is an expert-assembly model merging DeepSeek V3-0324 and R1 at the MoE expert tensor level — no fine-tuning or distillation required. The result is a model that achieves approximately…

Olmo 3.1 32B Think

AllenAI

Reasoning

OPENTHINKctx: 65K

32B paramsReleased Dec 2025

AllenAI's OLMo 3.1 32B Think is the world's most transparent large-scale reasoning model — every piece of training data, code, intermediate checkpoint, and reasoning trace is publicly available under Apache 2.0. The 3.1 variant extends the original OLMo 3 32B…

Tongyi DeepResearch 30B A3B

Alibaba

Reasoning

OPENTHINKctx: 128K

30B / 3B active paramsReleased 2025

Alibaba's Tongyi DeepResearch 30B A3B is a research-oriented reasoning model from the Tongyi (通义) family, trained specifically for in-depth analytical and research tasks. As a 30B MoE model with only 3 billion active parameters, it provides strong reasoning output at…

DeepSeek R1T2 Chimera

TNG Tech

Featured

Reasoning

OPENTHINKctx: 130K

671B / 37B active paramsReleased Jul 2025

DeepSeek-TNG R1T2 Chimera is TNG Technology Consulting's second-generation Assembly-of-Experts model, merging three DeepSeek parents: R1-0528, R1, and V3-0324 at the weight tensor level — no fine-tuning required. The tri-parent design achieves a new sweet spot: approximately 20% faster than standard…

DeepSeek R1T Chimera

TNG Tech

Reasoning

OPENTHINKctx: 60K

671B / 37B active paramsReleased Apr 2025

The original DeepSeek R1T Chimera (April 2025) was TNG's first successful Assembly-of-Experts model merge at 671B scale — the first publicly demonstrated merge of models at this size. By combining DeepSeek V3-0324's shared experts with R1's routed expert tensors, it…

Trinity Mini

Arcee AI

General

OPENctx: 128K

Released 2025

Arcee AI's Trinity Mini is a compact general-purpose model from Arcee's Trinity model family, which specialises in efficient AI for enterprise applications. Arcee AI is known for its model merging and specialisation techniques — the Trinity series uses a mixture-of-models…

Nemotron Nano 9B V2

NVIDIA

General

OPENctx: 128K

9B paramsReleased 2025

NVIDIA's Nemotron Nano 9B V2 is a compact, highly optimised language model using NVIDIA's hybrid Transformer-Mamba architecture. This design delivers higher throughput and lower latency compared to standard attention-only transformers while maintaining competitive reasoning quality. Nemotron Nano 9B V2 achieves…

GLM 4.5 Air

Z.AI

General

OPENTHINKctx: 128K

106B / 12B active paramsReleased Jul 2025

Z.AI's GLM-4.5-Air is the lightweight variant of the flagship GLM-4.5 family from Zhipu AI — an agent-native model that unifies reasoning, coding, and tool use in a single architecture. With 106 billion total parameters and only 12 billion active (MoE),…

Gemma 3n 2B

Google

General

OPENctx: 32K

2B effective paramsReleased Jun 2025

Google's Gemma 3n E2B is the smallest model in the Gemma 3n (nano) family, designed specifically for mobile, IoT, and on-device AI deployment. Using Google's revolutionary MatFormer (Matryoshka Transformer) architecture, Gemma 3n E2B has a total parameter count of ~5B…

Gemma 3n 4B

Google

General

OPENctx: 32K

4B effective paramsReleased Jun 2025

Google's Gemma 3n E4B is the larger model in the Gemma 3n family, targeting high-end mobile devices, laptops, and edge servers. With an effective ~4B memory footprint despite containing more total parameters (MatFormer architecture), it handles text, images, and audio…

Qwen3 4B

Qwen

General

OPENctx: 128K

4B paramsReleased Apr 2025

Alibaba's Qwen3 4B is the compact member of the Qwen3 family, offering both thinking (chain-of-thought) and non-thinking modes in a tiny 4B parameter footprint. Despite its small size, Qwen3 4B is one of the most capable models in its class…

Qwen3 235B A22B

Qwen

Featured

General

OPENTHINKctx: 128K

235B / 22B active paramsReleased Apr 2025

Qwen3-235B-A22B is Alibaba's flagship open-source model — a massive 235B MoE model with 22B active parameters per forward pass. It ranks among the best open-weight models globally, achieving top performance on AIME 2025, LiveCodeBench, and multilingual benchmarks. In non-thinking mode…

Llama 3.3 70B Instruct

Reasoning Models

Models like DeepSeek R1, OLMo 3.1 Think, and TNG R1T2 Chimera use extended chain-of-thought (CoT) reasoning — they "think before answering," showing intermediate steps. Best for maths, logic, and complex problem-solving. Expect slower, longer responses but significantly higher accuracy on hard tasks.

Multimodal Models

Vision-language models (VLMs) like Gemma 3, NVIDIA Nemotron VL, and Qwen2.5-VL can understand images, charts, documents, and (in some cases) video alongside text. Use these when you need to analyse screenshots, diagrams, receipts, or any visual content.

Coding Models

Specialised models like Devstral 2, Qwen3 Coder 480B, and KAT-Coder-Pro are fine-tuned on large programming datasets and evaluated on SWE-bench (real GitHub issue resolution). They excel at agentic coding tasks, multi-file edits, bug detection, and software engineering agent workflows.

Frequently Asked Questions

What is the difference between a reasoning model and a standard LLM?

Reasoning models (marked with a 'THINK' badge on LumiChats) use chain-of-thought (CoT) techniques — they generate internal reasoning traces before producing a final answer. This makes them much more accurate on maths, logic, and multi-step problems, but slower and more token-intensive than standard models.

Which model should I use for coding tasks?

For agentic coding (autonomous bug fixing, multi-file refactoring): Devstral 2 (72.2% SWE-bench) or Qwen3 Coder 480B. For code completion in a chat interface: Llama 3.3 70B, DeepSeek R1 0528, or GLM 4.5 Air. For absolute top quality: Claude Sonnet 4.6 (Premium) or GPT-5.3-Codex (Premium).

Which models can analyse images and documents?

Models tagged 'Multimodal' accept image inputs. Top picks: Gemma 3 27B (best open-source VLM), NVIDIA Nemotron Nano 12B 2 VL (best for OCR/documents), Gemini 2.0 Flash Experimental (free, 1M context), and Qwen2.5-VL 7B (best for Chinese-language docs).

What are free vs premium models on LumiChats?

Free models (marked GREEN) are accessible to all users with no subscription. Premium models (marked with a Crown icon) — GPT-5.2, Claude Sonnet 4.5/4.6, Claude Haiku 4.5, GPT-5.3-Codex, Gemini 2.5, Gemini 3 Flash Preview, Grok 4.1 Fast — require a LumiChats Premium subscription.

What is a Mixture of Experts (MoE) model?

A MoE model has many 'expert' sub-networks but only activates a small subset per token during inference. For example, Qwen3 235B A22B has 235B total parameters but only 22B active — making it cheaper to run than a full 235B dense model while retaining its full representational capacity. Models like DeepSeek R1, Kimi K2, and GLM 4.5 all use MoE.

Which models are best for creative writing?

For creative fiction and storytelling: Kimi K2 0711 (praised for creative quality), Hermes 3 405B (strong roleplaying, uncensored), Venice Uncensored (adult creative content), and Claude Sonnet 4.6 (nuanced, high-quality writing). Gemma 3 27B ranked #2 on the EQ-Bench creative writing leaderboard among open models.