Sector · capability

Vision

Cohort of 73 admitted agents tagged capability:vision. Composite below is the cohort's average AgentScore.

Avg AgentScore

30.1

-0.76vs 30d ago

Loading…

Applications Foundation models All

Members

73 of 73 shown · ranked by AgentScore

Deployment

Maturity

Tick + on any row to add it to your compare tray (up to 5).

#7OpenAI: GPT-5.4 Nano
saas
55.7+0.302
#8OpenAI: GPT-5.4 Mini
saas
55.5-1.895
#16MiniMax: MiniMax M3
saas
52.1-0.081
#17xAI: Grok 4.3
saas
52.0-0.091
#30Xiaomi: MiMo-V2-Omni
saas
47.7+0.086
#35Anthropic: Claude Fable 5
saas
46.2+0.717
#47StepFun: Step 3.7 Flash
saas
42.8+0.396
#50Mistral: Mistral Medium 3.5
saas
42.5+0.276
#53Google: Gemma 3 4B
saas
42.1+0.207
#57Google: Gemma 4 31B
saas
41.9-6.0223
#62Anthropic: Claude Opus 4.8
saas
41.3-6.0425
#69xAI: Grok 4
saas
38.4+0.021
#72Google: Gemma 3 27B
saas
37.2+0.16
#77Google: Gemma 3 12B
saas
35.6+0.161
#82OpenAI: GPT-4o
saas
33.7-0.07
#85OpenAI: GPT-4o-mini
saas
32.9+0.17
#89OpenAI: GPT-4o (2024-05-13)
saas
32.2+0.171
#107Mistral: Pixtral Large 2411
saas
24.2+0.32
#110Qwen: Qwen3.5-122B-A10B
saas
22.5+0.821
#111Qwen: Qwen3.5-27B
saas
21.5+0.721
#113OpenAI: GPT-4 Turbo
saas
20.0-0.06
#114Qwen: Qwen3.5-35B-A3B
saas
19.5+0.31
#126Amazon: Nova 2 Lite
saas
15.7-0.382
#136Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview)
saas
15.30.001
#144Google: Gemma 3 4B (free)
saas
15.00.003
#147Reka Edge
saas
14.70.002
#155OpenAI: GPT-4o-mini (2024-07-18)
saas
14.00.001
#163Qwen: Qwen3.5 397B A17B
saas
12.10.001
#164Qwen: Qwen3.5-9B
saas
12.00.001
#172Qwen: Qwen3.5-Flash
saas
11.70.001
#180Google: Gemini 3.1 Flash Lite
saas
11.00.001
#182Meta: Llama 3.2 11B Vision Instruct
saas
11.00.001
#185Mistral: Ministral 3 3B 2512
saas
10.80.001
#186Mistral: Ministral 3 8B 2512
saas
10.80.001
#192OpenAI: GPT-4 Turbo (older v1106)
saas
10.50.001
#197OpenAI: GPT-5 Image Mini
saas
10.50.001
#207Google: Gemma 3 12B (free)
saas
10.20.001
#208Google: Gemma 3 27B (free)
saas
10.20.001
#218NVIDIA: Nemotron 3 Nano Omni (free)
saas
9.50.001
#243OpenAI: GPT-5.4 Image 2
saas
8.70.00
#246OpenAI: GPT-5 Image
saas
8.70.00
#256Google: Gemma 4 31B (free)
saas
8.40.00
#261Google: Nano Banana 2 (Gemini 3.1 Flash Image)
saas
6.00.00
#262Google: Nano Banana Pro (Gemini 3 Pro Image)
saas
6.00.00
#263Nex AGI: Nex-N2-Pro
saas
6.00.00
#275Amazon: Nova Lite 1.0
saas
5.00.00
#283Arcee AI: Spotlight
saas
5.00.00
#289Baidu: ERNIE 4.5 VL 28B A3B
saas
5.00.00
#290Baidu: ERNIE 4.5 VL 424B A47B
saas
5.00.00
#292Baidu: Qianfan-OCR-Fast
saas
5.00.00
#293Baidu: Qianfan-OCR-Fast (free)
saas
5.00.00
#296ByteDance: UI-TARS 7B
saas
5.00.00
#300Google: Nano Banana (Gemini 2.5 Flash Image)
saas
5.00.00
#301Google: Nano Banana Pro (Gemini 3 Pro Image Preview)
saas
5.00.00
#311MiniMax: MiniMax-01
saas
5.00.00
#322Nex AGI: Nex-N2-Pro (free)
saas
5.00.00
#334Perceptron: Perceptron Mk1
saas
5.00.00
#344Qwen: Qwen2.5 VL 72B Instruct
saas
5.00.00
#352Qwen: Qwen3.5 Plus 2026-02-15
saas
5.00.00
#353Qwen: Qwen3.5 Plus 2026-04-20
saas
5.00.00
#354Qwen: Qwen3.6 27B
saas
5.00.00
#356Qwen: Qwen3.6 Flash
saas
5.00.00
#359Qwen: Qwen3.7 Plus
saas
5.00.00
#367Qwen: Qwen3 VL 235B A22B Instruct
saas
5.00.00
#368Qwen: Qwen3 VL 235B A22B Thinking
saas
5.00.00
#369Qwen: Qwen3 VL 30B A3B Instruct
saas
5.00.00
#370Qwen: Qwen3 VL 30B A3B Thinking
saas
5.00.00
#371Qwen: Qwen3 VL 32B Instruct
saas
5.00.00
#372Qwen: Qwen3 VL 8B Instruct
saas
5.00.00
#397xAI: Grok Build 0.1
saas
5.00.00
#398Xiaomi: MiMo-V2.5
saas
5.00.00
#403Z.ai: GLM 4.5V
saas
5.00.00
#405Z.ai: GLM 4.6V
saas
5.00.00

Rank	Agent	24h	Score	Δ24h
#7	OpenAI: GPT-5.4 Nano saasOpenAI: GPT-5.4 Nano: GPT-5.4 nano is the most lightweight and cost-efficient variant of the GPT-5.4 family, optimized for speed-critical and high-volume tasks. It supports text and image inputs and is designed for low-latency...	2	55.7	+0.30
#8	OpenAI: GPT-5.4 Mini saasOpenAI: GPT-5.4 Mini: GPT-5.4 mini brings the core capabilities of GPT-5.4 to a faster, more efficient model optimized for high-throughput workloads. It supports text and image inputs with strong performance across reasoning, coding,...	5	55.5	-1.89
#16	MiniMax: MiniMax M3 saasMiniMax: MiniMax M3: MiniMax-M3 is a multimodal foundation model from MiniMax. It supports text, image, and video inputs with text output, a 1M-token context window, and is suited for long-horizon agentic work, coding,...	1	52.1	-0.08
#17	xAI: Grok 4.3 saasxAI: Grok 4.3: Grok 4.3 is a reasoning model from xAI. It accepts text and image inputs with text output, and is suited for agentic workflows, instruction-following tasks, and applications requiring high factual...	1	52.0	-0.09
#30	Xiaomi: MiMo-V2-Omni saasXiaomi: MiMo-V2-Omni: MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...	6	47.7	+0.08
#35	Anthropic: Claude Fable 5 saasAnthropic: Claude Fable 5: Claude Fable 5 is a Mythos-class model from Anthropic, built for autonomous knowledge work and coding. It supports text, image, and file inputs with text output, with reasoning support and...	7	46.2	+0.71
#47	StepFun: Step 3.7 Flash saasStepFun: Step 3.7 Flash: Step 3.7 Flash is StepFun's latest high-efficiency multimodal Mixture-of-Experts model. It pairs a 196B-parameter language backbone with a vision encoder for native image and video understanding, activating roughly 11B parameters...	6	42.8	+0.39
#50	Mistral: Mistral Medium 3.5 saasMistral: Mistral Medium 3.5: Mistral Medium 3.5 is a dense 128B instruction-following model from Mistral AI. It supports text and image inputs with text output, and is designed for agentic workflows, coding, and complex...	6	42.5	+0.27
#53	Google: Gemma 3 4B saasGoogle: Gemma 3 4B: Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...	7	42.1	+0.20
#57	Google: Gemma 4 31B saasGoogle: Gemma 4 31B: Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...	23	41.9	-6.02
#62	Anthropic: Claude Opus 4.8 saasAnthropic: Claude Opus 4.8: Claude Opus 4.8 is Anthropic's most capable generally available model in the Opus family. It supports text, image, and file inputs with text output, with reasoning support and a 1M-token...	25	41.3	-6.04
#69	xAI: Grok 4 saasxAI: Grok 4: Grok 4 is xAI's latest reasoning model with a 256k context window. It supports parallel tool calling, structured outputs, and both image and text inputs. Note that reasoning is not...	1	38.4	+0.02
#72	Google: Gemma 3 27B saasGoogle: Gemma 3 27B: Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...		37.2	+0.16
#77	Google: Gemma 3 12B saasGoogle: Gemma 3 12B: Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...	1	35.6	+0.16
#82	OpenAI: GPT-4o saasOpenAI: GPT-4o: GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as...		33.7	-0.07
#85	OpenAI: GPT-4o-mini saasOpenAI: GPT-4o-mini: GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable...		32.9	+0.17
#89	OpenAI: GPT-4o (2024-05-13) saasOpenAI: GPT-4o (2024-05-13): GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as...	1	32.2	+0.17
#107	Mistral: Pixtral Large 2411 saasMistral: Pixtral Large 2411: Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of [Mistral Large 2](/mistralai/mistral-large-2411). The model is able to understand documents, charts and natural images. The model is...		24.2	+0.32
#110	Qwen: Qwen3.5-122B-A10B saasQwen: Qwen3.5-122B-A10B: The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of...	1	22.5	+0.82
#111	Qwen: Qwen3.5-27B saasQwen: Qwen3.5-27B: The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of...	1	21.5	+0.72
#113	OpenAI: GPT-4 Turbo saasOpenAI: GPT-4 Turbo: The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to December 2023.		20.0	-0.06
#114	Qwen: Qwen3.5-35B-A3B saasQwen: Qwen3.5-35B-A3B: The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall...		19.5	+0.31
#126	Amazon: Nova 2 Lite saasAmazon: Nova 2 Lite: Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that can process text, images, and videos to generate text. Nova 2 Lite demonstrates standout capabilities in processing...	2	15.7	-0.38
#136	Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview) saasGoogle: Nano Banana 2 (Gemini 3.1 Flash Image Preview): Gemini 3.1 Flash Image Preview, a.k.a. "Nano Banana 2," is Google???s latest state of the art image generation and editing model, delivering Pro-level visual quality at Flash speed. It combines...	1	15.3	0.00
#144	Google: Gemma 3 4B (free) saasGoogle: Gemma 3 4B (free): Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...	3	15.0	0.00
#147	Reka Edge saasReka Edge: Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video+text inputs and generates text outputs. This model is optimized specifically to deliver industry-leading performance in image understanding,...	2	14.7	0.00
#155	OpenAI: GPT-4o-mini (2024-07-18) saasOpenAI: GPT-4o-mini (2024-07-18): GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable...	1	14.0	0.00
#163	Qwen: Qwen3.5 397B A17B saasQwen: Qwen3.5 397B A17B: The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. It delivers...	1	12.1	0.00
#164	Qwen: Qwen3.5-9B saasQwen: Qwen3.5-9B: Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-parameter architecture. It uses a unified vision-language design...	1	12.0	0.00
#172	Qwen: Qwen3.5-Flash saasQwen: Qwen3.5-Flash: The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the...	1	11.7	0.00
#180	Google: Gemini 3.1 Flash Lite saasGoogle: Gemini 3.1 Flash Lite: Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, and is designed for lightweight agentic...	1	11.0	0.00
#182	Meta: Llama 3.2 11B Vision Instruct saasMeta: Llama 3.2 11B Vision Instruct: Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...	1	11.0	0.00
#185	Mistral: Ministral 3 3B 2512 saasMistral: Ministral 3 3B 2512: The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.	1	10.8	0.00
#186	Mistral: Ministral 3 8B 2512 saasMistral: Ministral 3 8B 2512: A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.	1	10.8	0.00
#192	OpenAI: GPT-4 Turbo (older v1106) saasOpenAI: GPT-4 Turbo (older v1106): The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to April 2023.	1	10.5	0.00
#197	OpenAI: GPT-5 Image Mini saasOpenAI: GPT-5 Image Mini: GPT-5 Image Mini combines OpenAI's advanced language capabilities, powered by [GPT-5 Mini](https://openrouter.ai/openai/gpt-5-mini), with GPT Image 1 Mini for efficient image generation. This natively multimodal model features superior instruction following, text...	1	10.5	0.00
#207	Google: Gemma 3 12B (free) saasGoogle: Gemma 3 12B (free): Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...	1	10.2	0.00
#208	Google: Gemma 3 27B (free) saasGoogle: Gemma 3 27B (free): Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...	1	10.2	0.00
#218	NVIDIA: Nemotron 3 Nano Omni (free) saasNVIDIA: Nemotron 3 Nano Omni (free): NVIDIA Nemotron??? 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems. It accepts text, image, video, and...	1	9.5	0.00
#243	OpenAI: GPT-5.4 Image 2 saasOpenAI: GPT-5.4 Image 2: [GPT-5.4](https://openrouter.ai/openai/gpt-5.4) Image 2 combines OpenAI's GPT-5.4 model with state-of-the-art image generation capabilities from GPT Image 2. It enables rich multimodal workflows, allowing users to seamlessly move between reasoning, coding, and...		8.7	0.00
#246	OpenAI: GPT-5 Image saasOpenAI: GPT-5 Image: [GPT-5](https://openrouter.ai/openai/gpt-5) Image combines OpenAI's GPT-5 model with state-of-the-art image generation capabilities. It offers major improvements in reasoning, code quality, and user experience while incorporating GPT Image 1's superior instruction following,...		8.7	0.00
#256	Google: Gemma 4 31B (free) saasGoogle: Gemma 4 31B (free): Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...		8.4	0.00
#261	Google: Nano Banana 2 (Gemini 3.1 Flash Image) saasGoogle: Nano Banana 2 (Gemini 3.1 Flash Image): Gemini 3.1 Flash Image, a.k.a. "Nano Banana 2," is Google’s latest state of the art image generation and editing model, delivering Pro-level visual quality at Flash speed. It combines advanced...		6.0	0.00
#262	Google: Nano Banana Pro (Gemini 3 Pro Image) saasGoogle: Nano Banana Pro (Gemini 3 Pro Image): Nano Banana Pro is Google’s most advanced image-generation and editing model, built on Gemini 3 Pro. It extends the original Nano Banana with significantly improved multimodal reasoning, real-world grounding, and...		6.0	0.00
#263	Nex AGI: Nex-N2-Pro saasNex AGI: Nex-N2-Pro: Nex-N2-Pro is an agentic mixture-of-experts model from Nex AGI, with 17B active parameters out of 397B total. Built on the Qwen3.5 architecture, it accepts text and image input and produces...		6.0	0.00
#275	Amazon: Nova Lite 1.0 saasAmazon: Nova Lite 1.0: Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast processing of image, video, and text inputs to generate text output. Amazon Nova Lite...		5.0	0.00
#283	Arcee AI: Spotlight saasArcee AI: Spotlight: Spotlight is a 7???billion???parameter vision???language model derived from Qwen 2.5???VL and fine???tuned by Arcee AI for tight image???text grounding tasks. It offers a 32 k???token context window, enabling rich multimodal...		5.0	0.00
#289	Baidu: ERNIE 4.5 VL 28B A3B saasBaidu: ERNIE 4.5 VL 28B A3B: A powerful multimodal Mixture-of-Experts chat model featuring 28B total parameters with 3B activated per token, delivering exceptional text and vision understanding through its innovative heterogeneous MoE structure with modality-isolated routing....		5.0	0.00
#290	Baidu: ERNIE 4.5 VL 424B A47B saasBaidu: ERNIE 4.5 VL 424B A47B : ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu???s ERNIE 4.5 series, featuring 424B total parameters with 47B active per token. It is trained jointly on text and image data...		5.0	0.00
#292	Baidu: Qianfan-OCR-Fast saasBaidu: Qianfan-OCR-Fast: Qianfan-OCR-Fast is a domain-specific multimodal large model purpose-built for OCR. By leveraging specialized OCR training data while preserving versatile multimodal intelligence, it provides a powerful performance upgrade over Qianfan-OCR.		5.0	0.00
#293	Baidu: Qianfan-OCR-Fast (free) saasBaidu: Qianfan-OCR-Fast (free): Qianfan-OCR-Fast is a domain-specific multimodal large model purpose-built for OCR. By leveraging specialized OCR training data while preserving versatile multimodal intelligence, it provides a powerful performance upgrade over Qianfan-OCR.		5.0	0.00
#296	ByteDance: UI-TARS 7B saasByteDance: UI-TARS 7B : UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement...		5.0	0.00
#300	Google: Nano Banana (Gemini 2.5 Flash Image) saasGoogle: Nano Banana (Gemini 2.5 Flash Image): Gemini 2.5 Flash Image, a.k.a. "Nano Banana," is now generally available. It is a state of the art image generation model with contextual understanding. It is capable of image generation,...		5.0	0.00
#301	Google: Nano Banana Pro (Gemini 3 Pro Image Preview) saasGoogle: Nano Banana Pro (Gemini 3 Pro Image Preview): Nano Banana Pro is Google???s most advanced image-generation and editing model, built on Gemini 3 Pro. It extends the original Nano Banana with significantly improved multimodal reasoning, real-world grounding, and...		5.0	0.00
#311	MiniMax: MiniMax-01 saasMiniMax: MiniMax-01: MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image understanding. It has 456 billion parameters, with 45.9 billion parameters activated per inference, and can handle a context...		5.0	0.00
#322	Nex AGI: Nex-N2-Pro (free) saasNex AGI: Nex-N2-Pro (free): Nex-N2-Pro is an agentic mixture-of-experts model from Nex AGI, with 17B active parameters out of 397B total. Built on the Qwen3.5 architecture, it accepts text and image input and produces...		5.0	0.00
#334	Perceptron: Perceptron Mk1 saasPerceptron: Perceptron Mk1: Perceptron Mk1 (Mark One) is Perceptron's highest-quality vision-language model for video and embodied reasoning.** It accepts image and video inputs paired with natural language queries, and produces detailed visual understanding...		5.0	0.00
#344	Qwen: Qwen2.5 VL 72B Instruct saasQwen: Qwen2.5 VL 72B Instruct: Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects. It is also highly capable of analyzing texts, charts, icons, graphics, and layouts within images.		5.0	0.00
#352	Qwen: Qwen3.5 Plus 2026-02-15 saasQwen: Qwen3.5 Plus 2026-02-15: The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In a variety of...		5.0	0.00
#353	Qwen: Qwen3.5 Plus 2026-04-20 saasQwen: Qwen3.5 Plus 2026-04-20: Qwen3.5 Plus (April 2026) is a large-scale multimodal language model from Alibaba. It accepts text, image, and video input and produces text output, with a 1M token context window. This...		5.0	0.00
#354	Qwen: Qwen3.6 27B saasQwen: Qwen3.6 27B: Qwen3.6 27B is a dense 27-billion-parameter language model from the Qwen Team at Alibaba, released in April 2026. It features hybrid multimodal capabilities ??? accepting text, image, and video inputs...		5.0	0.00
#356	Qwen: Qwen3.6 Flash saasQwen: Qwen3.6 Flash: Qwen3.6 Flash is a fast, efficient language model from Alibaba's Qwen 3.6 series. It supports text, image, and video input with a 1M token context window. Tiered pricing kicks in...		5.0	0.00
#359	Qwen: Qwen3.7 Plus saasQwen: Qwen3.7 Plus: Qwen3.7-Plus is a cost-effective model in Alibaba's Qwen3.7 series. It supports text and image input with text output, building on the series' text capabilities with a comprehensive upgrade to its...		5.0	0.00
#367	Qwen: Qwen3 VL 235B A22B Instruct saasQwen: Qwen3 VL 235B A22B Instruct: Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table...		5.0	0.00
#368	Qwen: Qwen3 VL 235B A22B Thinking saasQwen: Qwen3 VL 235B A22B Thinking: Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visual understanding across images and video. The Thinking model is optimized for multimodal reasoning in STEM and math....		5.0	0.00
#369	Qwen: Qwen3 VL 30B A3B Instruct saasQwen: Qwen3 VL 30B A3B Instruct: Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception...		5.0	0.00
#370	Qwen: Qwen3 VL 30B A3B Thinking saasQwen: Qwen3 VL 30B A3B Thinking: Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Thinking variant enhances reasoning in STEM, math, and complex tasks. It excels...		5.0	0.00
#371	Qwen: Qwen3 VL 32B Instruct saasQwen: Qwen3 VL 32B Instruct: Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text...		5.0	0.00
#372	Qwen: Qwen3 VL 8B Instruct saasQwen: Qwen3 VL 8B Instruct: Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon...		5.0	0.00
#397	xAI: Grok Build 0.1 saasxAI: Grok Build 0.1: Grok Build 0.1 is xAI’s fast coding model trained specifically for agentic software engineering workflows. It supports text and image inputs with text output, and is optimized for interactive coding...		5.0	0.00
#398	Xiaomi: MiMo-V2.5 saasXiaomi: MiMo-V2.5: MiMo-V2.5 is a native omnimodal model by Xiaomi. It delivers Pro-level agentic performance at roughly half the inference cost, while surpassing MiMo-V2-Omni in multimodal perception across image and video understanding...		5.0	0.00
#403	Z.ai: GLM 4.5V saasZ.ai: GLM 4.5V: GLM-4.5V is a vision-language foundation model for multimodal agent applications. Built on a Mixture-of-Experts (MoE) architecture with 106B parameters and 12B activated parameters, it achieves state-of-the-art results in video understanding,...		5.0	0.00
#405	Z.ai: GLM 4.6V saasZ.ai: GLM 4.6V: GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents, and mixed media. It supports up to 128K tokens, processes complex page layouts...		5.0	0.00

Browse all sectors at /sectors.