Sector · capability

Vision

Cohort of 68 admitted agents tagged capability:vision. Composite below is the cohort's average AgentScore.

Avg AgentScore

31.8

+6.54vs 30d ago

Loading…

Members

68 of 68 shown · ranked by AgentScore

License

Deployment

Maturity

Tick + on any row to add it to your compare tray (up to 5).

#12xAI: Grok 4.3
saas
59.7+37.88337
#5MinerU
58.8+12.4823
#34OpenAI: GPT-4o-mini (2024-07-18)
saas
58.5+36.68183
#32OpenAI: GPT-4o (2024-05-13)
saas
58.5+36.68180
#43Google: Gemma 4 31B (free)
saas
58.1+36.3171
#7ppt-master
mit · ide-plugin
56.7
#23model:Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash-GGUF
53.8+18.7953
#41awesome-generative-ai-guide
mit · ide-plugin
49.6
#67Qwen: Qwen3 VL 235B A22B Instruct
saas
49.4+27.59245
#48Amazon: Nova 2 Lite
saas
49.4+27.5913
#49Google: Nano Banana (Gemini 2.5 Flash Image)
saas
49.4+27.5969
#75Z.ai: GLM 4.6V
saas
49.4+27.59288
#74Z.ai: GLM 4.5V
saas
49.4+27.59287
#50Google: Nano Banana Pro (Gemini 3 Pro Image Preview)
saas
49.4+27.5969
#63Qwen: Qwen3.5-35B-A3B
saas
49.4+27.59227
#68Qwen: Qwen3 VL 235B A22B Thinking
saas
49.4+27.59245
#80Google: Gemma 3 4B
saas
46.8+24.9928
#87Google: Gemma 3 4B (free)
saas
44.6+22.7922
#88Google: Gemma 4 31B
saas
44.5+4.2786
#108awesome-generative-ai
39.0-1.8959
#112NVIDIA: Nemotron 3 Nano Omni (free)
saas
38.7+16.8981
#115model:HauhauCS/Qwen3.6-27B-Uncensored-HauhauCS-Balanced
38.6
#118model:Jackrong/Qwopus3.6-35B-A3B-v1-GGUF
38.2
#132Google: Gemini 3.1 Flash Lite
saas
37.1
#135Meta: Llama 3.2 11B Vision Instruct
saas
37.0+15.21
#143Mistral: Ministral 3 8B 2512
saas
36.5+14.6517
#153Mistral: Pixtral Large 2411
saas
36.5+14.6519
#142Mistral: Ministral 3 3B 2512
saas
36.5+14.6517
#145Mistral: Mistral Medium 3.5
saas
36.5
#180OpenAI: GPT-5.4 Image 2
saas
35.9+14.0452
#170OpenAI: GPT-4o-mini
saas
35.9+14.0446
#165OpenAI: GPT-4 Turbo
saas
35.9+14.0443
#182OpenAI: GPT-5.4 Nano
saas
35.9+14.0452
#185OpenAI: GPT-5 Image
saas
35.9+14.0454
#167OpenAI: GPT-4o
saas
35.9+14.0444
#181OpenAI: GPT-5.4 Mini
saas
35.9+14.0452
#203Google: Gemma 3 12B
saas
35.3+13.4799
#204Google: Gemma 3 12B (free)
saas
35.3+13.4799
#206Google: Gemma 3 27B (free)
saas
35.3+13.4799
#205Google: Gemma 3 27B
saas
35.3+13.4799
#222xAI: Grok 4
saas
32.4+10.60123
#232Qwen: Qwen3.5-Flash
saas
31.8+9.9561
#231Qwen: Qwen3.5 397B A17B
saas
31.8+9.9560
#229Qwen: Qwen3.5-122B-A10B
saas
31.8+9.9559
#230Qwen: Qwen3.5-27B
saas
31.8+9.9559
#239Reka Edge
saas
31.2+9.3880
#240OpenAI: GPT-4 Turbo (older v1106)
saas
31.2+9.3631
#241OpenAI: GPT-5 Image Mini
saas
31.2+9.361
#323Qwen: Qwen3.5-9B
saas
21.80.0031
#325Qwen: Qwen3.5 Plus 2026-04-20
saas
21.80.0030
#326Qwen: Qwen3.6 27B
saas
21.80.0030
#277ByteDance: UI-TARS 7B
saas
21.80.00203
#359Xiaomi: MiMo-V2.5
saas
21.80.007
#335Qwen: Qwen3 VL 30B A3B Thinking
saas
21.80.0020
#271Baidu: ERNIE 4.5 VL 424B A47B
saas
21.80.00204
#336Qwen: Qwen3 VL 32B Instruct
saas
21.80.0020
#337Qwen: Qwen3 VL 8B Instruct
saas
21.80.0020
#270Baidu: ERNIE 4.5 VL 28B A3B
saas
21.80.00204
#319Qwen: Qwen2.5 VL 72B Instruct
saas
21.80.0040
#334Qwen: Qwen3 VL 30B A3B Instruct
saas
21.80.0020
#260Amazon: Nova Lite 1.0
saas
21.80.00224
#292MiniMax: MiniMax-01
saas
21.80.00148
#360Xiaomi: MiMo-V2-Omni
saas
21.80.005
#324Qwen: Qwen3.5 Plus 2026-02-15
saas
21.80.0030
#273Baidu: Qianfan-OCR-Fast (free)
saas
21.80.00205
#265Arcee AI: Spotlight
saas
21.80.00207
#327Qwen: Qwen3.6 Flash
saas
21.80.0030
#369Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview)
saas
14.1-7.70252

Rank	Agent	24h	Score	Δ24h
#12	xAI: Grok 4.3 saasxAI: Grok 4.3: Grok 4.3 is a reasoning model from xAI. It accepts text and image inputs with text output, and is suited for agentic workflows, instruction-following tasks, and applications requiring high factual...	337	59.7	+37.88
#5	MinerU MinerU: Transforms complex documents like PDFs and Office docs into LLM-ready markdown/JSON for your Agentic workflows.	23	58.8	+12.48
#34	OpenAI: GPT-4o-mini (2024-07-18) saasOpenAI: GPT-4o-mini (2024-07-18): GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable...	183	58.5	+36.68
#32	OpenAI: GPT-4o (2024-05-13) saasOpenAI: GPT-4o (2024-05-13): GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as...	180	58.5	+36.68
#43	Google: Gemma 4 31B (free) saasGoogle: Gemma 4 31B (free): Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...	71	58.1	+36.31
#7	ppt-master mitide-pluginppt-master: AI generates natively editable PPTX from any document ??? real PowerPoint shapes with native animations, not images ?? by Hugo He.	NEW	56.7	—
#23	model:Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash-GGUF model:Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash-GGUF: An AI agent project.	53	53.8	+18.79
#41	awesome-generative-ai-guide mitide-pluginawesome-generative-ai-guide: A one stop repository for generative AI research updates, interview resources, notebooks and much more!.	NEW	49.6	—
#67	Qwen: Qwen3 VL 235B A22B Instruct saasQwen: Qwen3 VL 235B A22B Instruct: Qwen3-VL-235B-A22B Instruct is an open-weight multimodal model that unifies strong text generation with visual understanding across images and video. The Instruct model targets general vision-language use (VQA, document parsing, chart/table...	245	49.4	+27.59
#48	Amazon: Nova 2 Lite saasAmazon: Nova 2 Lite: Nova 2 Lite is a fast, cost-effective reasoning model for everyday workloads that can process text, images, and videos to generate text. Nova 2 Lite demonstrates standout capabilities in processing...	13	49.4	+27.59
#49	Google: Nano Banana (Gemini 2.5 Flash Image) saasGoogle: Nano Banana (Gemini 2.5 Flash Image): Gemini 2.5 Flash Image, a.k.a. "Nano Banana," is now generally available. It is a state of the art image generation model with contextual understanding. It is capable of image generation,...	69	49.4	+27.59
#75	Z.ai: GLM 4.6V saasZ.ai: GLM 4.6V: GLM-4.6V is a large multimodal model designed for high-fidelity visual understanding and long-context reasoning across images, documents, and mixed media. It supports up to 128K tokens, processes complex page layouts...	288	49.4	+27.59
#74	Z.ai: GLM 4.5V saasZ.ai: GLM 4.5V: GLM-4.5V is a vision-language foundation model for multimodal agent applications. Built on a Mixture-of-Experts (MoE) architecture with 106B parameters and 12B activated parameters, it achieves state-of-the-art results in video understanding,...	287	49.4	+27.59
#50	Google: Nano Banana Pro (Gemini 3 Pro Image Preview) saasGoogle: Nano Banana Pro (Gemini 3 Pro Image Preview): Nano Banana Pro is Google???s most advanced image-generation and editing model, built on Gemini 3 Pro. It extends the original Nano Banana with significantly improved multimodal reasoning, real-world grounding, and...	69	49.4	+27.59
#63	Qwen: Qwen3.5-35B-A3B saasQwen: Qwen3.5-35B-A3B: The Qwen3.5 Series 35B-A3B is a native vision-language model designed with a hybrid architecture that integrates linear attention mechanisms and a sparse mixture-of-experts model, achieving higher inference efficiency. Its overall...	227	49.4	+27.59
#68	Qwen: Qwen3 VL 235B A22B Thinking saasQwen: Qwen3 VL 235B A22B Thinking: Qwen3-VL-235B-A22B Thinking is a multimodal model that unifies strong text generation with visual understanding across images and video. The Thinking model is optimized for multimodal reasoning in STEM and math....	245	49.4	+27.59
#80	Google: Gemma 3 4B saasGoogle: Gemma 3 4B: Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...	28	46.8	+24.99
#87	Google: Gemma 3 4B (free) saasGoogle: Gemma 3 4B (free): Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...	22	44.6	+22.79
#88	Google: Gemma 4 31B saasGoogle: Gemma 4 31B: Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...	86	44.5	+4.27
#108	awesome-generative-ai awesome-generative-ai: A curated list of Generative AI tools, works, models, and references.	59	39.0	-1.89
#112	NVIDIA: Nemotron 3 Nano Omni (free) saasNVIDIA: Nemotron 3 Nano Omni (free): NVIDIA Nemotron??? 3 Nano Omni is a 30B-A3B open multimodal model designed to function as a perception and context sub-agent in enterprise agent systems. It accepts text, image, video, and...	81	38.7	+16.89
#115	model:HauhauCS/Qwen3.6-27B-Uncensored-HauhauCS-Balanced model:HauhauCS/Qwen3.6-27B-Uncensored-HauhauCS-Balanced: discovered AI agent.	NEW	38.6	—
#118	model:Jackrong/Qwopus3.6-35B-A3B-v1-GGUF model:Jackrong/Qwopus3.6-35B-A3B-v1-GGUF: discovered AI agent.	NEW	38.2	—
#132	Google: Gemini 3.1 Flash Lite saasGoogle: Gemini 3.1 Flash Lite: Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, and is designed for lightweight agentic...	NEW	37.1	—
#135	Meta: Llama 3.2 11B Vision Instruct saasMeta: Llama 3.2 11B Vision Instruct: Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...		37.0	+15.21
#143	Mistral: Ministral 3 8B 2512 saasMistral: Ministral 3 8B 2512: A balanced model in the Ministral 3 family, Ministral 3 8B is a powerful, efficient tiny language model with vision capabilities.	17	36.5	+14.65
#153	Mistral: Pixtral Large 2411 saasMistral: Pixtral Large 2411: Pixtral Large is a 124B parameter, open-weight, multimodal model built on top of [Mistral Large 2](/mistralai/mistral-large-2411). The model is able to understand documents, charts and natural images. The model is...	19	36.5	+14.65
#142	Mistral: Ministral 3 3B 2512 saasMistral: Ministral 3 3B 2512: The smallest model in the Ministral 3 family, Ministral 3 3B is a powerful, efficient tiny language model with vision capabilities.	17	36.5	+14.65
#145	Mistral: Mistral Medium 3.5 saasMistral: Mistral Medium 3.5: Mistral Medium 3.5 is a dense 128B instruction-following model from Mistral AI. It supports text and image inputs with text output, and is designed for agentic workflows, coding, and complex...	NEW	36.5	—
#180	OpenAI: GPT-5.4 Image 2 saasOpenAI: GPT-5.4 Image 2: [GPT-5.4](https://openrouter.ai/openai/gpt-5.4) Image 2 combines OpenAI's GPT-5.4 model with state-of-the-art image generation capabilities from GPT Image 2. It enables rich multimodal workflows, allowing users to seamlessly move between reasoning, coding, and...	52	35.9	+14.04
#170	OpenAI: GPT-4o-mini saasOpenAI: GPT-4o-mini: GPT-4o mini is OpenAI's newest model after [GPT-4 Omni](/models/openai/gpt-4o), supporting both text and image inputs with text outputs. As their most advanced small model, it is many multiples more affordable...	46	35.9	+14.04
#165	OpenAI: GPT-4 Turbo saasOpenAI: GPT-4 Turbo: The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to December 2023.	43	35.9	+14.04
#182	OpenAI: GPT-5.4 Nano saasOpenAI: GPT-5.4 Nano: GPT-5.4 nano is the most lightweight and cost-efficient variant of the GPT-5.4 family, optimized for speed-critical and high-volume tasks. It supports text and image inputs and is designed for low-latency...	52	35.9	+14.04
#185	OpenAI: GPT-5 Image saasOpenAI: GPT-5 Image: [GPT-5](https://openrouter.ai/openai/gpt-5) Image combines OpenAI's GPT-5 model with state-of-the-art image generation capabilities. It offers major improvements in reasoning, code quality, and user experience while incorporating GPT Image 1's superior instruction following,...	54	35.9	+14.04
#167	OpenAI: GPT-4o saasOpenAI: GPT-4o: GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as...	44	35.9	+14.04
#181	OpenAI: GPT-5.4 Mini saasOpenAI: GPT-5.4 Mini: GPT-5.4 mini brings the core capabilities of GPT-5.4 to a faster, more efficient model optimized for high-throughput workloads. It supports text and image inputs with strong performance across reasoning, coding,...	52	35.9	+14.04
#203	Google: Gemma 3 12B saasGoogle: Gemma 3 12B: Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...	99	35.3	+13.47
#204	Google: Gemma 3 12B (free) saasGoogle: Gemma 3 12B (free): Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...	99	35.3	+13.47
#206	Google: Gemma 3 27B (free) saasGoogle: Gemma 3 27B (free): Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...	99	35.3	+13.47
#205	Google: Gemma 3 27B saasGoogle: Gemma 3 27B: Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...	99	35.3	+13.47
#222	xAI: Grok 4 saasxAI: Grok 4: Grok 4 is xAI's latest reasoning model with a 256k context window. It supports parallel tool calling, structured outputs, and both image and text inputs. Note that reasoning is not...	123	32.4	+10.60
#232	Qwen: Qwen3.5-Flash saasQwen: Qwen3.5-Flash: The Qwen3.5 native vision-language Flash models are built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. Compared to the...	61	31.8	+9.95
#231	Qwen: Qwen3.5 397B A17B saasQwen: Qwen3.5 397B A17B: The Qwen3.5 series 397B-A17B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. It delivers...	60	31.8	+9.95
#229	Qwen: Qwen3.5-122B-A10B saasQwen: Qwen3.5-122B-A10B: The Qwen3.5 122B-A10B native vision-language model is built on a hybrid architecture that integrates a linear attention mechanism with a sparse mixture-of-experts model, achieving higher inference efficiency. In terms of...	59	31.8	+9.95
#230	Qwen: Qwen3.5-27B saasQwen: Qwen3.5-27B: The Qwen3.5 27B native vision-language Dense model incorporates a linear attention mechanism, delivering fast response times while balancing inference speed and performance. Its overall capabilities are comparable to those of...	59	31.8	+9.95
#239	Reka Edge saasReka Edge: Reka Edge is an extremely efficient 7B multimodal vision-language model that accepts image/video+text inputs and generates text outputs. This model is optimized specifically to deliver industry-leading performance in image understanding,...	80	31.2	+9.38
#240	OpenAI: GPT-4 Turbo (older v1106) saasOpenAI: GPT-4 Turbo (older v1106): The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Training data: up to April 2023.	31	31.2	+9.36
#241	OpenAI: GPT-5 Image Mini saasOpenAI: GPT-5 Image Mini: GPT-5 Image Mini combines OpenAI's advanced language capabilities, powered by [GPT-5 Mini](https://openrouter.ai/openai/gpt-5-mini), with GPT Image 1 Mini for efficient image generation. This natively multimodal model features superior instruction following, text...	1	31.2	+9.36
#323	Qwen: Qwen3.5-9B saasQwen: Qwen3.5-9B: Qwen3.5-9B is a multimodal foundation model from the Qwen3.5 family, designed to deliver strong reasoning, coding, and visual understanding in an efficient 9B-parameter architecture. It uses a unified vision-language design...	31	21.8	0.00
#325	Qwen: Qwen3.5 Plus 2026-04-20 saasQwen: Qwen3.5 Plus 2026-04-20: Qwen3.5 Plus (April 2026) is a large-scale multimodal language model from Alibaba. It accepts text, image, and video input and produces text output, with a 1M token context window. This...	30	21.8	0.00
#326	Qwen: Qwen3.6 27B saasQwen: Qwen3.6 27B: Qwen3.6 27B is a dense 27-billion-parameter language model from the Qwen Team at Alibaba, released in April 2026. It features hybrid multimodal capabilities ??? accepting text, image, and video inputs...	30	21.8	0.00
#277	ByteDance: UI-TARS 7B saasByteDance: UI-TARS 7B : UI-TARS-1.5 is a multimodal vision-language agent optimized for GUI-based environments, including desktop interfaces, web browsers, mobile systems, and games. Built by ByteDance, it builds upon the UI-TARS framework with reinforcement...	203	21.8	0.00
#359	Xiaomi: MiMo-V2.5 saasXiaomi: MiMo-V2.5: MiMo-V2.5 is a native omnimodal model by Xiaomi. It delivers Pro-level agentic performance at roughly half the inference cost, while surpassing MiMo-V2-Omni in multimodal perception across image and video understanding...	7	21.8	0.00
#335	Qwen: Qwen3 VL 30B A3B Thinking saasQwen: Qwen3 VL 30B A3B Thinking: Qwen3-VL-30B-A3B-Thinking is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Thinking variant enhances reasoning in STEM, math, and complex tasks. It excels...	20	21.8	0.00
#271	Baidu: ERNIE 4.5 VL 424B A47B saasBaidu: ERNIE 4.5 VL 424B A47B : ERNIE-4.5-VL-424B-A47B is a multimodal Mixture-of-Experts (MoE) model from Baidu???s ERNIE 4.5 series, featuring 424B total parameters with 47B active per token. It is trained jointly on text and image data...	204	21.8	0.00
#336	Qwen: Qwen3 VL 32B Instruct saasQwen: Qwen3 VL 32B Instruct: Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text...	20	21.8	0.00
#337	Qwen: Qwen3 VL 8B Instruct saasQwen: Qwen3 VL 8B Instruct: Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon...	20	21.8	0.00
#270	Baidu: ERNIE 4.5 VL 28B A3B saasBaidu: ERNIE 4.5 VL 28B A3B: A powerful multimodal Mixture-of-Experts chat model featuring 28B total parameters with 3B activated per token, delivering exceptional text and vision understanding through its innovative heterogeneous MoE structure with modality-isolated routing....	204	21.8	0.00
#319	Qwen: Qwen2.5 VL 72B Instruct saasQwen: Qwen2.5 VL 72B Instruct: Qwen2.5-VL is proficient in recognizing common objects such as flowers, birds, fish, and insects. It is also highly capable of analyzing texts, charts, icons, graphics, and layouts within images.	40	21.8	0.00
#334	Qwen: Qwen3 VL 30B A3B Instruct saasQwen: Qwen3 VL 30B A3B Instruct: Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception...	20	21.8	0.00
#260	Amazon: Nova Lite 1.0 saasAmazon: Nova Lite 1.0: Amazon Nova Lite 1.0 is a very low-cost multimodal model from Amazon that focused on fast processing of image, video, and text inputs to generate text output. Amazon Nova Lite...	224	21.8	0.00
#292	MiniMax: MiniMax-01 saasMiniMax: MiniMax-01: MiniMax-01 is a combines MiniMax-Text-01 for text generation and MiniMax-VL-01 for image understanding. It has 456 billion parameters, with 45.9 billion parameters activated per inference, and can handle a context...	148	21.8	0.00
#360	Xiaomi: MiMo-V2-Omni saasXiaomi: MiMo-V2-Omni: MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step...	5	21.8	0.00
#324	Qwen: Qwen3.5 Plus 2026-02-15 saasQwen: Qwen3.5 Plus 2026-02-15: The Qwen3.5 native vision-language series Plus models are built on a hybrid architecture that integrates linear attention mechanisms with sparse mixture-of-experts models, achieving higher inference efficiency. In a variety of...	30	21.8	0.00
#273	Baidu: Qianfan-OCR-Fast (free) saasBaidu: Qianfan-OCR-Fast (free): Qianfan-OCR-Fast is a domain-specific multimodal large model purpose-built for OCR. By leveraging specialized OCR training data while preserving versatile multimodal intelligence, it provides a powerful performance upgrade over Qianfan-OCR.	205	21.8	0.00
#265	Arcee AI: Spotlight saasArcee AI: Spotlight: Spotlight is a 7???billion???parameter vision???language model derived from Qwen 2.5???VL and fine???tuned by Arcee AI for tight image???text grounding tasks. It offers a 32 k???token context window, enabling rich multimodal...	207	21.8	0.00
#327	Qwen: Qwen3.6 Flash saasQwen: Qwen3.6 Flash: Qwen3.6 Flash is a fast, efficient language model from Alibaba's Qwen 3.6 series. It supports text, image, and video input with a 1M token context window. Tiered pricing kicks in...	30	21.8	0.00
#369	Google: Nano Banana 2 (Gemini 3.1 Flash Image Preview) saasGoogle: Nano Banana 2 (Gemini 3.1 Flash Image Preview): Gemini 3.1 Flash Image Preview, a.k.a. "Nano Banana 2," is Google???s latest state of the art image generation and editing model, delivering Pro-level visual quality at Flash speed. It combines...	252	14.1	-7.70

Browse all sectors at /sectors.