Model Catalog

Deepseek

6 items

These cutting-edge models from Chinese AI lab DeepSeek punch well above their weight. They deliver impressive reasoning, coding, and math skills while remaining open and surprisingly affordable to run. Ideal for research assistants, chatbots, and intelligent search applications.

Embeddings

20 items

Convert real-world data into simplified numerical representations that capture semantic meaning and syntactic relationship. Perfect for building semantic search engines, RAG pipelines, recommendation systems, and clustering applications.

Gemma

9 items

Google's lightweight open models prove great things come in small packages. Built from the same technology that powers Gemini, they deliver strong reasoning, coding, and conversational skills. Capable enough for serious work, yet compact enough to run almost anywhere.

gpt-oss

3 items

OpenAI's gpt-oss is an open‑source family of lightweight, fine‑tuned GPT models that deliver high‑quality text generation while staying transparent, customizable, and free for commercial and research use.

Inferentia 2

7 items

Our models optimized to run on AWS Inferentia 2, Amazon's purpose-built ML accelerator. Designed for high-throughput, cost-efficient inference at scale. Ideal for production deployments that need the performance of dedicated silicon without the cost of traditional GPU infrastructure.

Llama

8 items

Meta Llama is a versatile suite of open‑source language models that combine efficiency with cutting‑edge performance. From lightweight chatbots to research‑grade generators, each model delivers fast, context‑aware responses while staying accessible and customizable.

Llama.cpp

21 items

Discover our curated LlamaCPP model collection, optimized for fast, lightweight inference on any device. Powered by LlamaCPP’s native C++ engine, each model runs without Python dependencies, delivering low‑latency responses even on modest CPUs or GPUs.

Mistral

6 items

French AI lab Mistral AI delivers world-class open-weight models with top-tier reasoning and instruction-following. They're fast and efficient thanks to sliding window attention. Built in Europe, they're a natural fit for EU teams prioritizing data sovereignty and regulatory compliance.

NVIDIA

2 items

NVIDIA's open and NVFP4-optimized models for reasoning, agentic, and long-context workloads. Fast and memory-efficient with native Blackwell acceleration. A strong fit for serving frontier models at scale, cost-effectively.

OCR

5 items

Optical Character Recognition models that convert images, scanned documents, and PDFs into machine-readable text. Ideal for document digitization, data extraction, and handwriting recognition.

PaliGemma

3 items

Google's family of open vision-language models built on the Gemma architecture. PaliGemma combines image and text understanding, making it great for tasks like image captioning, visual question answering, and document understanding. Optimized for low‑resource environments.

Qwen

30 items

Alibaba's family of open-weight large language models, covering text, code, math, and multimodal tasks. Qwen models come in a wide range of sizes and specializations, consistently ranking among the top open models for their class.

Thinking Machines Lab

2 items

Inkling is a general-purpose multimodal model that accepts text, image and audio inputs and generates text outputs. It is intended for use in English and other languages, and across multiple coding languages.

Model Catalog

DeepSeek-V4-Flash-0731

Inkling-Small-NVFP4

Inkling-NVFP4

NVIDIA-Nemotron-3-Ultra-550B-A55B-NVFP4

GLM-5.2-NVFP4

gemma-4-12B-it