Model Catalog
106 items
Applied Filters
none
sentence-transformers /
all-MiniLM-L6-v2
275
Deployed 275 times
Sentence Embeddings
TEI
Accelerated Text Embeddings Inference
CPU 1x Intel Sapphire Rapids
$ 0.033
sentence-transformers /
all-mpnet-base-v2
59
Deployed 59 times
Sentence Embeddings
TEI
Accelerated Text Embeddings Inference
CPU 2x Intel Sapphire Rapids
$ 0.067
Linaqruf /
animagine-xl-2.0
43
Deployed 43 times
Text-to-Image
GPU 1x Nvidia L4
$ 0.8
stablediffusionapi /
anything-v5
32
Deployed 32 times
Text-to-Image
GPU 1x Nvidia L4
$ 0.8
swiss-ai /
Apertus-8B-Instruct-2509
18
Deployed 18 times
Text Generation
SGLang
Accelerated SGLang
GPU 1x Nvidia L4
$ 0.8
BAAI /
bge-base-en-v1.5
63
Deployed 63 times
Sentence Embeddings
TEI
Accelerated Text Embeddings Inference
CPU 4x Intel Sapphire Rapids
$ 0.134
BAAI /
bge-large-en-v1.5
44
Deployed 44 times
Sentence Embeddings
TEI
Accelerated Text Embeddings Inference
GPU 1x Nvidia L4
$ 0.8
BAAI /
bge-multilingual-gemma2
7
Deployed 7 times
Sentence Embeddings
GPU 1x Nvidia L40S
$ 1.8
BAAI /
bge-reranker-base
26
Deployed 26 times
Sentence Ranking
TEI
Accelerated Text Embeddings Inference
GPU 1x Nvidia T4
$ 0.5
deepseek-ai /
DeepSeek-R1-Distill-Llama-70B
92
Deployed 92 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 4x Nvidia L40S
$ 8.3
deepseek-ai /
DeepSeek-R1-Distill-Qwen-32B
115
Deployed 115 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 4x Nvidia L4
$ 3.8
unsloth /
DeepSeek-R1-GGUF
44
Deployed 44 times
Text Generation
Llama.cpp
Accelerated llama.cpp
IQ1_S
GPU 4x Nvidia L40S
$ 8.3
unsloth /
DeepSeek-R1-GGUF
41
Deployed 41 times
Text Generation
Llama.cpp
Accelerated llama.cpp
IQ2_XXS
GPU 8x Nvidia A100
$ 20
google /
embeddinggemma-300m
10
Deployed 10 times
Sentence Similarity
TEI
Accelerated Text Embeddings Inference
CPU 2x Intel Sapphire Rapids
$ 0.067
onnx-community /
embeddinggemma-300m-ONNX
3
Deployed 3 times
Sentence Similarity
TEI
Accelerated Text Embeddings Inference
CPU 2x Intel Sapphire Rapids
$ 0.067
black-forest-labs /
FLUX.1-schnell
139
Deployed 139 times
Text-to-Image
GPU 1x Nvidia L40S
$ 1.8
google /
gemma-3-12b-it
156
Deployed 156 times
Image-Text-to-Text
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L40S
$ 1.8
google /
gemma-3-27b-it
182
Deployed 182 times
Image-Text-to-Text
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia A100
$ 2.5
openai /
gpt-oss-120b
172
Deployed 172 times
Text Generation
vLLM
Accelerated vLLM
GPU 1x Nvidia H200
$ 5
openai /
gpt-oss-20b
249
Deployed 249 times
Text Generation
vLLM
Accelerated vLLM
GPU 1x Nvidia H200
$ 5
ibm-granite /
granite-3.1-8b-base
23
Deployed 23 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L40S
$ 1.8
ibm-granite /
granite-3.3-8b-instruct
14
Deployed 14 times
Text Generation
vLLM
Accelerated vLLM
GPU 1x Nvidia L40S
$ 1.8
thenlper /
gte-large
10
Deployed 10 times
Sentence Embeddings
TEI
Accelerated Text Embeddings Inference
CPU 4x Intel Sapphire Rapids
$ 0.134
beethogedeon /
gte-Qwen2-7B-instruct-Q4_K_M-GGUF
17
Deployed 17 times
Sentence Embeddings
Llama.cpp
Accelerated llama.cpp
Q4_K_M
CPU 8x Intel Sapphire Rapids
$ 0.268
ggml-org /
InternVL3-14B-Instruct-GGUF
12
Deployed 12 times
Image-Text-to-Text
Llama.cpp
Accelerated llama.cpp
Q8_0
GPU 1x Nvidia L4
$ 0.8
ggml-org /
jina-reranker-v1-turbo-en-GGUF
7
Deployed 7 times
Sentence Ranking
Llama.cpp
Accelerated llama.cpp
F16
CPU 8x Intel Sapphire Rapids
$ 0.268
meta-llama /
Llama-2-13b-chat-hf
13
Deployed 13 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L40S
$ 1.8
meta-llama /
Llama-2-13b-hf
6
Deployed 6 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L40S
$ 1.8
meta-llama /
Llama-2-70b-chat-hf
6
Deployed 6 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 4x Nvidia L40S
$ 8.3
meta-llama /
Llama-2-70b-hf
Text Generation
TGI
Accelerated Text Generation Inference
GPU 4x Nvidia L40S
$ 8.3
meta-llama /
Llama-2-7b-chat-hf
44
Deployed 44 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L40S
$ 1.8
meta-llama /
Llama-2-7b-hf
19
Deployed 19 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L40S
$ 1.8
meta-llama /
Llama-3.1-70B
21
Deployed 21 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 4x Nvidia L40S
$ 8.3
meta-llama /
Llama-3.1-70B-Instruct
56
Deployed 56 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 4x Nvidia L40S
$ 8.3
meta-llama /
Llama-3.1-8B
68
Deployed 68 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L4
$ 0.8
meta-llama /
Llama-3.1-8B-Instruct
305
Deployed 305 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L4
$ 0.8
meta-llama /
Llama-3.2-11B-Vision-Instruct
38
Deployed 38 times
Image-Text-to-Text
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L40S
$ 1.8
meta-llama /
Llama-3.2-1B
21
Deployed 21 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L4
$ 0.8
meta-llama /
Llama-3.2-1B-Instruct
38
Deployed 38 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L4
$ 0.8
meta-llama /
Llama-3.2-3B
21
Deployed 21 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L4
$ 0.8
meta-llama /
Llama-3.2-3B-Instruct
89
Deployed 89 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L4
$ 0.8
lmstudio-community /
Llama-3.3-70B-Instruct-GGUF
46
Deployed 46 times
Text Generation
Llama.cpp
Accelerated llama.cpp
Q8_0
GPU 4x Nvidia L4
$ 3.8
meta-llama /
Meta-Llama-3-70B
9
Deployed 9 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 4x Nvidia A100
$ 10
meta-llama /
Meta-Llama-3-70B-Instruct
3
Deployed 3 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 4x Nvidia A100
$ 10
meta-llama /
Meta-Llama-3-8B
28
Deployed 28 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L4
$ 0.8
meta-llama /
Meta-Llama-3-8B-Instruct
64
Deployed 64 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L4
$ 0.8
mistralai /
Mistral-7B-Instruct-v0.3
273
Deployed 273 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L4
$ 0.8
mistralai /
Mistral-7B-v0.3
27
Deployed 27 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L4
$ 0.8
mistralai /
Mistral-Nemo-Instruct-2407
26
Deployed 26 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 4x Nvidia L4
$ 3.8
mistralai /
Mistral-Small-24B-Instruct-2501
24
Deployed 24 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 4x Nvidia L4
$ 3.8
mistralai /
Mixtral-8x22B-Instruct-v0.1
17
Deployed 17 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 8x Nvidia A100
$ 20
mistralai /
Mixtral-8x7B-Instruct-v0.1
114
Deployed 114 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 2x Nvidia A100
$ 5
mistralai /
Mixtral-8x7B-v0.1
13
Deployed 13 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 2x Nvidia A100
$ 5
cross-encoder /
ms-marco-MiniLM-L12-v2
19
Deployed 19 times
Sentence Ranking
CPU 1x Intel Sapphire Rapids
$ 0.033
intfloat /
multilingual-e5-large
34
Deployed 34 times
Sentence Embeddings
TEI
Accelerated Text Embeddings Inference
GPU 1x Nvidia T4
$ 0.5
intfloat /
multilingual-e5-large-instruct
39
Deployed 39 times
Sentence Embeddings
TEI
Accelerated Text Embeddings Inference
GPU 1x Nvidia T4
$ 0.5
mixedbread-ai /
mxbai-embed-large-v1
22
Deployed 22 times
Sentence Embeddings
TEI
Accelerated Text Embeddings Inference
GPU 1x Nvidia L4
$ 0.8
Intel /
neural-chat-7b-v3-1
6
Deployed 6 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L4
$ 0.8
Intel /
neural-chat-7b-v3-3
6
Deployed 6 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L4
$ 0.8
nomic-ai /
nomic-embed-text-v1.5
24
Deployed 24 times
Sentence Similarity
TEI
Accelerated Text Embeddings Inference
CPU 8x Intel Sapphire Rapids
$ 0.268
openchat /
openchat-3.5-0106
24
Deployed 24 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L4
$ 0.8
prompthero /
openjourney
15
Deployed 15 times
Text-to-Image
GPU 1x Nvidia L4
$ 0.8
google /
paligemma2-10b-mix-224
2
Deployed 2 times
Image-Text-to-Text
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L40S
$ 1.8
google /
paligemma2-10b-mix-448
Image-Text-to-Text
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L40S
$ 1.8
google /
paligemma2-3b-mix-448
10
Deployed 10 times
Image-Text-to-Text
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L4
$ 0.8
sentence-transformers /
paraphrase-multilingual-MiniLM-L12-v2
13
Deployed 13 times
Sentence Embeddings
TEI
Accelerated Text Embeddings Inference
GPU 1x Nvidia L4
$ 0.8
microsoft /
Phi-3-mini-128k-instruct
60
Deployed 60 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L4
$ 0.8
microsoft /
Phi-3-mini-4k-instruct
39
Deployed 39 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L4
$ 0.8
microsoft /
phi-4
61
Deployed 61 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L40S
$ 1.8
Qwen /
Qwen2.5-14B-Instruct
118
Deployed 118 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 2x Nvidia A100
$ 5
Qwen /
Qwen2.5-14B-Instruct-1M
17
Deployed 17 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 2x Nvidia A100
$ 5
Qwen /
Qwen2.5-72B-Instruct
28
Deployed 28 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 4x Nvidia A100
$ 10
Qwen /
Qwen2.5-7B-Instruct
180
Deployed 180 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L40S
$ 1.8
Qwen /
Qwen2.5-Coder-14B-Instruct
20
Deployed 20 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L40S
$ 1.8
Qwen /
Qwen2.5-Coder-32B-Instruct
83
Deployed 83 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 4x Nvidia L4
$ 3.8
bartowski /
Qwen2.5-Coder-32B-Instruct-GGUF
43
Deployed 43 times
Text Generation
Llama.cpp
Accelerated llama.cpp
Q4_K_M
GPU 1x Nvidia L4
$ 0.8
Qwen /
Qwen2.5-Coder-7B-Instruct
45
Deployed 45 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L40S
$ 1.8
Qwen /
Qwen2.5-Math-72B-Instruct
4
Deployed 4 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 4x Nvidia A100
$ 10
Qwen /
Qwen2.5-Math-7B-Instruct
2
Deployed 2 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L40S
$ 1.8
ggml-org /
Qwen2.5-VL-3B-Instruct-GGUF
26
Deployed 26 times
Image-Text-to-Text
Llama.cpp
Accelerated llama.cpp
Q8_0
GPU 1x Nvidia T4
$ 0.5
Qwen /
Qwen2.5-VL-7B-Instruct
158
Deployed 158 times
Image-Text-to-Text
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L40S
$ 1.8
ggml-org /
Qwen2.5-VL-7B-Instruct-GGUF
26
Deployed 26 times
Image-Text-to-Text
Llama.cpp
Accelerated llama.cpp
Q8_0
GPU 1x Nvidia T4
$ 0.5
Qwen /
Qwen3-1.7B
59
Deployed 59 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L4
$ 0.8
Qwen /
Qwen3-32B
41
Deployed 41 times
Text Generation
vLLM
Accelerated vLLM
GPU 4x Nvidia L4
$ 3.8
Qwen /
Qwen3-Embedding-0.6B
95
Deployed 95 times
Feature Extraction
TEI
Accelerated Text Embeddings Inference
GPU 1x Nvidia L4
$ 0.8
Qwen /
Qwen3-Embedding-4B
76
Deployed 76 times
Feature Extraction
TEI
Accelerated Text Embeddings Inference
GPU 1x Nvidia L4
$ 0.8
Qwen /
Qwen3-Embedding-8B
113
Deployed 113 times
Feature Extraction
TEI
Accelerated Text Embeddings Inference
GPU 1x Nvidia L4
$ 0.8
Qwen /
Qwen3-Next-80B-A3B-Instruct
7
Deployed 7 times
Text Generation
vLLM
Accelerated vLLM
GPU 4x Nvidia A100
$ 10
Qwen /
Qwen3-Next-80B-A3B-Thinking
2
Deployed 2 times
Text Generation
vLLM
Accelerated vLLM
GPU 4x Nvidia A100
$ 10
Qwen /
QwQ-32B
47
Deployed 47 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 4x Nvidia L4
$ 3.8
bartowski /
QwQ-32B-Preview-GGUF
2
Deployed 2 times
Text Generation
Llama.cpp
Accelerated llama.cpp
Q8_0
GPU 1x Nvidia L40S
$ 1.8
RekaAI /
reka-flash-3
1
Deployed 1 time
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L40S
$ 1.8
simplescaling /
s1.1-32B
1
Deployed 1 time
Text Generation
TGI
Accelerated Text Generation Inference
GPU 4x Nvidia L4
$ 3.8
HuggingFaceTB /
SmolLM2-1.7B
13
Deployed 13 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L4
$ 0.8
HuggingFaceTB /
SmolLM2-1.7B-Instruct
24
Deployed 24 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L4
$ 0.8
ggml-org /
SmolVLM2-2.2B-Instruct-GGUF
16
Deployed 16 times
Image-Text-to-Text
Llama.cpp
Accelerated llama.cpp
F16
GPU 1x Nvidia T4
$ 0.5
stabilityai /
stable-diffusion-2-1
62
Deployed 62 times
Text-to-Image
GPU 1x Nvidia L4
$ 0.8
stable-diffusion-v1-5 /
stable-diffusion-v1-5
54
Deployed 54 times
Text-to-Image
GPU 1x Nvidia T4
$ 0.5
stabilityai /
stable-diffusion-xl-base-1.0
79
Deployed 79 times
Text-to-Image
GPU 1x Nvidia L4
$ 0.8
berkeley-nest /
Starling-LM-7B-alpha
5
Deployed 5 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L4
$ 0.8
Tesslate /
Tessa-T1-32B
4
Deployed 4 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia A100
$ 2.5
qihoo360 /
TinyR1-32B-Preview
5
Deployed 5 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 4x Nvidia L4
$ 3.8
lmsys /
vicuna-7b-v1.5
5
Deployed 5 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L40S
$ 1.8
openai /
whisper-large-v3
148
Deployed 148 times
Automatic Speech Recognition
GPU 1x Nvidia L4
$ 0.8
openai /
whisper-large-v3-turbo
156
Deployed 156 times
Automatic Speech Recognition
GPU 1x Nvidia L4
$ 0.8
HuggingFaceH4 /
zephyr-7b-beta
37
Deployed 37 times
Text Generation
TGI
Accelerated Text Generation Inference
GPU 1x Nvidia L4
$ 0.8