Inference
Endpoints
Catalog
Help
Log In
Inference Catalog
Search
Inference Task
All Available Tasks
Text Generation
Text-to-Image
Image-Text-to-Text
Sentence Embeddings
Sentence Ranking
Zero Shot Classification
Automatic Speech Recognition
Summarization
Feature Extraction
Price
$ 0 - 50 / hour
0
0.1
0.5
1
5
50
Hardware Accelerator
ALL
CPU
GPU
INF2
TPU
Inference Server
All
Llama.cpp
TEI
TGI
License
License:
All
Hub Models
Browse All Models
Order by:
Name
7 items
LLAMACPP
Clear All
unsloth /
DeepSeek-R1-GGUF
Text Generation
Quantization: IQ1_S
GPU
4x Nvidia L40S
$ 8.3
/ hour
unsloth /
DeepSeek-R1-GGUF
Text Generation
Quantization: IQ2_XXS
GPU
8x Nvidia L40S
$ 23.5
/ hour
beethogedeon /
gte-Qwen2-7B-instruct-Q4_K_M-GGUF
Sentence Embeddings
Quantization: Q4_K_M
CPU
8x Intel Sapphire Rapids
$ 0.268
/ hour
ggml-org /
jina-reranker-v1-turbo-en-GGUF
Sentence Ranking
Quantization: F16
CPU
8x Intel Sapphire Rapids
$ 0.268
/ hour
lmstudio-community /
Llama-3.3-70B-Instruct-GGUF
Text Generation
Quantization: Q8_0
GPU
4x Nvidia L4
$ 3.8
/ hour
bartowski /
Qwen2.5-Coder-32B-Instruct-GGUF
Text Generation
Quantization: Q4_K_M
GPU
1x Nvidia L4
$ 0.8
/ hour
bartowski /
QwQ-32B-Preview-GGUF
Text Generation
Quantization: Q8_0
GPU
1x Nvidia L40S
$ 1.8
/ hour