Limitation

Supported Model

Llama (1, 2, 3, 3.1)
Mistral
Qwen, Qwen2
Gemma, Gemma2

Incoming model

RecurrentGemma
Mamba

Regions

Currently, we offer services across 5 AWS regions:

North Virginia (us-east-1)
Oregon (us-west-2)
Tokyo (ap-northeast-1)
Sydney (ap-southeast-2)
Jakarta (ap-southeast-3)

GPU Types

We currently support 3 types of GPU instances:

NVIDIA L4
NVIDIA L40s
NVIDIA A10

Incoming GPU

NVIDIA H100
NVIDIA H200

Multi-GPU Support

At the moment, We do not support Multi-GPU Deployment.

Please note that our regional coverage and GPU options are subject to expansion in the future. We continuously strive to enhance our service offerings to meet evolving customer needs.

PreviousContext caching NextValidated model

Last updated 10 months ago