📚
Docs - Float16
homeapp
  • 🚀GETTING STARTED
    • Introduction
    • Account
      • Dashboard
      • Profile
      • Payment
      • Workspace
      • Service Quota
    • LLM as a service
      • Quick Start
        • Set the credentials
      • Supported Model
      • Limitation
      • API Reference
    • One Click Deploy
      • Quick Start
        • Instance Detail
        • Re-generate API Key
        • Terminate Instance
      • Features
        • OpenAI Compatible
        • Long context and Auto scheduler
        • Quantization
        • Context caching
      • Limitation
      • Validated model
      • Endpoint Specification
    • Serverless GPU
      • Quick Start
        • Mode
        • Task Status
        • App Features
          • Project Detail
      • Tutorials
        • Hello World
        • Install new library
        • Prepare model weight
        • S3 Copy output from remote
        • R2 Copy output from remote
        • Direct upload and download
        • Server mode
        • LLM Dynamic Batching
        • Train and Inference MNIST
        • Etc.
      • CLI References
      • ❓FAQ
    • Playground
      • FloatChat
      • FloatPrompt
      • Quantize by Float16
  • 📚Use Case
    • Q&A Bot (RAG)
    • Text-to-SQL
    • OpenAI with Rate Limit
    • OpenAI with Guardrail
    • Multiple Agents
    • Q&A Chatbots (RAG + Agents)
  • ✳️Journey
    • ✨The Beginner's LLM Development Journey
    • 📖Glossary
      • [English Version] LLM Glossary
      • [ภาษาไทย] LLM Glossary
    • 🧠How to install node
  • Prompting
    • 📚Variable
    • ⛓️Condition
    • 🔨Demonstration
    • ⌛Loop
    • 📙Formatting
    • 🐣Chat
    • 🔎Technical term (Retrieve)
  • Privacy Policy
  • Terms & Conditions
Powered by GitBook
On this page
  • Supported Model
  • Incoming model
  • Regions
  • GPU Types
  • Multi-GPU Support
  1. GETTING STARTED
  2. One Click Deploy

Limitation

Supported Model

  • Llama (1, 2, 3, 3.1)

  • Mistral

  • Qwen, Qwen2

  • Gemma, Gemma2

Incoming model

  • RecurrentGemma

  • Mamba

Regions

Currently, we offer services across 5 AWS regions:

  • North Virginia (us-east-1)

  • Oregon (us-west-2)

  • Tokyo (ap-northeast-1)

  • Sydney (ap-southeast-2)

  • Jakarta (ap-southeast-3)

GPU Types

We currently support 3 types of GPU instances:

  • NVIDIA L4

  • NVIDIA L40s

  • NVIDIA A10

Incoming GPU

  • NVIDIA H100

  • NVIDIA H200

Multi-GPU Support

At the moment, We do not support Multi-GPU Deployment.

Please note that our regional coverage and GPU options are subject to expansion in the future. We continuously strive to enhance our service offerings to meet evolving customer needs.

PreviousContext cachingNextValidated model

Last updated 7 months ago

🚀