๐Ÿ“š
Docs - Float16
homeapp
  • ๐Ÿš€GETTING STARTED
    • Introduction
    • Account
      • Dashboard
      • Profile
      • Payment
      • Workspace
      • Service Quota
    • LLM as a service
      • Quick Start
        • Set the credentials
      • Supported Model
      • Limitation
      • API Reference
    • One Click Deploy
      • Quick Start
        • Instance Detail
        • Re-generate API Key
        • Terminate Instance
      • Features
        • OpenAI Compatible
        • Long context and Auto scheduler
        • Quantization
        • Context caching
      • Limitation
      • Validated model
      • Endpoint Specification
    • Serverless GPU
      • Quick Start
        • Mode
        • Task Status
        • App Features
          • Project Detail
      • Tutorials
        • Hello World
        • Install new library
        • Prepare model weight
        • S3 Copy output from remote
        • R2 Copy output from remote
        • Direct upload and download
        • Server mode
        • LLM Dynamic Batching
        • Train and Inference MNIST
        • Etc.
      • CLI References
      • โ“FAQ
    • Playground
      • FloatChat
      • FloatPrompt
      • Quantize by Float16
  • ๐Ÿ“šUse Case
    • Q&A Bot (RAG)
    • Text-to-SQL
    • OpenAI with Rate Limit
    • OpenAI with Guardrail
    • Multiple Agents
    • Q&A Chatbots (RAG + Agents)
  • โœณ๏ธJourney
    • โœจThe Beginner's LLM Development Journey
    • ๐Ÿ“–Glossary
      • [English Version] LLM Glossary
      • [เธ เธฒเธฉเธฒเน„เธ—เธข] LLM Glossary
    • ๐Ÿง How to install node
  • Prompting
    • ๐Ÿ“šVariable
    • โ›“๏ธCondition
    • ๐Ÿ”จDemonstration
    • โŒ›Loop
    • ๐Ÿ“™Formatting
    • ๐ŸฃChat
    • ๐Ÿ”ŽTechnical term (Retrieve)
  • Privacy Policy
  • Terms & Conditions
Powered by GitBook
On this page
  • Check all instances
  • Add new instance
  • Quickly test API
  • Instance Chat Playground
  • cURL
  1. GETTING STARTED
  2. One Click Deploy

Quick Start

One Click Deploy quick start

PreviousOne Click DeployNextInstance Detail

Last updated 8 months ago

Check all instances

When you first access the One-Click Deploy service, you'll be presented with a table displaying all your instances, both active and inactive.

Every account is allocated a quota of 4 GPU cards, with no restrictions on the type of GPU. To check your current quota, simply click on the "Quota" button or navigate to the service quota settings.

Add new instance

to start new instance:

  1. Paste Hugging Face model repository and token (if required), then click "Next"

  1. Review model name and input instance name.

  2. Configure instance, select region and GPU type.

  3. Review pricing and instance summary.

  4. Click "Start Deploy", wait for "Start instance successfully" notification

  5. Redirected to the instance's deployment section.

Quickly test API

After successful deployment, test your model using:

Instance Chat Playground

After successfully deploying your model, you can easily test its performance using the Instance Chat Playground. This convenient testing tool is readily accessible from your instance overview, allowing you to immediately interact with your deployed model.

  • Default settings: temperature 0.5, max tokens 512

  • Customize parameters, system prompt, and text message via GUI

cURL

Or use following command:

curl -X POST http://api.float16.cloud/dedicate/JxlkeA5y2c/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <float16-api-key>" \
  -d '{
    "model": "<your model>",
    "messages": [
      {
        "role": "system",
        "content": "You are a helpful assistant."
      },
      {
        "role": "user",
        "content": "เธชเธงเธฑเธชเธ”เธต"
      }
    ]
   }'

Find copyable API formats (including OpenAI and LangChain) in the API tab.

Learn more about service quota

We currently use basic optimization techniques. Learn more in the section. For model support limitations, check .

๐Ÿš€
here
technical
here
All instance page
Input model repository
Create Instance
Chat Playground