📚
Docs - Float16
homeapp
  • 🚀GETTING STARTED
    • Introduction
    • Account
      • Dashboard
      • Profile
      • Payment
      • Workspace
      • Service Quota
    • LLM as a service
      • Quick Start
        • Set the credentials
      • Supported Model
      • Limitation
      • API Reference
    • One Click Deploy
      • Quick Start
        • Instance Detail
        • Re-generate API Key
        • Terminate Instance
      • Features
        • OpenAI Compatible
        • Long context and Auto scheduler
        • Quantization
        • Context caching
      • Limitation
      • Validated model
      • Endpoint Specification
    • Serverless GPU
      • Quick Start
        • Mode
        • Task Status
        • App Features
          • Project Detail
      • Tutorials
        • Hello World
        • Install new library
        • Prepare model weight
        • S3 Copy output from remote
        • R2 Copy output from remote
        • Direct upload and download
        • Server mode
        • LLM Dynamic Batching
        • Train and Inference MNIST
        • Etc.
      • CLI References
      • ❓FAQ
    • Playground
      • FloatChat
      • FloatPrompt
      • Quantize by Float16
  • 📚Use Case
    • Q&A Bot (RAG)
    • Text-to-SQL
    • OpenAI with Rate Limit
    • OpenAI with Guardrail
    • Multiple Agents
    • Q&A Chatbots (RAG + Agents)
  • ✳️Journey
    • ✨The Beginner's LLM Development Journey
    • 📖Glossary
      • [English Version] LLM Glossary
      • [ภาษาไทย] LLM Glossary
    • 🧠How to install node
  • Prompting
    • 📚Variable
    • ⛓️Condition
    • 🔨Demonstration
    • ⌛Loop
    • 📙Formatting
    • 🐣Chat
    • 🔎Technical term (Retrieve)
  • Privacy Policy
  • Terms & Conditions
Powered by GitBook
On this page
  • What is Serverless GPU ?
  • How to use ?
  • How is it charged ?
  • Compute time
  • Storage
  • Why Serverless GPU with Float16 ?
  • Pricing
  • Use Case
  1. GETTING STARTED

Serverless GPU

PreviousEndpoint SpecificationNextQuick Start

Last updated 1 month ago

What is Serverless GPU ?

Serverless GPU is a service that provides GPU resources for short periods, such as 1 second, 5 seconds, or 30 seconds.

Serverless GPU helps developers and data scientists rapidly create proofs of concept (POC) and move to production in the same environment.

How to use ?

The Serverless GPU include comprehensive workloads like AI training, image and video analytics, LLM chatbots, or vector search.

Based on these use cases, our Serverless GPU supports development tasks such as code fixing, syntax correction, and POC.

Once your function is developed, you can directly deploy it to production with the same dependencies as in development mode.

Development Mode

  • Development Mode is used to run code for non-server purposes. Use float16 run <py>

Spot Mode

  • Spot Mode is sub-set of Development mode.

  • This mode design for cost effective while the price is lower than Developmet mode more than 10x.

  • Use float16 run <py> --spot

Production Mode

  • Production Mode is used to deploy code as a server. Use float16 deploy <py>

  • After deployment, Production Mode provides you with two endpoints.

    • Function Endpoint : Container automatically stop after task completion. The cost are calculated during start and stop time.

    • Server Endpoint : The container remains active for up to 30 seconds, supporting multiple requests while active. It automatically stops after the time limit expires. Each request extends the timeout by another 30 seconds. This endpoint is used for low latency and cost-effective operations. Costs are calculated once (for 30 seconds), with no additional cost for each request while the server is active (within 30 seconds).

How is it charged ?

Serverless GPU charges are based on 2 usage metrics.

Compute time

Compute time is the duration the serverless GPU is utilized, such as running tasks in development mode, requests in function mode, and starting server mode.

The easiest way to inspect compute time is by checking the duration of each task via CLI or a web dashboard.

Storage

Storage is measured in near real-time, with pricing based on GB/month

Why Serverless GPU with Float16 ?

We ensure the performance of Serverless GPU by providing a high-performance end-to-end service.

Our service is designed to minimize your effort without requiring code changes to migrate to using serverless GPU.

No Cold start

  • Instant response under 100ms, Always-on instance.

No Vendor lock-in

  • No code changes required, no decorators needed. Write native Python and enjoy coding.

Ultra-fast File storage

  • High-performance storage supporting read speeds up to 10GB/s.

Realtime Debuging

  • Real-time code updates, so you don't have to worry about caching issues.

Pricing

[6 Dec 2024] Currently, we offer FREE instance usage with some limitations

  • Available GPUs L4. L40s, H100 will coming after general available.

  • Processing time is limited to 30 seconds

Feel free to try our service! We'd be grateful if you could share your feedback via Discord or any Float16 channel.

Use Case

🚀

Hello World

Launch your first serverless GPU function and kickstart your journey.

List all library

Explore the complete set of libraries available in your container.

Install new library

Enhance your toolkit by adding new libraries tailored to your project needs.

Pre-load model

Accelerate model access using remote storage for improved performance.

Copy output from remote

Efficiently transfer computation results from remote to your local storage.

Server mode

Create your own serverless server for scalable cloud-based applications.

Float16 run with GPU
Float16 deploy with GPU