Serverless GPU

What is Serverless GPU ?

Serverless GPU is a service that provides GPU resources for short periods, such as 1 second, 5 seconds, or 30 seconds.

Serverless GPU helps developers and data scientists rapidly create proofs of concept (POC) and move to production in the same environment.

How to use ?

The Serverless GPU include comprehensive workloads like AI training, image and video analytics, LLM chatbots, or vector search.

Based on these use cases, our Serverless GPU supports development tasks such as code fixing, syntax correction, and POC.

Once your function is developed, you can directly deploy it to production with the same dependencies as in development mode.

Development Mode

Development Mode is used to run code for non-server purposes. Use float16 run <py>

Spot Mode

Spot Mode is sub-set of Development mode.
This mode design for cost effective while the price is lower than Developmet mode more than 10x.
Can be interrupted at any time if system resources are needed for higher-priority tasks
Use float16 run <py> --spot

Production Mode

Production Mode is used to deploy code as a server. Use float16 deploy <py>
After deployment, Production Mode provides you with,
- Function Endpoint : Container automatically stop after task completion. The cost are calculated during start and stop time.

How is it charged ?

Serverless GPU charges are based on 2 usage metrics.

Compute time

Compute time is the duration the serverless GPU is utilized, such as running tasks in development mode, requests in function mode, and starting server mode.

The easiest way to inspect compute time is by checking the duration of each task via CLI or a web dashboard.

Storage

Storage is measured in near real-time, with pricing based on GB/month

Why Serverless GPU with Float16 ?

We ensure the performance of Serverless GPU by providing a high-performance end-to-end service.

Our service is designed to minimize your effort without requiring code changes to migrate to using serverless GPU.

No Cold start

Instant response under 100ms, Always-on instance.

No Vendor lock-in

No code changes required, no decorators needed. Write native Python and enjoy coding.

Ultra-fast File storage

High-performance storage supporting read speeds up to 10GB/s.

Realtime Debuging

Real-time code updates, so you don't have to worry about caching issues.

Pricing

[May 2025] We now offer Serverless GPU access with pay-as-you-go pricing.

Instance Type: NVIDIA H100
Pricing:
- On-demand: $0.006 per second (~$21.6/hour)
- Spot: $0.0012 per second (~$4.32/hour)
Storage Fee: $5.184 per GB/month

Feel free to try our service! We’d love your feedback via Discord or any Float16 community channel.

Use Case

Hello World

Launch your first serverless GPU function and kickstart your journey.

List all library

Explore the complete set of libraries available in your container.

Install new library

Enhance your toolkit by adding new libraries tailored to your project needs.

Pre-load model

Accelerate model access using remote storage for improved performance.

Copy output from remote

Efficiently transfer computation results from remote to your local storage.

Server mode

Create your own serverless server for scalable cloud-based applications.

PreviousEndpoint Specification NextQuick Start

Last updated 21 days ago