Serverless GPU
What is Serverless GPU ?
Serverless GPU is a service that provides GPU resources for short periods, such as 1 second, 5 seconds, or 30 seconds.
Serverless GPU helps developers and data scientists rapidly create proofs of concept (POC) and move to production in the same environment.
How to use ?
The Serverless GPU include comprehensive workloads like AI training, image and video analytics, LLM chatbots, or vector search.
Based on these use cases, our Serverless GPU supports development tasks such as code fixing, syntax correction, and POC.
Once your function is developed, you can directly deploy it to production with the same dependencies as in development mode.
Development Mode
Development Mode is used to run code for non-server purposes. Use
float16 run <py>
Spot Mode
Spot Mode is sub-set of Development mode.
This mode design for cost effective while the price is lower than Developmet mode more than 10x.
Can be interrupted at any time if system resources are needed for higher-priority tasks
Use
float16 run <py> --spot

Production Mode
Production Mode is used to deploy code as a server. Use
float16 deploy <py>
After deployment, Production Mode provides you with,
Function Endpoint : Container automatically stop after task completion. The cost are calculated during start and stop time.

How is it charged ?
Serverless GPU charges are based on 2 usage metrics.
Compute time
Compute time is the duration the serverless GPU is utilized, such as running tasks in development mode, requests in function mode, and starting server mode.
The easiest way to inspect compute time is by checking the duration of each task via CLI or a web dashboard.
Storage
Storage is measured in near real-time, with pricing based on GB/month
Why Serverless GPU with Float16 ?
We ensure the performance of Serverless GPU by providing a high-performance end-to-end service.
Our service is designed to minimize your effort without requiring code changes to migrate to using serverless GPU.
No Cold start
Instant response under 100ms, Always-on instance.
No Vendor lock-in
No code changes required, no decorators needed. Write native Python and enjoy coding.
Ultra-fast File storage
High-performance storage supporting read speeds up to 10GB/s.
Realtime Debuging
Real-time code updates, so you don't have to worry about caching issues.
Pricing
[May 2025] We now offer Serverless GPU access with pay-as-you-go pricing.
Instance Type: NVIDIA H100
Pricing:
On-demand: $0.006 per second (~$21.6/hour)
Spot: $0.0012 per second (~$4.32/hour)
Storage Fee: $5.184 per GB/month
Feel free to try our service! We’d love your feedback via Discord or any Float16 community channel.
Use Case
Hello World
Launch your first serverless GPU function and kickstart your journey.
List all library
Explore the complete set of libraries available in your container.
Install new library
Enhance your toolkit by adding new libraries tailored to your project needs.
Pre-load model
Accelerate model access using remote storage for improved performance.
Copy output from remote
Efficiently transfer computation results from remote to your local storage.
Server mode
Create your own serverless server for scalable cloud-based applications.
Last updated