Serverless GPU [Beta ver.]
Last updated
Last updated
Serverless GPU is a service that provides GPU resources for short periods, such as 1 second, 5 seconds, or 30 seconds.
Serverless GPU helps developers and data scientists rapidly create proofs of concept (POC) and move to production in the same environment.
The use cases for Serverless GPU include comprehensive workloads like AI training, image and video analytics, LLM chatbots, or vector search.
Based on these use cases, our Serverless GPU supports development tasks such as code fixing, syntax correction, and POC.
Once your function is developed, you can directly deploy it to production with the same dependencies as in development mode.
Development Mode
Development Mode is used to run code for non-server purposes. Use float16 run <py>
Production Mode
Production Mode is used to deploy code as a server. Use float16 deploy <py>
After deployment, Production Mode provides you with two endpoints.
Function Endpoint : Container automatically stop after task completion. The cost are calculated during start and stop time.
Server Endpoint : The container remains active for up to 30 seconds, supporting multiple requests while active. It automatically stops after the time limit expires. This endpoint is used for low latency and cost-effective operations. Costs are calculated once (for 30 seconds), with no additional cost for each request while the server is active (within 30 seconds).
Serverless GPU charges are based on 2 usage metrics.
Compute time is the duration the serverless GPU is utilized, such as running tasks in development mode, requests in function mode, and starting server mode.
The easiest way to inspect compute time is by checking the duration of each task via CLI or a web dashboard.
Storage is measured in near real-time, with pricing based on GB/month
We ensure the performance of Serverless GPU by providing a high-performance end-to-end service.
Our service is designed to minimize your effort without requiring code changes to migrate to using serverless GPU.
Instant response under 100ms, Always-on instance.
No code changes required, no decorators needed. Write native Python and enjoy coding.
High-performance storage supporting read speeds up to 10GB/s.
Real-time code updates, so you don't have to worry about caching issues.
[6 Dec 2024] Currently, we offer FREE instance usage with some limitations
Available GPUs L4. L40s, H100 will coming after general available.
Processing time is limited to 30 seconds
Feel free to try our service! We'd be grateful if you could share your feedback via Discord or any Float16 channel.
Hello World
Launch your first serverless GPU function and kickstart your journey.
List all library
Explore the complete set of libraries available in your container.
Install new library
Enhance your toolkit by adding new libraries tailored to your project needs.
Pre-load model
Accelerate model access using remote storage for improved performance.
Copy output from remote
Efficiently transfer computation results from remote to your local storage.
Server mode
Create your own serverless server for scalable cloud-based applications.