Mode
Serverless GPU Services
The Serverless GPU service offers two operational modes: Develop and Deploy. Each mode is designed for different use cases to help you efficiently utilize GPU resources.
Development Mode
Development mode is optimized for one-time GPU tasks with immediate result delivery. Suitable for batch processing, short-term experiments, and analytical tasks.
Command:
Limitations:
Max execution time: 60 seconds per task
Max concurrency: up to 8 tasks (not guaranteed; depends on system load)
Pricing:
On-demand: $0.006 per second (~$21.6/hour)
Spot Mode (Within development mode)
Spot mode is designed for users who need longer tasks (> 60s), at lower cost.
Note: Spot tasks can be interrupted by on-demand tasks and resumed automatically when resources free up.
Command:
Tasks may pause and resume based on resource availability
No automatic task staging – users must handle resume logic
Limitations:
No fixed execution time limit
Max concurrency: up to 8 tasks (not guaranteed)
Pricing:
Spot: $0.0012 per second (~$4.32/hour)
Production Mode
Production mode allows continuous deployment of GPU applications via API endpoints. Best suited for serving models and real-time inference.
Command:
After deployment, you will receive:
API endpoints for your app
API key for authentication
Endpoint Types:
Function Endpoint
Executes your task and shuts down the container after completion
Max execution time: 60 seconds per request
Server Endpoint
Keeps the container running for real-time interaction
Initial active time: 60 seconds
Each incoming request: extends timeout by 30 seconds
Automatically shuts down when no request is received within the timeout window
Limitations:
Must be built with FastAPI
One API key per deployment (regeneration supported)
Max concurrency: up to 8 tasks (not guaranteed)
Pricing:
💵 On-demand: $0.006 per second (~$21.6/hour)
Last updated