Mode

Serverless GPU Services

The Serverless GPU service offers two operational modes: Develop and Deploy. Each mode is designed for different use cases to help you efficiently utilize GPU resources.

Development Mode

Development mode is optimized for one-time GPU tasks with immediate result delivery. Suitable for batch processing, short-term experiments, and analytical tasks.

Command:

float16 run <your_app> --name <name>

Limitations:

Max execution time: 60 seconds per task
Max concurrency: up to 8 tasks (not guaranteed; depends on system load)

Pricing:

On-demand: $0.006 per second (~$21.6/hour)

Spot Mode (Within development mode)

Spot mode is designed for users who need longer tasks (> 60s), at lower cost.

Note: Spot tasks can be interrupted by on-demand tasks and resumed automatically when resources free up.

Command:

float16 run --spot --name <task_name> --budget <budget_value>

Tasks may pause and resume based on resource availability
No automatic task staging – users must handle resume logic

Limitations:

No fixed execution time limit
Max concurrency: up to 8 tasks (not guaranteed)

Pricing:

Spot: $0.0012 per second (~$4.32/hour)

Production Mode

Production mode allows continuous deployment of GPU applications via API endpoints. Best suited for serving models and real-time inference.

Command:

float16 deploy <your_app> --project-id <project_id>

After deployment, you will receive:

API endpoints for your app
API key for authentication

Endpoint Types:

Function Endpoint
- Executes your task and shuts down the container after completion
- Max execution time: 120 seconds per request

Limitations:

Must be built with FastAPI
One API key per deployment (regeneration supported)
Max concurrency: up to 8 tasks (not guaranteed)

Pricing:

💵 On-demand: $0.006 per second (~$21.6/hour)

PreviousQuick Start NextTask Status

Last updated 1 month ago