Mode

Serverless GPU Services

The Serverless GPU service offers two operational modes: Develop and Deploy. Each mode is designed for different use cases to help you efficiently utilize GPU resources.

Development Mode

Development mode is optimized for one-time GPU tasks with immediate result delivery. Suitable for batch processing, short-term experiments, and analytical tasks.

Command:

float16 run <your_app> --name <name>

Limitations:

  • Max execution time: 60 seconds per task

  • Max concurrency: up to 8 tasks (not guaranteed; depends on system load)

Pricing:

  • On-demand: $0.006 per second (~$21.6/hour)

Spot Mode (Within development mode)

Spot mode is designed for users who need longer tasks (> 60s), at lower cost.

Note: Spot tasks can be interrupted by on-demand tasks and resumed automatically when resources free up.

Command:

float16 run --spot --name <task_name> --budget <budget_value>
  • Tasks may pause and resume based on resource availability

  • No automatic task staging – users must handle resume logic

Limitations:

  • No fixed execution time limit

  • Max concurrency: up to 8 tasks (not guaranteed)

Pricing:

  • Spot: $0.0012 per second (~$4.32/hour)

Production Mode

Production mode allows continuous deployment of GPU applications via API endpoints. Best suited for serving models and real-time inference.

Command:

float16 deploy <your_app> --project-id <project_id>

After deployment, you will receive:

  • API endpoints for your app

  • API key for authentication

Endpoint Types:

  • Function Endpoint

    • Executes your task and shuts down the container after completion

    • Max execution time: 120 seconds per request

Limitations:

  • Must be built with FastAPI

  • One API key per deployment (regeneration supported)

  • Max concurrency: up to 8 tasks (not guaranteed)

Pricing:

  • 💵 On-demand: $0.006 per second (~$21.6/hour)

Last updated