Mode

Serverless GPU Services

The Serverless GPU service offers two operational modes, Develop and Deploy. Each mode is designed for different use cases to help you efficiently utilize GPU resources for your applications.

Develop Mode

Develop mode is optimized for one-time GPU tasks, immediate results delivery. Suitable for batch processing and analysis tasks. You can execute tasks using this command

float16 run <your_app> --name <name>

Limitations

  • Maximum execution time: 30 seconds per task

  • Concurrent tasks: 1 task per user

Deploy Mode

Deploy mode enables continuous GPU application deployment with API endpoint access.

float16 deploy <your_app> --project-id <project_id>

After successful deployment, you will receive:

  • API endpoints for your application

  • API key for authentication

  • Two endpoint types available:

    • Function Endpoint : Container automatically stop after task completion

    • Server Endpoint : Container remains active during the time limit (30 seconds), supports multiple requests while container is active and automatic stop after time limit expiration.

Limitations

  • Applications must be built using FastAPI framework.

  • Single API key provided per deployment (Regenerate available)

  • Maximum execution time: 30 seconds per task

  • Concurrent tasks: 1 task per user

Last updated