Mode
Serverless GPU Services
The Serverless GPU service offers two operational modes, Develop and Deploy. Each mode is designed for different use cases to help you efficiently utilize GPU resources for your applications.
Development Mode
Development mode is optimized for one-time GPU tasks, immediate results delivery. Suitable for batch processing and analysis tasks. You can execute tasks using this command
Limitations
Maximum execution time: 30 seconds per task
Concurrent tasks: 1 task per user
Spot Mode
We also offer another option in Development Mode for users who need to run tasks for more than 30 seconds. This new mode is called Spot Mode.
Usage:
In this mode, tasks can be interrupted by on-demand tasks if resources are insufficient. Once the on-demand task is completed, the spot task will automatically resume.
The system does not handle task staging. Users must manage task staging themselves, ensuring the task resumes from the last uncompleted position.
Production Mode
Production mode enables continuous GPU application deployment with API endpoint access.
After successful deployment, you will receive:
API endpoints for your application
API key for authentication
Two endpoint types available:
Function Endpoint : Container automatically stop after task completion
Server Endpoint : Container remains active during the time limit (30 seconds), supports multiple requests while container is active and automatic stop after time limit expiration.
Limitations
Applications must be built using FastAPI framework.
Single API key provided per deployment (Regenerate available)
Maximum execution time: 30 seconds per task
Concurrent tasks: 1 task per user
Last updated