Mode
Serverless GPU Services
Last updated
Serverless GPU Services
Last updated
The Serverless GPU service offers two operational modes, Develop and Deploy. Each mode is designed for different use cases to help you efficiently utilize GPU resources for your applications.
Development mode is optimized for one-time GPU tasks, immediate results delivery. Suitable for batch processing and analysis tasks. You can execute tasks using this command
Maximum execution time: 60 seconds per task
Concurrent tasks: 1 task per user
We also offer another option in Development Mode for users who need to run tasks for more than 60 seconds. This new mode is called Spot Mode.
Usage:
In this mode, tasks can be interrupted by tasks if resources are insufficient. Once the on-demand task is completed, the spot task will automatically resume.
The system does not handle task staging. Users must manage task staging themselves, ensuring the task resumes from the last uncompleted position.
Production mode enables continuous GPU application deployment with API endpoint access.
After successful deployment, you will receive:
API endpoints for your application
API key for authentication
Two endpoint types available:
Function Endpoint : Container automatically stop after task completion
Server Endpoint : The container remains active for up to 30 seconds and supports multiple requests while running. Each request extends the timeout by another 30 seconds. It automatically stops when the time limit expires.
Applications must be built using FastAPI framework.
Single API key provided per deployment (Regenerate available)
Maximum execution time: 30 seconds per task
Concurrent tasks: 1 task per user