# Mode

The Serverless GPU service offers two operational modes: Develop and Deploy. Each mode is designed for different use cases to help you efficiently utilize GPU resources.

### Development Mode

Development mode is optimized for one-time GPU tasks with immediate result delivery.\
Suitable for batch processing, short-term experiments, and analytical tasks.

**Command:**

```
float16 run <your_app> --name <name>
```

**Limitations:**

* Max execution time: 60 seconds per task
* Max concurrency: up to 8 tasks (not guaranteed; depends on system load)

**Pricing:**

* On-demand: $0.006 per second (\~$21.6/hour)

### Spot Mode (Within development mode)

Spot mode is designed for users who need longer tasks (> 60s), at lower cost.

**Note:** Spot tasks can be interrupted by on-demand tasks and resumed automatically when resources free up.

**Command:**

```sh
float16 run --spot --name <task_name> --budget <budget_value>
```

* Tasks may pause and resume based on resource availability
* No automatic task staging – users must handle resume logic

**Limitations:**

* No fixed execution time limit
* Max concurrency: up to 8 tasks (not guaranteed)

**Pricing:**

* Spot: $0.0012 per second (\~$4.32/hour)

### Production Mode

Production mode allows continuous deployment of GPU applications via API endpoints.\
Best suited for serving models and real-time inference.

**Command:**

```
float16 deploy <your_app> --project-id <project_id>
```

After deployment, you will receive:

* API endpoints for your app
* API key for authentication

**Endpoint Types:**

* **Function Endpoint**
  * Executes your task and shuts down the container after completion
  * Max execution time: 120 seconds per request

**Limitations:**

* Must be built with **FastAPI**
* One API key per deployment (regeneration supported)
* Max concurrency: up to 8 tasks (not guaranteed)

**Pricing:**

* 💵 On-demand: $0.006 per second (\~$21.6/hour)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.float16.cloud/getting-started/serverless-gpu/quick-start/mode.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
