📚
Docs - Float16
homeapp
  • 🚀GETTING STARTED
    • Introduction
    • Account
      • Dashboard
      • Profile
      • Payment
      • Workspace
      • Service Quota
    • LLM as a service
      • Quick Start
        • Set the credentials
      • Supported Model
      • Limitation
      • API Reference
    • One Click Deploy
      • Quick Start
        • Instance Detail
        • Re-generate API Key
        • Terminate Instance
      • Features
        • OpenAI Compatible
        • Long context and Auto scheduler
        • Quantization
        • Context caching
      • Limitation
      • Validated model
      • Endpoint Specification
    • Serverless GPU
      • Quick Start
        • Mode
        • Task Status
        • App Features
          • Project Detail
      • Tutorials
        • Hello World
        • Install new library
        • Prepare model weight
        • S3 Copy output from remote
        • R2 Copy output from remote
        • Direct upload and download
        • Server mode
        • LLM Dynamic Batching
        • Train and Inference MNIST
        • Etc.
      • CLI References
      • ❓FAQ
    • Playground
      • FloatChat
      • FloatPrompt
      • Quantize by Float16
  • 📚Use Case
    • Q&A Bot (RAG)
    • Text-to-SQL
    • OpenAI with Rate Limit
    • OpenAI with Guardrail
    • Multiple Agents
    • Q&A Chatbots (RAG + Agents)
  • ✳️Journey
    • ✨The Beginner's LLM Development Journey
    • 📖Glossary
      • [English Version] LLM Glossary
      • [ภาษาไทย] LLM Glossary
    • 🧠How to install node
  • Prompting
    • 📚Variable
    • ⛓️Condition
    • 🔨Demonstration
    • ⌛Loop
    • 📙Formatting
    • 🐣Chat
    • 🔎Technical term (Retrieve)
  • Privacy Policy
  • Terms & Conditions
Powered by GitBook
On this page
  • Step 1 : Prepare Your Script
  • Step 2 : Create project
  • Resulting Files
  • Step 3 : Deploy Script
  • Step 4 : Endpoint Request
  • Explore More
  1. GETTING STARTED
  2. Serverless GPU
  3. Tutorials

Server mode

Get Endpoint via Float16

PreviousDirect upload and downloadNextLLM Dynamic Batching

Last updated 1 month ago

This tutorial guides you through deploying a simple FastAPI "Hello World" application using Float16's deployment mode.

  • Float16 CLI installed

  • Logged into Float16 account

  • VSCode or preferred text editor recommended

Step 1 : Prepare Your Script

(server.py)

import os
import uvicorn
import asyncio
from fastapi import FastAPI
from fastapi.responses import JSONResponse
from utils import *
app = FastAPI()

@app.get("/hello")
async def read_root():
    return {"message": f"{say_hello() say_world()}"}

async def main():
    config = uvicorn.Config(
        app, host="0.0.0.0", port=int(os.environ["PORT"])
    )
    server = uvicorn.Server(config)
    await server.serve()

(utils.py)

def say_hello():
    return "hello"
    
def say_world():
    return "world"
  • Save the script in a selected folder

  • Navigate to the folder in your terminal

  • Ensure the port is set to "port=int(os.environ['PORT'])"

  • Ensure the server is serve with "async def main"

Step 2 : Create project

float16 project create --instance h100

Resulting Files

  • float16.conf: Contains your project ID

  • requirements.txt: Initially empty

Step 3 : Deploy Script

float16 deploy server.py

After successful deployment, you'll receive:

  • Function Endpoint

  • Server Endpoint

  • API Key

Example:

Function Endpoint: http://api.float16.cloud/task/run/function/x7x2DFl8zU   
Server Endpoint: http://api.float16.cloud/task/run/server/x7x2DFl8zU       
API Key: float16-r-QoZU7uNlgDIFJ5IMrBtOCjuzVBlC

## curl
curl -X GET "{FUNCTION-URL}/hello" -H "Authorization: Bearer {FLOAT16-ENDPOINT-TOKEN}"

curl -X GET "http://api.float16.cloud/task/run/function/x7x2DFl8zU/hello" -H "Authorization: Bearer float16-r-QoZU7uNlgDIFJ5IMrBtOCjuzVBlC"

Step 4 : Endpoint Request

Use the provided endpoints with the API key (bearer token) to make requests.

Endpoint Request Example:

  • Path: /hello

  • Expected Response: {"message": "Hello World!"}

Congratulations! You've successfully use your first server mode on Float16's serverless GPU platform.

Explore More

Learn how to use Float16 CLI for various use cases in our tutorials.

Happy coding with Float16 Serverless GPU!

If you cannot create new project,

To understand the differences between function and server modes, refer to .

🚀
https://github.com/float16-cloud/examples/tree/main/official/deploy/fastapi-helloworld

Hello World

Launch your first serverless GPU function and kickstart your journey.

Install new library

Enhance your toolkit by adding new libraries tailored to your project needs.

Copy output from remote

Efficiently transfer computation results from remote to your local storage.

Deploy FastAPI Helloworld

Quick start to deploy FastAPI without change the code.

Upload and Download via CLI and Website

Direct upload and download file(s) to server.

More examples

Open source from community and Float16 team.

the dedicated section
learn more