OpenAI Compatible

Available Endpoint

/{dedicate}/v1/chat/completions

/{dedicate}/v1/completions

Read the full details about request parameters

Authentication

The endpoint provides an API key to access it.

Rate limit

The endpoint does not have a rate limit, but it does have a limited batch size (similar to concurrency). The batch size is determined by GPU VRAM.

Chat template

The endpoint has an option to automatically apply a chat template or use the prompt from the request. The chat template is determined by the chat_template attribute in the tokenizer_config.json file."

PreviousFeatures NextLong context and Auto scheduler

Last updated 10 months ago