OpenAI Compatible
Available Endpoint
/{dedicate}/v1/chat/completions
/{dedicate}/v1/completions
Read the full details about request parameters
Authentication
The endpoint provides an API key to access it.
Rate limit
The endpoint does not have a rate limit, but it does have a limited batch size (similar to concurrency). The batch size is determined by GPU VRAM.
Chat template
The endpoint has an option to automatically apply a chat template or use the prompt from the request. The chat template is determined by the chat_template
attribute in the tokenizer_config.json
file."
Last updated