OpenAI Compatible
Last updated
Last updated
/{dedicate}/v1/chat/completions
/{dedicate}/v1/completions
s
The endpoint provides an API key to access it.
The endpoint does not have a rate limit, but it does have a limited batch size (similar to concurrency). The batch size is determined by GPU VRAM.
The endpoint has an option to automatically apply a chat template or use the prompt from the request. The chat template is determined by the chat_template
attribute in the tokenizer_config.json
file."