The endpoint does not have a rate limit, but it does have a limited batch size (similar to concurrency). The batch size is determined by GPU VRAM.
Chat template
The endpoint has an option to automatically apply a chat template or use the prompt from the request. The chat template is determined by the chat_template attribute in the tokenizer_config.json file."