Long context and Auto scheduler
Batch size, (Context length) Max input length, Number token

A higher batch size increases the parallel LLM processes at the same time.
A longer max input length increases the number of tokens in the prompt.
The number of tokens determines the average maximum tokens per request.
How does long context is work ?
1. The maximum context length is read from max_position_embeddings in config.json.
max_position_embeddings in config.json.2. Ensure the VRAM instance is sufficient.
3. One Click Deploy automatically sets the maximum context length.
How does auto scheduler is work ?
1. Exceeding batch size
2. Exceeding number of token
Last updated