# Features

All features are automatically enabled by default to improve overall performance.

### **OpenAI Compatible**

Supports OpenAI clients for chat and completion use cases, and Continue.dev for Co-Pilot use cases.

### **Batch size, Max input length, Number token**

Supports long contexts up to 1M context length and dynamic batching to improve utilization.

### **Quantization**

Reduces memory footprint to deploy LLMs to half the original model size.

Improves inference speed by 2 times.

### **Context caching**

Reduces redundant computation when requests have the same context.

Improves inference speed by 1.5 to 2 times on evaluation datasets and over 10 times in ideal scenarios.
