# Context caching

### What is context caching ?

Context caching is the ability to cache the KV (key-value) computations of the same context between requests.&#x20;

This feature helps speed up inference time by over 10 times when the request has the same context.&#x20;

The speedup is about 2 times when benchmarked with evaluation datasets like M3Exam.

### How does context caching is work ?&#x20;

Context caching is automatically triggered when requests have the same context within the same batch size.&#x20;

For example, if the endpoint has a batch size of 8 and each request has 1,024 tokens, and the requests share the same context for 900 tokens.

Instead of the system calculating 1,024 \* 8 = 8,192 tokens.

The system will calculate ((1,024 - 900) \* 8) + 1,024 = 2,016 tokens.&#x20;

This significantly reduces the compute required and improves the endpoint's latency.

### Use case

* RAG
* Few-shot prompting
* Code Co-pilot
