Wink - AI原生创新，忠于用户，专属智能体验

My family members are planning to go out and want to use my OpenWebUI. The server security is all set up with SSL and a custom domain. But I have a question: what happens if multiple people use the same LLM at the same time or nearly at the same time? Does the system start a separate LLM instance for each user, or are all requests squeezed into a single instance? Is the context length shared among users?

![LLM Concurrency Diagram](https://example.com/llm-concurrency.png)

**The key lies in your inference engine and configuration.**

**If you're using llama.cpp:**

- With the default single-slot configuration, the second request will queue up and wait for the first one to complete before being processed.

- You can configure multiple slots to support concurrency, but at the cost of dividing the context length equally. For example, a 32K context becomes 16K per request with 2 slots, or 8K with 4 slots.

- This pre-allocation mechanism can lead to resource waste: a request might only use a small portion of the context, but the system has reserved the full slot capacity for it.

**If you switch to vLLM:**

- Its KV cache is pooled, with all requests sharing the full context window (e.g., 32K).

- Each request dynamically occupies cache space as needed, without requiring pre-reservation of fixed capacity.

- This means multiple requests can be truly processed in parallel, with each request able to use nearly the full context length when needed, as long as the total usage doesn't exceed the pool size.

So, if it's just for occasional family use, llama.cpp with 2 concurrent slots might be sufficient. But if you want to make more efficient use of VRAM, especially when handling conversations of varying lengths, vLLM's pooled design offers more flexibility.

![vLLM Cache Pool Diagram](https://example.com/vllm-pool.png)

Finally, don't forget to check your hardware resources—especially VRAM. With more concurrent requests, VRAM usage will increase significantly, so you don't want your family to crash the server while using it.

Wink Pings

What Happens When Multiple People Use My LLM Simultaneously?