Don't see what you need? Contact our team.
LiteLLM is a unified interface to call 100+ LLMs using the OpenAI format, providing a proxy server for multiple LLM providers.
LMCache is an LLM serving engine extension that stores and reuses KV caches across requests to reduce time-to-first-token (TTFT) and increase throughput. It integrates with vLLM to provide GPU-accelerated inference with shared KV cache management.
NVIDIA NeMo Framework is an end-to-end, cloud-native framework to build, customize, and deploy generative AI models anywhere.
The NVIDIA Container Toolkit allows users to build and run GPU accelerated containers.
Tools necessary for GPU and feature discovery for NVIDIA GPU driver container that allows the provisioning of the NVIDIA driver through the use of containers.
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other large language models.
34 images