NVIDIA¶
NVIDIA AI Endpoints provides API access to a range of models through a unified OpenAI-compatible interface.
Provider file: ~/.codefreedom/proxy/config/providers/nvidia.yaml
Environment Variables¶
| Variable | Description | Required |
|---|---|---|
NVIDIA_API_KEY |
API key from NVIDIA Build | Yes |
NVIDIA_BASE_URL |
API base URL | No (default: https://integrate.api.nvidia.com/v1) |
Models¶
| Model | Context | Max Output | Vision | Reasoning |
|---|---|---|---|---|
| DeepSeek-V4-Flash | 1,000,000 | 384,000 | No | Yes |
| DeepSeek-V4-Pro | 1,000,000 | 384,000 | No | Yes |
| GLM-5.1 | 204,800 | 8,192 | No | Yes |
| Kimi-K2.6 | 256,000 | 16,384 | Yes | Yes |
| Step-3.7-Flash | 262,144 | 16,384 | Yes | Yes |
Configuration¶
All NVIDIA models share a common pattern. Example for DeepSeek-V4-Flash:
model_list:
- model_name: NVIDIA/DeepSeek-V4-Flash
litellm_params:
model: openai/deepseek-ai/deepseek-v4-flash
api_base: os.environ/NVIDIA_BASE_URL
api_key: os.environ/NVIDIA_API_KEY
timeout: 300
drop_params: true
extra_body:
stream_options:
include_usage: true
model_info:
id: "nvidia-deepseek-v4-flash"
db_model: false
supports_reasoning: true
mode: chat
context_window: 1000000
max_tokens: 1000000
max_output_tokens: 384000
supports_system_messages: true
supports_native_streaming: true
supports_vision: false
supported_openai_params:
- tools
- tool_choice
- parallel_tool_calls
- response_format
- max_tokens
- max_completion_tokens
- stream
- stream_options
- temperature
- top_p
- stop
- thinking
- reasoning_effort
See the recipe YAML for all model entries including model-specific extra_body overrides (GLM-5.1 uses temperature/top_p/top_k; Kimi-K2.6 uses chat_template_kwargs.thinking: true).
Enabling¶
- Uncomment model entries in
nvidia.yaml. - Ensure
providers/nvidia.yamlis in theincludelist inconfig.yaml. - Set
NVIDIA_API_KEYin~/.codefreedom/.env.proxy.secrets. - Restart the proxy.