Before deployment, please review the official vLLM documentation to verify hardware compatibility.
Supported Models
This guide applies to the following models. You only need to update the model name during deployment. The following examples use MiniMax-M1-40k:Environment Requirements
- OS: Linux
- Python: 3.9 – 3.12
- GPU:
- Compute capability ≥ 7.0
- Memory requirements:
- Model weights require: 495 GB
- Each 1M context tokens require: 38.2 GB
- Recommended configurations (adjust based on workload):
- 8 × 80 GB GPUs: Supports up to 2M tokens of context
- 8 × 96 GB GPUs: Supports up to 5M tokens of context
- Text01: vLLM ≥ 0.8.3
- M1: vLLM ≥ 0.9.2
- Versions 0.8.3 – 0.9.1 may cause unsupported model errors or precision loss. See details: vLLM PR #19592
architectures in config.json to MiniMaxText01ForCausalLM. See: MiniMax-M1 Issue #21 for details.
Deploy with Python
We recommend using a virtual environment (venv, conda, or uv) to avoid dependency conflicts. Install vLLM in a clean Python environment:MiniMax-M1 model from Hugging Face.
Deploy with Docker
Docker ensures a consistent and portable environment. First, fetch the model (make sure Git LFS is installed):Verify Deployment
Once started, test the OpenAI-compatible API with the following command.Experimental: Enable vLLM V1
Benchmarks show V1 delivers 30–50% better latency and throughput under medium–high concurrency, but slightly worse performance under single-thread workloads (due to missing Full CUDA Graph, which will be fixed in future releases). This feature has not yet been released, so it must be installed from source code.Troubleshooting
Hugging Face network issues
If network errors occur when downloading models, set a mirror:No module named ‘vllm._C’
The following error means a local folder named vllm conflicts with the installed package. Commonly happens when cloning the repo to run examples/. Rename the folder to fix it. See vLLM Issue #1814 for more details.MiniMax-M1 model not supported
Your vLLM version is too old. Please upgrade to v0.9.2+. For versions between 0.8.3 – 0.9.1, see Environment Requirements above for more details.Getting Support
If you encounter issues while deploying MiniMax models:- Contact our support team via api@minimax.io
- Open an Issue on our GitHub repository