Applicable Models
This document applies to the following models. You only need to modify the model name during deployment.Note: Transformers-compatible models use repositories with the -hf suffix. Compared to versions without the suffix, only the config.json file differs, while the weight files remain identical.The following deployment example uses MiniMax-M1-40k-hf
Environment Setup
- Python: 3.9+
venv, conda, or uv) to avoid dependency conflicts.
Install Transformers, Torch, and related dependencies with the following commands:
Running with Python
Ensure that all dependencies are correctly installed and CUDA drivers are properly configured.The following example demonstrates how to load and run the MiniMax-M1 model with Transformers:
Accelerating Inference with Flash Attention
Flash Attention is an efficient attention implementation that accelerates model inference.Make sure your GPU supports Flash Attention, as some older GPUs may not be compatible.
First, install the
flash_attn package:
from_pretrained:
Support
If you encounter issues while deploying the MiniMax models:- Contact our technical support team via “api@minimaxi.com”.
- Open an Issue in our GitHub repository.