Fine-tuning LLMs using Workbench
TOC
BackgroundScopePreparationLLM Model Fine-tuning StepsCreating a Notebook/VSCode InstancePrepare the ModelPreparing the Model Output LocationPreparing the DatasetHuggingface dataset formatLLaMA-Factory FormatPrepare Runtime ImageCreate Fine-tuning VolcanoJob TaskNote about using NFS PVCManage the TaskExperiment TrackingLaunch Inference Service Using the Fine-tuned ModelRunning on Non-Nvidia GPUsPreparationVerifying the Original Vendor Solution (Optional)Converting the Vendor Solution to Run as a Kubernetes Job/Deployment (Optional)Modify the vendor solution to run as a volcano jobExperiment TrackingSummaryBackground
Model fine-tuning and training often require adapting to different model structures, hardware devices, and appropriate parallel training methods. Alauda AI Workbench provides a comprehensive approach, from model development to training task submission, management, and experiment tracking, helping model and algorithm engineers quickly adapt and complete the entire model fine-tuning and training process.
In this solution, we provide an example of fine-tuning a Qwen3-0.6B model using LLaMA-Factory.
Alauda AI Workbench creates a Notebook/VSCode (CodeServer) container environment for development and debugging in a user namespace. Multiple Notebook/VSCode instances can be created within a namespace to preserve environments for different users and development tasks. Notebooks can request only CPU resources for development and cluster task submission, using the cluster's GPU resources to run tasks. GPU resources can also be requested for Notebooks, allowing tasks such as training and fine-tuning to be completed directly within the Notebook, regardless of whether the model requires distributed training.
In addition, you can use the platform's built-in MLFlow to record various metrics for each model fine-tuning training session, making it easier to compare multiple experiments and select the final model.
We use VolcanoJob, the Kubernetes-native resource manager, to submit cluster tasks from Notebooks. The Volcano scheduler supports queues, priorities, and various scheduling policies, facilitating more efficient cluster task scheduling and improving resource utilization.
This solution uses the LLaMA-Factory to launch fine-tuning and training tasks. However, for larger-scale model fine-tuning and training scenarios requiring parallel methods like Tensor Parallelism, Context Parallelism, and Expert Parallelism to train larger models, it may be necessary to use other tools, build custom fine-tuning runtime images, and modify the task launch script to adapt to different tools and models. For more detailed LLaMA-Factory usage and parameter configuration, please refer to: https://llamafactory.readthedocs.io/en/latest/index.html
Scope
- This solution is applicable to Alauda AI 1.3 and later.
- Fine-tuning and training of LLM models. If you need to train other types of models (such as YOLOv5), you will need to use different images, startup scripts, datasets, etc.
- This solution is applicable to x86/64 CPU and NVIDIA GPU scenarios.
- NPU scenarios require building a suitable runtime image based on this solution to be compatible.
Preparation
- You must first deploy the "Alauda AI Workbench" plugin to enable Workbench support (or deploy Kubeflow Base plugin to use Notebook with Kubeflow).
- Install
MLFlowplugin for experiment tracking.
LLM Model Fine-tuning Steps
Creating a Notebook/VSCode Instance
Go to Alauda AI - Workbench (or Advanced - Kubeflow - Notebook) then create a new Notebook or use an existing one. Note that it is recommended that the Notebook only use CPU resources. Submitting a cluster task from within the Notebook will request GPU resources within the cluster to improve resource utilization.
see Creating a Workbench for detailed steps to create a Notebook/VSCode instance.
Prepare the Model
Download the model Qwen/Qwen3-0.6B from Huggingface or other open-source
model sharing websites. Then upload the model to the model repository.
See Upload Models Using Notebook for detailed steps on uploading model files to the model repository.
Preparing the Model Output Location
Create an empty model in the model repository to store the output model. When configuring the fine-tuning output location, enter the model's Git repository URL.
Preparing the Dataset
Download and push the sample identity dataset to the dataset repository. This dataset is used to fine-tune the LLM to answer user questions such as "Who are you?"
- First, create an empty dataset repository under "Datasets" - "Dataset Repository".
- Upload the zip file to the notebook, unzip it, then navigate to the dataset directory. Use git lfs to push the dataset to the dataset repository's Git URL. The steps are similar to uploading the model.
- After the push is complete, refresh the dataset page and you should see that the file has been successfully uploaded in the "File Management" tab.
Note: The dataset format must be correctly recognized by the fine-tuning framework to be used in subsequent fine-tuning tasks. The following examples illustrate two common LLM fine-tuning dataset formats:
Huggingface dataset format
You can use the following code to check whether the dataset directory format can be correctly loaded by datasets:
LLaMA-Factory Format
If you use the LLaMA-Factory tool in the examples to complete training, the dataset format must conform to the LLaMA-Factory format. Reference: data_preparation
Prepare Runtime Image
Use the following Containerfile to build the training runtime image, or use the pre-built image: alaudadockerhub/fine_tune_with_llamafactory:v0.1.0. If you wish to use a different training framework, such as YOLOv5, you may need to customize the image and install the required dependencies within it.
After building the image, you need to upload it to the image registry of the Alauda AI platform cluster and configure it in the following tasks.
Note: The
git lfscommand is required within the image to download and upload the model and dataset files.
Containerfile
Create Fine-tuning VolcanoJob Task
In Notebook, create the YAML file for the task submission. Refer to the following example:
VolcanoJob YAML File
In the YAML file above, modify the following settings before submitting the task.
- Image: Contains the dependencies required for task execution.
- Environment variables: Locations of the original model, dataset, and output model for the task:
BASE_MODEL_URL: Change to the Git URL of the prepared model.DATASET_URL: Change to the Git URL of the prepared datasetidentity-alauda.OUTPUT_MODEL_URL: Create an empty model in the model repository to store the output model, and then enter the Git URL of this model.
- Required resources for the task, including:
- model cache PVC (optional): PVC to store the base model (when running pre-training jobs, you can omit the base model download step) and the dataset. You need to manually create and specify a PVC in above YAML file. Use this "model cache PVC" to accelerate multiple fine tuning experiments on several models.
- workspace PVC (required): PVC and working directory where the job runs. The checkpoints, output model files will be stored here. In above example, we use a temporary PVC which will be deleted after the job finishes, since the output model is uploaded to model repository already.
- Shared Memory: For multi-GPU/distributed training tasks, it is recommended to allocate at least 4 Gi of shared memory.
- CPU, memory, and GPU resources required for the task (based on the GPU device plugin deployed in the cluster).
- Task Execution Script:
- The example script above includes caching the model from the model repository to the PVC, caching the training dataset to the PVC, and pushing the model to the new model repository after fine-tuning. You can customize the script to fit your own tasks.
- The example script uses the
LLaMA-Factoryto launch the fine-tuning task, which can handle most LLM fine-tuning training scenarios. - Hyperparameters: In the example above, the hyperparameters are defined directly in the startup script. You can also use environment variables to read hyperparameters that may be adjusted repeatedly, making it easier to run and configure multiple times.
Note about using NFS PVC
When using NFS as workspace or model cache PVC, you should make sure below operations are performed to ensure the mounted NFS volume have correct filesystem permissions:
-
All K8s nodes that can use the NFS PVC must install
nfs-utils, e.g.yum install -y nfs-utils. -
Add
mountPermissions: "0757"settings when creating the NFS storage class, like:
Once you have completed above settings, open a terminal in Notebook and execute: kubectl create -f vcjob_sft.yaml to submit the VolcanoJob to the cluster.
Manage the Task
In the Notebook terminal
- Run
kubectl get vcjobto view the task list, thenkubectl get vcjob <task name>to view the status of theVolcanoJobtask. - Run
kubectl get podto view the pod status, andkubectl logs <pod name>to view the task logs. Note that for distributed tasks, multiple pods may exist. - If the pod is not created, run
kubectl describe vcjob <task name>orkubectl get podgroupsto view the Volcano podgroup. You can also check theVolcanoscheduler log to determine if the scheduling issue is due to insufficient resources, an inability to mount a PVC, or other scheduling issues. - After the task successfully executes, the fine-tuned model will be automatically pushed to the model repository. Note that the task will automatically generate a repository branch for push based on the time. When using the output model, be sure to select the correct version.
Run kubectl delete vcjob <task name> to delete the task.
Experiment Tracking
In the fine-tuning example task above, we use LLaMA-Factory to launch the fine-tuning task and set report_to: mlflow in the task configuration. This automatically outputs training metrics to the mlflow server. After the task starts, you can find the experiment tracking records under Alauda AI - Advanced - MLFlow and compare multiple executions. For example, you can compare the loss of multiple experiments.
Launch Inference Service Using the Fine-tuned Model
After the fine-tuning task completes, the model is automatically pushed to the model repository. You can use the fine-tuned model to launch the inference service and access it.
Note: In the example task above, we use LoRA fine-tuning method. Before uploading the model, the LoRA adapter was merged with the original model. This allows the output model to be directly published to the inference service. Launching inference service using base model and adapters are NOT supported in current version.
Steps:
- Go to Alauda AI - Model Repository, find the fine-tuned output model, go to Model Info - File Management - Edit Metadata, select "Task Type": "Text Classification", and "Framework": "Transformers".
- Click the Publish Inference API button, then select Custom Publishing.
- On the Publish Inference Service page, select vLLM inference runtime (select vLLM with the CUDA version that is supported by the cluster GPU nodes), fill in settings of storage, resource, GPU, then click Publish.
- Wait until the inference service is fully started, click the Experience button in the upper-right corner to start a conversation with the model. (Note: Models that include the
chat_templateconfiguration only have chat capabilities.)
Running on Non-Nvidia GPUs
When using a non-Nvidia GPU environment (e.g. NPU, Intel Gaudi, AMD etc.), you can follow the common steps below to fine-tune models, launch training tasks, and manage them in AML Notebook.
Note: The following steps can also be adapt to LLM pre-training and traditional ML senarios. These are general steps for converting a vendor solution to run on Alauda AI using Notebook and VolcanoJob.
Preparation
- Prerequisite: The vendor GPU driver and Kubernetes device plugin have been deployed in the cluster. The devices can be accessed within the pod created by Kubernetes.
Note: You will need to know the vendor GPU resource name and the total number of device resources in the cluster to facilitate subsequent task submission.
For example, for Huawei NPUs, you can apply for an NPU card using:
huawei.com/Ascend910:1. - Obtain the vendor-provided solution documentation and materials for fine-tuning on the current vendor's GPU. This typically includes:
- Solution documentation and steps. This can be done on Kubernetes or in a container using
nerdctl run. - Image to run the fine-tuning. For example, the vendor provides a fine-tuning solution using
LLaMA-Factoryand a correspondingLLaMA-Factoryimage (which may be included in the image). - Model to run the fine-tuning. Typically, vendor devices support a range of models. Use models that the device supports or the models provided in the vendor solution.
- Training data. Use the sample data provided in the vendor solution documentation or construct your own dataset in the same format.
- Task launch command and parameters. For example, the
LLaMA-Factoryframework fine-tuning solution uses thellamafactory-clicommand to launch the fine-tuning task and configure various parameters, including task hyperparameters, in a YAML file.
Verifying the Original Vendor Solution (Optional)
To ensure the correct execution of the vendor solution and reduce subsequent troubleshooting, you can first run it completely according to the vendor solution to verify that it works correctly.
This step can be skipped. However, if issues with task execution arise later, you can return to this step to verify that the original solution is the problem.
Converting the Vendor Solution to Run as a Kubernetes Job/Deployment (Optional)
If the vendor solution is already running as a Kubernetes job/deployment/pod, you can skip this step.
If the vendor solution uses a container execution method, such as nerdctl run, you can first use a simple Kubernetes job to verify that the solution runs correctly in a Kubernetes environment where the vendor device plugin is deployed.
Note: This step can rule out issues with volcano jobs being unable to schedule vendor GPU devices, so it can be verified separately.
Reference:
Modify the vendor solution to run as a volcano job
Refer to the following YAML definition
VolcanoJob YAML File
Experiment Tracking
Some fine-tuning/training frameworks automatically record experiment progress to various experiment tracking services. For example, the LLaMA-Factory and Transformers framework can specify recording of experiment progress to services such as mlflow and wandb. Depending on your deployment, you can configure the following environment variables:
MLFLOW_TRACKING_URI: The URL of the mlflow tracking server.MLFLOW_EXPERIMENT_NAME: The experiment name, typically using a namespace name. This distinguishes a group of tasks.
The framework also specifies the recording destination. For example, LLaMA-Factory requires specifying report_to: mlflow in the task parameter configuration YAML file.
After a training task begins, you can find the corresponding task in the Alauda AI - "Advanced" - MLFlow interface and view the curves of each recorded metric in "Metrics" or the parameter configuration for each execution. You can also compare multiple experiments.
Summary
Using the Alauda AI Notebook development environment, you can quickly submit fine-tuning and training tasks to a cluster using YAML and command-line tools, and manage the execution status of these tasks. This approach allows you to quickly develop and customize model fine-tuning and training steps, enabling operations such as LLM SFT, preference alignment, traditional model training, and multiple experimental comparisons.