By Vitalij Neverkevic in ARTIFICIAL INTELLIGENCE — Sep 11, 2024

InstructLab Deep Dive Part 2

Index:
InstructLab Deep Dive Part 1
What is InstructLab and what problem does it solve?

InstructLab Deep Dive Part 2
Installing InstructLab on Fedora 40 with enabled CUDA.

InstructLab Deep Dive Part 3
Installing InstructLab on Fedora 40 with Ansible

InstructLab Deep Dive Part 4
Building InstructLab Podman/Docker container and running it on runpod.io

InstructLab Deep Dive Part 5
Advanced techniques with InstructLab, AI Agents and Function calling

Introduction

In this guide, I will walk you through the process of setting up InstructLab on a Fedora 40 virtual machine, using Proxmox VE8 as the virtualization environment. We'll cover the essential steps, including provisioning the VM, passing through a GPU, installing Fedora 40, setting up the necessary NVIDIA drivers and CUDA software, and finally, installing InstructLab.

Setting up a VM

Create a VM on Proxmox with the following parameters:

  - System tab
      Machine: Q35
      BIOS: UEFI
      EFI Storage: NVME0
  - Disks tab
      Disk size: 128GB
      SSD emulation: true
  - CPU tab
      Cores: 8
      Type: host
  - Memory tab
      16384
      Ballooning Device: false

These VM parameters are selected to optimize the virtual machine for GPU passthrough. The Q35 machine type and UEFI BIOS settings are chosen to ensure compatibility with modern hardware, particularly when passing through a GPU as it will allow us to disable default secure boot. In general UEFI is better for GPU passthrough and your motherboard should be preferably set to UEFI BIOS as well. The 128GB disk size ensures ample space for the operating system, CUDA libraries, model downloads (unquantized Granite model is around 15GB), while SSD emulation improves disk performance. I've set 8 vCPU cores just to speed things up, but make sure you set the CPU to "host" type. Finally, 16GB of RAM without ballooning ensures consistent and reliable memory allocation as ballooning might not work correctly with GPU passthrough.

Next, proceed with installing Fedora 40 as you normally would. During the installation, make sure to manually configure the storage settings. By default, Fedora doesn't allocate the full disk size to the root partition, which can lead to the need for resizing it later. To avoid this extra step, manually set up the storage to ensure that the root partition utilizes the entire disk space. After the installation is complete, reboot the OS, update it and then power it down.

Next we will add a GPU to a VM: Select a VM you have created, click on a "Hardware" tab, click "Add" -> "PCI Device" and in the menu select "Raw Device" find your GPU, then tick "All Functions", "ROM-Bar", "PCI-Express" and this will add the GPU.

Next we need to edit VM config: In a host terminal, click on the name of your hypervisor and then go to ">_ Shell".

nano /etc/pve/qemu-server/104.conf     # 104 is the ID of your VM

cpu: host,hidden=1,flags=+pcid          # Edit the line that starts with cpu

Next, you need to disable "Secure Boot" in the VM BIOS. To do this, select your VM from the list of VMs in Proxmox, then go to >_Console. Be prepared to press ESC as soon as the VM starts booting. Boot the VM, press ESC when prompted, and you’ll enter the VM BIOS. From there, navigate to "Device Manager" -> "Secure Boot Configuration" -> "Attempt Secure Boot." Use the spacebar to untick this option, then press F10 to save your changes. Confirm by pressing 'Y', then press ESC several times and select "Continue" to boot into the VM.

After booting into the OS, you can verify that your GPU is recognized by running the following command:

lspci | grep NVID

You should see output similar to this:

[admin@fedora40-srv-ilab ~]$ lspci | grep NVID
01:00.0 VGA compatible controller: NVIDIA Corporation GA106 [RTX A2000 12GB] 
01:00.1 Audio device: NVIDIA Corporation GA106 High Definition Audio Controller

Before proceeding with the NVIDIA driver installation, it's important to blacklist the nouveau driver, which is the open-source driver that could interfere with the NVIDIA driver installation. To do this, run as root and then power off the VM:

# Blacklist nouveau driver
echo "blacklist nouveau" | sudo tee -a /etc/modprobe.d/blacklist.conf
poweroff

Before you install the NVIDIA drivers, it’s a good idea to create a snapshot of the VM. This allows you to easily roll back if anything goes wrong during the installation process.

To create a snapshot in Proxmox:
1. Select your VM from the list.
2. Go to the "Snapshots" tab.
3. Click on "Take Snapshot."
4. Give your snapshot a name (e.g., "Pre-NVIDIA Driver Install") 
5. Click "Create"

With the snapshot created, you can now proceed with the NVIDIA driver installation knowing you can revert to this state if needed.

To ensure your system can fully utilize the GPU, you'll need to install the official NVIDIA drivers. Start by adding the NVIDIA CUDA repository, which provides the necessary packages followed by installing the NVIDIA driver along with essential utilities.

# Add NVIDIA Driver and CUDA repository
sudo dnf config-manager --add-repo \
https://developer.download.nvidia.com/compute/cuda/repos/fedora39/x86_64/cuda-fedora39.repo

# Install NVIDIA driver and utilities
sudo dnf install -y nvidia-driver nvidia-modprobe dnf-plugin-nvidia

# Once the installation is complete, reboot your system to apply the changes
sudo reboot

Install NVidia CUDA support

# Install CUDA support 
sudo dnf install nvidia-driver-cuda nvidia-driver-cuda-libs nvidia-driver-NVML
reboot

Install NVidia CUDA toolkit

# Install CUDA-Toolkit 12.4 
sudo dnf install cuda-toolkit-12-4

To ensure that your system properly recognizes and uses the CUDA toolkit, you’ll need to add several environment variables to your bash profile. This step configures your environment so that the CUDA toolkit and related utilities are easily accessible for all terminal sessions.

vi ~/.bashrc

export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64
export PATH=$PATH:/usr/local/cuda/bin
export CUDACXX=/usr/local/cuda-12/bin/nvcc
export CMAKE_ARGS="-DLLAMA_CUBLAS=on -DCMAKE_CUDA_ARCHITECTURES=all-major"
export FORCE_CMAKE=1

Save the changes and close the editor. To apply these changes, you can either restart your terminal or run:

source ~/.bashrc

Install nvtop to monitor GPU

sudo dnf install -y nvtop

Verifying NVIDIA GPU Configuration

After setting up the NVIDIA drivers and CUDA toolkit, it’s important to verify that your GPU is configured correctly and functioning as expected. If you can see your GPU and nvcc version then your drivers are configured correctly.

# nvtop is a real-time GPU monitoring tool similar to htop
nvtop

# nvidia-smi is a command-line tool that provides information about GPU
nvidia-smi

# Continuous display with a refresh of 1s
watch -n 1 nvidia-smi

# nvcc is the NVIDIA CUDA Compiler
nvcc --version

Installing Podman and NVIDIA Container Toolkit

This step is optional if you don't intend to run InstructLab in a container. To run GPU-accelerated containers, you'll need to install Podman along with the NVIDIA Container Toolkit. These tools will allow you to create and manage containers that can access your NVIDIA GPU.

First, install Podman, a daemonless container engine that is a drop-in replacement for Docker:

# Install start and enable Podman 
sudo dnf install podman
sudo systemctl start --now podman

The NVIDIA Container Toolkit enables GPU support in containers. Follow these steps to install it:

# Add the NVIDIA Container Toolkit repository to your system
curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \
sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repo

# Install the NVIDIA Container Toolkit Packages
sudo yum install -y nvidia-container-toolkit

# Generate a CDI Specification
sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

# Verify that everything is set up correctly by running a test container
podman run --rm \
--device nvidia.com/gpu=all \
--security-opt=label=disable \
ubuntu nvidia-smi -L

Installing Necessary Packages for InstructLab

Before you can install and use InstructLab, you need to ensure that your system has the required development tools and libraries. These tools will allow you to build and compile software from source, which is essential for installing InstructLab.

# Install packages
sudo dnf groupinstall -y "Development Tools"
sudo dnf install -y gcc gcc-c++ clang17 make git wget vim curl

When working with CUDA on Fedora 39/40, you might encounter an issue where CUDA is not fully compatible with the latest version of GCC (GCC v14.1+). This is because CUDA, as of now, is typically validated and optimized for use with specific versions of GCC, and it may not support the very latest GCC releases due to changes in the compiler's ABI (Application Binary Interface) or other underlying differences. Clang is another widely used C/C++ compiler that is often more permissive and compatible with CUDA. Unlike GCC, Clang maintains a higher level of compatibility with CUDA, especially in situations where the latest GCC versions are not yet supported by CUDA. By using Clang 17, which is supported by CUDA, we can bypass the compatibility issues associated with GCC v14. This ensures that the llama-cpp-python package, which is an important component for running LLaMA-based models with CUDA, is compiled correctly.

Setting Up mini Conda for InstructLab

InstructLab currently requires Python 3.11 for proper operation and compatibility. However, the latest Python version available on Fedora 40 is Python 3.12, which is not yet fully supported by InstructLab.

To overcome this challenge, we will use mini conda to create isolated Python environments with the specific version required by InstructLab. Mini Conda is a lightweight distribution of the Anaconda package manager, which allows you to manage multiple environments with different Python versions and dependencies on a single system.

mkdir -p ~/miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh
bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
rm -rf ~/miniconda3/miniconda.sh
~/miniconda3/bin/conda init bash


~/miniconda3/bin/conda init zsh

Creating a Python 3.11 Environment

Once mini conda is installed and initialized, you can create a new environment specifically for InstructLab with Python 3.11. This ensures that InstructLab runs in a compatible environment, free from conflicts with other Python versions on your system.

# Create and activate virtual environment
conda create -n instructlab python=3.11
conda activate instructlab

Install Pytorch from Conda

conda install pytorch torchvision torchaudio pytorch-cuda=12.4 -c pytorch -c nvidia

Installing InstructLab

With your environment now properly configured, the next step is to install InstructLab. This involves using the pip package manager to install the InstructLab package along with CUDA support, ensuring that your installation can leverage GPU acceleration.

CUDAHOSTCXX=$(which clang++-17) pip install instructlab[cuda] \
-C cmake.args="-DLLAMA_CUDA=on"

Explanation:

CUDAHOSTCXX=$(which clang++-17): This part of the command sets the CUDAHOSTCXX environment variable to the path of the Clang++ 17 compiler. This ensures that Clang 17 is used to compile any CUDA-related code, which is necessary due to the previously mentioned incompatibility between CUDA and GCC v14.
pip install instructlab[cuda]: This command installs the InstructLab package along with its CUDA dependencies. The [cuda] extra tells pip to include additional packages required for GPU acceleration.
-C cmake.args="-DLLAMA_CUDA=on": This CMake argument is passed during the installation process to explicitly enable CUDA support in the LLaMA component of InstructLab. It ensures that the installation process configures InstructLab to utilize CUDA for GPU-accelerated operations.

Verifying InstructLab Installation with CUDA

After successfully installing InstructLab, it’s important to verify that everything is working correctly, particularly with CUDA support. The following steps will guide you through initializing InstructLab, downloading a default model, and serving it while monitoring GPU usage to ensure CUDA is being utilized.

Start by initializing InstructLab, which will set up the necessary configurations and download the default taxonomy:

# Accept all defaults 
ilab config init

Next, download the default large language model (LLM) provided by InstructLab:

# Download default model
ilab model download

Before serving the model, it’s useful to monitor your GPU’s activity to confirm that CUDA is being utilized. Open two separate terminal windows:

# watch -n 1 nvidia-smi command runs nvidia-smi every second
# In the first terminal, run:
watch -n 1 nvidia-smi

# nvtop provides a real-time graphical representation of GPU usage
# In the second terminal, run:
nvtop

Now, serve the downloaded Merlinite model:

ilab model serve

Open another terminal window to start a conversation with the served model:

ilab model chat

While chatting with the model, observe the outputs from nvidia-smi and nvtop in the other terminal windows. As you interact with the model, you should see GPU activity increase in both nvidia-smi and nvtop. This indicates that CUDA is being used to accelerate the processing, confirming that your setup is correctly utilizing the GPU for InstructLab operations.

Conclusion

In this tutorial, we’ve walked through the complete process of setting up InstructLab on a Fedora 40 virtual machine, from configuring the environment to verifying GPU acceleration with CUDA. Whether you’re fine-tuning models or running AI tasks, this setup provides a solid foundation for leveraging the power of GPU acceleration in your projects. Now that everything is up and running, you’re ready to explore the capabilities of InstructLab and take your AI development to the next level.