The Art of GPU Occupation

littletree / 2023-08-25

I have been developing 2030 project almost more than one month.

I primarily focus on analysing and cleaning data. Recently, I’ve delved into some advanced work about LLM. Specifically, I use LoRA to fine-tune Chinese-Llama-2-7b.

In the beginning, five postgraduates were working on similar tasks as mine on different dataset. Sometimes, serval Ph.D students also use the same server. However, we only have 8 NVIDIA A800 on the server for fine-tuning LLM.

It is quite annoying to see available GPUs slip away simply because we were slightly slow in executing commands. Therefore, efficiently securing enough GPUs under limited resources becomes crucial.

In summary, my goal is to design a Python program that can help me grab available GPUs and once I acquire $n$ GPUs, then start running the real program.

Before delving into the solution, I’d like to share something else. Creating a program to grab GPUs aligns with the first of the Three Virtues of a Programmer — Laziness. This is because automated GPU allocation saves me more time and energy compared to do it manually. Moreover, automation enhances efficiency and accuracy. The process of automating tasks is also a great way to hone my programming skills.

Now, let’s delve into the source codes and briefly discuss the main idea.

To grab GPUs, we first need to gather the information about GPUs. For this, I leverage the Python libaray subprocess to execute the nvidia-smi command, which provides details about GPU status. The get_gpu_mem function retrieves the memory of a specified GPU while get_free_gpus returns available GPUs as a list.

def get_gpu_mem(gpu_id):
    gpu_query = subprocess.check_output(['nvidia-smi', '--query-gpu=memory.used', '--format=csv,nounits,noheader'])
    gpu_memory = [int(x) for x in gpu_query.decode('utf-8').split('\n')[:-1]]
    return gpu_memory[gpu_id]

def get_free_gpus()->list:
    gpu_query = subprocess.check_output(['nvidia-smi', '--query-gpu=memory.used', '--format=csv,nounits,noheader'])
    gpu_memory = [int(x) for x in gpu_query.decode('utf-8').split('\n')[:-1]]
    free_gpus = [i for i, mem in enumerate(gpu_memory) if mem < 100]
    return free_gpus

So, how to occupy available GPUs ? I employ Python’s multiprocessing library to achieve this. If there are $n$ avaible GPUs, $n$ subprocesses will their own GPU.

In the main process for-loop, the update rate of the occupy_num variable lags far behind the actual code execution. As a result, the occupy_all_gpus function spawns numerous subprocesses. In fact, the total number of subprocesses exceeds $n$ . However, thanks to the Lock mechanism, only $n$ subprocesses get to occupy GPUs and grab GPUs orderly.

To occupy a GPU essentially means claiming its memory. In the occupy_gpu function, I generate a high-dimensional torch tensor on the designated GPU and then make the subprocess enter a sleep state.

def occupy_gpu(gpu_id:int, n, occupy_num, ocpy_gpus, lock, a_dim=100000):
    with lock:
        if get_gpu_mem(gpu_id) < 100 and occupy_num.value < n:
            import torch
            a = torch.ones((a_dim,a_dim)).cuda(gpu_id)
            ocpy_gpus[occupy_num.value]= gpu_id
            occupy_num.value += 1
            print(f"Occupying GPU {gpu_id}, Total Occupied: {occupy_num.value}")
    while True:
        time.sleep(10)

def occupy_all_gpus(n:int, occupy_num, ocpy_gpus, interval=10):
    print("Launching process to occupy GPU ...")
    lock = Lock()
    processes = [] #List to store the processes
    while occupy_num.value < n:
        free_gpus = get_free_gpus()
        will_occupy_num = min(n, max(0,len(free_gpus)))
        for i in range(will_occupy_num):
            if occupy_num.value < n:
                p = Process(target=occupy_gpu, args=(free_gpus[i], n, occupy_num, ocpy_gpus, lock))
                p.start()
                processes.append(p)
        time.sleep(interval) # enough time to occupy gpus and update nvidia-smi
    return processes, ocpy_gpus

With that, I conclude the introduction to the mechanism of occupying GPUs ends. Once we’ve occupy $n$ GPUs, it`s time to run our real program. However, before that, we need to terminate all the subprocesses.

def run_my_program(n, desired_script, processes, ocpy_gpus, occupy_num):
    for p in processes:
        p.terminate()
    ocpy_gpus_list = list(ocpy_gpus[:occupy_num.value])
    cuda_visible_devices = ",".join(map(str, ocpy_gpus_list))
    os.environ['CUDA_VISIBLE_DEVICES'] = cuda_visible_devices
    subprocess.run([desired_script, str(n)])

In a nutshell, the core of my solution is employing Python multiprocessing to occupy GPUs memory.

The source code is available for download here. I developed it using Python 3.11. You can run the script by executing the following command.

python grab_gpu.py --n 3 --otime 30 --spath ./train.sh

I finish the whole work from programming to polish this blog by the help of chatGPT. The capabilities of this tool have profoundly transformed my academic and personal life. The more I engage with it, the more I feel can’t live without it.

This evokes mixed feelings. While I’m elated witnessing the moumental strides AI is making to better our lives, the sheer potency of AI instills a lingering apprehension that one day, AI might spiral out of our control.🤔

The emergence of tools like chatGPT prompts reflection on topics such as the essence of human learning and the evolving nature of a programmer’s role.

Recently, I’ve been reading The Art of Unix Programming." Inspired by its title, I’ve chosen to name my blog The Art of GPU Occupation.😁

Hope this blog can help you and if you have any questions or insights, I welcome a hearty discussion! 😆