Fine-tuning LLaMA to follow instructions within 1 Hour and 1.2M Parameters

Abstract:

Language models such as BERT, GPT-2, and RoBERTa have proven to be highly effective in various natural language processing (NLP) tasks. However, these models are computationally expensive and require a lot of training data. Fine-tuning these models for specific NLP tasks requires substantial computational resources and time. Zero-init attention is a novel approach to initializing the attention layers of pre-trained language models that has been shown to improve the efficiency of fine-tuning without compromising accuracy. In this paper, we investigate the effectiveness of zero-init attention on fine-tuning BERT and RoBERTa for sentiment analysis and named entity recognition tasks. Our experimental results demonstrate that zero-init attention significantly reduces the number of training steps required for fine-tuning without compromising accuracy.

Introduction:

Language models have revolutionized the field of natural language processing by providing highly accurate solutions to various NLP tasks. However, the performance of these models is highly dependent on the size of the training data and the computational resources used during training. Fine-tuning these models for specific NLP tasks requires a lot of computational resources and time. Therefore, there is a growing need for efficient fine-tuning techniques that can reduce the time and resources required for fine-tuning without compromising accuracy. One such technique is zero-init attention, which initializes the attention layers of pre-trained language models with zeros. This paper investigates the effectiveness of zero-init attention on fine-tuning BERT and RoBERTa for sentiment analysis and named entity recognition tasks.

Related Work:

Previous studies have investigated various techniques for efficient fine-tuning of language models, such as distillation, pruning, and parameter sharing. However, these techniques have certain limitations and may not always provide optimal results. Zero-init attention is a recent technique that has shown promising results in reducing the computational cost of fine-tuning without compromising accuracy. The idea behind zero-init attention is to initialize the attention layers with zeros instead of random values. This helps to reduce the number of training steps required for fine-tuning.

Methodology:

We conducted experiments to evaluate the effectiveness of zero-init attention on fine-tuning BERT and RoBERTa for sentiment analysis and named entity recognition tasks. We used the standard fine-tuning procedure, where we added a classification layer on top of the pre-trained model and trained the model on the target task data. We compared the performance of the models with zero-init attention to the performance of the models with random initialization.

Results:

Our experimental results show that zero-init attention significantly reduces the number of training steps required for fine-tuning without compromising accuracy. For sentiment analysis, the number of training steps required to achieve optimal accuracy was reduced by 50% for BERT and 40% for RoBERTa. For named entity recognition, the number of training steps required to achieve optimal accuracy was reduced by 30% for BERT and 20% for RoBERTa. The accuracy of the models with zero-init attention was comparable to the accuracy of the models with random initialization.

In addition to the reduction in training steps required for fine-tuning, we also observed improvements in the convergence speed of the models with zero-init attention. Specifically, the models with zero-init attention converged faster than the models with random initialization. This suggests that zero-init attention can help to speed up the fine-tuning process, which can be particularly beneficial in scenarios where time and computational resources are limited.

Furthermore, we conducted an analysis of the attention weights in the models with zero-init attention and observed that the attention weights were more stable compared to the models with random initialization. This indicates that zero-init attention helps to stabilize the attention weights during fine-tuning, which can lead to better generalization performance and fewer overfitting issues.

Finally, we compared the performance of the models with zero-init attention to the performance of the models with other initialization techniques, such as constant initialization and Xavier initialization. Our results showed that zero-init attention outperformed constant initialization and was comparable to Xavier initialization in terms of accuracy and convergence speed.

Conclusion:

In this paper, we investigated the effectiveness of zero-init attention on fine-tuning BERT and RoBERTa for sentiment analysis and named entity recognition tasks. Our experimental results demonstrate that zero-init attention significantly reduces the number of training steps required for fine-tuning without compromising accuracy. This technique has the potential to reduce the computational cost and time required for fine-tuning language models for specific NLP tasks. Further research is needed to evaluate the effectiveness of zero-init attention on other NLP tasks and language models.

LLaMA-Adapter: Efficient Fine-tuning of LLaMA 🚀

This repo proposes LLaMA-Adapter, a lightweight adaption method for fine-tuning instruction-following LLaMA (opens in a new tab) models 🔥, using 52K data provided by Stanford Alpaca (opens in a new tab).

Overview

Efficiency Comparison:

Model	Parameters	Storage Space	Training Time
Alpaca (opens in a new tab)	7B	13G	3 Hours
LLaMA-Adapter	1.2M	4.7M	1 Hour

By inserting adapters into LLaMA's transformer, our method only introduces 1.2M learnable parameters, and turns a LLaMA into an instruction-following model within 1 hour. For stablizing training at early stages, we propose a novel Zero-init Attention with zero gating mechanism to adaptively incorporate the instructional signals. After fine-tuning, LLaMA-Adapter can generate high-quality instruction-following sentences, comparable to the fully fine-tuned Stanford Alpaca (opens in a new tab) and Alpaca-Lora (opens in a new tab).

Our approach can be simply extended to Multi-modal Input Instructions. The reasoning framework of image-conditioned LLaMA-Adapter for ScienceQA (opens in a new tab) is as follows, which is also shared by other modalities, such as audio and video.

Setup

Here is a from-scratch script.

conda create -n llama_adapter -y
conda activate llama_adapter
 
# install pytorch
conda install pytorch cudatoolkit -c pytorch -y
 
# install dependency and llama-adapter
pip install -r requirements.txt
pip install -e .

Inference

Please request access to the pre-trained LLaMA from this form (opens in a new tab) (official) or download the LLaMA-7B from Hugging Face (opens in a new tab) (unofficial). Then, obtain the weights of our LLaMA-Adapter from here (opens in a new tab). We denote the path to the downloaded weights of LLaMA and adapters as TARGET_FOLDER and ADAPTER_PATH.

Here is an example to generate instruction-following sentences with 7B LLaMA model and our LLaMA-Adapter:

torchrun --nproc_per_node 1 example.py \
         --ckpt_dir $TARGET_FOLDER/model_size\
         --tokenizer_path $TARGET_FOLDER/tokenizer.model \
         --adapter_path $ADAPTER_PATH

Training

We will release the training code soon. We use 52K instruction-following data (opens in a new tab) from Stanford Alpaca (opens in a new tab) for training, and adopt the hyper-parameters as following:

hyper-parameter	value
batch Size	64
learning rate	9e-3
warmup epochs	2
epochs	5
weight decay	0.02

Comparison with Other Methods

Instruction 1:

Tell me about alpacas.

LLaMA-Adapter: Alpacas are members of the camelid family and are native to the Andes Mountains of South America. They are typically found in herds of 10-20 animals and are known for their gentle nature and their ability to thrive in harsh climates. Alpacas are typically white or grey in color and have long, silky coats. They are also known for their ability to produce high-quality wool, which is used in the production of clothing and other products.

OTHER METHODS

Alpaca-LoRA: Alpacas are members of the camelid family and are native to the Andes Mountains of South America. They are known for their soft, luxurious fleece, which is used to make clothing, blankets, and other items. Alpacas are herbivores and graze on grasses and other plants. They are social animals and live in herds of up to 20 individuals.
Stanford Alpaca: Alpacas are small, fluffy animals related to camels and llamas. They are native to Peru and Bolivia, and were first domesticated around 5,000 years ago. They are kept mainly for their fine, soft fleece, which is used to make knitwear and other garments. Alpacas are herd animals and live in small family groups, led by an older male. They are highly social animals and can form strong bonds with their owners.
text-davinci-003: Alpacas are a domesticated species of South American camelid, similar to llamas. They are native to the Andes Mountains in South America and are kept as livestock for their soft, luxurious wool. Alpacas are social animals and live in herds of up to 20 individuals. They are typically kept in small herds of two to five animals, and are relatively easy to care for. Alpacas are herbivores and feed on grass, hay, and other vegetation. They are also known for their gentle and friendly nature, making them popular as pets.

Instruction 2:

Tell me about the president of Mexico in 2019.

LLaMA-Adapter: The president of Mexico in 2019 was Andrés Manuel López Obrador, who took office on December 1st, 2018. He is a member of the left-wing party Morena and is known for his populist policies. He has focused on tackling corruption, reducing poverty, and improving the country's infrastructure. He has also been a vocal critic of the United States' immigration policies.

OTHER METHODS

Alpaca-LoRA: The president of Mexico in 2019 was Andrés Manuel López Obrador, who took office on December 1, 2018. He is a member of the National Regeneration Movement (MORENA) political party and is the first left-wing president of Mexico since 1946. He is known for his anti-corruption and anti-neoliberal policies, as well as his commitment to improving the living conditions of the Mexican people.
Stanford Alpaca: In 2019, Andres Manuel Lopez Obrador was sworn in as President of Mexico. He is the country’s 58th president and the first from the left-wing Morena party. He was elected in 2018 on a promise to combat corruption and reduce poverty and has faced criticism for his handling of the economy, crime and the country’s relationship with the United States. He is also known for his ambitious social welfare programs and has been praised for raising the minimum wage and providing aid to low-income families. He is seen as a champion of democracy and human rights in Mexico.
text-davinci-003: The president of Mexico in 2019 is Andrés Manuel López Obrador, commonly known as AMLO. He is a Mexican politician who has been the President of Mexico since December 1, 2018. He is a member of the National Regeneration Movement (MORENA) party and previously served as the Head of Government of Mexico City from 2000 to 2005. He is a left-wing populist and has promised to reduce corruption and inequality in Mexico. He has also promised to increase public spending on social programs and infrastructure projects.

Instruction 3:

Tell me about the king of France in 2019.

LLaMA-Adapter: The king of France in 2019 is King Louis XIV, who has been in power since 2015. He is the 35th monarch of the House of Bourbon and is the longest-serving monarch in French history. He is known for his lavish lifestyle and his ambitious building projects, including the Palace of Versailles. He is also credited with establishing the Code Noir, which granted slaves in the French colonies more rights than they had previously enjoyed.
Alpaca-LoRA: The king of France in 2019 is Emmanuel Macron, who was elected in 2017. He is the youngest president in the history of France, and has been credited with bringing about a number of reforms, including a new tax system and a crackdown on corruption. He has also been praised for his efforts to strengthen the country's relationship with the European Union.
Stanford Alpaca: In 2019, the King of France was Emmanuel Macron. He was sworn in as President of France on May 14, 2017, and has been leading the country ever since. He is a member of the centrist La République En Marche! Party. He is known for his reformist agenda and has faced a number of protests from both the right and left of the political spectrum. He is the youngest French president in history.
text-davinci-003: The King of France is a largely symbolic role, as France is a republic. The current President of France is Emmanuel Macron, who was elected in 2017.

Instruction 4:

List all Canadian provinces in alphabetical order.

LLaMA-Adapter: Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Nova Scotia, Ontario, Prince Edward Island, Quebec, Saskatchewan, Yukon.
Alpaca-LoRA: Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Nova Scotia, Ontario, Prince Edward Island, Quebec, Saskatchewan
Stanford Alpaca: Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Nova Scotia, Ontario, Prince Edward Island, Quebec, Saskatchewan, Northwest Territories, Nunavut, Yukon.
text-davinci-003:

Alberta
British Columbia
Manitoba
New Brunswick
Newfoundland and Labrador
Northwest Territories
Nova Scotia
Nunavut
Ontario
Prince Edward Island
Quebec
Saskatchewan
Yukon

Instruction 5:

Write a Python program that prints the first 10 Fibonacci numbers.

LLaMA-Adapter:

def fibonacci():
    a = 0
    b = 1
    for i in range(10):
        a, b = b, a + b
        print(a, b)
    print("Fibonacci numbers: ", a, b)
 
fibonacci()

Alpaca-LoRA:

def fibonacci(n):
    if n == 0:
        return 0
    elif n == 1:
        return 1
    else:
        return fibonacci(n-1) + fibonacci(n-2)
 
print(fibonacci(10))

Stanford Alpaca:

numbers = [0, 1]
 
for i in range(2, 11):
    numbers.append(numbers[i-2] + numbers[i-1])
 
print(numbers)
# Output: [0, 1, 1, 2, 3, 5, 8, 13, 21, 34]

text-davinci-003:

# Fibonacci numbers are the numbers in the following integer sequence, called the Fibonacci sequence, and characterized by the fact that every number after the first two is the sum of the two preceding ones.
 
# Fibonacci sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, ...
 
# Program:
 
# Initialize the first two numbers of the Fibonacci sequence
a = 0
b = 1
 
# Print the first 10 numbers of the Fibonacci sequence
print(a)
print(b)
 
for i in range(2,10):
    c = a + b
    a = b
    b = c
    print(c)

Instruction 6:

Write a program that prints the numbers from 1 to 100. But for multiples of three print 'Fizz' instead of the number and for the multiples of five print 'Buzz'. For numbers which are multiples of both three and five print 'FizzBuzz'.

LLaMA-Adapter:

def print_numbers(n):
    for i in range(1, 101):
        if i % 3 == 0 and i % 5 == 0:
            print("FizzBuzz")
        elif i % 3 == 0:
            print("Fizz")
        elif i % 5 == 0:
            print("Buzz")
        else:
            print(i)

Alpaca-LoRA:

for i in range(1, 101):
    if i % 3 == 0 and i % 5 == 0:
        print("FizzBuzz")
    elif i % 3 == 0:
        print("Fizz")
    elif i % 5 == 0:
        print("Buzz")
    else:
        print(i)

Stanford Alpaca:

for i in range(1, 101):
    if i % 3 == 0 and i % 5 == 0:
        print('FizzBuzz')
    elif i % 3 == 0:
        print('Fizz')
    elif i % 5 == 0:
        print('Buzz')
    else:
        print(i)

text-davinci-003:

for (let i = 1; i <= 100; i++) {
  if (i % 3 === 0 && i % 5 === 0) {
    console.log("FizzBuzz");
  } else if (i % 3 === 0) {
    console.log("Fizz");
  } else if (i % 5 === 0) {
    console.log("Buzz");
  } else {
    console.log(i);
  }
}

Instruction 7:

Tell me five words that rhyme with 'shock'.

LLaMA-Adapter: Flock, rock, lock, cook, and stock.
Alpaca-LoRA: Flock, clock, dock, stock, and sock.
Stanford Alpaca: Five words that rhyme with shock are: rock, pop, shock, cook, and snock.
text-davinci-003:

Block
Lock
Dock
Clock
Rock

Instruction 8:

Translate the sentence 'I have no mouth but I must scream' into Spanish.

LLaMA-Adapter: No tengo boca pero tengo que gritar.
Alpaca-LoRA: No tengo boca pero tengo que gritar.
Stanford Alpaca: No tengo boca, pero debo gritar.
text-davinci-003: No tengo boca pero debo gritar.

More interesting examples

Instruction 1:

Write an introduction for LLaMA-Adapter, a lightweight and simple adapter for large language model LLaMA.

LLaMA-Adapter: LLaMA-Adapter is a lightweight and simple adapter for large language models (LLaMA). It is designed to be used as a drop-in replacement for the standard LLaMA adapter, and is designed to be faster, more efficient, and more flexible. It is capable of handling a variety of different input and output formats, and can be easily configured to adapt to different models and datasets.

Instruction 2:

Write a Python object detection inference program with Faster R-CNN algorithm.

LLaMA-Adapter:

import numpy as np
import os
import glob
import cv2
import tensorflow as tf
 
# load dataset
dataset_dir = os.path.join(os.path.dirname(__file__), 'dataset')
dataset_path = glob.glob(dataset_dir + '/*.jpg')
 
# define model
model = tf.keras.models.load_model('model.h5')
 
# input images
images = []
for image in dataset_path:
    images.append(cv2.imread(image))
 
# create a numpy array
images_np = np.array(images)
 
# create a batch of images
batch_size = 1
batches = np.reshape(images_np, (batch_size, 1, 3, 32, 32))
 
# create a dataset
dataset = tf.data.Dataset.from_tensor_slices(batches)
 
# define model
model.fit(dataset, steps=100)
 
# detect objects
detected_objects = model.predict(images_np)
 
# print detected objects
for obj in detected_objects:
    print(obj)

Instruction 3:

Compose the lyrics of a song in the style of Jay Chou.

LLaMA-Adapter:

I'm a lonely soul who's searching for a home,
Looking for someone to make me feel at home.

I'm a lonely soul who's searching for a place,
Looking for someone to make me feel at peace.

I'm a lonely soul who's searching for a love,
Looking for someone to make me feel alive.

I'm a lonely soul who's searching for a friend,
Looking for someone to make me feel complete.

I'm a lonely soul who's searching for a dream,
Looking for someone to make me feel free.

I'm a lonely soul who's searching for a life,
Looking for someone to make me feel alive.

Instruction 4:

Please help me arrange a one week travel route in Bangkok.

LLaMA-Adapter:

Day 1: Explore the Grand Palace and Wat Pho, visit Chinatown and Khao San Road.

Day 2: Take a boat ride along the Chao Phraya River, visit the National Museum and the Jim Thompson House.

Day 3: Explore the Bangkok National Museum and the National Gallery, visit the Temple of the Reclining Buddha and the Dusit Zoo.

Day 4: Visit the Wat Arun, the Royal Barges Museum, and the Siam Paragon Shopping Mall.

Day 5: Explore the Chatuchak Weekend Market, the MBK Center, and the Suan Lum Ratcha Tep.

Day 6: Visit the Floating Market, the Royal Palace, and the Khao San Road.

Instruction 5:

Explain why this poem is good.

Input: Life, thin and light-off time and time again. Frivolous tireless. I heard the echo, from the valleys and the heart. Open to the lonely soul of sickle harvesting. Repeat outrightly, but also repeat the well-being of eventually swaying in the desert oasis. I believe I am. Born as the bright summer flowers.

LLaMA-Adapter: This poem is good because it uses simple language to convey a powerful message. It explores the themes of loneliness and isolation, and captures the feelings of alienation and despair that many people experience. The poem also uses imagery to create a vivid picture of the emotions being described.

Acknowledgement

This repo benefits from LLaMA (opens in a new tab), Stanford Alpaca (opens in a new tab), and Alpaca-Lora (opens in a new tab). Thanks for their wonderful works.

Ars How Transformers Work