Meta LLaMA 3: Use Cases, Benchmarks, and How to Get Started

What Is Meta LLaMA 3?

LLaMA 3 is the latest series of open-source large language models (LLMs) from Meta. It includes models with 8B and 70B parameters, designed for a wide array of applications and featuring enhanced capabilities that rival state of the art commercial LLMs. This is part of a series of articles about Meta LLama.

Meta presents LLaMA 3 as its most capable LLM, empowering developers to build more advanced generative AI applications, while encouraging responsible AI usage and deployment. Meta plans to continue developing its series of open source LLMs, with plans for multilingual and multimodal capabilities, longer context windows, and additional performance improvements.

You can get Meta LLaMA 3 from the official GitHub repo or Hugging Face. Note that you must accept the license terms and wait for approval from Meta before downloading it.

Cool Things You Can Do with LLaMA 3

Meta LLaMA 3’s capabilities open up many new applications. Here are some interesting use cases:

1. Content Creation

LLaMA 3 can assist in drafting articles, blogs, and reports by providing coherent and contextually relevant text. The model’s improved understanding and vocabulary enable it to generate detailed and accurate content that closely mimics human writing.

2. Language Translation

The model’s ability to handle multiple languages with precision makes it a powerful tool for language translation. LLaMA 3 can translate text between languages more accurately than its predecessors, ensuring that the meaning and nuances are preserved. This makes it useful for applications requiring reliable and efficient translation services.

3. Intelligent Virtual Assistants

LLaMA 3 can be used to develop sophisticated virtual assistants. These assistants can perform a wide range of tasks, from answering customer queries to scheduling appointments and providing personalized recommendations. The model’s improved context understanding ensures that interactions are more natural and effective.

4. Advanced Data Analysis

LLaMA 3 can process and interpret large datasets, providing insights and summaries that help in decision-making. Its ability to comprehend complex information and generate accurate summaries makes it a valuable tool for analysts and researchers.

5. Creative Applications

Artists and creators can leverage LLaMA 3 for various creative endeavors. The model can generate ideas for stories, design concepts, and even compose music lyrics. Its ability to understand and build upon creative inputs allows users to explore new dimensions in their artistic projects.

6. Improved Educational Tools

LLaMA 3 can enhance educational tools by providing personalized tutoring and generating educational content tailored to individual learning needs. It can explain complex concepts in simpler terms, making it easier for students to understand and retain information. This makes it a useful resource in both formal and informal education settings.

Meta LLaMA 3 Improvements Over LLaMA 2

Meta LLaMA 3 introduces several enhancements over its predecessor, focusing on scale and data quality. The model has expanded its pretraining from 2 trillion tokens to 15 trillion, working with sequences up to 8,192 tokens long. This increase in training material allows for deeper understanding and more nuanced responses.

Additionally, Meta has refined data quality through improved filtering techniques such as heuristic filters, NSFW filters, semantic deduplication, and text classifiers designed to predict data quality. This means that LLaMA 3 is trained on cleaner, more relevant datasets.

Unlike LLaMA 2, LLaMA 3 employs an attention-mask mechanism that prevents self-attention from crossing document boundaries, enhancing context understanding and coherence in generated text. It has a larger vocabulary of 128k words, reducing the number of tokens needed by 15% compared to LLaMA 2 for generating equivalent text outputs. Grouped query attention (GQA) helps optimize performance and efficiency.

Meta LLaMA 3 Benchmarks: How It Compares to Google Gemma and Mistral

LLaMA 3 models have been evaluated against several benchmarks to demonstrate their superior performance compared to previous versions.

Base Pretrained Models

The following table showcases the performance of Meta LLaMA 3 compared to other leading small-scale models like Mistral and Google Gemma. The benchmarks include MMLU, AGIEval English, BIG-Bench Hard, ARC-Challenge, and DROP, using various shot settings to evaluate the models’ capabilities across different tasks.

The table illustrates that Meta LLaMA 3, particularly the 70B model, outperforms its counterparts in most benchmarks, demonstrating superior general knowledge, reasoning, and language understanding.

Benchmark	LLaMA 3 8B	Mistral 7B	Gemma 7B	LLaMa 3 70B	Gemini Pro 1.0	Mistral 8x22B
MMLU (5-shot)	66.6	62.5	64.3	79.5	71.8	77.7
AGIEval English (3-5 shot)	45.9	44.0	41.7	63.0	–	61.2
BIG-Bench Hard (3-shot, CoT)	61.1	56.0	55.1	81.3	75.0	79.2
ARC-Challenge (25-shot)	78.6	78.1	53.2	93.0	–	90.7
DROP (3-shot, F1)	58.4	54.4	56.3	79.7	74.1	77.6

Instruction Tuned Models

The table below compares instruction-tuned models, evaluating performance in MMLU, GPQA, HumanEval, GSM-8K, and MATH benchmarks. Instruction tuning helps models follow user instructions more effectively, enhancing their practical application in real-world scenarios.
Meta LLaMA 3 models, especially the 70B variant, show robust performance in instruction-tuned tasks, indicating their enhanced capability to follow complex instructions and deliver accurate outputs.

Benchmark	LLaMA 3 8B	Gemma 7B – It	Mistral 7B Instruct	LLaMA 3 70B	Gemini Pro .15	Claude 3 Sonnet
MMLU (5-shot)	68.4	53.3	58.4	82.0	81.9	79.0
GPQA (0-shot)	34.2	21.4	26.3	39.5	41.5	38.5
HumanEval (0-shot)	62.2	30.5	36.6	81.7	71.9	73.0
GSM-8K (8-shot, CoT)	79.6	30.6	39.9	93.0	91.7	92.3
MATH (4-shot, CoT)	30.0	12.2	11.0	50.4	58.5	40.5

LLaMA 3 Prompt Format and Examples

Meta LLaMA 3 utilizes a specific prompt format to generate responses accurately. The model uses special tokens to delineate the start and end of messages, and to specify roles within a conversation.
Special tokens supported by LLaMA 3 include:

<bos>: Beginning of sequence token
<eos>: End of sequence token
<|start_of_message|>: Start of a message in a turn
<|end_of_message|>: End of a message in a turn
{role}: Encloses the role for a particular message, with possible roles being: system, user, and assistant
Let’s see an example of the use of these tokens in a system prompt with multiple turns:

    <|begin_of_text|><|start_header_id|>system<|end_header_id|>
    You are a knowledgeable AI assistant for cooking recipes and tips<|eot_id|><|start_header_id|>user<|end_header_id|>
    What are some easy pasta recipes?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
    Happy to help! Here are a few easy pasta recipes: Spaghetti Aglio e Olio, Penne Arrabbiata, and Fettuccine Alfredo.<|eot_id|>
    <|start_header_id|>user<|end_header_id|>

    Can you give me the recipe for Spaghetti Aglio e Olio?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
    Of course! For Spaghetti Aglio e Olio, you need spaghetti, garlic, olive oil, red pepper flakes, parsley, and Parmesan cheese. Cook the spaghetti, sauté the garlic in olive oil, add red pepper flakes, toss the spaghetti in the sauce, and finish with parsley and Parmesan.<|eot_id|><|start_header_id|>user<|end_header_id|>
    What are some tips to give it more flavor?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
    ...

The system message sets the context for the assistant, while the alternating user and assistant messages continue the conversation. Each turn ends with the <|end_of_message|> token, and the conversation prompts the model to generate the next assistant response after the <|start_of_message|> token.

Getting Started with the Meta LLaMA 3 Model

Here’s an overview of how to download and start using the latest LLaMA models. The code snippets were shared in the LLaMA 3 GitHub repo.

Download the Model

To access Meta LLaMA 3 model weights and tokenizer, visit the Meta LLaMA website to accept the licensing agreement. Following approval, a signed URL will be sent via email to download the necessary files. This involves using a download.sh script, which requires wget and md5sum utilities to be pre-installed on the user’s system. The script is initiated by running ./download.sh in the terminal and following prompts to input the provided URL.
The download links are valid for only 24 hours or a limited number of downloads, after which they expire. A “403: Forbidden” error message indicates that the link has expired or reached its download limit. In such cases, request a new download link through the Meta LLaMA website.

Access the Hugging Face Repository

To access Meta LLaMA 3 models on Hugging Face, navigate to the desired model repository, such as meta-llama/Meta-Llama-3-8B-Instruct, and agree to the license terms. Approval grants access to all LLaMA 3 models, typically processed within an hour, although it can take up to several days.
For native format weights compatible with this repository, go to Files and versions and download from the original folder, or use the command line after installing huggingface-hub:

    huggingface-cli download meta-llama/Meta-Llama-3-8B-Instruct --include "original/*" --local-dir meta-llama/Meta-Llama-3-8B-Instruct

For integration with the Hugging Face transformers library, a pipeline code snippet automatically downloads and caches the model weights for immediate use:

    import transformers
import torch
model_id = "meta-llama/Meta-Llama-3-8B-Instruct"
pipeline = transformers.pipeline(
  "text-generation",
  model=model_id,
  model_kwargs={"torch_dtype": torch.bfloat16},
  device="cuda",
)

Integrate the Model

To quickly integrate Meta LLaMA 3 into your projects, begin by cloning the repository and setting up a compatible environment. This involves creating a Conda environment with PyTorch and CUDA installed. After setting up the environment, navigate to the repository’s top-level directory and install the necessary dependencies with pip install -e.
Next, visit the Meta LLaMA website to register for downloading the model weights. Once you receive an email with a download URL, run the provided download.sh script in your repository directory. Ensure you have execution permissions for this script and input the URL when prompted manually.
Finally, run a command like the following to perform inference using one of the downloaded models:

    torchrun --nproc_per_node 1 example_chat_completion.py 
    --ckpt_dir Meta-Llama-3-8B-Instruct/ 
    --tokenizer_path Meta-Llama-3-8B-Instruct/tokenizer.model 
    --max_seq_len 512 --max_batch_size 6

This command runs a script to perform chat completion using the Meta LLaMA 3 model with 8B parameters. It configures the script to use specific model checkpoints and tokenizer files, sets a maximum input sequence length of 512 tokens, and processes up to 6 sequences per batch.
Make sure to adjust paths according to your downloaded model’s location and modify --nproc_per_node, –-max_seq_len, and --max_batch_size based on the model requirements and hardware capabilities.

Building LLM Applications with LLaMA 3 and Acorn

Visit https://gptscript.ai to download GPTScript and start building today. As we expand on the capabilities with GPTScript, we are also expanding our list of tools. With these tools, you can create any application imaginable: check out tools.gptscript.ai to get started.