LLaMA 3 is the latest series of open-source large language models (LLMs) from Meta. It includes models with 8B and 70B parameters, designed for a wide array of applications and featuring enhanced capabilities that rival state of the art commercial LLMs. This is part of a series of articles about Meta LLama.
Meta presents LLaMA 3 as its most capable LLM, empowering developers to build more advanced generative AI applications, while encouraging responsible AI usage and deployment. Meta plans to continue developing its series of open source LLMs, with plans for multilingual and multimodal capabilities, longer context windows, and additional performance improvements.
You can get Meta LLaMA 3 from the official GitHub repo or Hugging Face. Note that you must accept the license terms and wait for approval from Meta before downloading it.
Meta LLaMA 3's capabilities open up many new applications. Here are some interesting use cases:
LLaMA 3 can assist in drafting articles, blogs, and reports by providing coherent and contextually relevant text. The model's improved understanding and vocabulary enable it to generate detailed and accurate content that closely mimics human writing.
The model's ability to handle multiple languages with precision makes it a powerful tool for language translation. LLaMA 3 can translate text between languages more accurately than its predecessors, ensuring that the meaning and nuances are preserved. This makes it useful for applications requiring reliable and efficient translation services.
LLaMA 3 can be used to develop sophisticated virtual assistants. These assistants can perform a wide range of tasks, from answering customer queries to scheduling appointments and providing personalized recommendations. The model's improved context understanding ensures that interactions are more natural and effective.
LLaMA 3 can process and interpret large datasets, providing insights and summaries that help in decision-making. Its ability to comprehend complex information and generate accurate summaries makes it a valuable tool for analysts and researchers.
Artists and creators can leverage LLaMA 3 for various creative endeavors. The model can generate ideas for stories, design concepts, and even compose music lyrics. Its ability to understand and build upon creative inputs allows users to explore new dimensions in their artistic projects.
LLaMA 3 can enhance educational tools by providing personalized tutoring and generating educational content tailored to individual learning needs. It can explain complex concepts in simpler terms, making it easier for students to understand and retain information. This makes it a useful resource in both formal and informal education settings.
Meta LLaMA 3 introduces several enhancements over its predecessor, focusing on scale and data quality. The model has expanded its pretraining from 2 trillion tokens to 15 trillion, working with sequences up to 8,192 tokens long. This increase in training material allows for deeper understanding and more nuanced responses.
Additionally, Meta has refined data quality through improved filtering techniques such as heuristic filters, NSFW filters, semantic deduplication, and text classifiers designed to predict data quality. This means that LLaMA 3 is trained on cleaner, more relevant datasets.
Unlike LLaMA 2, LLaMA 3 employs an attention-mask mechanism that prevents self-attention from crossing document boundaries, enhancing context understanding and coherence in generated text. It has a larger vocabulary of 128k words, reducing the number of tokens needed by 15% compared to LLaMA 2 for generating equivalent text outputs. Grouped query attention (GQA) helps optimize performance and efficiency.
LLaMA 3 models have been evaluated against several benchmarks to demonstrate their superior performance compared to previous versions.
The following table showcases the performance of Meta LLaMA 3 compared to other leading small-scale models like Mistral and Google Gemma. The benchmarks include MMLU, AGIEval English, BIG-Bench Hard, ARC-Challenge, and DROP, using various shot settings to evaluate the models' capabilities across different tasks.
The table illustrates that Meta LLaMA 3, particularly the 70B model, outperforms its counterparts in most benchmarks, demonstrating superior general knowledge, reasoning, and language understanding.
Benchmark | LLaMA 3 8B | Mistral 7B | Gemma 7B | LLaMa 3 70B | Gemini Pro 1.0 | Mistral 8x22B |
---|---|---|---|---|---|---|
MMLU (5-shot) | 66.6 | 62.5 | 64.3 | 79.5 | 71.8 | 77.7 |
AGIEval English (3-5 shot) | 45.9 | 44.0 | 41.7 | 63.0 | – | 61.2 |
BIG-Bench Hard (3-shot, CoT) | 61.1 | 56.0 | 55.1 | 81.3 | 75.0 | 79.2 |
ARC-Challenge (25-shot) | 78.6 | 78.1 | 53.2 | 93.0 | – | 90.7 |
DROP (3-shot, F1) | 58.4 | 54.4 | 56.3 | 79.7 | 74.1 | 77.6 |
The table below compares instruction-tuned models, evaluating performance in MMLU, GPQA, HumanEval, GSM-8K, and MATH benchmarks. Instruction tuning helps models follow user instructions more effectively, enhancing their practical application in real-world scenarios. Meta LLaMA 3 models, especially the 70B variant, show robust performance in instruction-tuned tasks, indicating their enhanced capability to follow complex instructions and deliver accurate outputs.
Benchmark | LLaMA 3 8B | Gemma 7B - It | Mistral 7B Instruct | LLaMA 3 70B | Gemini Pro .15 | Claude 3 Sonnet |
---|---|---|---|---|---|---|
MMLU (5-shot) | 68.4 | 53.3 | 58.4 | 82.0 | 81.9 | 79.0 |
GPQA (0-shot) | 34.2 | 21.4 | 26.3 | 39.5 | 41.5 | 38.5 |
HumanEval (0-shot) | 62.2 | 30.5 | 36.6 | 81.7 | 71.9 | 73.0 |
GSM-8K (8-shot, CoT) | 79.6 | 30.6 | 39.9 | 93.0 | 91.7 | 92.3 |
MATH (4-shot, CoT) | 30.0 | 12.2 | 11.0 | 50.4 | 58.5 | 40.5 |
Meta LLaMA 3 utilizes a specific prompt format to generate responses accurately. The model uses special tokens to delineate the start and end of messages, and to specify roles within a conversation. Special tokens supported by LLaMA 3 include:
<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a knowledgeable AI assistant for cooking recipes and tips<|eot_id|><|start_header_id|>user<|end_header_id|> What are some easy pasta recipes?<|eot_id|><|start_header_id|>assistant<|end_header_id|> Happy to help! Here are a few easy pasta recipes: Spaghetti Aglio e Olio, Penne Arrabbiata, and Fettuccine Alfredo.<|eot_id|> <|start_header_id|>user<|end_header_id|> Can you give me the recipe for Spaghetti Aglio e Olio?<|eot_id|><|start_header_id|>assistant<|end_header_id|> Of course! For Spaghetti Aglio e Olio, you need spaghetti, garlic, olive oil, red pepper flakes, parsley, and Parmesan cheese. Cook the spaghetti, sauté the garlic in olive oil, add red pepper flakes, toss the spaghetti in the sauce, and finish with parsley and Parmesan.<|eot_id|><|start_header_id|>user<|end_header_id|> What are some tips to give it more flavor?<|eot_id|><|start_header_id|>assistant<|end_header_id|> ...
The system message sets the context for the assistant, while the alternating user and assistant messages continue the conversation. Each turn ends with the
<|end_of_message|>
<|start_of_message|>
Here’s an overview of how to download and start using the latest LLaMA models. The code snippets were shared in the LLaMA 3 GitHub repo.
To access Meta LLaMA 3 model weights and tokenizer, visit the Meta LLaMA website to accept the licensing agreement. Following approval, a signed URL will be sent via email to download the necessary files. This involves using a
download.sh script
wget
md5sum
./download.sh
To access Meta LLaMA 3 models on Hugging Face, navigate to the desired model repository, such as meta-llama/Meta-Llama-3-8B-Instruct, and agree to the license terms. Approval grants access to all LLaMA 3 models, typically processed within an hour, although it can take up to several days. For native format weights compatible with this repository, go to Files and versions and download from the
original
huggingface-hub
huggingface-cli download meta-llama/Meta-Llama-3-8B-Instruct --include "original/*" --local-dir meta-llama/Meta-Llama-3-8B-Instruct
For integration with the Hugging Face
transformers
import transformers import torch model_id = "meta-llama/Meta-Llama-3-8B-Instruct" pipeline = transformers.pipeline( "text-generation", model=model_id, model_kwargs={"torch_dtype": torch.bfloat16}, device="cuda", )
To quickly integrate Meta LLaMA 3 into your projects, begin by cloning the repository and setting up a compatible environment. This involves creating a Conda environment with PyTorch and CUDA installed. After setting up the environment, navigate to the repository’s top-level directory and install the necessary dependencies with
pip install -e
download.sh script
torchrun --nproc_per_node 1 example_chat_completion.py \ --ckpt_dir Meta-Llama-3-8B-Instruct/ \ --tokenizer_path Meta-Llama-3-8B-Instruct/tokenizer.model \ --max_seq_len 512 --max_batch_size 6
This command runs a script to perform chat completion using the Meta LLaMA 3 model with 8B parameters. It configures the script to use specific model checkpoints and tokenizer files, sets a maximum input sequence length of 512 tokens, and processes up to 6 sequences per batch. Make sure to adjust paths according to your downloaded model’s location and modify
--nproc_per_node
-max_seq_len
--max_batch_size
Visit https://gptscript.ai to download GPTScript and start building today. As we expand on the capabilities with GPTScript, we are also expanding our list of tools. With these tools, you can create any application imaginable: check out tools.gptscript.ai to get started.