Mistral AI Solution Overview: Models, Pricing, and API

What Is Mistral AI?

Mistral AI is a company focused on developing advanced large language models (LLMs) and specialized AI solutions. Founded by a team of experts in artificial intelligence and machine learning, Mistral AI aims to push the boundaries of what is possible with LLMs, offering state-of-the-art models designed to handle a wide range of tasks, from natural language processing to code generation and complex mathematical reasoning.

Mistral AI differentiates itself through its commitment to open-source principles and a focus on creating models that are not only powerful but also resource-efficient. This includes developing fine-tuning and deploying models in various environments, from cloud-based systems to on-premise servers. By providing both general-purpose and specialized models, Mistral AI caters to a broad audience, including developers, data scientists, and AI researchers.

In this article:

Mistral AI Models {#mistral-ai-models}

General Purpose Models

These models handle a broad spectrum of tasks, offering a balance between performance and versatility.

Mistral Nemo

Mistral Nemo is a 12-billion-parameter model developed by Mistral AI in collaboration with NVIDIA. This model features a 128k token context length, making it capable in tasks requiring extensive context, such as complex reasoning, coding, and understanding world knowledge. Mistral NeMo can serve as a direct replacement for the company’s older Mistral 7B model.

The model is released under the Apache 2.0 license, with both pre-trained base and instruction-tuned checkpoints available, facilitating adoption across research and enterprise settings. Mistral NeMo is also designed with quantization awareness, supporting FP8 inference without any performance degradation.

A significant advancement in Mistral NeMo is its Tekken tokenizer, which is based on Tiktoken and trained on over 100 languages. Tekken is efficient in compressing natural language text and source code, outperforming previous models by 30% in many languages and up to 3x more efficient in Korean and Arabic.

The model's fine-tuning and alignment phase have improved its ability to follow instructions, reason effectively, engage in multi-turn conversations, and generate accurate code, marking an improvement over its predecessor, Mistral 7B.

Mistral Large 2

Mistral Large 2 is the latest iteration of Mistral AI's flagship model, delivering performance in code generation, mathematics, and reasoning. This model is noteworthy for its multilingual support and function calling capabilities. With a 128k token context window and 123 billion parameters, Mistral Large 2 is optimized for long-context applications and can run efficiently on a single node.

A key highlight of Mistral Large 2 is its performance across multiple benchmarks. It achieves an accuracy of 84.0% on the MMLU benchmark, setting a new standard for open models in terms of performance and cost efficiency. The model also excels in code generation, performing on par with leading models like GPT-4o and Llama 3 405B, thanks to extensive training on a large volume of code.

Mistral Large 2 also shows advancements in reasoning capabilities. It has been fine-tuned to reduce the likelihood of generating incorrect or irrelevant information, and it can now better recognize when it lacks sufficient information to provide a confident answer. This enhancement is evident in its improved performance on mathematical benchmarks.

Specialized Models

Specialized models in Mistral AI are tailored for particular tasks, ensuring efficiency and performance in their respective domains.

Codestral

Codestral is Mistral AI's first specialized model for code generation. It assists developers in writing and interacting with code efficiently. This 22-billion-parameter model excels in generating, completing, and refining code, providing a tool for developers across various programming languages.

Codestral is trained on a dataset that spans over 80 programming languages, including popular ones like Python, Java, C++, and JavaScript, as well as more specialized languages such as Swift and Fortran. This broad linguistic range allows it to support diverse coding environments.

Codestral sets new standards in code generation, particularly in terms of performance and latency. It features a 32k token context window, larger than many competitors, which enhances its capability in handling long-range code completion tasks. The model demonstrates performance on several benchmarks, including HumanEval, MBPP, CruxEval, and RepoBench.

Codestral includes a fill-in-the-middle (FIM) mechanism, allowing it to complete partial code snippets. This feature speeds up the coding process and reduces the likelihood of errors. Developers can leverage Codestral to write tests, fill in missing code, and improve existing codebases.

Mistral Embed

Mistral Embed is a specialized model developed by Mistral AI for generating high-quality text embeddings, which are vectorial representations that capture the semantic meaning of text. These embeddings are crucial for a variety of natural language processing (NLP) tasks, such as clustering, classification, and retrieval.

Currently optimized for English text, Mistral Embed achieves a retrieval score of 55.26 on the Massive Text Embedding Benchmark (MTEB), highlighting its effectiveness in understanding and representing text semantics in a high-dimensional vector space. This makes it particularly useful for tasks requiring semantic similarity, such as information retrieval and question-answering systems.

The Mistral Embed API allows users to generate embeddings by sending input text to the API endpoint, which returns the corresponding numerical vectors. These vectors can then be used in various downstream applications, including retrieval-augmented generation (RAG) systems. In such systems, a knowledge base is embedded into vectors, stored in a vector database, and queried to find the most relevant information based on semantic similarity.

Research Models

Research models are developed with the primary goal of advancing the field of AI through experimental and cutting-edge algorithms. These models are often used in academic and corporate research to push the boundaries of what's possible with AI technology.

Codestral Mamba

Codestral Mamba is a language model specifically designed for code generation. Unlike traditional transformer-based models, Codestral Mamba is built on the Mamba2 architecture, which allows for linear time inference. This feature makes it effective for handling sequences of potentially infinite length, ensuring quick responses regardless of input size. This efficiency is valuable in code productivity scenarios.

Codestral Mamba, with over 7 billion parameters, offers performance in code generation and reasoning tasks, rivaling leading transformer models. It has been tested on in-context retrieval tasks with token lengths up to 256k, showcasing its capability to serve as a local code assistant. The model is available for deployment through various platforms, including the mistral-inference SDK and TensorRT-LLM, with support for local inference expected soon.

Mathstral

Mathstral is Mistral AI's specialized model for mathematical reasoning and scientific discovery. This 7-billion-parameter model is designed to tackle complex, multi-step logical reasoning tasks, making it useful for advanced mathematical problems and STEM-related applications. Mathstral is built on the foundation of Mistral 7B but focuses on achieving performance in mathematical reasoning within its size category.

The model features a 32k token context window, allowing it to handle extensive and intricate mathematical expressions and problems effectively. Released under the Apache 2.0 license, Mathstral is intended to support the scientific community, especially in academic research and projects requiring deep mathematical insight.

Mathstral excels in various industry-standard benchmarks, achieving 56.6% on the MATH benchmark and 63.47% on the MMLU benchmark, with improvements over its predecessor in these domains. Notably, when additional computation resources are allocated during inference, Mathstral's performance on the MATH benchmark increases to 68.37%, and even further to 74.59% with a reward model among multiple candidates.

Mixtral

Mixtral is a sparse mixture-of-experts (sMOE) model developed by Mistral AI, designed to deliver performance while optimizing computational efficiency. Mixtral is notable for its ability to outperform much larger models, such as Llama 2 70B, in various benchmarks, while also providing faster inference times—up to six times faster. This makes it one of the most powerful open-weight models available.

The model is built on a sparse architecture, meaning it selectively uses different groups of parameters (experts) during processing. Specifically, Mixtral has 46.7 billion parameters in total, but only utilizes 12.9 billion of them per token. This selective usage allows the model to achieve the output quality of a much larger model while maintaining the speed and cost efficiency of a smaller one.

Mixtral supports a 32k token context window, making it capable of handling lengthy inputs effectively. It excels in various tasks, including code generation, and supports multiple languages such as English, French, Italian, German, and Spanish. Additionally, Mixtral can be fine-tuned to become an instruction-following model, achieving a score of 8.3 on MT-Bench.

Legacy Models

Legacy models in Mistral AI are earlier versions that laid the groundwork for current advancements, and are still in use for certain applications.

Mixtral 8x22B

Mixtral 8x22B is a legacy model within Mistral AI's portfolio, recognized for its contributions to advancing sparse mixture-of-experts (sMOE) architectures. This model features 141 billion parameters, of which only 39 billion are active per token, optimizing computational efficiency without compromising performance. This sparse activation approach allows Mixtral 8x22B to deliver output quality comparable to much larger dense models while maintaining faster processing speeds.

Mixtral 8x22B is multilingual, supporting languages like English, French, Italian, German, and Spanish. It is particularly strong in mathematics and coding, offering native function calling capabilities that are enhanced when combined with constrained output modes on specialized platforms.

With a 64k token context window, Mixtral 8x22B excels at handling extensive documents, providing precise information recall and maintaining relevance across large inputs. Released under the Apache 2.0 license, it remains a fully open model, promoting innovation and broad use in the AI community.

Mistral 8x7B

Mistral 8x7B is a legacy model within Mistral AI's suite, designed as a sparse mixture-of-experts (sMOE) model that emphasizes both efficiency and performance. It features 56 billion parameters, with only 16 billion active per token, which allows it to outperform larger models like Llama 2 70B while maintaining faster inference speeds—up to six times faster.

This model is particularly strong in code generation, handling languages such as English, French, Italian, German, and Spanish, and it is optimized for instruction following. Mistral 8x7B supports a 32k token context window, making it capable of processing lengthy inputs effectively.

Mistral 7B

Mistral 7B is a legacy model within Mistral AI's suite, designed to deliver performance despite its relatively small size. With 7.3 billion parameters, it outperforms larger models like Llama 2 13B across various benchmarks, including commonsense reasoning, reading comprehension, and code generation. It even rivals the performance of Llama 34B on several tasks.

Mistral 7B utilizes techniques such as grouped-query attention (GQA) and sliding window attention (SWA). These innovations allow the model to process longer sequences more efficiently and speed up inference.

Mistral Medium

Mistral Medium is a mid-sized legacy model from Mistral AI, designed to offer a balanced trade-off between performance and resource efficiency. With 13 billion parameters, this model is positioned between the lightweight Mistral 7B and the more computationally intensive Mistral Large series. Mistral Medium is engineered for scenarios where users require strong performance across a range of tasks, such as natural language processing, code generation, and reasoning, without the overhead associated with larger models.

Mistral Small

Mistral Small is the most lightweight legacy model in Mistral AI's suite, designed for use cases where computational efficiency and low latency are paramount. With 2.7 billion parameters, it is tailored for deployment in resource-constrained environments, such as mobile devices, edge computing, and other applications where power consumption and memory usage are critical factors.

Mistral AI Pricing {#mistral-ai-pricing}

Mistral AI offers a range of pricing options tailored to different models and use cases, ensuring flexibility for various users, from developers to large enterprises. Pricing is divided into three main categories: input costs, output costs, and additional storage fees for fine-tuned models.

General Purpose Models

  • Mistral Nemo: The cost for processing input and generating output with Mistral Nemo is $0.3 per 1M tokens each. This model does not require any additional storage fees unless it is fine-tuned.
  • Mistral Large 2: This model is priced at $3 per 1M tokens for input and $9 per 1M tokens for output.

Specialist Models

  • Codestral: Designed for code generation, Codestral has an input cost of $1 per 1M tokens and an output cost of $3 per 1M tokens.
  • Mistral Embed: Optimized for generating text embeddings, Mistral Embed is priced at $0.01 per 1M tokens for both input and output.

Fine-Tuning Models

For users who require custom fine-tuning of models, Mistral AI offers a one-off training cost and a monthly storage fee, depending on the model:

  • Mistral Nemo: Fine-tuning costs $1 per 1M tokens, with an additional storage fee of $2 per month per model.
  • Codestral: Fine-tuning this model costs $3 per 1M tokens, with a $2 monthly storage fee.
  • Mistral Large 2: Fine-tuning Mistral Large 2 is the most expensive, at $9 per 1M tokens, and it requires a $4 per month storage fee.

Legacy Models

Legacy models offer a more economical option with slightly lower costs:

  • Mistral 7B: Both input and output are priced at $0.25 per 1M tokens.
  • Mixtral 8x7B: This model costs $0.7 per 1M tokens for input and output.
  • Mixtral 8x22B: Priced at $2 per 1M tokens for input and $6 per 1M tokens for output.
  • Mistral Small: Input is priced at $1 per 1M tokens and output at $3 per 1M tokens.
  • Mistral Medium: With an input cost of $2.75 per 1M tokens and output at $8.1 per 1M tokens.

Getting Started with Mistral AI API {#getting-started-with-mistral-ai-api}

Mistral AI API enables developers to integrate AI models into their applications with minimal setup. The API is accessible through La Plateforme, and to get started, you must first activate payments on your account to obtain API keys. Once your API keys are activated, you can begin interacting with Mistral’s models using a few lines of code.

Example: Using Mistral AI for Chat Completion

To use the Mistral AI API for chat completion, you can follow this basic example:

import os from mistralai import Mistral # Set your API key api_key = os.environ["MISTRAL_API_KEY"] # Choose the model you want to use model = "mistral-large-latest" # Initialize the Mistral client with your API key client = Mistral(api_key=api_key) # Create a chat request chat_response = client.chat.complete( model=model, messages=[ { "role": "user", "content": "What are the benefits of open source LLMs?", }, ] ) # Print the response from the model print(chat_response.choices[0].message.content)

In this example, the

Mistral
class is initialized with your API key, and a chat completion request is sent to the model specified by
model
. The model then processes the input and returns a response, which is printed to the console.

Example: Generating Text Embeddings

Mistral AI also provides an embeddings API, which can be used to generate text embeddings—vector representations that capture the semantic meaning of text. Below is an example of how to generate embeddings using the Mistral API:

import os from mistralai import Mistral # Set your API key api_key = os.environ["MISTRAL_API_KEY"] # Specify the embedding model to use model = "mistral-embed" # Initialize the Mistral client client = Mistral(api_key=api_key) # Generate embeddings for the input text embeddings_response = client.embeddings.create( model=model, inputs=["Embed this sentence.", "As well as this one."] ) # Print the embeddings print(embeddings_response)

In this case, the

mistral-embed
model is used to create embeddings for the provided text inputs. The API returns the embeddings as numerical vectors, which can then be utilized in various NLP tasks, such as semantic search or clustering.

Build LLM Applications with Mistral and Acorn {#build-llm-applications-with-mistral-and-acorn}

Visit https://gptscript.ai to download GPTScript and start building today. As we expand on the capabilities with GPTScript, we are also expanding our list of tools. With these tools, you can create any application imaginable: check out tools.gptscript.ai to get started.