Google Gemini is a multi-modal large language model (LLM). It provides natural language and image processing capabilities to enable text generation, sentiment analysis, document processing, image and video analysis, and more. Using the Gemini API, developers can integrate AI functionalities into their applications without needing deep expertise in machine learning algorithms.
The Google Gemini API provides the following key features:
In this article:
The Google Gemini API offers two main pricing tiers:
The free tier is available for all versions of the Gemini models, including Gemini 1.5 Flash, Gemini 1.5 Pro, Gemini 1.0 Pro, and text embedding 004. This tier provides limited rate limits and free access to various capabilities. For example:
This model charges users according to actual usage. Rates vary significantly across different Gemini models:
Related content: Read our guide to Google Gemini Pro
The instructions in this and the following section are adapted from the Gemini documentation. We’ll show how to get started with the Google Gemini API using the Python SDK. SDKs are also available for Node.js, Go, Dart (Flutter), and Android Swift.
Before you begin, ensure that your local environment meets the following requirements:
The Google Gemini API SDK is part of the
google-generativeai
pip
pip install -q -U google-generativeai
Next, you need to set up your API key to authenticate your requests to the Gemini API. You should generate this key from the Google AI Studio.
Once you have your API key, configure it as an environment variable to keep it secure. This practice is recommended over hardcoding your API key directly into your code to prevent accidental exposure.
export API_KEY=<YOUR_API_KEY>
Before making any API calls, you need to import the
google.generativeai
Gemini 1.5 Flash
import google.generativeai as genai import os # Configure the API key genai.configure(api_key=os.environ["API_KEY"]) # Initialize the Gemini 1.5 Flash model model = genai.GenerativeModel('gemini-1.5-flash')
Now that your model is initialized, you can generate text using the API. Here's a simple example that asks the model to write a story about an AI and magic.
# Generate text content with the model response = model.generate_content("Write a script for the first episode of a sci-fi TV series") # Output the generated text print(response.text)
This code sends a prompt to the Gemini API and returns a generated story based on your input. The response contains the generated text, which you can print or use in your application.
The Google Gemini API provides various methods for generating text, whether from a simple text prompt or a combination of text and images.
The simplest way to generate text using the Gemini API is by providing a single text prompt.
model = genai.GenerativeModel("gemini-1.5-flash") response = model.generate_content("Write a script for the first episode of a sci-fi TV series.") print(response.text)
In this example, the prompt
"Write a script…"
The Gemini API also supports multimodal inputs, allowing you to generate text based on a combination of text and images. This can be particularly useful when the context involves visual elements that need to be described or analyzed.
import PIL.Image model = genai.GenerativeModel("gemini-1.5-flash") toyota = PIL.Image.open("toyota.jpg") response = model.generate_content(["Tell me about this car", toyota]) print(response.text)
In this snippet, the model is provided with both a text prompt (
"Tell me about this car"
toyota.jpg
For scenarios where you need faster interactions, the Gemini API supports text streaming. This allows you to start receiving parts of the response before the entire generation process is complete, which can be beneficial for real-time applications.
model = genai.GenerativeModel("gemini-1.5-flash") response = model.generate_content("Write a fairytale about an enchanted kettle.", stream=True) for chunk in response: print(chunk.text) print("_" * 80)
The
stream=True
The Gemini API can also be used to create interactive chat experiences. This feature is suitable for applications like customer support, tutoring systems, or any scenario that requires a back-and-forth conversation.
model = genai.GenerativeModel("gemini-1.5-flash") chat = model.start_chat( history=[ {"role": "user", "parts": "Hi there"}, {"role": "model", "parts": "Nice to meet you. How can I help you?"}, ] ) response = chat.send_message("I have 3 cats in my room.") print(response.text) response = chat.send_message("How many paws are there in my room?") print(response.text)
In this example, the chat history is initialized with a greeting from both the user and the model. The
send_message
The Gemini API offers several configuration options to customize how text is generated. You can control the length, randomness, and stopping conditions of the generated content using
GenerationConfig
model = genai.GenerativeModel("gemini-1.5-flash") response = model.generate_content( "Write a fairytale about an enchanted kettle.", generation_config = genai.types.GenerationConfig( candidate_count = 1, stop_sequences = ["x"], max_output_tokens = 300, temperature = 1.0, ), ) print(response.text)
Here,
GenerationConfig
candidate_count
stop_sequences
max_output_tokens
temperature
These settings allow for fine-tuning the model's output, making it more suited to specific tasks or requirements. For example, lowering the temperature might be useful for generating more predictable, fact-based content, while a higher temperature could be used for creative writing.
The Google Gemini API offers document processing capabilities, particularly with the Gemini 1.5 Pro and 1.5 Flash models. These models can handle up to 3,600 pages per document, with each page equating to approximately 258 tokens. Documents must be in PDF format, and while there is no explicit pixel limit, the resolution is managed to optimize performance.
For optimal results, it's essential to prepare your documents by ensuring they are correctly oriented and free of blurriness. Additionally, if you are working with single-page documents, you should place the text prompt immediately after the page.
To process documents, you first need to upload them using the File API. This API can handle files of any size, making it ideal for large documents that exceed the 20 MB limit imposed by other methods. The File API also allows for up to 20 GB of storage per project, with individual files stored for 48 hours.
Here’s how you can upload a PDF document:
import google.generativeai as genai !curl -o gemini.pdf https://storage.googleapis.com/cloud-samples-data/generative-ai/pdf/2403.05530.pdf sample_file = genai.upload_file(path = "gemini.pdf", display_name = "Gemini 1.5 PDF") print(f"Uploaded file '{sample_file.display_name}' as: {sample_file.uri}")
The
upload_file
After uploading, you can verify that the file was successfully stored and retrieve its metadata:
file = genai.get_file(name = sample_file.name) print(f"Retrieved file '{file.display_name}' as: {sample_file.uri}")
This step ensures that your file is correctly uploaded and accessible for further processing.
Once your document is uploaded, you can use it in conjunction with a text prompt to generate content:
model = genai.GenerativeModel(model_name = "gemini-1.5-flash") response = model.generate_content([sample_file, "Please provide a summary of this document as a list of bullets."]) print(response.text)
This example prompts the Gemini API to summarize the content of the uploaded document.
You can also upload multiple documents and process them together:
sample_file_2 = genai.upload_file(path = "example-1.pdf") sample_file_3 = genai.upload_file(path = "example-2.pdf") prompt = "Provide a summary of the main differences between the abstracts for each thesis." response = model.generate_content([prompt, sample_file, sample_file_2, sample_file_3]) print(response.text)
This approach is useful when you need to compare or aggregate information across several documents.
You can list all the files you've uploaded:
for file in genai.list_files():
`print(f"{file.display_name}, URI: {file.uri}")`
And you can manually delete files before their automatic deletion after 48 hours:
genai.delete_file(sample_file.name) print(f'Deleted file {sample_file.uri}')
This functionality helps you manage your file storage effectively while using the Gemini API.
The Google Gemini API includes a feature called code execution, which enables the model to generate, execute, and refine Python code as part of its response. This is useful for applications that require complex problem-solving, such as performing calculations, processing data, or running simulations. The model can use code execution to iteratively improve its output based on the results of the code it generates, making it a tool for dynamic, code-driven applications.
First, you'll need to initialize the Gemini API model with the code execution capability enabled. The following Python code demonstrates how to do this:
import os import google.generativeai as genai genai.configure(api_key = os.environ['API_KEY']) model = genai.GenerativeModel( model_name = 'gemini-1.5-pro', tools = 'code_execution' ) response = model.generate_content(( 'Tell me the sum of the first 40 odd numbers.' 'Generate and run code for this calculation, and make sure it includes all 40.' )) print(response.text)
In this example, the model is instructed to calculate the sum of the first 40 odd numbers. The Gemini API generates the necessary Python code, executes it, and then returns the result.
When the API is asked to solve a problem, it generates Python code to achieve the desired outcome. Here's an example of what the generated code might look like:
sum = 0 for i in range(1, 80, 2): sum = sum + i print(f'{sum=}')
In this code, the
for
1
79
2
1
2
1, 3, 5, 7, ..., 79
Within the loop, the current number
i
sum
Output Example:
The sum of the first 40 odd numbers is: 1600
If you prefer, you can also enable code execution directly within the
generate_content
response = model.generate_content( ('Tell me the sum of the first 40 odd numbers? ' 'Generate and run code for this calculation, and make sure it includes all 40.'), tools='code_execution' )
This method achieves the same result but gives you flexibility in configuring code execution on a per-request basis.
You can also integrate code execution within an interactive chat session:
chat = model.start_chat() response = chat.send_message(( 'Tell me the sum of the first 40 odd numbers? ' 'Generate and run code for this calculation, and make sure it includes all 40.' )) print(response.text)
In this scenario, the code execution is part of an ongoing conversation, making it suitable for applications like tutoring systems or interactive coding assistants.
Visit https://gptscript.ai to download GPTScript and start building today. As we expand on the capabilities with GPTScript, we are also expanding our list of tools. With these tools, you can create any application imaginable: check out tools.gptscript.ai to get started.