Google Gen AI 5-Day Intensive: Day Four – Part 2 (4/5)

Codelab #2 – Use Google Search In Generation

This is the first assigned codelab on day four of the intensive. Download it here from Github to run locally or run in this Kaggle notebook.

"""Use Google Search in Generation

Google Gen AI 5-Day Intensive Course
Host: Kaggle

Day: 4

Codelab: https://www.kaggle.com/code/markishere/day-4-google-search-grounding
"""

import io
import os
from pprint import pprint

from google import genai
from google.api_core import retry
from google.genai import types
from IPython.display import HTML, Image, Markdown, display

client = genai.Client(api_key=os.environ["GOOGLE_API_KEY"])

# Define a retry policy. The model might make multiple consecutive calls automatically
# for a complex query, this ensures the client retries if it hits quota limits.
is_retriable = lambda e: (
    isinstance(e, genai.errors.APIError) and e.code in {429, 503}
)

if not hasattr(genai.models.Models.generate_content, "__wrapped__"):
    genai.models.Models.generate_content = retry.Retry(predicate=is_retriable)(
        genai.models.Models.generate_content
    )

# To enable search grounding, specify it as a tool 'google_search'
# as a parameter in `GenerateContentConfig` passed to `generate_content`

# Ask for information without search grounding
response = client.models.generate_content(
    model="gemini-2.0-flash",
    contents="When and where is Billie Eilish's next concert?",
)
Markdown(response.text)

# And now rerun the same query with search grounding enabled.
config_with_search = types.GenerateContentConfig(
    tools=[types.Tool(google_search=types.GoogleSearch())]
)


def query_with_grounding():
    response = client.models.generate_content(
        model="gemini-2.0-flash",
        contents="When and where is Billie Eilish's next concert?",
        config=config_with_search,
    )
    
    return response


rc = query_with_grounding()
Markdown(rc.text)


# Response metadata
# Get links to search suggestions, supporting documents and information
# on how they were used.
while (
    not rc.grounding_metadata.grounding_supports
    or not rc.grounding_metadata.grounding_chunks
):
    # If incomplete groundind data was returned, retry.
    rc = query_with_grounding()

chunks = rc.grounding_metadata.grounding_chunks
for chunk in chunks:
    print(f"{chunk.web.title}: {chunk.web.url}")

HTML(rc.grounding_metadata.search_entry_point.rendered_content)

supports = rc.grounding_metadata.grounding_supports
for support in supports:
    pprint(support.to_json_dict())

markdown_buffer = io.StringIO()

# Print the text with footnote markers.
markdown_buffer.write("Supported text:\n\n")
for support in supports:
    markdown_buffer.write(" * ")
    markdown_buffer.write(
        rc.content.parts[0].text[
            support.segment.start_index : support.segment.end_index
        ]
    )

    for i in support.grounding_chunk_indices:
        chunk = chunks[i].web
        markdown_buffer.write(f"<sup>[{i + 1}]</sup>")

    markdown_buffer.write("\n\n")

# Print footnotes.
markdown_buffer.write("Citations:\n\n")
for i, chunk in enumerate(chunks, start=1):
    markdown_buffer.write(f"{i}. [{chunk.web.title}]({chunk.web.url})\n")

Markdown(markdown_buffer.getvalue())


# Search with tools
# Use Google search grounding and code generation tools
def show_response(response):
    for p in response.candidates[0].content.parts:
        if p.text:
            display(Markdown(p.text))
        elif p.inline_data:
            display(Image(p.inline_data.data))
        else:
            print(p.to_json_dict())
        
        display(Markdown('----'))
        
config_with_search = types.GenerateContentConfig(
    tools=[types.Tool(google_search=types.GoogleSearch())],
    temperature=0.0
)

chat = client.chats.create(model='gemini-2.0-flash')

response = chat.send_message(
    message="What were the medal tallies, by top-10 countries, for the 2024 Olympics?",
    config=config_with_search
)

show_response(response)

config_with_code = types.GenerateContentConfig(
    tools=[types.Tool(code_execution=types.ToolCodeExecution())],
    temperature=0.0
)

response = chat.send_message(
    message="Now plot this as a Seaborn chart. Break out the medals too.",
    config=config_with_code
)

show_response(response)

Google Gen AI 5-Day Intensive: Day Four – Part 1 (4/5)

Codelab #1 – Tune A Gemini Model

This is the first assigned codelab on day four of the intensive. Download it here from Github to run locally or run in this Kaggle notebook.

"""Tune Gemini Model for Custom Function

Google Gen AI 5-Day Intensive Course
Host: Kaggle

Day: 4

Codelab: https://www.kaggle.com/code/markishere/day-4-fine-tuning-a-custom-model
"""

import datetime
import email
import os
import re
import time
import warnings
from collections.abc import Iterable

import pandas as pd
import tqdm
from google import genai
from google.api_core import retry
from google.genai import types
from sklearn.datasets import fetch_20newsgroups
from tqdm.rich import tqdm as tqdmr

client = genai.Client(api_key=os.environ["GOOGLE_API_KEY"])

for model in client.models.list():
    if "createTunedModel" in model.supported_actions:
        print(model.name)
        
newgroups_train = fetch_20newsgroups(subset='train')
newgroups_test = fetch_20newsgroups(subset='test')

# View list of class names for dataset
newsgroups_train.target_names
print(newsgroups_train.date[0])

def preprocess_newsgroup_row(data):
    # Extract only the subject and body.
    msg = email.message_from_string(data)
    text = f'{msg["Subject"]}\n\n{msg.get_payload()}'
    # Strip any remaining email addresses
    text = re.sub(r"[\w\.-]+@[\w\.-]+", "", text)
    # Truncate the text to fit within the input limits
    text = text[:40000]
    
    return text
    
def preprocess_newsgroup_data(newsgroup_dataset):
    # Put the points into a DataFrame
    df = pd.DataFrame(
        {
            'Text': newsgroup_dataset.data,
            'Label': newsgroup_dataset.target
        }
    )
    #  Clean up the text
    df['Text'] = df['Text'].apply(preprocess_newsgroup_row)
    # Match label to target name index
    df['Class Name'] = df['Label'].map(lambda l: newsgroup_dataset.target_names[l])
    
    return df

# Apply preprocessing to training and test datasets
df_train = preprocess_newsgroup_data(newgroups_train)
df_test = preprocess_newsgroup_data(newgroups_test)

df_train.head()

def sample_data(df, num_samples, classes_to_keep):
    # Sample rows, selecting num_samples of each label.
    df = (
        df.groupby('Label')[df.columns]
        .apply(lambda x: x.sample(num_samples))
        .reset_index(drop=True)
    )
    
    df = df[df['Class Name'].str.contains(classes_to_keep)]
    df['Class Name'] = df['Class Name'].astype('category')
    
    return df

TRAIN_NUM_SAMPLES = 50
TEST_NUM_SAMPLES = 10
# Keep rec.* and sci.*
CLASSES_TO_KEEP = '^rec|^sci'

df_train = sample_data(df_train, TRAIN_NUM_SAMPLES, CLASSES_TO_KEEP)
df_test = sample_data(df_test, TEST_NUM_SAMPLES, CLASSES_TO_KEEP)

# Evaluate baseline performance
sample_idx = 0
sample_row = preprocess_newsgroup_row(newsgroups_test.data[sample_idx])
sample_label = newsgroups_test.target_names[newsgroups_test.target[sample_idx]]

print(sample_row)
print('---')
print('Label:', sample_label)

response = client.models.generate_content(
    model='gemini-1.5-flash-001',
    contents=sample_row
)
print(response.text)


# Ask the model directly in a zero-shot prompt.

prompt = "From what newsgroup does the following message originate?"
baseline_response = client.models.generate_content(
    model="gemini-1.5-flash-001",
    contents=[prompt, sample_row])
print(baseline_response.text)


# You can use a system instruction to do more direct prompting, and get a
# more succinct answer.

system_instruct = """
You are a classification service. You will be passed input that represents
a newsgroup post and you must respond with the newsgroup from which the post
originates.
"""

# Define a helper to retry when per-minute quota is reached.
is_retriable = lambda e: (isinstance(e, genai.errors.APIError) and e.code in {429, 503})

# If you want to evaluate your own technique, replace this body of this function
# with your model, prompt and other code and return the predicted answer.
@retry.Retry(predicate=is_retriable)
def predict_label(post: str) -> str:
    response = client.models.generate_content(
        model="gemini-1.5-flash-001",
        config=types.GenerateContentConfig(
            system_instruction=system_instruct),
        contents=post)

    rc = response.candidates[0]

    # Any errors, filters, recitation, etc we can mark as a general error
    if rc.finish_reason.name != "STOP":
        return "(error)"
    else:
        # Clean up the response.
        return response.text.strip()


prediction = predict_label(sample_row)

print(prediction)
print()
print("Correct!" if prediction == sample_label else "Incorrect.")


# Enable tqdm features on Pandas.
tqdmr.pandas()

# But suppress the experimental warning
warnings.filterwarnings("ignore", category=tqdm.TqdmExperimentalWarning)


# Further sample the test data to be mindful of the free-tier quota.
df_baseline_eval = sample_data(df_test, 2, '.*')

# Make predictions using the sampled data.
df_baseline_eval['Prediction'] = df_baseline_eval['Text'].progress_apply(predict_label)

# And calculate the accuracy.
accuracy = (df_baseline_eval["Class Name"] == df_baseline_eval["Prediction"]).sum() / len(df_baseline_eval)
print(f"Accuracy: {accuracy:.2%}")


# Tune a custom model
# Convert the data frame into a dataset suitable for tuning.
input_data = {'examples': 
    df_train[['Text', 'Class Name']]
      .rename(columns={'Text': 'textInput', 'Class Name': 'output'})
      .to_dict(orient='records')
 }

# If you are re-running this lab, add your model_id here.
model_id = None

# Or try and find a recent tuning job.
if not model_id:
  queued_model = None
  # Newest models first.
  for m in reversed(client.tunings.list()):
    # Only look at newsgroup classification models.
    if m.name.startswith('tunedModels/newsgroup-classification-model'):
      # If there is a completed model, use the first (newest) one.
      if m.state.name == 'JOB_STATE_SUCCEEDED':
        model_id = m.name
        print('Found existing tuned model to reuse.')
        break

      elif m.state.name == 'JOB_STATE_RUNNING' and not queued_model:
        # If there's a model still queued, remember the most recent one.
        queued_model = m.name
else:
    if queued_model:
        model_id = queued_model
        print('Found queued model, still waiting.')


# Upload the training data and queue the tuning job.
if not model_id:
    tuning_op = client.tunings.tune(
        base_model="models/gemini-1.5-flash-001-tuning",
        training_dataset=input_data,
        config=types.CreateTuningJobConfig(
            tuned_model_display_name="Newsgroup classification model",
            batch_size=16,
            epoch_count=2,
        ),
    )

    print(tuning_op.state)
    model_id = tuning_op.name

print(model_id)


MAX_WAIT = datetime.timedelta(minutes=10)

while not (tuned_model := client.tunings.get(name=model_id)).has_ended:

    print(tuned_model.state)
    time.sleep(60)

    # Don't wait too long. Use a public model if this is going to take a while.
    if datetime.datetime.now(datetime.timezone.utc) - tuned_model.create_time > MAX_WAIT:
        print("Taking a shortcut, using a previously prepared model.")
        model_id = "tunedModels/newsgroup-classification-model-ltenbi1b"
        tuned_model = client.tunings.get(name=model_id)
        break


print(f"Done! The model state is: {tuned_model.state.name}")

if not tuned_model.has_succeeded and tuned_model.error:
    print("Error:", tuned_model.error)
    

#  Use the new model
new_text = """
First-timer looking to get out of here.

Hi, I'm writing about my interest in travelling to the outer limits!

What kind of craft can I buy? What is easiest to access from this 3rd rock?

Let me know how to do that please.
"""

response = client.models.generate_content(
    model=model_id, contents=new_text)

print(response.text)


@retry.Retry(predicate=is_retriable)
def classify_text(text: str) -> str:
    """Classify the provided text into a known newsgroup."""
    response = client.models.generate_content(
        model=model_id, 
        contents=text)
    rc = response.candidates[0]

    # Any errors, filters, recitation, etc we can mark as a general error
    if rc.finish_reason.name != "STOP":
        return "(error)"
    else:
        return rc.content.parts[0].text


# The sampling here is just to minimise your quota usage. If you can, you should
# evaluate the whole test set with `df_model_eval = df_test.copy()`.
df_model_eval = sample_data(df_test, 4, '.*')

df_model_eval["Prediction"] = df_model_eval["Text"].progress_apply(classify_text)

accuracy = (df_model_eval["Class Name"] == df_model_eval["Prediction"]).sum() / len(df_model_eval)
print(f"Accuracy: {accuracy:.2%}")


# Compare token usage
# Calculate the *input* cost of the baseline model with system instructions.
sysint_tokens = client.models.count_tokens(
    model='gemini-1.5-flash-001', contents=[system_instruct, sample_row]
).total_tokens
print(f'System instructed baseline model: {sysint_tokens} (input)')

# Calculate the input cost of the tuned model.
tuned_tokens = client.models.count_tokens(model=tuned_model.base_model, contents=sample_row).total_tokens
print(f'Tuned model: {tuned_tokens} (input)')

savings = (sysint_tokens - tuned_tokens) / tuned_tokens
print(f'Token savings: {savings:.2%}')  # Note that this is only n=1.


# Tweak output token quantity
baseline_token_output = baseline_response.usage_metadata.candidates_token_count
print('Baseline (verbose) output tokens:', baseline_token_output)

tuned_model_output = client.models.generate_content(
    model=model_id, contents=sample_row)
tuned_tokens_output = tuned_model_output.usage_metadata.candidates_token_count
print('Tuned output tokens:', tuned_tokens_output)

Google Gen AI 5-Day Intensive: Day One (1/5)

Codelab 1/2 from day one. The code lab is here in Kaggle and you can download it to run locally on Github.

"""
Evaluation and Structured Output

Google Gen AI 5-Day Intensive Course
Host: Kaggle

Day: 1

Kaggle: https://www.kaggle.com/code/markishere/day-1-evaluation-and-structured-output
"""

import enum
import os

from google import genai
from google.api_core import retry
from google.genai import types
from IPython.display import Markdown, display

client = genai.Client(api_key=os.environ["GOOGLE_API_KEY"])

# Automated retry
is_retriable = lambda e: (
    isinstance(e, genai.errors.APIError) and e.code in {429, 503}
)

genai.models.Models.generate_content = retry.Retry(predicate=is_retriable)(
    genai.models.Models.generate_content
)

# if not hasattr(genai.models.Models.generate_content, '__wrapped__'):
#     genai.models.Models.generate_content = retry.Retry(
#         predicate=is_retriable)(genai.models.Models.generate_content)

# Evaluation
# Understand model performance
# Get the file locally first
# !wget -nv -O gemini.pdf https://storage.googleapis.com/cloud-samples-data/generative-ai/pdf/2403.05530.pdf
document_file = client.files.upload(file="/Users/renise/Documents/Python/gen_ai/day_one/gemini.pdf")
print("\n")
print(document_file)
print("\n")

print("\nSummarize a document\n")


# Summarize a document
def summarize_doc(request: str) -> str:
    """Execute the request on the uploaded document."""
    # Set the temperature low to stabilize the output.
    config = types.GenerateContentConfig(temperature=0.0)
    response = client.models.generate_content(
        model="gemini-2.0-flash",
        config=config,
        contents=[request, document_file],
    )
    return response.text


request = "Tell me about the training process used here."
summary = summarize_doc(request)
# display(Markdown(summary + "\n-----"))
print("\n\n")


# Define an evaluator
SUMMARY_PROMPT = """\
# Instruction
You are an expert evaluator. Your task is to evaluate the quality of the responses generated by AI models.
We will provide you with the user input and an AI-generated responses.
You should first read the user input carefully for analyzing the task, and then evaluate the quality of the responses based on the Criteria provided in the Evaluation section below.
You will assign the response a rating following the Rating Rubric and Evaluation Steps. Give step-by-step explanations for your rating, and only choose ratings from the Rating Rubric.

# Evaluation
## Metric Definition
You will be assessing summarization quality, which measures the overall ability to summarize text. Pay special attention to length constraints, such as in X words or in Y sentences. The instruction for performing a summarization task and the context to be summarized are provided in the user prompt. The response should be shorter than the text in the context. The response should not contain information that is not present in the context.

## Criteria
Instruction following: The response demonstrates a clear understanding of the summarization task instructions, satisfying all of the instruction's requirements.
Groundedness: The response contains information included only in the context. The response does not reference any outside information.
Conciseness: The response summarizes the relevant details in the original text without a significant loss in key information without being too verbose or terse.
Fluency: The response is well-organized and easy to read.

## Rating Rubric
5: (Very good). The summary follows instructions, is grounded, is concise, and fluent.
4: (Good). The summary follows instructions, is grounded, concise, and fluent.
3: (Ok). The summary mostly follows instructions, is grounded, but is not very concise and is not fluent.
2: (Bad). The summary is grounded, but does not follow the instructions.
1: (Very bad). The summary is not grounded.

## Evaluation Steps
STEP 1: Assess the response in aspects of instruction following, groundedness, conciseness, and verbosity according to the criteria.
STEP 2: Score based on the rubric.

# User Inputs and AI-generated Response
## User Inputs

### Prompt
{prompt}

## AI-generated Response
{response}
"""


# Define a structured enum class to capture the result.
class SummaryRating(enum.Enum):
    VERY_GOOD = 5
    GOOD = 4
    OK = 3
    BAD = 2
    VERY_BAD = 1


def eval_summary(prompt, ai_response):
    """Evaluate the generated summary against the prompt."""

    chat = client.chats.create(model="gemini-2.0-flash")

    # Generate the full text response
    response = chat.send_message(
        message=SUMMARY_PROMPT.format(prompt=prompt, response=ai_response)
    )
    verbose_eval = response.text

    # Coerce into desired structure
    structured_output_config = types.GenerateContentConfig(
        response_mime_type="text/x.enum", 
        response_schema=SummaryRating
    )
    response = chat.send_message(
        message="Convert the final score.", 
        config=structured_output_config
    )
    structured_eval = response.parsed

    return verbose_eval, structured_eval


text_eval, struct_eval = eval_summary(
    prompt=[request, document_file], 
    ai_response=summary
)
Markdown(text_eval)

# Play with the summary prompt
new_prompt = "Explain like I'm 5 the training process"

# Try:
#  ELI5 the training process
#  Summarise the needle/haystack evaluation technique in 1 line
#  Describe the model architecture to someone with a civil engineering degree
#  What is the best LLM?
if not new_prompt:
    raise ValueError("Try setting a new summarization prompt.")


def run_and_eval_summary(prompt):
    """Generate and evaluate the summary using the new prompt."""
    summary = summarize_doc(new_prompt)
    display(Markdown(summary + "\n-----"))

    text, struct = eval_summary([new_prompt, document_file], summary)
    display(Markdown(text + "\n-----"))
    print(struct)


run_and_eval_summary(new_prompt)

Google Gen AI 5-Day Intensive Course Overview

In late March 2025, Google held a generative AI 5-day intensive for developers. It was hosted by Kaggle and included daily codelabs for the first four days. The course included daily live video sessions on YouTube with insights from current Google employees.

Source: Google’s The Keyword Blog

The course featured:

  • YouTube videos: Each day a one hour live session hosted by Google that featured a Q&A codelab reviews.
  • Discord server: The Kaggle server gave attendees the opportunity to ask questions, share ideas and get support.
  • Kaggle notebooks: Codelabs were hosted in Kaggle notebooks and the day one notebook is here as an example.

The course is targeted towards developers who already know how to write code. If you’re new to coding, the code will be more challenging because the codelabs require you to read the provided code.

It taught us how to use Vertex AI and Gemini APIs to implement generative AI for a range of use cases. We learned about agents, functions, creating custom models, how to fine tune a model, to name a few. The course is structured to provide coding examples that developers can use to develop there own generative AI solutions.

To take a future offering of the course, check this page for when another session will be offered. The first sesson was offered in November 2024 and the second was held in April 2025.