Hands-onLLM使用LoRA在Keras中微调谷歌Gemma模型作者： Mist君的风控与数据成长路来源： Mist君的风控与数据成长路 4月13日笔者有幸参加Google Developer Group 上海站活动Build with AI，动手时间在Kaggle平台及Colab平台使用了 Gemma 模型。本文由Google关于Gemma的技术文档及Bui

Hands-onLLM使用LoRA在Keras中微调谷歌Gemma模型

By AiBard123
April 18, 2024 - 2 min read

作者： Mist君的风控与数据成长路来源： Mist君的风控与数据成长路

4月13日笔者有幸参加Google Developer Group 上海站活动Build with AI，动手时间在Kaggle平台及Colab平台使用了 Gemma 模型。本文由Google关于Gemma的技术文档及Build with AI in Shanghai Coding Time 使用的Notebook代码（Gemma | Kaggle，可点击阅读原文Gemma-Code-Model Notebooks查看或复制星标文档）整理而成。

Gemma 模型

Gemma 是谷歌的轻量级* 、先进的开放模型系列，采用与创建 Gemini 模型相同的研究和技术构建而成，由Google DeepMind 和 Google 的其他团队开发，以拉丁语 gemma（意为“宝石”）命名。

Gemma 模型可在个人应用以及硬件、移动设备或托管服务上运行，可使用调参技术自定义这些模型。

模型大小和功能

*Gemma 模型 现有2B和7B两种，可以根据可用的计算资源、所需的功能和运行位置来构建生成式 AI 解决方案。入门级可使用 2B 参数大小，可以在移动设备和笔记本电脑运行，以降低资源要求并在部署模型时更加灵活。

借助 Keras 3.0 的多支持功能，可以在 TensorFlow、JAX 和 PyTorch 上运行这些模型，甚至可以使用 JAX 的原生实现（基于 FLAX 框架）和 PyTorch。
可从 Kaggle Models 下载 Gemma 模型。

轻量级微调方法

*基于添加的方法 ：向模型中添加少量参数，然后对其进行微调；

*基于规范的方法 ：通过调整模型架构或超参数以实现有效的调整；

*基于重新参数化的方法 ：修改模型的现有参数，实现轻量级微调。

* 例如，低秩适应 （LoRA），本示文例将采用这种方法。

代码时间

*设置

访问 Gemma

要完成本教程，您首先需要完成 Gemma setup 中的设置说明。Gemma 设置说明向您展示了如何执行以下操作：

Gemma 模型由 Kaggle 托管。要使用 Gemma，请在 Kaggle 上请求访问权限：

登录或注册 kaggle.com
打开 Gemma 模型卡并选择“请求访问”
填写同意书并接受条款和条件

*安装依赖

# Install Keras 3 last. See https://keras.io/getting_started/ for more details.
!pip install -q -U keras-nlp
!pip install -q -U keras>=3

*选择后端

Keras 是一个高级、多框架的深度学习 API，旨在实现简单易用性。使用 Keras 3，您可以在以下三个后端之一上运行工作流：TensorFlow、JAX 或 PyTorch。

在本教程中，配置 JAX 的后端。

import os
  

os.environ["KERAS_BACKEND"] = "jax"  # Or "torch" or "tensorflow".
# Avoid memory fragmentation on JAX backend.
os.environ["XLA_PYTHON_CLIENT_MEM_FRACTION"]="1.00"

*导入包

导入 Keras 和 KerasNLP。

import keras
import keras_nlp

*加载数据集

对数据进行预处理。本教程使用 1000 个训练示例的子集来更快地执行笔记本。考虑使用更多训练数据进行更高质量的微调。

import json
data = []
with open('/kaggle/input/databricks-dolly-15k/databricks-dolly-15k.jsonl') as file:
    for line in file:
        features = json.loads(line)
        # Filter out examples with context, to keep it simple.
        if features["context"]:
            continue
        # Format the entire example as a single string.
        template = "Instruction:\n{instruction}\n\nResponse:\n{response}"
        data.append(template.format(**features))
  

  

# Only use 1000 training examples, to keep it fast.
data = data[:1000]

*加载模型

KerasNLP 提供了许多流行的模型架构的实现。在本教程中，您将使用 GemmaCausalLM 创建一个模型，这是一个用于因果语言建模的端到端 Gemma 模型。因果语言模型根据先前的标记预测下一个标记。

使用 from_preset 方法创建模型：

gemma_lm = keras_nlp.models.GemmaCausalLM.from_preset("gemma_2b_en")
gemma_lm.summary()






┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓  
┃ Tokenizer (type)                                   ┃                                             Vocab # ┃  
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩  
│ gemma_tokenizer (GemmaTokenizer)                   │                                             256,000 │  
└────────────────────────────────────────────────────┴─────────────────────────────────────────────────────┘

from_preset 方法从预设架构和权重实例化模型。在上面的代码中，字符串“gemma_2b_en”指定了预设架构——一个具有 20 亿个参数的 Gemma 模型。

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓  
┃ Layer (type)                  ┃ Output Shape              ┃         Param # ┃ Connected to               ┃  
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩  
│ padding_mask (InputLayer)     │ (None, None)              │               0 │ -                          │  
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤  
│ token_ids (InputLayer)        │ (None, None)              │               0 │ -                          │  
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤  
│ gemma_backbone                │ (None, None, 2048)        │   2,506,172,416 │ padding_mask[0][0],        │  
│ (GemmaBackbone)               │                           │                 │ token_ids[0][0]            │  
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤  
│ token_embedding               │ (None, None, 256000)      │     524,288,000 │ gemma_backbone[0][0]       │  
│ (ReversibleEmbedding)         │                           │                 │                            │  
└───────────────────────────────┴───────────────────────────┴─────────────────┴────────────────────────────┘

注意：还提供具有 70 亿个参数的 Gemma 模型。要在 Colab 中运行更大的模型，您需要访问付费计划中提供的高级 GPU。或者，您可以在 Kaggle 或 Google Cloud 上对 Gemma 7B 模型执行分布式调整，在下一篇推文中将介绍分布式调整方法与步骤。

*微调前的推理

在本节中，将使用各种提示查询模型，以查看其响应方式。

*欧洲旅行提示

查询模型以获取有关欧洲旅行时该做什么的建议。

prompt = template.format(
    instruction="What should I do on a trip to Europe?",
    response="",
)
print(gemma_lm.generate(prompt, max_length=256))

Instruction:

What should I do on a trip to Europe?  
  
Response:  
1. Take a trip to Europe.  
2. Take a trip to Europe.  
3. Take a trip to Europe.  
4. Take a trip to Europe.  
5. Take a trip to Europe.  
6. Take a trip to Europe.  
7. Take a trip to Europe.  
8. Take a trip to Europe.  
9. Take a trip to Europe.  
10. Take a trip to Europe.  
11. Take a trip to Europe.  
12. Take a trip to Europe.  
13. Take a trip to Europe.  
14. Take a trip to Europe.  
15. Take a trip to Europe.  
16. Take a trip to Europe.  
17. Take a trip to Europe.  
18. Take a trip to Europe.  
19. Take a trip to Europe.  
20. Take a trip to Europe.  
21. Take a trip to Europe.  
22. Take a trip to Europe.  
23. Take a trip to Europe.  
24. Take a trip to Europe.  
25. Take a trip to

该模型只是回应了去欧洲旅行的建议。而且会出现重复输出的情况。

*ELI5 光合作用提示

提示模型用足够简单的术语解释光合作用，让 5 岁的孩子能够理解。

prompt = template.format(
    instruction="Explain the process of photosynthesis in a way that a child could understand.",
    response="",
)
print(gemma_lm.generate(prompt, max_length=256))






Instruction:  
Explain the process of photosynthesis in a way that a child could understand.  
  
Response:  
Photosynthesis is the process by which plants use the energy from the sun to convert water and carbon dioxide into oxygen and glucose. The process begins with the absorption of light energy by chlorophyll molecules in the leaves of plants. The energy from the light is used to split water molecules into hydrogen and oxygen. The oxygen is released into the atmosphere, while the hydrogen is used to make glucose. The glucose is then used by the plant to make energy and grow.  
  
Explanation:  
Photosynthesis is the process by which plants use the energy from the sun to convert water and carbon dioxide into oxygen and glucose. The process begins with the absorption of light energy by chlorophyll molecules in the leaves of plants. The energy from the light is used to split water molecules into hydrogen and oxygen. The oxygen is released into the atmosphere, while the hydrogen is used to make glucose. The glucose is then used by the plant to make energy and grow.  
  
Explanation:  
  
Photosynthesis is the process by which plants use the energy from the sun to convert water and carbon dioxide into oxygen and glucose. The process begins with the absorption of light energy by chlorophyll molecules in the leaves of plants. The energy from

这些回答包含对孩子来说可能不容易理解的单词，例如叶绿素、葡萄糖等。同样会出现重复输出的情况。

*LoRA 微调

若要从模型获得更好的响应，请使用 Databricks Dolly 15k 数据集通过低秩适应（LoRA）微调模型。

LoRA 秩决定了添加到 LLM 原始权重的可训练矩阵的维数。它控制微调调整的表现力和精度。

更高的秩意味着可以进行更详细的更改，但也意味着更多可训练的参数。较低的秩意味着更少的计算开销，但可能不太精确的适应。

本教程使用秩为4 的 LoRA 。在实践中，从相对较小的秩（例如 4、8、16）开始。这对于实验来说是计算效率高的。使用该秩训练模型，并评估任务的性能改进。在随后的试验中逐渐提高秩，看看这是否会进一步提高性能。

# Enable LoRA for the model and set the LoRA rank to 4.
gemma_lm.backbone.enable_lora(rank=4)
gemma_lm.summary()






┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓  
┃ Tokenizer (type)                                   ┃                                             Vocab # ┃  
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩  
│ gemma_tokenizer (GemmaTokenizer)                   │                                             256,000 │  
└────────────────────────────────────────────────────┴─────────────────────────────────────────────────────┘

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓  
┃ Layer (type)                  ┃ Output Shape              ┃         Param # ┃ Connected to               ┃  
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩  
│ padding_mask (InputLayer)     │ (None, None)              │               0 │ -                          │  
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤  
│ token_ids (InputLayer)        │ (None, None)              │               0 │ -                          │  
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤  
│ gemma_backbone                │ (None, None, 2048)        │   2,507,536,384 │ padding_mask[0][0],        │  
│ (GemmaBackbone)               │                           │                 │ token_ids[0][0]            │  
├───────────────────────────────┼───────────────────────────┼─────────────────┼────────────────────────────┤  
│ token_embedding               │ (None, None, 256000)      │     524,288,000 │ gemma_backbone[0][0]       │  
│ (ReversibleEmbedding)         │                           │                 │                            │  
└───────────────────────────────┴───────────────────────────┴─────────────────┴────────────────────────────┘

可以注意到，启用 LoRA 会显著减少可训练参数的数量（从 25 亿个减少到 130 万个）。

# Limit the input sequence length to 512 (to control memory usage).
gemma_lm.preprocessor.sequence_length = 512
# Use AdamW (a common optimizer for transformer models).
optimizer = keras.optimizers.AdamW(
    learning_rate=5e-5,
    weight_decay=0.01,
)
# Exclude layernorm and bias terms from decay.
optimizer.exclude_from_weight_decay(var_names=["bias", "scale"])
  

  

gemma_lm.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=optimizer,
    weighted_metrics=[keras.metrics.SparseCategoricalAccuracy()],
)
gemma_lm.fit(data, epochs=1, batch_size=1)

1000/1000 ━━━━━━━━━━━━━━━━━━━━ 803s 782ms/step - loss: 0.4597 - sparse_categorical_accuracy: 0.5228

*微调后的推理

微调后，响应将按照提示中提供的说明进行操作。

欧洲旅行提示

prompt = template.format(
    instruction="What should I do on a trip to Europe?",
    response="",
)
print(gemma_lm.generate(prompt, max_length=256))






Instruction:  
What should I do on a trip to Europe?  
  
Response:  
You should plan to see the most famous sights in Europe. The Eiffel Tower, the Acropolis, and the Colosseum are just a few. You should also plan on seeing as many countries as possible. There are so many amazing places in Europe, it is a shame to not see them all.  
  
Additional Information:  
Europe is a very interesting place to visit for many reasons, not least of which is that there are so many different places to see.

该模型现在推荐欧洲的旅游地点。

*ELI5 光合作用提示

prompt = template.format(
    instruction="Explain the process of photosynthesis in a way that a child could understand.",
    response="",
)
print(gemma_lm.generate(prompt, max_length=256))






Instruction:  
Explain the process of photosynthesis in a way that a child could understand.  
  
Response:  
Photosynthesis is a process in which plants and photosynthetic organisms (such as algae, cyanobacteria, and some bacteria and archaea) use light energy to convert water and carbon dioxide into sugar and release oxygen. This process requires chlorophyll, water, carbon dioxide, and energy. The chlorophyll captures the light energy and uses it to power a reaction that converts the carbon from carbon dioxide into organic molecules (such as sugar) that can be used for energy. The process also generates oxygen as a by-product.

该模型现在用更简单的术语解释了光合作用。

注意：本教程仅针对一个时期和较低秩的 LoRA 对数据集的一小部分子集上的模型进行微调。若要从微调模型中获得更好的响应，可以尝试：

增加微调数据集的大小
更多步骤（epoch）的训练
设置更高秩的 LoRA
修改超参数值，例如 learning_rate 和 weight_decay。

*摘要和后续步骤

本教程介绍了使用 KerasNLP 对 Gemma 模型进行 LoRA 微调。接下来查看以下文档：

了解如何使用 Gemma 模型生成文本。
了解如何在 Gemma 模型上执行分布式微调和推理。
了解如何将 Gemma 开放模型与 Vertex AI 结合使用。
了解如何使用 KerasNLP 微调 Gemma 并部署到 Vertex AI。

更多AI工具，参考Github-AiBard123，国内AiBard123

可关注我们的公众号：每天AI新工具