stepbystep将GPT4functioncall准确率从35%提高到75%
作者: NLP前沿 来源: NLP前沿
https://blog.composio.dev/improving-function-calling-accuracy-for-agentic-integrations/
背景
参照clickup文档,选定了一些api,并转换成openai function schema格式,具体如下:
https://github.com/SamparkAI/fcalling-clickup-blog-code/blob/master/clickup_space_schema.json
get_spaces(team_id:string, archived:boolean)
create_space(team_id:string, name:string, multiple_assignees:boolean, features:(due_dates:(enabled:boolean, start_date:boolean, remap_due_dates:boolean, remap_closed_due_date:boolean), time_tracking:(enabled:boolean)))
get_space(space_id:string)
update_space(space_id:string, name:string, color:string, private:boolean, admin_can_manage:boolean, multiple_assignees:boolean, features:(due_dates:(enabled:boolean, start_date:boolean, remap_due_dates:boolean, remap_closed_due_date:boolean), time_tracking:(enabled:boolean)))
delete_space(space_id:string)
get_space_tags(space_id:string)
create_space_tag(space_id:string, tag:(name:string, tag_fg:string, tag_bg:string))
delete_space_tag(space_id:string, tag_name:string, tag:(name:string, tag_fg:string, tag_bg:string))
为了有效地评估结果,制作了小型的基准数据集,要求解决八个选定函数之一,范围从简单到复杂。 我们的评估将基于使用正确参数调用函数的准确程度。
https://github.com/SamparkAI/fcalling-clickup-blog-code/blob/master/clickup_space_benchmark.json
[
{
"prompt": "As the new fiscal year begins, the management team at a marketing agency decides it's time to archive older projects to make way for new initiatives. They remember that one of their teams is called \"Innovative Solutions\" and operates under the team ID \"team123\". They want to check which spaces under this team are still active before deciding which ones to archive.",
"solution": "get_spaces(team_id=\"team123\", archived=False)"
},
{
"prompt": "Ella, the project coordinator, is setting up a new project space in ClickUp for the \"Creative Minds\" team with team ID \"cm789\". This space, named \"Innovative Campaigns 2023\", should allow multiple assignees for tasks, but keep due dates and time tracking disabled, as the initial planning phase doesn't require strict deadlines or time monitoring.",
"solution": "create_space(team_id=\"cm789\", name=\"Innovative Campaigns 2023\", multiple_assignees=True, features=(due_dates=(enabled=False, start_date=False, remap_due_dates=False, remap_closed_due_date=False), time_tracking=(enabled=False)))"
},
...
]
baseline (No system prompt)
独立评估GPT-4的性能,而不受任何系统提示的影响。
fcalling_llm = lambda fprompt : client.chat.completions.create(
model="gpt-4-turbo-preview",
messages=[
{
"role": "system",
"content": """"""
},
{
"role": "user",
"content": prompt
},
],
temperature=0,
max_tokens=4096,
top_p=1,
tools=tools,
tool_choice="auto"
)
response = fcalling_llm(bench_data[1]["prompt"])
我们设置了temperature=0,以使结果不随机,实验了3次平均准确率为30%。
嵌套结构参数拉化 (Flattening the Parameters)
有些函数要求输出参数以嵌套结构的形式提供。以下是一个示例-
{
"name": "create_space",
"description": "Add a new Space to a Workspace.",
"parameters": {
"type": "object",
"properties": {
"team_id": {
"type": "string",
"description": "The ID of the team"
},
"name": {
"type": "string",
"description": "The name of the new space"
},
"multiple_assignees": {
"type": "boolean",
"description": "Enable or disable multiple assignees for tasks within the space"
},
"features": {
"type": "object",
"description": "Enabled features within the space",
"properties": {
"due_dates": {
"type": "object",
"description": "Due dates feature settings",
"properties": {
"enabled": { "type": "boolean" },
"start_date": { "type": "boolean" },
"remap_due_dates": { "type": "boolean" },
"remap_closed_due_date": { "type": "boolean" }
}
},
"time_tracking": {
"type": "object",
"description": "Time tracking feature settings",
"properties": {
"enabled": { "type": "boolean" }
}
}
}
}
},
"required": ["team_id", "name", "multiple_assignees", "features"]
}
}
根据对LLM的经验,虽然模型(GPT-4)已经针对结构化输出进行了优化,但复杂的输出结构实际上可能会降低LLM输出的性能和准确性。
因此,我们以前缀方式展平这些参数。
展平后的上述函数将如下所示:
{
"description": "Add a new Space to a Workspace.",
"name": "create_space",
"parameters": {
"properties": {
"features__due_dates__enabled": {
"description": "enabled__Due dates feature settings__Enabled features within the space__",
"type": "boolean"
},
"features__due_dates__remap_closed_due_date": {
"description": "remap_closed_due_date__Due dates feature settings__Enabled features within the space__",
"type": "boolean"
},
"features__due_dates__remap_due_dates": {
"description": "remap_due_dates__Due dates feature settings__Enabled features within the space__",
"type": "boolean"
},
"features__due_dates__start_date": {
"description": "start_date__Due dates feature settings__Enabled features within the space__",
"type": "boolean"
},
"features__time_tracking__enabled": {
"description": "enabled__Time tracking feature settings__Enabled features within the space__",
"type": "boolean"
},
"multiple_assignees": {
"description": "Enable or disable multiple assignees for tasks within the space__",
"type": "boolean"
},
"name": {
"description": "The name of the new space__",
"type": "string"
},
"team_id": {
"description": "The ID of the team__",
"type": "string"
}
},
"required": [
"team_id",
"name",
"multiple_assignees",
"features__due_dates__enabled",
"features__due_dates__start_date",
"features__due_dates__remap_due_dates",
"features__due_dates__remap_closed_due_date",
"features__time_tracking__enabled"
],
"type": "object"
}
}
我们将参数名称与其上级参数连接起来。例如:features__due_dates__enabled、features__due_dates__remap_due_dates,3次测试结果如下
添加系统提示词 (Adding System Prompt)
之前调用没有使用系统提示词,所以LLM没有被指示其角色或与ClickUp API交互的方式。现在让我们添加一个简单的系统提示。
from openai import OpenAI
client = OpenAI()
fcalling_llm = lambda fprompt : client.chat.completions.create(
model="gpt-4-turbo-preview",
messages=[
{
"role": "system",
"content": """
You are an agent who is responsible for managing various employee management platform,
one of which is ClickUp.
When you are presented with a technical situation, that a person of a team is facing,
you must give the soulution utilizing your functionalities.
"""
},
{
"role": "user",
"content": fprompt
},
],
temperature=0,
max_tokens=4096,
top_p=1,
tools=tools,
tool_choice="auto"
)
response = fcalling_llm(bench_data[1]["prompt"])
优化系统提示词 (Improving System Prompt)
可以看到通过添加一个系统提示,可以提高性能,增强其细节,以评估性能的提升是否持续。
You are an agent who is responsible for managing various employee management platform,
one of which is ClickUp.
You are given a number of tools as functions, you must use one of those tools and fillup
all the parameters of those tools ,whose answers you will get from the given situation.
When you are presented with a technical situation, that a person of a team is facing,
you must give the soulution utilizing your functionalities.
First analyze the given situation to fully anderstand what is the intention of the user,
what they need and exactly which tool will fill up that necessity.
Then look into the parameters and extract all the relevant informations to fillup the
parameter with right values.
在系统提示词中添加工具描述摘要 (Adding Schema Summary in Schema Prompt)
在系统提示词的,角色之外,添加工具的功能,已进一步增强系统提示
以下是我们添加到提示词中的工具的简明摘要。
get_spaces - View the Spaces available in a Workspace.
create_space - Add a new Space to a Workspace.
get_space - View the details of a specific Space in a Workspace.
update_space - Rename, set the Space color, and enable ClickApps for a Space.
delete_space - Delete a Space from your Workspace.
get_space_tags - View the task Tags available in a Space.
create_space_tag - Add a new task Tag to a Space.
delete_space_tag - Delete a task Tag from a Space.
优化函数名 (Optimising Function Names)
现在,让函数名更具描述性
schema_func_name_dict = {
"get_spaces": "get_all_clickup_spaces_available",
"create_space": "create_a_new_clickup_space",
"get_space": "get_a_specific_clickup_space_details",
"update_space": "modify_an_existing_clickup_space",
"delete_space": "delete_an_existing_clickup_space",
"get_space_tags": "get_all_tags_of_a_clickup_space",
"create_space_tag": "assign_a_tag_to_a_clickup_space",
"delete_space_tag": "remove_a_tag_from_a_clickup_space",
}
optimized_schema = []
for sc in flattened_schema:
temp_dict = sc.copy()
temp_dict["name"] = schema_func_name_dict[temp_dict["name"]]
optimized_schema.append(temp_dict)
优化函数描述 (Optimising Function Description)
schema_func_decription_dict = {
"get_spaces": "Retrives information of all the spaces available in user's Clickup Workspace.",
"create_space": "Creates a new ClickUp space",
"get_space": "Retrives information of a specific Clickup space",
"update_space": "Modifies name, settings the Space color, and assignee management Space.",
"delete_space": "Delete an existing space from user's ClickUp Workspace",
"get_space_tags": "Retrives all the Tags assigned on all the tasks in a Space.",
"create_space_tag": "Assigns a customized Tag in a ClickUp Space.",
"delete_space_tag": "Deletes a specific tag previously assigned in a space.",
}
optimized_schema = []
for sc in flattened_schema:
temp_dict = sc.copy()
temp_dict["description"] = schema_func_decription_dict[temp_dict["name"]]
optimized_schema.append(temp_dict)
优化参数描述 (Optimising Function Parameter Descriptions)
之前,我们通过将嵌套参数的描述与其父级描述堆叠在一起,直到它们处于扁平化状态,来扁平化模式。
现在让我们将它们替换为:
schema_func_params_dict = {
'create_space': {
'features__due_dates__enabled': 'If due date feature is enabled within the space. Default: True',
'features__due_dates__remap_closed_due_date': 'If remapping closed date feature in due dates is available within the space. Default: False',
'features__due_dates__remap_due_dates': 'If remapping due date feature in due dates is available within the space. Default: False',
'features__due_dates__start_date': 'If start date feature in due dates is available within the space. Default: False',
'features__time_tracking__enabled': 'If time tracking feature is available within the space. Default: True',
'multiple_assignees': 'Enable or disable multiple assignees for tasks within the space. Default: True',
'name': 'The name of the new space to create',
'team_id': 'The ID of the team'
},
'create_space_tag': {
'space_id': 'The ID of the space',
'tag__name': 'The name of the tag to assign',
'tag__tag_bg': 'The background color of the tag to assign',
'tag__tag_fg': 'The foreground(text) color of the tag to assign'
},
'delete_space': {
'space_id': 'The ID of the space to delete'
},
'delete_space_tag': {
'space_id': 'The ID of the space',
'tag__name': 'The name of the tag to delete',
'tag__tag_bg': 'The background color of the tag to delete',
'tag__tag_fg': 'The foreground color of the tag to delete',
'tag_name': 'The name of the tag to delete'
},
'get_space': {
'space_id': 'The ID of the space to retrieve details'
},
'get_space_tags': {
'space_id': 'The ID of the space to retrieve all the tags from'
},
'get_spaces': {
'archived': 'A flag to decide whether to include archived spaces or not. Default: True',
'team_id': 'The ID of the team'
},
'update_space': {
'admin_can_manage': 'A flag to determine if the administrator can manage the space or not. Default: True',
'color': 'The color used for the space',
'features__due_dates__enabled': 'If due date feature is enabled within the space. Default: True',
'features__due_dates__remap_closed_due_date': 'If remapping closed date feature in due dates is available within the space. Default: False',
'features__due_dates__remap_due_dates': 'If remapping due date feature in due dates is available within the space. Default: False',
'features__due_dates__start_date': 'If start date feature in due dates is available within the space. Default: False',
'features__time_tracking__enabled': 'If time tracking feature is available within the space. Default: True',
'multiple_assignees': 'Enable or disable multiple assignees for tasks within the space. Default: True',
'name': 'The new name of the space',
'private': 'A flag to determine if the space is private or not. Default: False',
'space_id': 'The ID of the space'
}
}
optimized_schema = []
for sc in flattened_schema:
temp_dict = sc.copy()
temp_dict["description"] = schema_func_decription_dict[temp_dict["name"]]
for func_param_name, func_param_description in schema_func_params_dict[temp_dict["name"]].items():
sc["parameters"]["properties"][func_param_name]["description"] = func_param_description
optimized_schema.append(temp_dict)
对于所有的运行,我们得分都达到或超过了75%。
函数描述添加示例 (Adding Examples of Function Calls)
LLMs 一般在few-shot下表现更好。所以在每个工具的描述中提供一些调用示例
schema_func_decription_dict = {
"get_spaces": """\
Retrives information of all the spaces available in user's Clickup Workspace. Example Call:
\```python
get_spaces({'team_id': 'a1b2c3d4', 'archived': False})
\```
""",
"create_space": """\
Creates a new ClickUp space. Example Call:
\```python
create_space ({
'team_id': 'abc123',
'name': 'NewWorkspace',
'multiple_assignees': True,
'features__due_dates__enabled': True,
'features__due_dates__start_date': False,
'features__due_dates__remap_due_dates': False,
'features__due_dates__remap_closed_due_date': False,
'features__time_tracking__enabled': True
})
\```}
很遗憾,分数似乎在下降!
参数值添加示例(Adding Example Parameter Values)
由于函数描述中调用示例没有起作用,现在让我们尝试向函数参数添加示例值,以提供更清晰的输入值的概念。我们将相应调整函数参数的描述。
schema_func_params_dict = {
'create_space': {
'features__due_dates__enabled': 'If due date feature is enabled within the space. \nExample: True, False \nDefault: True',
'features__due_dates__remap_closed_due_date': 'If remapping closed date feature in due dates is available within the space. \nExample: True, False \nDefault: False',
'features__due_dates__remap_due_dates': 'If remapping due date feature in due dates is available within the space. \nExample: True, False \nDefault: False',
'features__due_dates__start_date': 'If start date feature in due dates is available within the space. \nExample: True, False \nDefault: False',
'features__time_tracking__enabled': 'If time tracking feature is available within the space. \nExample: True, False \nDefault: True',
'multiple_assignees': 'Enable or disable multiple assignees for tasks within the space \nExample: True, False. Default: True',
'name': 'The name of the new space to create \nExample: \'NewWorkspace\', \'TempWorkspace\'',
'team_id': 'The ID of the team \nExample: \'abc123\', \'def456\' '
},
'create_space_tag': {
'space_id': 'The ID of the space \nExample: \'abc123\', \'def456\'',
'tag__name': 'The name of the tag to assign \nExample: \'NewTag\', \'TempTag\'',
'tag__tag_bg': 'The background color of the tag to assign \nExample: \'#FF0000\', \'#00FF00\'',
'tag__tag_fg': 'The foreground(text) color of the tag to assign \nExample: \'#FF0000\', \'#00FF00\''
},
'delete_space': {
'space_id': 'The ID of the space to delete \nExample: \'abc123\', \'def456\''
},
'delete_space_tag': {
'space_id': 'The ID of the space to delete \nExample: \'abc123\', \'def456\'',
'tag__name': 'The name of the tag to delete \nExample: \'NewTag\', \'TempTag\'',
'tag__tag_bg': 'The background color of the tag to delete \nExample: \'#FF0000\', \'#00FF00\', \'#0000FF\'',
'tag__tag_fg': 'The foreground color of the tag to delete \nExample: \'#FF0000\', \'#00FF00\', \'#0000FF\'',
'tag_name': 'The name of the tag to delete \nExample: \'NewTag\', \'TempTag\''
},
'get_space': {
'space_id': 'The ID of the space to retrieve details \nExample: \'abc123\', \'def456\''
},
'get_space_tags': {
'space_id': 'The ID of the space to retrieve all the tags from \nExample: \'abc123\', \'def456\''
},
'get_spaces': {
'archived': 'A flag to decide whether to include archived spaces or not \nExample: True, False. Default: True',
'team_id': 'The ID of the team \nExample: \'abc123\', \'def456\''
},
'update_space': {
'admin_can_manage': 'A flag to determine if the administrator can manage the space or not \nExample: True, False. Default: True',
'color': 'The color used for the space \nExample: \'#FF0000\', \'#00FF00\'',
'features__due_dates__enabled': 'If due date feature is enabled within the space. \nExample: True, False \nDefault: True',
'features__due_dates__remap_closed_due_date': 'If remapping closed date feature in due dates is available within the space. Default: False',
'features__due_dates__remap_due_dates': 'If remapping due date feature in due dates is available within the space. Default: False',
'features__due_dates__start_date': 'If start date feature in due dates is available within the space. Default: False',
'features__time_tracking__enabled': 'If time tracking feature is available within the space. \nExample: True, False \nDefault: True',
'multiple_assignees': 'Enable or disable multiple assignees for tasks within the space \nExample: True, False. Default: True',
'name': 'The new name of the space \nExample: \'NewWorkspace\', \'TempWorkspace\'',
'private': 'A flag to determine if the space is private or not \nExample: True, False. Default: False',
'space_id': 'The ID of the space to update \nExample: \'abc123\', \'def456\''
}
}
总结:
更多AI工具,参考Github-AiBard123,国内AiBard123