【GUI-Agent】阶跃星辰 GUI-MCP 解读---(6)---HITL(Human In The Loop)

OP0D4LR3u

328人浏览 · 2026-03-29 15:07:57

OP0D4LR3u · 2026-03-29 15:07:57 发布

【GUI-Agent】阶跃星辰 GUI-MCP 解读---(6)---HITL(Human In The Loop)

0x00 摘要

25年底，阶跃星辰升级发布了全新的AI Agent系列模型Step-GUI，包括云端模型Step-GUI、首个面向GUI Agent的MCP协议：GUI-MCP（Graphical User Interface - Model Context Protocol），这是首个专为图形用户界面自动化而设计的 MCP 实现，兼顾标准化与隐私保护。

因此，我们就来解读这个MCP协议，顺便看看端侧Agent的实现架构。本文是第六篇，主要是介绍Step-GUI的HITL，以及其他特殊之处。

因为是反推解读，而且时间有限，所以可能会有各种错误，还请大家不吝指出。

0x01 HITL

1.1 HITL的意义

Human-in-the-loop（简称HITL）是一种重新划分人类认知与机器能力边界、放大双方优势的系统设计理念。它的存在价值，可从三个核心维度展开：

突破技术天花板。再强大的模型，认知边界也局限于训练数据覆盖的范围——在这个范围内，它能展现出稳定的“自信”；可一旦遭遇罕见场景、长尾问题或是对抗性样本，其判断的可靠性便会急剧下降。而HITL的设计巧思正在于此：当机器的置信度低于预设阈值时，会自动将决策权移交人类。这就像给系统装上了一张“安全网”，稳稳接住机器力所不及的漏洞。
守住伦理与合规底线。算法的决策责任，永远无法转嫁到冰冷的硅片上。HITL特意保留了“人类确认”的关键环节，让整个算法决策链条中，始终存在一个可追溯、可追责的“自然人”主体，这是技术落地必须守住的伦理根基。
优化经济成本结构。HITL不是“全程人工介入”，而是一种“稀疏化参与”——用极少的人类工时投入，换取系统安全性的大幅提升，其ROI远高于全人工操作或纯机器自主决策的方案。说到底，HITL就是用“人类注意力”撬动“系统鲁棒性”的最优杠杆。

因此，若要让人类有效掌控任务走向，落实HITL理念的核心在于两点：

优化交互设计。通过合理的交互逻辑，让人类能够顺畅地参与到任务补充与推进的过程中，实现“需要时介入，介入时高效”。
保障对话连续性。在人类介入任务的过程中，不得中断当前对话链路，应采用“挂起等待”的模式，确保人机协同的连贯性与信息完整性。

1.2 Step-GUI HITL

Step-GUI中，HITL 信息获取能力如下：

上下文感知：
- INFO 操作能够根据当前上下文提出具体问题；
- auto_reply 函数利用当前截图和任务信息生成澄清问题；
信息类型多样化：
- 可以获取文本输入、确认信息、选择项等不同类型信息
- 可以通过 value 字段传递具体问题内容
- 支持多轮对话以获取复杂信息

0x02 MCP流程

2.1 流程图

流程图如下，该流程图展示了任务从启动到结束的完整流程：

任务启动后先判断会话 ID，新建 / 续领会话并初始化设备；
捕获截图→Agent 处理→生成动作，根据动作类型分 3 类处理（设备执行 / 人工介入 / 任务结束）；
人工介入支持自动回复、手动回复、转交客户端等模式，最终回到 Agent 循环；
任务结束 / 达到最大步数时，记录包含会话、设备、任务状态的日志。

hitl-1

2.2 时序图

时序图如下。该时序图清晰展示了「客户端 - MCP 服务器 - Agent 服务器 - 设备 - 人类」的协同交互流程，核心逻辑分为三阶段：

阶段 1：任务初始化

客户端向 MCP 服务器发起「启动新任务」请求（携带任务信息、设备 ID）；
MCP 服务器向 Agent 服务器申请会话 ID，Agent 服务器返回会话标识；
MCP 服务器指令设备重置环境（按 Home 键），并捕获设备初始截图。

阶段 2：Agent 动作循环（核心）

进入循环执行逻辑，直到任务结束 / 达到步数上限：

MCP 服务器将「会话 ID + 截图」传给 Agent 服务器，调用 automate_step 接口获取 Agent 动作；
分支处理：
- ✅ 若动作是INFO（需要人类介入）：
  - MCP 服务器通知客户端「任务暂停，需人工回复」，并返回会话 ID；
  - 人类向客户端提供回复内容；
  - 客户端携带「会话 ID + 人工回复」请求 MCP 服务器续跑任务；
  - MCP 服务器将「截图 + 人工回复」传给 Agent 服务器，继续执行动作；
- ❌ 若动作非INFO（如点击、输入等）：
  - MCP 服务器指令设备执行对应动作；
  - 等待指定延迟后，捕获设备新截图，进入下一轮循环。

阶段 3：任务终止

✅ 任务完成：Agent 服务器返回 COMPLETE 动作，MCP 服务器向客户端返回「任务成功完成」结果；
❌ 达到最大步数：MCP 服务器直接向客户端返回「步数超限」结果。

hitl-2

2.3 MCP 工具区别

ask_agent_continue 和 ask_agent_start_new_task 的业务逻辑区别如下：

特性	ask_agent_start_new_task	ask_agent_continue
环境重置	reset_environment = True	reset_environment = False
会话状态	创建新会话	继续现有会话
目标	开始新任务	继续已有任务
设备状态	重置到初始状态	保持当前状态

2.3.1 详细业务逻辑对比

ask_agent_start_new_task

@mcp.tool
def ask_agent_start_new_task(
    # ...参数
):
    # 启动新任务，重置环境
    # 重置设备到初始状态（按 HOME 键）
    # 创建全新的会话
    # 适用于独立的、全新的任务
    reset_environment = True  # 重置环境
    return_log = execute_task(
        device_id=device_id,
        task=task,
        reset_environment=reset_environment,  # 重置环境
    )
    return return_log

ask_agent_continue

@mcp.tool
def ask_agent_continue(
    # ... 参数
):
    # 继续任务，不重置环境
    # 保持设备当前状态
    # 基于之前的上下文继续执行
    # 适用于需要连续性的任务
    reset_environment = False  # 不重置环境
    return_log = execute_task(
        device_id=device_id,
        task=task,
        reset_environment=reset_environment,  # 不重置环境
    )
    return return_log

2.3.2 使用场景对比

ask_agent_start_new_task 使用场景

graph TD
A[用户发起新任务] --> B[是否与之前任务相关？]
B --> |无关/新任务| C[使用 ask_agent_start_new_task]
C --> D[重置环境到初始状态]
D --> E[启动全新任务]

适用场景：

完全独立的新任务
不同 App 的任务（如：从淘宝切换到微信）
需要干净环境的任务
错误后的重新开始

ask_agent_continue 使用场景

graph TD
A[用户继续任务] --> B{是否与之前任务相关?}
B -->|相关/继续| C[使用 ask_agent_continue]
C --> D[保持当前环境状态]
D --> E[基于上下文继续任务]

适用场景：

同一个任务的继续执行
需要保持当前应用状态
Human-in-the-Loop 后的继续
多步骤任务的后续步骤

2.3.3 业务逻辑实现差异

在 execute_task 中的处理

def execute_task(
# ...
 reset_environment: bool,  # 关键参数
# ...
):
 if reset_environment and session_id is None and task is not None:
     press_home_key(device_id, print_command=True)  # 重置设备
        
# session_id 为 None 时创建新会话

# session_id 存在时继续现有会话

在 gui_agent_loop 中的体现

def gui_agent_loop(
# ...
 reset_environment: bool = True,
 session_id: str = None,
# ...
):
 if reset_environment and session_id is None and task is not None:
     press_home_key(device_id, print_command=True)  # 重置环境
 if session_id is None:
    # 创建新会话
    session_id = agent_server.get_session({...})
 else:
    # 继续现有会话
    print(f"Continue Session ID: {session_id}")

Human-in-the-Loop 场景应用

任务中断后继续

# 第一步：开始任务，遇到INFO action
result = ask_agent_start_new_task(
    device_id=device_id,
    task="去淘宝帮我选一个生日礼物",
    # ...
)
# 返回：stop_reason="INFO_ACTION_NEEDS_REPLY", session_id="xxx"

# 第二步：用户提供回复后继续

result = ask_agent_continue(
    device_id=device_id,
    task=None,  # 不需要重新指定任务
    session_id="xxx",  # 使用之前的会话ID
    reply_from_client="铜苹果",  # 用户的回复
    # ...
)

多任务切换

# 开始任务A
result_a = ask_agent_start_new_task(
    device_id=device_id,
    task="打开微信并发送消息",
    # ...
)
# 完成后开始不相关的任务B
result_b = ask_agent_start_new_task(  # 使用 start_new_task 重置环境
    device_id=device_id,
    task="打开高德地图导航到公司",
    # ...
)

2.3.4 核心区别总结

环境状态：

ask_agent_start_new_task：重置设备环境到初始状态
ask_agent_continue：保持设备当前环境状态

会话管理：

ask_agent_start_new_task：创建新会话
ask_agent_continue：继续现有会话

使用时机：

ask_agent_start_new_task：新任务、独立任务、需要干净环境
ask_agent_continue：任务继续、保持上下文、Human-in-the-Loop

上下文连续性：

ask_agent_start_new_task：无上下文连续性
ask_agent_continue：保持任务上下文和应用状态

这种设计使得系统既能处理独立的离散任务，又能处理需要连续性的复杂任务，提高了任务执行的灵活性和效率。

2.4 代码

ask_agent_start_new_task 代码如下：

@mcp.tool
def ask_agent_start_new_task(

    device_id: Annotated[str, Field(description="ID of the device to perform the task on. listed by list_connected_devices tool.")],

    task: Annotated[str | None, Field(description="The task that the agent needs to perform on the mobile device. if this is not None, the agent will try to perform this task. if None, the session_id must be provided to continue the previous session.")],
    
    # reset_environment: Annotated[bool, Field(description="Whether to reset the environment before executing the task, close current app, and back to home screen. If you want to execute a independent task, set this to True will make it easy to execute. If you want to continue the previous session, set this to False.")] = False,

    max_steps: Annotated[int, Field(description="Maximum number of steps the agent can take to complete the task.")] = 20,

    # session_id: Annotated[str | None, Field(description="Optional, session ID must provide when the last task endwith INFO action and you want to reply, the session id and device id and the reply from client must be provided.")] = None,

    # When the INFO action is called, how to handle it.
    # 1. "auto_reply": the INFO action will be handled automatically by calling the caption model to generate image captions.
    # 2. "no_reply": the INFO action will be ignored. THE AGENT MAY GET STUCK IF THE INFO ACTION IS IGNORED.
    # 3. "manual_reply": the INFO action will cause an interruption, and the user needs to provide the reply manually by input things in server's console.
    # 4. "pass_to_client": the INFO action will be returned to the MCP client to handle it. 
#     reply_mode: Annotated[str, Field(description='''
#         How to handle the INFO action during task execution.
        
#         Options:
#             - "auto_reply": Automatically generate image captions for INFO actions.
#             - "no_reply": Ignore INFO actions (may cause the agent to get stuck).
#             - "manual_reply": Interrupt and require user input for INFO actions.
#             - "pass_to_client": Pass INFO actions to the MCP client for handling.
# ''')] = "auto_reply",

    # reply_from_client: Annotated[str | None, Field(description="If the last task is ended with INFO action, and you want to give GUI agent a reply, provide the reply here. If you do so, you must provide last session id and last device id.")] = None,
) -> dict:

    """
# Ask GUI Agent to start performing a new task on a connected device.

Ask the GUI agent to perform the specified task on a connected device.
The GUI Agent can be able to understand natural language instructions and interact with the device accordingly.
The agent will be able to execute a high-level task description，if you have any additional requirements, write them down in detail at tast string.
This function will reset the environment before executing the task, close current app, and back to home screen.

if you have 

## The agent has the below limited capabilities:

1. The task must be related to an app that is already installed on the device. for example, "打开微信，帮我发一条消息给张三，说今天下午三点开会"; "帮我在淘宝上搜索一款性价比高的手机，并加入购物车"; "to purchase an ea on Amazon".

2. The task must be simple and specific. for example, "do yyy in xxx app"; "find xxx information in xxx app". ONE THING AT ONE APP AT A TIME.

3. The agent may not be able to handle complex tasks that require multi-step reasoning or planning. for example. You may need to break down complex tasks into simpler sub-tasks and ask the agent to perform them sequentially. For example, instead of asking the agent to "plan a trip to Paris for xxx", you can ask it to "search for flights to Paris on xxx app", "find hotels in Paris on xxx app", make the plan yourself and ask agent to "sent the plan to xxx via IM app like wechat".

4. The agent connot accept multimodal inputs now. if you want to provide additional information like screenshot captions, please include them in the task description.

## Usage guidance：

1. you should never directly ask an Agent to pay or order anything. If user want to make a purchase, you should ask agent to stop brfore ordering/paying, and let user to order/pay.

2. tell the agent, if human verification is appeared during the task execution, the agent should ask Client. when the you see the INFO, you should ask user to handle the verification manually. after user says "done", you can continue the task with the session_id and device_id and ask the agent to continue in reply_from_client.

3. IF the last agentic call is failed or you want to perform a new task in different app, you should always use this function to start a new task, so that the environment will be reset before executing the task.

Returns:
    dict: Execution log containing details of the task execution.
    with keys including
        - device_info: Information about the device used for task execution.
        - final_action: The final action taken by the agent to complete the task.
        - global_step_idx: The total number of steps taken during the task execution.
        - local_step_idx: The number of steps taken in the current session.
        - session_id: The session ID for maintaining context across multiple tasks.
        - stop_reason: The reason for stopping the task execution (e.g., TASK_COMPLETED_SUCCESSFULLY).
        - task: The original task description provided to the agent.
    """

    reply_mode = "pass_to_client"

    # if task is not None:
    #     assert session_id is None, "If task is provided, session_id must be None."
    #     # New task, so reset_environment is True
    #     reset_environment = True
    # else:
    #     assert session_id is not None, "If task is None, session_id must be provided to continue the previous session."
    #     # Continuing previous session, so reset_environment is False
    #     reset_environment = False

    reset_environment = True
    

    return_log = execute_task(
        device_id=device_id,

        task=task,

        reset_environment=reset_environment,
        max_steps=max_steps,

        # enable_intermediate_logs=False,
        # enable_intermediate_image_caption=False,
# 
        enable_intermediate_logs=True,
        # enable_intermediate_image_caption=False,
        enable_intermediate_image_caption=True,

        enable_intermediate_screenshots=False,

        enable_final_screenshot=False,
        # enable_final_image_caption=False,
        enable_final_image_caption=True,

        reply_mode=reply_mode,

        session_id=None,
        # session_id=session_id,
        reply_from_client=None,
        # reply_from_client=reply_from_client,


    )

    return return_log

ask_agent_continue 代码如下：

@mcp.tool
def ask_agent_continue(
    device_id: Annotated[str, Field(description="ID of the device to perform the task on. listed by list_connected_devices tool.")],

    task: Annotated[str | None, Field(description="The task that the agent needs to perform on the mobile device. if this is not None, the agent will try to perform this task. if None, the session_id must be provided to continue the previous session.")],
    
    # reset_environment: Annotated[bool, Field(description="Whether to reset the environment before executing the task, close current app, and back to home screen. If you want to execute a independent task, set this to True will make it easy to execute. If you want to continue the previous session, set this to False.")] = False,

    max_steps: Annotated[int, Field(description="Maximum number of steps the agent can take to complete the task.")] = 20,

    # session_id: Annotated[str | None, Field(description="Optional, session ID must provide when the last task endwith INFO action and you want to reply, the session id and device id and the reply from client must be provided.")] = None,

    # When the INFO action is called, how to handle it.
    # 1. "auto_reply": the INFO action will be handled automatically by calling the caption model to generate image captions.
    # 2. "no_reply": the INFO action will be ignored. THE AGENT MAY GET STUCK IF THE INFO ACTION IS IGNORED.
    # 3. "manual_reply": the INFO action will cause an interruption, and the user needs to provide the reply manually by input things in server's console.
    # 4. "pass_to_client": the INFO action will be returned to the MCP client to handle it. 
#     reply_mode: Annotated[str, Field(description='''
#         How to handle the INFO action during task execution.
        
#         Options:
#             - "auto_reply": Automatically generate image captions for INFO actions.
#             - "no_reply": Ignore INFO actions (may cause the agent to get stuck).
#             - "manual_reply": Interrupt and require user input for INFO actions.
#             - "pass_to_client": Pass INFO actions to the MCP client for handling.
# ''')] = "auto_reply",

    # reply_from_client: Annotated[str | None, Field(description="If the last task is ended with INFO action, and you want to give GUI agent a reply, provide the reply here. If you do so, you must provide last session id and last device id.")] = None,
) -> dict:

    """
# Ask GUI Agent to continue performing a task on a connected device, using previous context.

Ask the GUI agent to perform the specified task on a connected device.
The GUI Agent can be able to understand natural language instructions and interact with the device accordingly.
The agent will be able to execute a high-level task description，if you have any additional requirements, write them down in detail at tast string.
This function will **NOT** reset the environment before executing the task, so that the agent can continue the previous session.

if you have 

## The agent has the below limited capabilities:

1. The task must be related to an app that is already installed on the device. for example, "打开微信，帮我发一条消息给张三，说今天下午三点开会"; "帮我在淘宝上搜索一款性价比高的手机，并加入购物车"; "to purchase an ea on Amazon".

2. The task must be simple and specific. for example, "do yyy in xxx app"; "find xxx information in xxx app". ONE THING AT ONE APP AT A TIME.

3. The agent may not be able to handle complex tasks that require multi-step reasoning or planning. for example. You may need to break down complex tasks into simpler sub-tasks and ask the agent to perform them sequentially. For example, instead of asking the agent to "plan a trip to Paris for xxx", you can ask it to "search for flights to Paris on xxx app", "find hotels in Paris on xxx app", make the plan yourself and ask agent to "sent the plan to xxx via IM app like wechat".

4. The agent connot accept multimodal inputs now. if you want to provide additional information like screenshot captions, please include them in the task description.

## Usage guidance：

1. you should never directly ask an Agent to pay or order anything. If user want to make a purchase, you should ask agent to stop brfore ordering/paying, and let user to order/pay.

2. tell the agent, if human verification is appeared during the task execution, the agent should ask Client. when the you see the INFO, you should ask user to handle the verification manually. after user says "done", you can continue the task with the session_id and device_id and ask the agent to continue in reply_from_client.

3. IF the last agentic call is successful or the last action is INFO or the new task is related to the previous task, you can use this function to continue the task, so that the agent can finish the task faster by leveraging the previous context.
    dict: Execution log containing details of the task execution.
    with keys including
        - device_info: Information about the device used for task execution.
        - final_action: The final action taken by the agent to complete the task.
        - global_step_idx: The total number of steps taken during the task execution.
        - local_step_idx: The number of steps taken in the current session.
        - session_id: The session ID for maintaining context across multiple tasks.
        - stop_reason: The reason for stopping the task execution (e.g., TASK_COMPLETED_SUCCESSFULLY).
        - task: The original task description provided to the agent.
    """

    reply_mode = "pass_to_client"

    # if task is not None:
    #     assert session_id is None, "If task is provided, session_id must be None."
    #     # New task, so reset_environment is True
    #     reset_environment = True
    # else:
    #     assert session_id is not None, "If task is None, session_id must be provided to continue the previous session."
    #     # Continuing previous session, so reset_environment is False
    #     reset_environment = False

    reset_environment = False    

    return_log = execute_task(
        device_id=device_id,

        task=task,

        reset_environment=reset_environment,
        max_steps=max_steps,

        # enable_intermediate_logs=False,
        # enable_intermediate_image_caption=False,
# 
        enable_intermediate_logs=True,
        enable_intermediate_image_caption=True,

        enable_intermediate_screenshots=False,

        enable_final_screenshot=False,
        # enable_final_image_caption=False,
        enable_final_image_caption=True,

        reply_mode=reply_mode,

        session_id=None,
        # session_id=session_id,
        reply_from_client=None,
        # reply_from_client=reply_from_client,
    )

    return return_log

0x03 INFO 操作

3.1 INFO 操作的核心特性

INFO交互模式特殊性如下：

用户输入请求：INFO 操作是唯一需要用户主动输入的交互模式，与 CLICK、TYPE、AWAKE 等自动执行操作不同，INFO 需要中断自动化流程以获取用户反馈。
任务暂停机制：当执行 INFO 操作时，自动化流程暂停，系统会等待用户提供必要信息后继续执行，防止因缺少关键信息导致的错误操作

3.2 处理策略

INFO 操作有多种处理策略，具体在 reply_mode 中设置：

auto_reply：自动调用模型生成回复
no_reply：忽略 INFO 操作，可能导致代理卡住
manual_reply：手动输入回复
pass_to_client：将 INFO 操作传递给 MCP 客户端处理

何处设置 reply_mode？具体如下：

在 execute_task 函数中定义处理模式
gui_agent_loop 函数根据 reply_mode 执行相应逻辑
支持动态调整 INFO 操作处理方式

自动回复机制的细节如下：

auto_reply 函数结合当前任务、截图和 INFO 操作内容
使用 LLM 生成合适的回复内容
减少对用户手动输入的依赖

人工回复处理的细节如下：

manual_reply 模式下，程序暂停并等待用户输入
提供中英文提示信息来帮助用户理解需要回复的内容
验证用户输入的有效性

3.3 流程控制机制

INFO 的流程控制机制如下：

会话中断与恢复：
- INFO 操作触发时，stop_reason 设置为 INFO_ACTION_NEEDS_REPLY
- 保存当前会话状态，包括 session_id
- 支持后续使用相同 session_id 继续执行
回复传递机制：
- 用户回复通过 reply_from_client 参数传递
- 在 payload 中作为 query 字段传递给代理
- 代理将用户回复作为下一步操作的输入

3.4 INFO 操作的实现细节

INFO 操作的信息传递流程如下：

从代理到用户：
- 代理生成 INFO 操作并包含 value（问题内容）
- action['value'] 被显示给用户
- 用户输入回复内容
从用户到代理：
- 用户输入通过 reply_from_client 参数传递
- reply_info 变量存储用户回复
- 作为 query 字段传递给下一次 automate_step 调用

3.5 INFO 操作的应用场景

INFO 操作的应用场景可能如下：

人机协作场景

验证码处理：
- 当遇到图形验证码或短信验证码时触发 INFO 操作
- 代理请求用户提供验证码
- 用户输入验证码后代理继续执行
敏感操作确认：
- 在执行支付、删除等敏感操作前，代理可能通过 INFO 操作请求用户确认
- 避免自动化操作导致的意外后果

信息补充场景

个性化信息获取：
- 代理需要获取用户的个人信息如姓名、地址等
- 通过 INFO 操作请求用户提供特定信息
- 完成表单填写等任务
决策支持：
- 当面临多个选项需要用户选择时
- 代理通过 INFO 操作询问用户偏好
- 根据用户选择继续执行相应路径

3.6 代码

INFO的相关代码如下：

def gui_agent_loop( # 省略代码
        ):
    """
    Evaluate a task on a device using the provided frontend action converter and action function.    
    """
	
    # 省略代码
    
        action = uiTars_to_frontend_action(action)

        if action['action_type'].upper() == "INFO":
            if reply_mode == "auto_reply":
                print(f"AUTO REPLY INFO FROM MODEL!")
                reply_info = auto_reply(image_b64_url, task, action, model_provider=agent_loop_config['model_config']['model_provider'], model_name=agent_loop_config['model_config']['model_name'])
                print(f"info: {reply_info}")

            elif reply_mode == "no_reply":
                print(f"INFO action ignored as per reply_mode=no_reply. Agent may get stuck.")
                reply_info = "Please follow the task and continue. Don't ask further questions."
                # do nothing, agent may get stuck
            elif reply_mode == "manual_reply":
                print(f"EN: Agent asks: {action['value']} Please Reply: ")
                print(f"ZH: Agent 问你: {action['value']} 回复一下：")
                reply_info = input("Your reply:")
                print(f"Replied info action: {reply_info}")
            elif reply_mode == "pass_to_client":
                print(f"Passing INFO action to client for reply.")
                # break the loop and return to client for handling
                stop_reason = "INFO_ACTION_NEEDS_REPLY"
                break
            else:
                raise ValueError(f"Unknown reply_mode: {reply_mode}")
    # 省略代码
    
        act_on_device(action, device_id, device_wm_size, print_command=True, reflush_app=reflush_app)

        history_actions.append(action)        
        
    # 省略代码

    if stop_reason in ['MANUAL_STOP_SCREEN_OFF', 'INFO_ACTION_NEEDS_REPLY', "NOT_STARTED"]:
        pass
    elif  action['action_type'].upper() == 'COMPLETE':
        stop_reason = "TASK_COMPLETED_SUCCESSFULLY"
    elif action['action_type'].upper() == 'ABORT':
        stop_reason = "TASK_ABORTED_BY_AGENT"
    elif step_idx == max_steps - 1:
        stop_reason = "MAX_STEPS_REACHED"

    return return_log

0x04 auto_reply 函数

4.1 作用

auto_reply 函数的作用如下：

信息处理功能
- 处理由 GUI Agent 发起的 INFO 操作
- 通过大语言模型自动生成对用户问题的回复
- 模拟用户角色，根据当前任务和页面内容提供简洁直接的答案
输入处理
- 接收当前页面截图的 URL
- 接收任务描述
- 接收 Agent 的询问内容
- 接收模型提供商和模型名称

输出生成
- 生成简洁明确的回复内容
- 避免多余的解释或礼貌用语

AtomGit开源社区

AtomGit 是由开放原子开源基金会联合 CSDN 等生态伙伴共同推出的新一代开源与人工智能协作平台。平台坚持“开放、中立、公益”的理念，把代码托管、模型共享、数据集托管、智能体开发体验和算力服务整合在一起，为开发者提供从开发、训练到部署的一站式体验。

更多推荐

灵境AI-大模型介绍系列：GPT-5.5重塑专业AI体验，解锁高效工作新范式

从被动应答工具到主动思考智能体，GPT-5.5 实现跨越式升级！不止是参数迭代，更是直接颠覆普通人的工作模式。清晰需求+附件赋能+实时联网，一套组合拳下来，轻松搞定各类复杂工作，生产力直接拉满！AI增效时代已然来临！GPT-5.5 用全方位硬核升级，打破传统AI的所有短板，大幅降低专业工作门槛。不管是职场新人、资深从业者还是内容创作者，都能靠它极速提质增效，玩转全新人机协作，开启人人可复刻的开挂工

AtomGit开源社区

CANN ops-transformer：MoE 路由算子的负载均衡策略

AtomGit开源社区

GEFCom2012 负荷预测数据集介绍

GEFCom2012负荷预测数据集简介该数据集来自2012年全球能源预测竞赛，包含美国某电力公司20个区域及系统总负荷的每小时电力数据（单位：kW）。数据集分为训练期（2004-2008年）和预测期（2008年7月1周），包含负荷历史数据、气温数据、节假日信息及基准模型预测结果。主要特点包括：需同时预测21条时间序列包含8个回测周和1个预测周任务采用加权均方根误差(WRMSE)评分，不同任