深度强化学习(7)多智能体强化学习IPPO、MADDPG

深度强化学习(7)多智能体强化学习IPPO、MADDPG多智能体的情形相比于单智能体更加复杂 因为每个智能体在和环境交互的同时也在和其他智能体进行直接或者间接的交互

大家好,欢迎来到IT知识分享网。

多智能体的情形相比于单智能体更加复杂,因为每个智能体在和环境交互的同时也在和其他智能体进行直接或者间接的交互。多智能体强化学习可以分为以下类别

  • 集中式强化学习
    由一个全局学习单元承担学习任务,以整个多智能体系统的整体状态为输入,输出各个智能体的动作。
  • 独立强化学习
    每个智能体都是独立的学习主体,只考虑自身的观测环境和策略利益
  • 社会强化学习
    独立强化学习与社会/经济模型的结合,模拟人类社会中人类个体的交互过程,用社会学和管理学的方法调节智能体之间的关系
  • 群体强化学习
    集中训练-分布执行CTDE范式,融合集中学习和独立学习的优势。在训练阶段,智能体利用全局信息集中学习;在执行阶段,智能体仅使用自身观测状态和局部信息选择动作

7.1 IPPO算法

  • 对于N个智能体,为每个智能体初始化各自的策略以及价值函数
  • for \text{for} for训练轮数 k = 0 , 1 , 2 , ⋯ do k=0,1,2,\cdots\text{do} k=0,1,2,do
    • 所有智能体在环境中交互分别获得各自的的一条轨迹数据
    • 对每个智能体,基于当前的价值函数用GAE计算优质函数的估计
    • 对每个智能体,通过最大化其PPO-截断的目标来更新其策略
    • 对每个智能体,通过均方误差损失函数优化其价值函数

Combat环境

代码实现

导入Combat环境

git clone https://github.com/boyu-ai/ma-gym.git 
import torch import torch.nn.functional as F import numpy as np from tqdm import tqdm import matplotlib.pyplot as plt import sys sys.path.append("../ma-gym") from ma_gym.envs.combat.combat import Combat 

PPO算法

# PPO算法 class PolicyNet(torch.nn.Module): def __init__(self, state_dim, hidden_dim, action_dim): super(PolicyNet, self).__init__() self.fc1 = torch.nn.Linear(state_dim, hidden_dim) self.fc2 = torch.nn.Linear(hidden_dim, hidden_dim) self.fc3 = torch.nn.Linear(hidden_dim, action_dim) def forward(self, x): x = F.relu(self.fc2(F.relu(self.fc1(x)))) return F.softmax(self.fc3(x), dim=1) class ValueNet(torch.nn.Module): def __init__(self, state_dim, hidden_dim): super(ValueNet, self).__init__() self.fc1 = torch.nn.Linear(state_dim, hidden_dim) self.fc2 = torch.nn.Linear(hidden_dim, hidden_dim) self.fc3 = torch.nn.Linear(hidden_dim, 1) def forward(self, x): x = F.relu(self.fc2(F.relu(self.fc1(x)))) return self.fc3(x) def compute_advantage(gamma, lmbda, td_delta): td_delta = td_delta.detach().numpy() advantage_list = [] advantage = 0.0 for delta in td_delta[::-1]: advantage = gamma * lmbda * advantage + delta advantage_list.append(advantage) advantage_list.reverse() return torch.tensor(advantage_list, dtype=torch.float) # PPO,采用截断方式 class PPO: def __init__(self, state_dim, hidden_dim, action_dim, actor_lr, critic_lr, lmbda, eps, gamma, device): self.actor = PolicyNet(state_dim, hidden_dim, action_dim).to(device) self.critic = ValueNet(state_dim, hidden_dim).to(device) self.actor_optimizer = torch.optim.Adam(self.actor.parameters(), actor_lr) self.critic_optimizer = torch.optim.Adam(self.critic.parameters(), critic_lr) self.gamma = gamma self.lmbda = lmbda self.eps = eps # PPO中截断范围的参数 self.device = device def take_action(self, state): state = torch.tensor([state], dtype=torch.float).to(self.device) probs = self.actor(state) action_dict = torch.distributions.Categorical(probs) action = action_dict.sample() return action.item() def update(self, transition_dict): states = torch.tensor(transition_dict['states'], dtype=torch.float).to(self.device) actions = torch.tensor(transition_dict['actions']).view(-1, 1).to(self.device) rewards = torch.tensor(transition_dict['rewards'], dtype=torch.float).view(-1, 1).to(self.device) next_states = torch.tensor(transition_dict['next_states'], dtype=torch.float).to(self.device) dones = torch.tensor(transition_dict['dones'], dtype=torch.float).view(-1, 1).to(self.device) td_target = rewards + self.gamma * self.critic(next_states) * (1 - dones) td_delta = td_target - self.critic(states) advantage = compute_advantage(self.gamma, self.lmbda, td_delta.cpu()).to(self.device) old_log_probs = torch.log(self.actor(states).gather(1, actions)).detach() log_probs = torch.log(self.actor(states).gather(1, actions)) ratio = torch.exp(log_probs - old_log_probs) surr1 = ratio * advantage surr2 = torch.clamp(ratio, 1 - self.eps, 1 + self.eps) * advantage # 截断 action_loss = torch.mean(-torch.min(surr1, surr2)) # PPO损失函数 critic_loss = torch.mean(F.mse_loss(self.critic(states), td_target.detach())) self.actor_optimizer.zero_grad() self.critic_optimizer.zero_grad() action_loss.backward() critic_loss.backward() self.actor_optimizer.step() self.critic_optimizer.step() 

参数和环境设置

actor_lr = 3e-4 critic_lr = 1e-3 epochs = 10 episode_per_epoch = 1000 hidden_dim = 64 gamma = 0.99 lmbda = 0.97 eps = 0.2 team_size = 2 # 每个team里agent的数量 grid_size = (15, 15) # 二维空间的大小 device = torch.device("cuda") if torch.cuda.is_available() else torch.device("cpu") # 创建环境 env = Combat(grid_shape=grid_size, n_agents=team_size, n_opponents=team_size) state_dim = env.observation_space[0].shape[0] action_dim = env.action_space[0].n 

参数共享(parameter sharing)即对于所有智能体使用同一套策略参数,前提是这些智能体是同质(homogeneous)的,即它们的状态空间和动作空间是完全一致的,并且它们的优化目标也完全一致。

  • 智能体不共享策略
# 创建智能体(不参数共享) agent1 = PPO( state_dim, hidden_dim, action_dim, actor_lr, critic_lr, lmbda, eps, gamma, device ) agent2 = PPO( state_dim, hidden_dim, action_dim, actor_lr, critic_lr, lmbda, eps, gamma, device ) 
  • 智能体共享同一策略
# 创建智能体(参数共享) agent = PPO( state_dim, hidden_dim, action_dim, actor_lr, critic_lr, lmbda, eps, gamma, device ) 

Training

win_list = [] for e in range(epochs): with tqdm(total=episode_per_epoch, desc='Epoch %d' % e) as pbar: for episode in range(episode_per_epoch): # Replay buffer for agent1 buffer_agent1 = { 
     'states': [], 'actions': [], 'next_states': [], 'rewards': [], 'dones': [] } # Replay buffer for agent2 buffer_agent2 = { 
     'states': [], 'actions': [], 'next_states': [], 'rewards': [], 'dones': [] } # 重置环境 s = env.reset() terminal = False while not terminal: # 采取动作(不进行参数共享) a1 = agent1.take_action(s[0]) a2 = agent2.take_action(s[1]) # 采取动作(进行参数共享) # a1 = agent.take_action(s[0]) # a2 = agent.take_action(s[1]) next_s, r, done, info = env.step([a1, a2]) buffer_agent1['states'].append(s[0]) buffer_agent1['actions'].append(a1) buffer_agent1['next_states'].append(next_s[0]) # 如果获胜,获得100的奖励,否则获得0.1惩罚 buffer_agent1['rewards'].append( r[0] + 100 if info['win'] else r[0] - 0.1) buffer_agent1['dones'].append(False) buffer_agent2['states'].append(s[1]) buffer_agent2['actions'].append(a2) buffer_agent2['next_states'].append(next_s[1]) buffer_agent2['rewards'].append( r[1] + 100 if info['win'] else r[1] - 0.1) buffer_agent2['dones'].append(False) s = next_s # 转移到下一个状态 terminal = all(done) # 更新策略(不进行参数共享) agent1.update(buffer_agent1) agent2.update(buffer_agent2) # 更新策略(进行参数共享) # agent.update(buffer_agent1) # agent.update(buffer_agent2) win_list.append(1 if info['win'] else 0) if (episode + 1) % 100 == 0: pbar.set_postfix({ 
     'episode': '%d' % (episode_per_epoch * e + episode + 1), 'winner prob': '%.3f' % np.mean(win_list[-100]) }) pbar.update(1) 
win_array = np.array(win_list) # 每100条轨迹取一次平均 win_array = np.mean(win_array.reshape(-1, 100), axis=1) episode_list = np.array(win_array.shape[0]) * 100 plt.plot(episode_list, win_array) plt.xlabel('Episodes') plt.ylabel('win rate') plt.title('IPPO on Combat') plt.show() 

在这里插入图片描述

7.2 MADDPG算法

  • for  e = 1 → M  do \text{for}\space e=1\to M\space\text{do} for e=1M do
    • 初始化随机过程 N \mathcal N N,用于动作探索
    • 获取所有智能体的初始观测 x \mathbf x x
    • for  t = 1 → T  do \text{for}\space t=1\to T\space\text{do} for t=1T do
      • 对于每个智能体 i i i,用当前策略选择一个动作 a i = μ θ i ( o i ) + N t a_i=\mu_{\theta_i}(o_i)+\mathcal N_t ai=μθi(oi)+Nt
      • 执行动作 a = ( a 1 , ⋯   , a N ) a=(a_1,\cdots,a_N) a=(a1,,aN)获得奖励 r r r和新的观测 x ′ \mathbf x^\prime x
      • ( x , a , r , x ′ ) (\mathbf x,a,r,\mathbf x^\prime) (x,a,r,x)存入经验回放池 D \mathcal D D
      • x ← x ′ \mathbf x\larr\mathbf x^\prime xx
      • for  i = 1 → N  do \text{for}\space i=1\to N\space\text{do} for i=1N do
        • D \mathcal D D中随机采样 ( x j , a j , r j , x ′ j ) (\mathbf x^j,a^j,r^j,\mathbf x^{\prime j}) (xj,aj,rj,xj)
        • 中心化训练Critic网络
        • 训练自身的Actor网络
        • 更新目标 Actor 网络和目标 Critic 网络

MPE环境

多智能体粒子环境(Multi-Agent Particles Environment,MPE)由 1 个红色的对抗智能体(adversary), N N N个蓝色的正常智能体,以及 N N N个地点(一般 N = 2 N=2 N=2),这 N N N个地点中有一个是目标地点(绿色)。正常智能体知道哪一个是目标地点,但对抗智能体不知道。正常智能体是合作关系,它们其中任意一个距离目标地点足够近,则每个正常智能体都能获得相同的奖励。对抗智能体如果距离目标地点足够近,也能获得奖励,但它需要猜哪一个才是目标地点。因此,正常智能体需要进行合作,分散到不同的坐标点,以此欺骗对抗智能体。

在这里插入图片描述

Gumbel-Softmax近似采样

由于MPE 环境中的每个智能体的动作空间是离散的,而DDPG算法本身需要使智能体的动作对于其策略参数可导,因此引入Gumbel-Softmax的方法来得到离散分布的近似采样。

假设随机变量 Z Z Z服从某个离散分布 K = ( a 1 , ⋯   , a k ) \mathcal K=(a_1,\cdots,a_k) K=(a1,,ak)。其中, a i ∈ [ 0 , 1 ] a_i\in[0,1] ai[0,1],表示 P ( Z = i ) P(Z=i) P(Z=i),并且满足 ∑ i = 1 k a i = 1 \sum^k_{i=1}a_i=1 i=1kai=1。引入重参数因子 g i g_i gi,它是一个采样自Gumbel(0, 1)的噪声,表示为:
g i = − log ⁡ ( − log ⁡ u ) , u ∼ U n i f o r m ( 0 , 1 ) g_i=-\log(-\log u),u\sim\mathrm{Uniform}(0,1) gi=log(logu),uUniform(0,1)
于是Gumbel-Softmax采样可以写为:
y i = e log ⁡ a i + g i τ ∑ j = 1 k e log ⁡ a j + g i τ , ∀ i = 1 , ⋯   , k y_i={e^{\log a_i+g_i\over \tau}\over \sum^k_{j=1}e^{\log a_j+g_i\over\tau}},\forall i=1,\cdots,k yi=j=1keτlogaj+gieτlogai+gi,i=1,,k
通过 z = arg ⁡ max ⁡ i y i z=\arg\max_iy_i z=argmaxiyi计算离散值,该离散值近似等价于离散采样 z ∼ K z\sim\mathcal K zK的值。采样到结果 y y y自然地引入了对于 a a a的梯度。温度参数 τ > 0 \tau>0 τ>0:控制Gumbel-Softmax分布与离散分布的近似程度, τ \tau τ越小,生成的分布越趋向于 onehot ( arg ⁡ max ⁡ i ( log ⁡ a i + g i ) ) \text{onehot}(\arg\max_i(\log a_i+g_i)) onehot(argmaxi(logai+gi)) τ \tau τ越大,生成的分布越趋向于均匀分布。



代码实现

导入MPE环境

git clone https://github.com/boyu-ai/multiagent-particle-envs.git --quiet pip install -e multiagent-particle-envs # 由于multiagent-pariticle-env的一些版本问题,gym需要改为可用的版本 pip install --upgrade gym==0.10.5 -q 
import os import time import torch import torch.nn.functional as F import numpy as np import matplotlib.pyplot as plt import random import collections import gym import sys sys.path.append("..\multiagent-particle-envs") # 刚git下来的包所存放的路径 from multiagent.environment import MultiAgentEnv import multiagent.scenarios as scenarios 

创建环境

def make_env(name): scenario = scenarios.load(f'{ 
      name}.py').Scenario() world = scenario.make_world() return MultiAgentEnv(world, scenario.reset_world, scenario.reward, scenario.observation) env_id = "simple_adversary" env = make_env(env_id) state_dims = [state_space.shape[0] for state_space in env.observation_space] action_dims = [action_space.n for action_space in env.action_space] critic_input_dim = sum(state_dims) + sum(action_dims) 

定义工具函数,包括让 DDPG 可以适用于离散动作空间的 Gumbel Softmax 采样的相关函数

# 生成最优动作的one-hot形式 def onehot_from_logits(logits, eps=0.01): argmax_acs = (logits == logits.max(1, keepdim=True)[0]).float() # 生成随机动作,转换成独热形式 rand_acs = torch.autograd.Variable( torch.eye(logits.shape[1])[[ np.random.choice(range(logits.shape[1]), size=logits.shape[0]) ]], requires_grad=False ).to(logits.device) # 通过epsilon-贪婪算法来选择用哪个动作 return torch.stack([ argmax_acs[i] if r > eps else rand_acs[i] for i, r in enumerate(torch.rand(logits.shape[0])) ]) # Gumbel(0,1)分布中噪声采样 def sample_gumbel(shape, eps=1e-20, tens_type=torch.FloatTensor): U = torch.autograd.Variable(tens_type(*shape).uniform_(), requires_grad=False) return -torch.log(-torch.log(U + eps) + eps) # 从Gumbel-Softmax分布中采样 def gumbel_softmax_sample(logits, temperature): y = logits + sample_gumbel(logits.shape, tens_type=type(logits.data)).to(logits.device) return F.softmax(y / temperature, dim=1) # 从Gumbel-Softmax分布中采样,并进行离散化 def gumbel_softmax(logits, temperature=1.0): y = gumbel_softmax_sample(logits, temperature) y_hard = onehot_from_logits(y) y = (y_hard.to(logits.device) - y).detach() + y return y 

实现单智能体的DDPG,包含 Actor 网络与 Critic 网络,以及计算动作的函数

class ThreeLayerFC(torch.nn.Module): def __init__(self, num_in, num_out, hidden_dim): super().__init__() self.fc1 = torch.nn.Linear(num_in, hidden_dim) self.fc2 = torch.nn.Linear(hidden_dim, hidden_dim) self.fc3 = torch.nn.Linear(hidden_dim, num_out) def forward(self, x): x = F.relu(self.fc2(F.relu(self.fc1(x)))) return self.fc3(x) class DDPG: def __init__(self, state_dim, action_dim, critic_input_dim, hidden_dim, actor_lr, critic_lr, device): self.actor = ThreeLayerFC(state_dim, action_dim, hidden_dim).to(device) self.target_actor = ThreeLayerFC(state_dim, action_dim, hidden_dim).to(device) self.critic = ThreeLayerFC(critic_input_dim, 1, hidden_dim).to(device) self.target_critic = ThreeLayerFC(critic_input_dim, 1, hidden_dim).to(device) self.target_critic.load_state_dict(self.critic.state_dict()) self.target_actor.load_state_dict(self.actor.state_dict()) self.actor_optimizer = torch.optim.Adam(self.actor.parameters(), actor_lr) self.critic_optimizer = torch.optim.Adam(self.critic.parameters(), critic_lr) def take_action(self, state, explore=False): action = self.actor(state) if explore: action = gumbel_softmax(action) else: action = onehot_from_logits(action) return action.detach().cpu().numpy()[0] def soft_update(self, net, target_net, tau): for param_target, param in zip(target_net.parameters(), net.parameters()): param_target.data.copy_(param_target.data * (1.0 - tau) + param.data * tau) 

MADDPG算法

class MADDPG: def __init__(self, env, device, actor_lr, critic_lr, hidden_dim, state_dims, action_dims, critic_input_dim, gamma, tau): self.agents = [DDPG( state_dims[i], action_dims[i], critic_input_dim, hidden_dim, actor_lr, critic_lr, device ) for i in range(len(env.agents))] self.gamma = gamma self.tau = tau self.critic_criterion = torch.nn.MSELoss() self.device = device @property def policies(self): return [agt.actor for agt in self.agents] @property def target_policies(self): return [agt.target_actor for agt in self.agents] def take_action(self, states, explore): # 将各个状态分给各个智能体,让它们在各自状态下执行动作 states = [ torch.tensor([states[i]], dtype=torch.float, device=self.device) for i in range(len(env.agents)) ] return [ agent.take_action(state, explore) for agent, state in zip(self.agents, states) ] def update(self, sample, agent_id): current_agent = self.agents[agent_id] obs, acts, rewards, next_obs, done = sample '''更新critic网络''' current_agent.critic_optim.zero_grad() # 计算Q-target all_target_act = [ onehot_from_logits(pi(next_obs_)) for pi, next_obs_ in zip(self.target_policies, next_obs) ] # 拼接神经网络target_critic的输入 target_critic_input = torch.cat((*next_obs, *all_target_act), dim=1) target_critic_value = rewards[agent_id].view(-1, 1)\ + self.gamma * (1 - done[agent_id].view(-1, 1)) * current_agent.target_critic(target_critic_input) # 计算Q-eval critic_input = torch.cat((*obs, *acts), dim=1) critic_value = current_agent.critic(critic_input) # 利用MSE更新critic网络 critic_loss = self.critic_criterion(critic_value, target_critic_value.detach()) critic_loss.backward() current_agent.critic_optim.step() '''更新actor网络''' current_agent.actor_optim.zero_grad() logits = current_agent.actor(obs[agent_id]) act = gumbel_softmax(logits) all_actor_acts = [] for i, (pi, obs_) in enumerate(zip(self.policies, obs)): if i == agent_id: all_actor_acts.append(act) else: all_actor_acts.append(onehot_from_logits(pi(obs_))) vf_input = torch.cat((*obs, *all_actor_acts), dim=1) actor_loss = -current_agent.critic(vf_input).mean() actor_loss += (logits  2).mean() * 1e-3 actor_loss.backward() current_agent.actor_optim.step() # 对target网络进行软更新 def update_all_target(self): for agt in self.agents: agt.soft_update(agt.actor, agt.target_actor, self.tau) agt.soft_update(agt.critic, agt.target_critic, self.tau) 

定义评估策略的方法

def evaluate(env_id, maddpg, n_episode=10, episode_length=25): env = make_env(env_id) returns = np.zeros(len(env.agents)) for _ in range(n_episode): obs = env.reset() for t_i in range(episode_length): actions = maddpg.take_action(obs, explore=False) obs, rew, done, info = env.step(actions) rew = np.array(rew) returns += rew / n_episode return returns.tolist() 

Training

num_episodes = 5000 episode_length = 25 buffer_size =  hidden_dim = 128 actor_lr = 1e-3 critic_lr = 1e-3 gamma = 0.99 tau = 0.005 batch_size = 256 device = torch.device("cuda" if torch.cuda.is_available() else "cpu") update_interval = 50 minimal_size = 4000 epsilon = 0.3 maddpg = MADDPG(env, device, actor_lr, critic_lr, hidden_dim, state_dims, action_dims, critic_input_dim, gamma, tau) replay_buffer = rl_utils.ReplayBuffer(buffer_size) return_list = [] total_step = 0 for episode in range(num_episodes): state = env.reset() for step in range(episode_length): actions = maddpg.take_action(state, explore=True) next_state, reward, done, _ = env.step(actions) replay_buffer.add(state, actions, reward, next_state, done) state = next_state total_step += 1 # 如果replay buffer存满了,以及达到更新间隔update_interval,对buffer进行更新 if replay_buffer.size() >= minimal_size and total_step % update_interval == 0: sample = replay_buffer.sample(batch_size) # 处理样本数据 def stack_array(x): rearranged = [[sub_x[i] for sub_x in x] for i in range(len(x[0]))] return [ torch.FloatTensor(np.vstack(ra)).to(device) for ra in rearranged ] sample = [stack_array(x) for x in sample] # 更新每一个agent的critic和actor网络 for agent_id in range(len(env.agents)): maddpg.update(sample, agent_id) # 更新target网络 maddpg.update_all_target() if (episode + 1) % 100 == 0: ep_returns = evaluate(env_id, maddpg, n_episode=100) return_list.append(ep_returns) print(f'Episode: { 
      episode + 1}, { 
      ep_returns}') return_array = np.array(return_list) for i, agent_name in enumerate(["adversary", "agent0", "agent1"]): plt.figure() plt.plot( np.arange(return_array.shape[0]) * 100, rl_utils.moving_average(return_array[:, i], 9) ) plt.xlabel("Episode") plt.ylabel("Returns") plt.title(agent_name) 
Episode: 100, [-41.304, -6.82515, -6.82515] Episode: 200, [-35.2446, -2.8429, -2.8429] Episode: 300, [-27.023, 4.085, 4.085] Episode: 400, [-17.635, -12.409, -12.409] Episode: 500, [-15.068, -6.9104, -6.9104] Episode: 600, [-16.269, -3.02317, -3.02317] Episode: 700, [-11.7778, -5.9993, -5.9993] Episode: 800, [-13.0006, 4.8817, 4.8817] Episode: 900, [-11.3697, 3.548, 3.548] Episode: 1000, [-11.0582, 3.97206, 3.97206] Episode: 1100, [-12.112, 6.1136, 6.1136] Episode: 1200, [-10.8363, 4.40725, 4.40725] Episode: 1300, [-12.8032, 7.06, 7.06] Episode: 1400, [-11.8538, 7.4386, 7.4386] Episode: 1500, [-10.0543, 6.8339, 6.8339] Episode: 1600, [-9.2806, 7.2865, 7.2865] Episode: 1700, [-10.0836, 7.2733, 7.2733] Episode: 1800, [-10.4314, 7.5415, 7.5415] Episode: 1900, [-11.0025, 7.58345, 7.58345] Episode: 2000, [-9.2294, 6.0727, 6.0727] Episode: 2100, [-9.4188, 6.2431, 6.2431] Episode: 2200, [-8.2239, 6.0182, 6.0182] Episode: 2300, [-9.8365, 6.7572, 6.7572] Episode: 2400, [-10.6255, 6.4565, 6.4565] Episode: 2500, [-7.5542, 5.8697, 5.8697] Episode: 2600, [-8.7832, 6.7296, 6.7296] Episode: 2700, [-8.0892, 6.5939, 6.5939] Episode: 2800, [-7.6937, 5.2278, 5.2278] Episode: 2900, [-8.4698, 6.6716, 6.6716] Episode: 3000, [-8.2417, 5.4646, 5.4646] Episode: 3100, [-8.0954, 6.7612, 6.7612] Episode: 3200, [-8.7608, 5.17524, 5.17524] Episode: 3300, [-6.0495, 4.1814, 4.1814] Episode: 3400, [-9.0465, 5.8535, 5.8535] Episode: 3500, [-9.3274, 5.1028, 5.1028] Episode: 3600, [-8.9446, 6.11715, 6.11715] Episode: 3700, [-9.0769, 5.7206, 5.7206] Episode: 3800, [-8.6009, 5.1042, 5.1042] Episode: 3900, [-9.6136, 5.2459, 5.2459] Episode: 4000, [-9.0453, 5.5292, 5.5292] Episode: 4100, [-9.785, 5.8946, 5.8946] Episode: 4200, [-9.2312, 5.58105, 5.58105] Episode: 4300, [-8.4968, 5.2804, 5.2804] Episode: 4400, [-8.9002, 5.0755, 5.0755] Episode: 4500, [-10.9779, 6.7362, 6.7362] Episode: 4600, [-8.6367, 5.6162, 5.6162] Episode: 4700, [-9.9247, 5.18125, 5.18125] Episode: 4800, [-8.9647, 5.47295, 5.47295] Episode: 4900, [-9.0698, 5.803, 5.803] Episode: 5000, [-10.1705, 6.03605, 6.03605] 

在这里插入图片描述
在这里插入图片描述


免责声明:本站所有文章内容,图片,视频等均是来源于用户投稿和互联网及文摘转载整编而成,不代表本站观点,不承担相关法律责任。其著作权各归其原作者或其出版社所有。如发现本站有涉嫌抄袭侵权/违法违规的内容,侵犯到您的权益,请在线联系站长,一经查实,本站将立刻删除。 本文来自网络,若有侵权,请联系删除,如若转载,请注明出处:https://haidsoft.com/133706.html

(0)
上一篇 2025-07-20 17:26
下一篇 2025-07-20 17:45

相关推荐

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注微信