见书中例子:Random-sample on-step tabular Q-planning.
, or Q q for a given O-policy n-step Q() for estimating Q q, or Q q for a given 8 Random-sample
Random-sample one-step tabular Q-planning 通过从模型中获取奖赏值,计算 Tabular Dyna-Q 如果n=0,就是Q-learning算法。
比如下面介绍了一个简单的基于一步表格Q-learning算法以及从采样模型产生的样本上的例子,这个方法叫做random-sample one-step tabular Q-planning,和一步表格Q-learning
""" Update the priority queue with the most recent (state, action) pair and perform random-sample += np.abs(P) return priority def _simulate_behavior(self): """ Perform random-sample