强化学习

强化学习

Reinforcement learning is a computational approach to learning from interaction, which is a kind of goal-directed learning.

目标

学习从环境状态到行为的映射, 使得智能体选择的行为能够获得环境最大的奖赏,使得外部环境对学习系统在某种意义下的评价为最佳。

区别于监督学习

监督学习是从标注中学习:

  • Learning from a training set of labeled examples provided by a knowledgeable external supervisor;
  • Focusing on generalization capacity.

强化学习是从交互中学习

强化学习的两大特性

  1. 试错搜索(Trial-and-error search)
  2. 延迟奖励(Delayed reward)

强化学习需要应对的挑战

1. Exploration-exploitation dilemma:

  • Exploitation: Agent exploits what it has already experienced in order to obtain reward;
  • Exploration: Agent explores in order to make better action selections in the future.

2. Focusing on the target problem as a whole rather than focusing on many isolated sub-problems:

  • Generally start with a complete, interactive, goal-seeking agent.

强化学习的要素

主体:智能体和环境

  • 状态、行为和奖励;

要素

  • 策略policy:状态到行为的映射,包括确定策略和随机策略两种;
  • 奖励reward:关于状态和行为的函数,通常具有一定的不确定性;
  • 价值value:累计奖励或长期目标;
  • 环境模型model of environment:刻画环境对行为的反馈。

强化学习的发展历程