强化学习

Posted on 2018-06-26

强化学习

Reinforcement learning is a computational approach to learning from interaction, which is a kind of goal-directed learning.

目标

学习从环境状态到行为的映射，使得智能体选择的行为能够获得环境最大的奖赏，使得外部环境对学习系统在某种意义下的评价为最佳。

区别于监督学习

监督学习是从标注中学习：

Learning from a training set of labeled examples provided by a knowledgeable external supervisor;
Focusing on generalization capacity.

强化学习是从交互中学习

强化学习的两大特性

试错搜索（Trial-and-error search）
延迟奖励（Delayed reward）

强化学习需要应对的挑战

1. Exploration-exploitation dilemma:

Exploitation: Agent exploits what it has already experienced in order to obtain reward;
Exploration: Agent explores in order to make better action selections in the future.

2. Focusing on the target problem as a whole rather than focusing on many isolated sub-problems:

Generally start with a complete, interactive, goal-seeking agent.

强化学习的要素

主体：智能体和环境：

状态、行为和奖励；

要素：

策略policy：状态到行为的映射，包括确定策略和随机策略两种；
奖励reward：关于状态和行为的函数，通常具有一定的不确定性；
价值value：累计奖励或长期目标；
环境模型model of environment：刻画环境对行为的反馈。

强化学习的发展历程