强化学习
Reinforcement learning is a computational approach to learning from interaction, which is a kind of goal-directed learning.
目标
学习从环境状态到行为的映射, 使得智能体选择的行为能够获得环境最大的奖赏,使得外部环境对学习系统在某种意义下的评价为最佳。
区别于监督学习
监督学习是从标注中学习:
- Learning from a training set of labeled examples provided by a knowledgeable external supervisor;
- Focusing on generalization capacity.
强化学习是从交互中学习
强化学习的两大特性
- 试错搜索(Trial-and-error search)
- 延迟奖励(Delayed reward)
强化学习需要应对的挑战
1. Exploration-exploitation dilemma:
- Exploitation: Agent exploits what it has already experienced in order to obtain reward;
- Exploration: Agent explores in order to make better action selections in the future.
2. Focusing on the target problem as a whole rather than focusing on many isolated sub-problems:
- Generally start with a complete, interactive, goal-seeking agent.
强化学习的要素
主体:智能体和环境:
- 状态、行为和奖励;
要素:
- 策略policy:状态到行为的映射,包括确定策略和随机策略两种;
- 奖励reward:关于状态和行为的函数,通常具有一定的不确定性;
- 价值value:累计奖励或长期目标;
- 环境模型model of environment:刻画环境对行为的反馈。