On the Convergence of Off-line Temporal Difference Learning with Function Approximation
|
 |
|
Post a Comment
|
 |
|
|
|
CONTRIBUTORS:
|
|
|
INSTITUTION ID:
|
|
|
SERIES TITLE:
|
|
|
YEAR:
|
2003
|
|
PUB TYPE:
|
Working Paper/Manuscript
|
|
WORKING PAPER NUMBER:
|
None
|
|
PAGES:
|
10 p.
|
|
SUBJECT(S):
|
off-line reinforcement learning, function approximation, linear function approximation, temporal-difference learning.
|
|
DISCIPLINE:
|
Computer Science
|
|
HTTP:
|
|
|
LANGUAGE:
|
English
|
|
PUB ID:
|
103-397-080
(Last edited on
2003/11/21 20:51:22 US/Mountain)
|
|
SPONSOR(S):
|
|
|
ABSTRACT:
A standard reinforcement learning agent learns the optimal policy to achieve its goal, or the value function of a given policy, while interacting with the environment. This paper considers a variation of this standard framework, where the agent learns the value function of a control policy off-line, based on a fixed set of training data. Parameterized function approximation is used and a general update rule is derived. But it has been known that online reinforcement learning algorithms with function approximation can diverge in some cases. This paper considers the convergence property for off-line reinforcement learning when function approximation is used. More extensive analytical convergence analysis is done for the linear function approximation case.
|
|
|
|
STATISTICS
|
|
Click on # to view
|
|
Citations
|
|
0
|
|
References
|
|
0
|
|
Comments
|
|
0
|
|
Quality
|
|
0/0.00
|
|
Interest
|
|
0/0.00
|
|
View(er)s
|
|
3/256
|
|
|
|
|
|
|
| Prev |
Next |
|