Learning optimal values form random walk
Refereed conference paper presented and published in conference proceedings

CUHK Authors
Author(s) no longer affiliated with CUHK


Full Text

Times Cited
Web of Science0WOS source URL (as at 10/08/2020) Click here for the latest count

Other information
AbstractIn this paper we extend the random walk example of Sutton and Barto to a multistage dynamic programming optimization setting with discounted reward. Using Bellman equations on presumed action, the optimal values are derived for general transition probability rho and discount rate gamma, and include the original random walk as a special case. Temporal difference methods with eligibility traces, TD(lambda), are effective in predicting the optimal values for different rho and gamma; but their performances are found to depend critically on the choice of truncated return in the formulation when gamma is less than 1.
All Author(s) ListLam KP
Name of Conference17th International Conference on Tools with Artificial Intelligence
Start Date of Conference14/11/2005
End Date of Conference16/11/2005
Place of ConferenceHong Kong
Country/Region of ConferenceChina
Journal name2011 23RD IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2011)
Year2005
Month1
Day1
PublisherIEEE COMPUTER SOC
Pages334 - 339
ISBN0-7695-2488-5
ISSN1082-3409
LanguagesEnglish-United Kingdom
Web of Science Subject CategoriesComputer Science; Computer Science, Artificial Intelligence

Last updated on 2020-11-08 at 03:29