下載電子全文宣告This thesis is authorized to indicate in-campus access only
You can not download at the moment.
Your IP address is 22.214.171.124
The defense date of the thesis is 2008-06-23
The current date is 2019-03-22
This thesis will be accessible at off-campus not accessible
URN etd-0619108-203139 Statistics This thesis had been viewed 3661 times. Download 17 times. Author Ming-Po Hsieh Author's Email Address No Public. Department Computer Science and Enginerring Year 2007 Semester 2 Degree Master Type of Document Master's Thesis Language English Page Count 63 Title A Temporal-Difference Prediction Approach for Intelligent Multiplayer Games Keyword Reinforcement Learning Temporal-Difference Networks Temporal-Difference Networks Reinforcement Learning Abstract Reinforcement learning is the problem faced by an agent that must learn behavior
through trial-and-error interactions with a dynamic environment. Rather than being given the expertise, the agent takes trial actions itself and experiences the information returning from the environment continuously in order to seek a remarkable way to accomplish its goal. By continually exploiting what it already knew and exploring what it had never experienced, the agent can renew the policy progressively based on the value function built by the agent itself incrementally. The value function is, in fact, a mechanism for the agent to record the merit of each state in the environment so that the agent can look it up to determine which action should be the best under the current situation in the environment. There are two challenges in reinforcement learning. First, it can't create a table to store values for an environment with an enormous large number of states because of insu±cient memory spaces. Second, traditional solution methods for reinforcement learning focused on perfectly knowable
environment, but many real-world problems are not so.
In this thesis, we propose a novel learning method in an information imperfect
environment. In this approach, two cascaded neural-networks are linked together as our learning engine. One neural-network is designed based on the concept of the temporal-difference network. Its task is to predict the occurrence of successive events corresponding to a trial action in probability sense. The predictions, accompanying with some known information, are then passed to the other neural-network for value estimation. The best action corresponding to the current state can then be determined by choosing the one with the most beneficial value. The sophisticated agent is able to estimate the opponents' violent attacks, and make a strong resistance to fight them back.
The card game "Hearts" is our test target. It is a typical example of imperfect
information games, and it is so di±cult that the traditional reinforcement learning methods can't learn well. Playing 100 games with MSHEARTS in Microsoft Windows, our well-trained agent won the championship.
Advisor Committee Tai-Wen Yue - advisor
Ching-Long Yeh - co-chair
Y. C. Hung - co-chair
Files Date of Defense 2008-05-28 Date of Submission 2008-06-23