首頁 > 網路資源 > 大同大學數位論文系統

Title page for etd-0619108-203139


URN etd-0619108-203139 Statistics This thesis had been viewed 3661 times. Download 17 times.
Author Ming-Po Hsieh
Author's Email Address No Public.
Department Computer Science and Enginerring
Year 2007 Semester 2
Degree Master Type of Document Master's Thesis
Language English Page Count 63
Title A Temporal-Difference Prediction Approach for Intelligent Multiplayer Games
Keyword
  • Reinforcement Learning
  • Temporal-Difference Networks
  • Temporal-Difference Networks
  • Reinforcement Learning
  • Abstract Reinforcement learning is the problem faced by an agent that must learn behavior
    through trial-and-error interactions with a dynamic environment. Rather than being given the expertise, the agent takes trial actions itself and experiences the information returning from the environment continuously in order to seek a remarkable way to accomplish its goal. By continually exploiting what it already knew and exploring what it had never experienced, the agent can renew the policy progressively based on the value function built by the agent itself incrementally. The value function is, in fact, a mechanism for the agent to record the merit of each state in the environment so that the agent can look it up to determine which action should be the best under the current situation in the environment. There are two challenges in reinforcement learning. First, it can't create a table to store values for an environment with an enormous large number of states because of insu±cient memory spaces. Second, traditional solution methods for reinforcement learning focused on perfectly knowable
    environment, but many real-world problems are not so.
    In this thesis, we propose a novel learning method in an information imperfect
    environment. In this approach, two cascaded neural-networks are linked together as our learning engine. One neural-network is designed based on the concept of the temporal-difference network. Its task is to predict the occurrence of successive events corresponding to a trial action in probability sense. The predictions, accompanying with some known information, are then passed to the other neural-network for value estimation. The best action corresponding to the current state can then be determined by choosing the one with the most beneficial value. The sophisticated agent is able to estimate the opponents' violent attacks, and make a strong resistance to fight them back.
    The card game "Hearts" is our test target. It is a typical example of imperfect
    information games, and it is so di±cult that the traditional reinforcement learning methods can't learn well. Playing 100 games with MSHEARTS in Microsoft Windows, our well-trained agent won the championship.
    Advisor Committee
  • Tai-Wen Yue - advisor
  • Ching-Long Yeh - co-chair
  • Y. C. Hung - co-chair
  • Files indicate in-campus access only
    Date of Defense 2008-05-28 Date of Submission 2008-06-23


    Browse | Search All Available ETDs