首頁 > 網路資源 > 大同大學數位論文系統

Title page for etd-1110112-003335


URN etd-1110112-003335 Statistics This thesis had been viewed 2016 times. Download 2414 times.
Author Tsao-Jen Hsueh
Author's Email Address No Public.
Department Computer Science and Enginerring
Year 2012 Semester 1
Degree Master Type of Document Master's Thesis
Language zh-TW.Big5 Chinese Page Count 77
Title WEB DATA EXTRACTION SYSTEM – DATA TRANSFORMATION AND SCHEDULING
Keyword
  • Scheduling
  • Extraction
  • Web data extraction
  • Web mining
  • Wrapper generation
  • XML transformation
  • XML transformation
  • Wrapper generation
  • Web mining
  • Web data extraction
  • Extraction
  • Scheduling
  • Abstract The World Wide Web (abbreviated as WWW or Web), is just like the biggest library in the world, which houses various and abundant knowledge and information that we can carry it with us anytime to realize the dream of “Knowing everything without going out”. We often take advantage of the search engine like Google, to query the information on the internet by keywords. But much unrelated information is also responded by the search engine which requires further arrangement and filtering to get what we really want. Therefore, how to find out the information we need from the wide internet and extract the useful data has become a topic for general discussion and study.
    Here we proposed a Web data extraction system called “W2X”. It focuses on a few websites of similar themes, such as travel or shopping, to download the interested information from the webpage automatically and extract what we really need from these webpages of completely different structures. At last, the extracted information is transformed to an identical format and delivered to the portal website for query usage. Hence, this portal website is able to provide the query result across various websites and reduces the time it takes the end-user to query from different websites respectively as well as providing an integrated query result to facilitate the end-user for comparing and analyzing the information. In addition, to avoid the data update interruption by the structure change of the target webpage, the system further provides a graphical user interface for fast setting change and adjustment in order to enable the portal website to provide the most updated query result in the shortest time.
    Advisor Committee
  • Yue-Sun Kuo - advisor
  • Ching-Long Yeh - co-chair
  • Deron Liang - co-chair
  • Files indicate access worldwide
    Date of Defense 2012-10-26 Date of Submission 2012-11-10


    Browse | Search All Available ETDs