Categorizing XML documents based on page styles

JW Lee�- Advanced Workshop on Content Computing, 2004 - Springer
JW Lee
Advanced Workshop on Content Computing, 2004Springer
The self-describing feature of XML offers both challenges and opportunities in information
retrieval, document management, and data mining. To process and manage XML
documents effectively on XML data server, database, Electronic Document Management
System (EDMS) and search engine, we have to develop a new technique for categorizing
large XML documents automatically. In this paper, we propose a new methodology for
categorizing XML documents based on page style by taking account of meanings of the�…
Abstract
The self-describing feature of XML offers both challenges and opportunities in information retrieval, document management, and data mining. To process and manage XML documents effectively on XML data server, database, Electronic Document Management System(EDMS) and search engine, we have to develop a new technique for categorizing large XML documents automatically. In this paper, we propose a new methodology for categorizing XML documents based on page style by taking account of meanings of the elements and nested structures of XML. Accurate categorization of XML documents by page styles provides an important basis for a variety of applications of managing and processing XML. Experiments with Yahoo! pages show that our methodology provides almost 100% accuracy in categorizing XML documents by page styles.
Springer
Showing the best result for this search. See all results