[PDF][PDF] Refining Information Extraction Rules using Data Provenance.

B Liu, L Chiticariu, V Chu, HV Jagadish, F Reiss�- IEEE Data Eng. Bull., 2010 - Citeseer
IEEE Data Eng. Bull., 2010Citeseer
Developing high-quality information extraction (IE) rules, or extractors, is an iterative and
primarily manual process, extremely time consuming, and error prone. In each iteration, the
outputs of the extractor are examined, and the erroneous ones are used to drive the
refinement of the extractor in the next iteration. Data provenance explains the origins of an
output data, and how it has been transformed through a query. As such, one can expect data
provenance to be valuable in understanding and debugging complex IE rules. In this paper�…
Abstract
Developing high-quality information extraction (IE) rules, or extractors, is an iterative and primarily manual process, extremely time consuming, and error prone. In each iteration, the outputs of the extractor are examined, and the erroneous ones are used to drive the refinement of the extractor in the next iteration. Data provenance explains the origins of an output data, and how it has been transformed through a query. As such, one can expect data provenance to be valuable in understanding and debugging complex IE rules. In this paper we discuss how data provenance can be used beyond understanding and debugging, to automatically refine IE rules. In particular, we overview the main ideas behind a recent provenance-based solution for suggesting a ranked list of refinements to an extractor aimed at increasing its precision, and outline several related directions for future research.
Citeseer
Showing the best result for this search. See all results