skip to main content
Article

Bias-free hypothesis evaluation in multirelational domains

Published: 23 April 2006 Publication History

Abstract

In machine learning one typically assumes that the true classification of an object depends only on the object itself and given the object, is independent of the classification of other objects. In this case, setting aside a sufficiently large and randomly chosen part of the training data as a test set, the observed sample error on the test set is an unbiased estimator of true error. However, in many application settings, those mainstream approaches to model evaluation might be inappropriate. As pointed out by [2], among others, whenever there is autocorrelation, i.e., whenever the target value of one object depends not only on the object itself, but also on other objects' classifications or information that is shared between objects, observed error on a randomly chosen test set may not be an unbiased estimator anymore. We introduce a sampling technique, generalized subgraph sampling, that avoids a bias in error estimation by establishing the required amount of linked objects in the test set.

References

[1]
http://www.imdb.com.
[2]
D. Jensen and J. Neville. Autocorrelation and linkage cause bias in evaluation of relational learners. In Proc. of the 12th International Conference on Inductive Logic Programming. Springer-Verlag, 2002.
[3]
J. Neville and D. Jensen. Collective classification with relational dependency networks. In Proc. of the 2nd Multi-Relational Data Mining Workshop, 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '06: Proceedings of the 2006 ACM symposium on Applied computing
April 2006
1967 pages
ISBN:1595931082
DOI:10.1145/1141277
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 April 2006

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. error estimation
  2. multirelational learning
  3. sampling

Qualifiers

  • Article

Conference

SAC06
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 99
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Oct 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media