Pure Java Implementation of K-Nearest Neighbor Text Categorization
BBC Dataset is used for test(filenames are refactored to [categoryname]-[number] ). Dataset is under "data" folder. http://mlg.ucd.ie/datasets/bbc.html
Run:
java -jar doc_classifier_knn.jar -Dir=data -categories=business,entertainment,politics,sport,tech -trainingSize=150 -k=3
Written for experimental purpose. It could be optimized and expanded.
Concurrent implementation of this algorithm -> https://github.com/hasanfd/concurrent_doc_classifier_knn