subscribe to arXiv mailings

arXiv:1408.2031 [pdf]

Conditional Probability Tree Estimation Analysis and Algorithms

Authors: Alina Beygelzimer, John Langford, Yuri Lifshits, Gregory Sorkin, Alexander L. Strehl

Abstract: We consider the problem of estimating the conditional probability of a label in time O(log n), where n is the number of possible labels. We analyze a natural reduction of this problem to a set of binary regression problems organized in a tree structure, proving a regret bound that scales with the depth of the tree. Motivated by this analysis, we propose the first online algorithm which provably co… ▽ More We consider the problem of estimating the conditional probability of a label in time O(log n), where n is the number of possible labels. We analyze a natural reduction of this problem to a set of binary regression problems organized in a tree structure, proving a regret bound that scales with the depth of the tree. Motivated by this analysis, we propose the first online algorithm which provably constructs a logarithmic depth tree on the set of labels to solve this problem. We test the algorithm empirically, showing that it works succesfully on a dataset with roughly 106 labels. △ Less

Submitted 9 August, 2014; originally announced August 2014.

Comments: Appears in Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (UAI2009)

Report number: UAI-P-2009-PG-51-58

arXiv:1004.0092 [pdf, ps, other]

doi 10.1007/s00224-008-9154-6

Maximal Intersection Queries in Randomized Input Models

Authors: Benjamin Hoffmann, Mikhail Lifshits, Yury Lifshits, Dirk Nowotka

Abstract: Consider a family of sets and a single set, called the query set. How can one quickly find a member of the family which has a maximal intersection with the query set? Time constraints on the query and on a possible preprocessing of the set family make this problem challenging. Such maximal intersection queries arise in a wide range of applications, including web search, recommendation systems, and… ▽ More Consider a family of sets and a single set, called the query set. How can one quickly find a member of the family which has a maximal intersection with the query set? Time constraints on the query and on a possible preprocessing of the set family make this problem challenging. Such maximal intersection queries arise in a wide range of applications, including web search, recommendation systems, and distributing on-line advertisements. In general, maximal intersection queries are computationally expensive. We investigate two well-motivated distributions over all families of sets and propose an algorithm for each of them. We show that with very high probability an almost optimal solution is found in time which is logarithmic in the size of the family. Moreover, we point out a threshold phenomenon on the probabilities of intersecting sets in each of our two input models which leads to the efficient algorithms mentioned above. △ Less

Submitted 1 April, 2010; originally announced April 2010.

Comments: 18 pages

Journal ref: Theory of Computing Systems, 46(1):104-119, 2010

arXiv:0903.4217 [pdf, ps, other]

Conditional Probability Tree Estimation Analysis and Algorithms

Authors: Alina Beygelzimer, John Langford, Yuri Lifshits, Gregory Sorkin, Alex Strehl

Abstract: We consider the problem of estimating the conditional probability of a label in time $O(\log n)$, where $n$ is the number of possible labels. We analyze a natural reduction of this problem to a set of binary regression problems organized in a tree structure, proving a regret bound that scales with the depth of the tree. Motivated by this analysis, we propose the first online algorithm which prov… ▽ More We consider the problem of estimating the conditional probability of a label in time $O(\log n)$, where $n$ is the number of possible labels. We analyze a natural reduction of this problem to a set of binary regression problems organized in a tree structure, proving a regret bound that scales with the depth of the tree. Motivated by this analysis, we propose the first online algorithm which provably constructs a logarithmic depth tree on the set of labels to solve this problem. We test the algorithm empirically, showing that it works succesfully on a dataset with roughly $10^6$ labels. △ Less

Submitted 3 June, 2009; v1 submitted 24 March, 2009; originally announced March 2009.

arXiv:cs/0604058 [pdf, ps, other]

Solving Classical String Problems on Compressed Texts

Authors: Yury Lifshits

Abstract: Here we study the complexity of string problems as a function of the size of a program that generates input. We consider straight-line programs (SLP), since all algorithms on SLP-generated strings could be applied to processing LZ-compressed texts. The main result is a new algorithm for pattern matching when both a text T and a pattern P are presented by SLPs (so-called fully compressed patter… ▽ More Here we study the complexity of string problems as a function of the size of a program that generates input. We consider straight-line programs (SLP), since all algorithms on SLP-generated strings could be applied to processing LZ-compressed texts. The main result is a new algorithm for pattern matching when both a text T and a pattern P are presented by SLPs (so-called fully compressed pattern matching problem). We show how to find a first occurrence, count all occurrences, check whether any given position is an occurrence or not in time O(n^2m). Here m,n are the sizes of straight-line programs generating correspondingly P and T. Then we present polynomial algorithms for computing fingerprint table and compressed representation of all covers (for the first time) and for finding periods of a given compressed string (our algorithm is faster than previously known). On the other hand, we show that computing the Hamming distance between two SLP-generated strings is NP- and coNP-hard. △ Less

Submitted 13 April, 2006; originally announced April 2006.

Comments: 10 pages, 6 figures, submitted

ACM Class: E.4; F.2.2; I.7

Showing 1–4 of 4 results for author: Lifshits, Y