-
LINKAGE: Listwise Ranking among Varied-Quality References for Non-Factoid QA Evaluation via LLMs
Authors:
Sihui Yang,
Keping Bi,
Wanqing Cui,
Jiafeng Guo,
Xueqi Cheng
Abstract:
Non-Factoid (NF) Question Answering (QA) is challenging to evaluate due to diverse potential answers and no objective criterion. The commonly used automatic evaluation metrics like ROUGE or BERTScore cannot accurately measure semantic similarities or answers from different perspectives. Recently, Large Language Models (LLMs) have been resorted to for NFQA evaluation due to their compelling perform…
▽ More
Non-Factoid (NF) Question Answering (QA) is challenging to evaluate due to diverse potential answers and no objective criterion. The commonly used automatic evaluation metrics like ROUGE or BERTScore cannot accurately measure semantic similarities or answers from different perspectives. Recently, Large Language Models (LLMs) have been resorted to for NFQA evaluation due to their compelling performance on various NLP tasks. Common approaches include pointwise scoring of each candidate answer and pairwise comparisons between answers. Inspired by the evolution from pointwise to pairwise to listwise in learning-to-rank methods, we propose a novel listwise NFQA evaluation approach, that utilizes LLMs to rank candidate answers in a list of reference answers sorted by descending quality. Moreover, for NF questions that do not have multi-grade or any golden answers, we leverage LLMs to generate the reference answer list of various quality to facilitate the listwise evaluation. Extensive experimental results on three NFQA datasets, i.e., ANTIQUE, the TREC-DL-NF, and WebGLM show that our method has significantly higher correlations with human annotations compared to automatic scores and common pointwise and pairwise approaches.
△ Less
Submitted 30 September, 2024; v1 submitted 23 September, 2024;
originally announced September 2024.
-
Contextual Dual Learning Algorithm with Listwise Distillation for Unbiased Learning to Rank
Authors:
Lulu Yu,
Keping Bi,
Shiyu Ni,
Jiafeng Guo
Abstract:
Unbiased Learning to Rank (ULTR) aims to leverage biased implicit user feedback (e.g., click) to optimize an unbiased ranking model. The effectiveness of the existing ULTR methods has primarily been validated on synthetic datasets. However, their performance on real-world click data remains unclear. Recently, Baidu released a large publicly available dataset of their web search logs. Subsequently,…
▽ More
Unbiased Learning to Rank (ULTR) aims to leverage biased implicit user feedback (e.g., click) to optimize an unbiased ranking model. The effectiveness of the existing ULTR methods has primarily been validated on synthetic datasets. However, their performance on real-world click data remains unclear. Recently, Baidu released a large publicly available dataset of their web search logs. Subsequently, the NTCIR-17 ULTRE-2 task released a subset dataset extracted from it. We conduct experiments on commonly used or effective ULTR methods on this subset to determine whether they maintain their effectiveness. In this paper, we propose a Contextual Dual Learning Algorithm with Listwise Distillation (CDLA-LD) to simultaneously address both position bias and contextual bias. We utilize a listwise-input ranking model to obtain reconstructed feature vectors incorporating local contextual information and employ the Dual Learning Algorithm (DLA) method to jointly train this ranking model and a propensity model to address position bias. As this ranking model learns the interaction information within the documents list of the training set, to enhance the ranking model's generalization ability, we additionally train a pointwise-input ranking model to learn the listwise-input ranking model's capability for relevance judgment in a listwise manner. Extensive experiments and analysis confirm the effectiveness of our approach.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Are Large Language Models More Honest in Their Probabilistic or Verbalized Confidence?
Authors:
Shiyu Ni,
Keping Bi,
Lulu Yu,
Jiafeng Guo
Abstract:
Large language models (LLMs) have been found to produce hallucinations when the question exceeds their internal knowledge boundaries. A reliable model should have a clear perception of its knowledge boundaries, providing correct answers within its scope and refusing to answer when it lacks knowledge. Existing research on LLMs' perception of their knowledge boundaries typically uses either the prob…
▽ More
Large language models (LLMs) have been found to produce hallucinations when the question exceeds their internal knowledge boundaries. A reliable model should have a clear perception of its knowledge boundaries, providing correct answers within its scope and refusing to answer when it lacks knowledge. Existing research on LLMs' perception of their knowledge boundaries typically uses either the probability of the generated tokens or the verbalized confidence as the model's confidence in its response. However, these studies overlook the differences and connections between the two. In this paper, we conduct a comprehensive analysis and comparison of LLMs' probabilistic perception and verbalized perception of their factual knowledge boundaries. First, we investigate the pros and cons of these two perceptions. Then, we study how they change under questions of varying frequencies. Finally, we measure the correlation between LLMs' probabilistic confidence and verbalized confidence. Experimental results show that 1) LLMs' probabilistic perception is generally more accurate than verbalized perception but requires an in-domain validation set to adjust the confidence threshold. 2) Both perceptions perform better on less frequent questions. 3) It is challenging for LLMs to accurately express their internal confidence in natural language.
△ Less
Submitted 19 August, 2024;
originally announced August 2024.
-
Crystal growth and characterization of Fe$_{1+δ}$Se$_{1-x}$Te$_x$ (0.5 $\leq$ $x$ $\leq$ 1) from LiCl/KCl flux
Authors:
Qiaoyu Wang,
Kexin Bi,
Lewei Chen,
Yunqing Shi,
Junkun Yi,
Yadong Gu,
Menghu Zhou,
Binbin Ruan,
Xingye Lu,
Mingwei Ma,
Genfu Chen,
Zhian Ren
Abstract:
An eutectic LiCl/KCl flux method in a horizontal configuration has been used to grow a series of homogeneous Fe$_{1+δ}$Se$_{1-x}$Te$_x$ single crystals of high quality with 0.5 $\leq$ $x$ $\leq$ 1. Compared with previously used melt-growth method, the stable crystallization process in LiCl/KCl flux below their peritectic temperatures results in better homogeneity and crystalline perfection identif…
▽ More
An eutectic LiCl/KCl flux method in a horizontal configuration has been used to grow a series of homogeneous Fe$_{1+δ}$Se$_{1-x}$Te$_x$ single crystals of high quality with 0.5 $\leq$ $x$ $\leq$ 1. Compared with previously used melt-growth method, the stable crystallization process in LiCl/KCl flux below their peritectic temperatures results in better homogeneity and crystalline perfection identified by energy dispersive spectrometer and x-ray diffraction. The interstitial Fe value $δ$ remains small within 0.5 $\leq$ $x$ $\leq$ 0.85 where the superconducting temperature $T_C$ is not sensitive to the Te content with sharp superconducting transition widths $Δ$$T_C$ < 1 K and a maximum of $T_C$ = 14.3 K at $x$ = 0.61. The value $δ$ starts to increase quickly accompanied by a deviation of linear behavior of crystal lattice parameters as well as the broadening of $Δ$$T_C$ = 2.1 K at $x$ = 0.91, then suddenly rises up to $δ$ > 0.1 followed by the disappearance of superconductivity and emergence of antiferromagnetic order at x $\geq$ 0.96. We also observed a metallic to semiconducting transition in the normal state resistivity of Fe$_{1+δ}$Se$_{1-x}$Te$_x$ with increasing Te content which is related to a localized electronic state induced by the interstitial Fe. The interstitial Fe value $δ$ might be a key physical parameter to understand various properties of Fe$_{1+δ}$Se$_{1-x}$Te$_x$ system.
△ Less
Submitted 18 August, 2024;
originally announced August 2024.
-
Evaluating Implicit Bias in Large Language Models by Attacking From a Psychometric Perspective
Authors:
Yuchen Wen,
Keping Bi,
Wei Chen,
Jiafeng Guo,
Xueqi Cheng
Abstract:
As Large Language Models (LLMs) become an important way of information seeking, there have been increasing concerns about the unethical content LLMs may generate. In this paper, we conduct a rigorous evaluation of LLMs' implicit bias towards certain groups by attacking them with carefully crafted instructions to elicit biased responses. Our attack methodology is inspired by psychometric principles…
▽ More
As Large Language Models (LLMs) become an important way of information seeking, there have been increasing concerns about the unethical content LLMs may generate. In this paper, we conduct a rigorous evaluation of LLMs' implicit bias towards certain groups by attacking them with carefully crafted instructions to elicit biased responses. Our attack methodology is inspired by psychometric principles in cognitive and social psychology. We propose three attack approaches, i.e., Disguise, Deception, and Teaching, based on which we built evaluation datasets for four common bias types. Each prompt attack has bilingual versions. Extensive evaluation of representative LLMs shows that 1) all three attack methods work effectively, especially the Deception attacks; 2) GLM-3 performs the best in defending our attacks, compared to GPT-3.5 and GPT-4; 3) LLMs could output content of other bias types when being taught with one type of bias. Our methodology provides a rigorous and effective way of evaluating LLMs' implicit bias and will benefit the assessments of LLMs' potential ethical risks.
△ Less
Submitted 20 June, 2024;
originally announced June 2024.
-
Iterative Utility Judgment Framework via LLMs Inspired by Relevance in Philosophy
Authors:
Hengran Zhang,
Keping Bi,
Jiafeng Guo,
Xueqi Cheng
Abstract:
Utility and topical relevance are critical measures in information retrieval (IR), reflecting system and user perspectives, respectively. While topical relevance has long been emphasized, utility is a higher standard of relevance and is more useful for facilitating downstream tasks, e.g., in Retrieval-Augmented Generation (RAG). When we incorporate utility judgments into RAG, we realize that the t…
▽ More
Utility and topical relevance are critical measures in information retrieval (IR), reflecting system and user perspectives, respectively. While topical relevance has long been emphasized, utility is a higher standard of relevance and is more useful for facilitating downstream tasks, e.g., in Retrieval-Augmented Generation (RAG). When we incorporate utility judgments into RAG, we realize that the topical relevance, utility, and answering in RAG are closely related to the three types of relevance that Schutz discussed from a philosophical perspective. They are topical relevance, interpretational relevance, and motivational relevance, respectively. Inspired by the dynamic iterations of the three types of relevance, we propose an Iterative utiliTy judgmEnt fraMework (ITEM) to promote each step of the cycle of RAG. We conducted extensive experiments on multi-grade passage retrieval and factoid question-answering datasets (i.e., TREC DL, WebAP, and NQ). Experimental results demonstrate significant improvements in utility judgments, ranking of topical relevance, and answer generation upon representative baselines, including multiple single-shot utility judging approaches. Our code and benchmark can be found at https://anonymous.4open.science/r/ITEM-B486/.
△ Less
Submitted 17 June, 2024;
originally announced June 2024.
-
MORE: Multi-mOdal REtrieval Augmented Generative Commonsense Reasoning
Authors:
Wanqing Cui,
Keping Bi,
Jiafeng Guo,
Xueqi Cheng
Abstract:
Since commonsense information has been recorded significantly less frequently than its existence, language models pre-trained by text generation have difficulty to learn sufficient commonsense knowledge. Several studies have leveraged text retrieval to augment the models' commonsense ability. Unlike text, images capture commonsense information inherently but little effort has been paid to effectiv…
▽ More
Since commonsense information has been recorded significantly less frequently than its existence, language models pre-trained by text generation have difficulty to learn sufficient commonsense knowledge. Several studies have leveraged text retrieval to augment the models' commonsense ability. Unlike text, images capture commonsense information inherently but little effort has been paid to effectively utilize them. In this work, we propose a novel Multi-mOdal REtrieval (MORE) augmentation framework, to leverage both text and images to enhance the commonsense ability of language models. Extensive experiments on the Common-Gen task have demonstrated the efficacy of MORE based on the pre-trained models of both single and multiple modalities.
△ Less
Submitted 13 June, 2024; v1 submitted 21 February, 2024;
originally announced February 2024.
-
When Do LLMs Need Retrieval Augmentation? Mitigating LLMs' Overconfidence Helps Retrieval Augmentation
Authors:
Shiyu Ni,
Keping Bi,
Jiafeng Guo,
Xueqi Cheng
Abstract:
Large Language Models (LLMs) have been found to have difficulty knowing they do not possess certain knowledge and tend to provide specious answers in such cases. Retrieval Augmentation (RA) has been extensively studied to mitigate LLMs' hallucinations. However, due to the extra overhead and unassured quality of retrieval, it may not be optimal to conduct RA all the time. A straightforward idea is…
▽ More
Large Language Models (LLMs) have been found to have difficulty knowing they do not possess certain knowledge and tend to provide specious answers in such cases. Retrieval Augmentation (RA) has been extensively studied to mitigate LLMs' hallucinations. However, due to the extra overhead and unassured quality of retrieval, it may not be optimal to conduct RA all the time. A straightforward idea is to only conduct retrieval when LLMs are uncertain about a question. This motivates us to enhance the LLMs' ability to perceive their knowledge boundaries to help RA. In this paper, we first quantitatively measure LLMs' such ability and confirm their overconfidence. Then, we study how LLMs' certainty about a question correlates with their dependence on external retrieved information. We propose several methods to enhance LLMs' perception of knowledge boundaries and show that they are effective in reducing overconfidence. Additionally, equipped with these methods, LLMs can achieve comparable or even better performance of RA with much fewer retrieval calls.
△ Less
Submitted 11 June, 2024; v1 submitted 17 February, 2024;
originally announced February 2024.
-
Reproducibility Analysis and Enhancements for Multi-Aspect Dense Retriever with Aspect Learning
Authors:
Keping Bi,
Xiaojie Sun,
Jiafeng Guo,
Xueqi Cheng
Abstract:
Multi-aspect dense retrieval aims to incorporate aspect information (e.g., brand and category) into dual encoders to facilitate relevance matching. As an early and representative multi-aspect dense retriever, MADRAL learns several extra aspect embeddings and fuses the explicit aspects with an implicit aspect "OTHER" for final representation. MADRAL was evaluated on proprietary data and its code wa…
▽ More
Multi-aspect dense retrieval aims to incorporate aspect information (e.g., brand and category) into dual encoders to facilitate relevance matching. As an early and representative multi-aspect dense retriever, MADRAL learns several extra aspect embeddings and fuses the explicit aspects with an implicit aspect "OTHER" for final representation. MADRAL was evaluated on proprietary data and its code was not released, making it challenging to validate its effectiveness on other datasets. We failed to reproduce its effectiveness on the public MA-Amazon data, motivating us to probe the reasons and re-examine its components. We propose several component alternatives for comparisons, including replacing "OTHER" with "CLS" and representing aspects with the first several content tokens. Through extensive experiments, we confirm that learning "OTHER" from scratch in aspect fusion is harmful. In contrast, our proposed variants can greatly enhance the retrieval performance. Our research not only sheds light on the limitations of MADRAL but also provides valuable insights for future studies on more powerful multi-aspect dense retrieval models. Code will be released at: https://github.com/sunxiaojie99/Reproducibility-for-MADRAL.
△ Less
Submitted 16 January, 2024; v1 submitted 7 January, 2024;
originally announced January 2024.
-
A Multi-Granularity-Aware Aspect Learning Model for Multi-Aspect Dense Retrieval
Authors:
Xiaojie Sun,
Keping Bi,
Jiafeng Guo,
Sihui Yang,
Qishen Zhang,
Zhongyi Liu,
Guannan Zhang,
Xueqi Cheng
Abstract:
Dense retrieval methods have been mostly focused on unstructured text and less attention has been drawn to structured data with various aspects, e.g., products with aspects such as category and brand. Recent work has proposed two approaches to incorporate the aspect information into item representations for effective retrieval by predicting the values associated with the item aspects. Despite thei…
▽ More
Dense retrieval methods have been mostly focused on unstructured text and less attention has been drawn to structured data with various aspects, e.g., products with aspects such as category and brand. Recent work has proposed two approaches to incorporate the aspect information into item representations for effective retrieval by predicting the values associated with the item aspects. Despite their efficacy, they treat the values as isolated classes (e.g., "Smart Homes", "Home, Garden & Tools", and "Beauty & Health") and ignore their fine-grained semantic relation. Furthermore, they either enforce the learning of aspects into the CLS token, which could confuse it from its designated use for representing the entire content semantics, or learn extra aspect embeddings only with the value prediction objective, which could be insufficient especially when there are no annotated values for an item aspect. Aware of these limitations, we propose a MUlti-granulaRity-aware Aspect Learning model (MURAL) for multi-aspect dense retrieval. It leverages aspect information across various granularities to capture both coarse and fine-grained semantic relations between values. Moreover, MURAL incorporates separate aspect embeddings as input to transformer encoders so that the masked language model objective can assist implicit aspect learning even without aspect-value annotations. Extensive experiments on two real-world datasets of products and mini-programs show that MURAL outperforms state-of-the-art baselines significantly.
△ Less
Submitted 16 January, 2024; v1 submitted 5 December, 2023;
originally announced December 2023.
-
HKTGNN: Hierarchical Knowledge Transferable Graph Neural Network-based Supply Chain Risk Assessment
Authors:
Zhanting Zhou,
Kejun Bi,
Yuyanzhen Zhong,
Chao Tang,
Dongfen Li,
Shi Ying,
Ruijin Wang
Abstract:
The strength of a supply chain is an important measure of a country's or region's technical advancement and overall competitiveness. Establishing supply chain risk assessment models for effective management and mitigation of potential risks has become increasingly crucial. As the number of businesses grows, the important relationships become more complicated and difficult to measure. This emphasiz…
▽ More
The strength of a supply chain is an important measure of a country's or region's technical advancement and overall competitiveness. Establishing supply chain risk assessment models for effective management and mitigation of potential risks has become increasingly crucial. As the number of businesses grows, the important relationships become more complicated and difficult to measure. This emphasizes the need of extracting relevant information from graph data. Previously, academics mostly employed knowledge inference to increase the visibility of links between nodes in the supply chain. However, they have not solved the data hunger problem of single node feature characteristics. We propose a hierarchical knowledge transferable graph neural network-based (HKTGNN) supply chain risk assessment model to address these issues. Our approach is based on current graph embedding methods for assessing corporate investment risk assessment. We embed the supply chain network corresponding to individual goods in the supply chain using the graph embedding module, resulting in a directed homogeneous graph with just product nodes. This reduces the complicated supply chain network into a basic product network. It addresses difficulties using the domain difference knowledge transferable module based on centrality, which is presented by the premise that supply chain feature characteristics may be biased in the actual world. Meanwhile, the feature complement and message passing will alleviate the data hunger problem, which is driven by domain differences. Our model outperforms in experiments on a real-world supply chain dataset. We will give an equation to prove that our comparative experiment is both effective and fair.
△ Less
Submitted 6 November, 2023;
originally announced November 2023.
-
CAME: Competitively Learning a Mixture-of-Experts Model for First-stage Retrieval
Authors:
Yinqiong Cai,
Yixing Fan,
Keping Bi,
Jiafeng Guo,
Wei Chen,
Ruqing Zhang,
Xueqi Cheng
Abstract:
The first-stage retrieval aims to retrieve a subset of candidate documents from a huge collection both effectively and efficiently. Since various matching patterns can exist between queries and relevant documents, previous work tries to combine multiple retrieval models to find as many relevant results as possible. The constructed ensembles, whether learned independently or jointly, do not care wh…
▽ More
The first-stage retrieval aims to retrieve a subset of candidate documents from a huge collection both effectively and efficiently. Since various matching patterns can exist between queries and relevant documents, previous work tries to combine multiple retrieval models to find as many relevant results as possible. The constructed ensembles, whether learned independently or jointly, do not care which component model is more suitable to an instance during training. Thus, they cannot fully exploit the capabilities of different types of retrieval models in identifying diverse relevance patterns. Motivated by this observation, in this paper, we propose a Mixture-of-Experts (MoE) model consisting of representative matching experts and a novel competitive learning mechanism to let the experts develop and enhance their expertise during training. Specifically, our MoE model shares the bottom layers to learn common semantic representations and uses differently structured upper layers to represent various types of retrieval experts. Our competitive learning mechanism has two stages: (1) a standardized learning stage to train the experts equally to develop their capabilities to conduct relevance matching; (2) a specialized learning stage where the experts compete with each other on every training instance and get rewards and updates according to their performance to enhance their expertise on certain types of samples. Experimental results on three retrieval benchmark datasets show that our method significantly outperforms the state-of-the-art baselines.
△ Less
Submitted 5 November, 2023;
originally announced November 2023.
-
An Implementation of Multimodal Fusion System for Intelligent Digital Human Generation
Authors:
Yingjie Zhou,
Yaodong Chen,
Kaiyue Bi,
Lian Xiong,
Hui Liu
Abstract:
With the rapid development of artificial intelligence (AI), digital humans have attracted more and more attention and are expected to achieve a wide range of applications in several industries. Then, most of the existing digital humans still rely on manual modeling by designers, which is a cumbersome process and has a long development cycle. Therefore, facing the rise of digital humans, there is a…
▽ More
With the rapid development of artificial intelligence (AI), digital humans have attracted more and more attention and are expected to achieve a wide range of applications in several industries. Then, most of the existing digital humans still rely on manual modeling by designers, which is a cumbersome process and has a long development cycle. Therefore, facing the rise of digital humans, there is an urgent need for a digital human generation system combined with AI to improve development efficiency. In this paper, an implementation scheme of an intelligent digital human generation system with multimodal fusion is proposed. Specifically, text, speech and image are taken as inputs, and interactive speech is synthesized using large language model (LLM), voiceprint extraction, and text-to-speech conversion techniques. Then the input image is age-transformed and a suitable image is selected as the driving image. Then, the modification and generation of digital human video content is realized by digital human driving, novel view synthesis, and intelligent dressing techniques. Finally, we enhance the user experience through style transfer, super-resolution, and quality evaluation. Experimental results show that the system can effectively realize digital human generation. The related code is released at https://github.com/zyj-2000/CUMT_2D_PhotoSpeaker.
△ Less
Submitted 31 October, 2023;
originally announced October 2023.
-
CIR at the NTCIR-17 ULTRE-2 Task
Authors:
Lulu Yu,
Keping Bi,
Jiafeng Guo,
Xueqi Cheng
Abstract:
The Chinese academy of sciences Information Retrieval team (CIR) has participated in the NTCIR-17 ULTRE-2 task. This paper describes our approaches and reports our results on the ULTRE-2 task. We recognize the issue of false negatives in the Baidu search data in this competition is very severe, much more severe than position bias. Hence, we adopt the Dual Learning Algorithm (DLA) to address the po…
▽ More
The Chinese academy of sciences Information Retrieval team (CIR) has participated in the NTCIR-17 ULTRE-2 task. This paper describes our approaches and reports our results on the ULTRE-2 task. We recognize the issue of false negatives in the Baidu search data in this competition is very severe, much more severe than position bias. Hence, we adopt the Dual Learning Algorithm (DLA) to address the position bias and use it as an auxiliary model to study how to alleviate the false negative issue. We approach the problem from two perspectives: 1) correcting the labels for non-clicked items by a relevance judgment model trained from DLA, and learn a new ranker that is initialized from DLA; 2) including random documents as true negatives and documents that have partial matching as hard negatives. Both methods can enhance the model performance and our best method has achieved nDCG@10 of 0.5355, which is 2.66% better than the best score from the organizer.
△ Less
Submitted 18 October, 2023;
originally announced October 2023.
-
A Comparative Study of Training Objectives for Clarification Facet Generation
Authors:
Shiyu Ni,
Keping Bi,
Jiafeng Guo,
Xueqi Cheng
Abstract:
Due to the ambiguity and vagueness of a user query, it is essential to identify the query facets for the clarification of user intents. Existing work on query facet generation has achieved compelling performance by sequentially predicting the next facet given previously generated facets based on pre-trained language generation models such as BART. Given a query, there are mainly two types of train…
▽ More
Due to the ambiguity and vagueness of a user query, it is essential to identify the query facets for the clarification of user intents. Existing work on query facet generation has achieved compelling performance by sequentially predicting the next facet given previously generated facets based on pre-trained language generation models such as BART. Given a query, there are mainly two types of training objectives to guide the facet generation models. One is to generate the default sequence of ground-truth facets, and the other is to enumerate all the permutations of ground-truth facets and use the sequence that has the minimum loss for model updates. The second is permutation-invariant while the first is not. In this paper, we aim to conduct a systematic comparative study of various types of training objectives, with different properties of not only whether it is permutation-invariant but also whether it conducts sequential prediction and whether it can control the count of output facets. To this end, we propose another three training objectives of different aforementioned properties. For comprehensive comparisons, besides the commonly used evaluation that measures the matching with ground-truth facets, we also introduce two diversity metrics to measure the diversity of the generated facets. Based on an open-domain query facet dataset, i.e., MIMICS, we conduct extensive analyses and show the pros and cons of each method, which could shed light on model training for clarification facet generation. The code can be found at \url{https://github.com/ShiyuNee/Facet-Generation}
△ Less
Submitted 1 October, 2023;
originally announced October 2023.
-
L^2R: Lifelong Learning for First-stage Retrieval with Backward-Compatible Representations
Authors:
Yinqiong Cai,
Keping Bi,
Yixing Fan,
Jiafeng Guo,
Wei Chen,
Xueqi Cheng
Abstract:
First-stage retrieval is a critical task that aims to retrieve relevant document candidates from a large-scale collection. While existing retrieval models have achieved impressive performance, they are mostly studied on static data sets, ignoring that in the real-world, the data on the Web is continuously growing with potential distribution drift. Consequently, retrievers trained on static old dat…
▽ More
First-stage retrieval is a critical task that aims to retrieve relevant document candidates from a large-scale collection. While existing retrieval models have achieved impressive performance, they are mostly studied on static data sets, ignoring that in the real-world, the data on the Web is continuously growing with potential distribution drift. Consequently, retrievers trained on static old data may not suit new-coming data well and inevitably produce sub-optimal results. In this work, we study lifelong learning for first-stage retrieval, especially focusing on the setting where the emerging documents are unlabeled since relevance annotation is expensive and may not keep up with data emergence. Under this setting, we aim to develop model updating with two goals: (1) to effectively adapt to the evolving distribution with the unlabeled new-coming data, and (2) to avoid re-inferring all embeddings of old documents to efficiently update the index each time the model is updated.
We first formalize the task and then propose a novel Lifelong Learning method for the first-stage Retrieval, namely L^2R. L^2R adopts the typical memory mechanism for lifelong learning, and incorporates two crucial components: (1) selecting diverse support negatives for model training and memory updating for effective model adaptation, and (2) a ranking alignment objective to ensure the backward-compatibility of representations to save the cost of index rebuilding without hurting the model performance. For evaluation, we construct two new benchmarks from LoTTE and Multi-CPR datasets to simulate the document distribution drift in realistic retrieval scenarios. Extensive experiments show that L^2R significantly outperforms competitive lifelong learning baselines.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
Pre-training with Aspect-Content Text Mutual Prediction for Multi-Aspect Dense Retrieval
Authors:
Xiaojie Sun,
Keping Bi,
Jiafeng Guo,
Xinyu Ma,
Fan Yixing,
Hongyu Shan,
Qishen Zhang,
Zhongyi Liu
Abstract:
Grounded on pre-trained language models (PLMs), dense retrieval has been studied extensively on plain text. In contrast, there has been little research on retrieving data with multiple aspects using dense models. In the scenarios such as product search, the aspect information plays an essential role in relevance matching, e.g., category: Electronics, Computers, and Pet Supplies. A common way of le…
▽ More
Grounded on pre-trained language models (PLMs), dense retrieval has been studied extensively on plain text. In contrast, there has been little research on retrieving data with multiple aspects using dense models. In the scenarios such as product search, the aspect information plays an essential role in relevance matching, e.g., category: Electronics, Computers, and Pet Supplies. A common way of leveraging aspect information for multi-aspect retrieval is to introduce an auxiliary classification objective, i.e., using item contents to predict the annotated value IDs of item aspects. However, by learning the value embeddings from scratch, this approach may not capture the various semantic similarities between the values sufficiently. To address this limitation, we leverage the aspect information as text strings rather than class IDs during pre-training so that their semantic similarities can be naturally captured in the PLMs. To facilitate effective retrieval with the aspect strings, we propose mutual prediction objectives between the text of the item aspect and content. In this way, our model makes more sufficient use of aspect information than conducting undifferentiated masked language modeling (MLM) on the concatenated text of aspects and content. Extensive experiments on two real-world datasets (product and mini-program search) show that our approach can outperform competitive baselines both treating aspect values as classes and conducting the same MLM for aspect and content strings. Code and related dataset will be available at the URL \footnote{https://github.com/sunxiaojie99/ATTEMPT}.
△ Less
Submitted 22 August, 2023;
originally announced August 2023.
-
Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models
Authors:
Lingxi Xie,
Longhui Wei,
Xiaopeng Zhang,
Kaifeng Bi,
Xiaotao Gu,
Jianlong Chang,
Qi Tian
Abstract:
The AI community has been pursuing algorithms known as artificial general intelligence (AGI) that apply to any kind of real-world problem. Recently, chat systems powered by large language models (LLMs) emerge and rapidly become a promising direction to achieve AGI in natural language processing (NLP), but the path towards AGI in computer vision (CV) remains unclear. One may owe the dilemma to the…
▽ More
The AI community has been pursuing algorithms known as artificial general intelligence (AGI) that apply to any kind of real-world problem. Recently, chat systems powered by large language models (LLMs) emerge and rapidly become a promising direction to achieve AGI in natural language processing (NLP), but the path towards AGI in computer vision (CV) remains unclear. One may owe the dilemma to the fact that visual signals are more complex than language signals, yet we are interested in finding concrete reasons, as well as absorbing experiences from GPT and LLMs to solve the problem. In this paper, we start with a conceptual definition of AGI and briefly review how NLP solves a wide range of tasks via a chat system. The analysis inspires us that unification is the next important goal of CV. But, despite various efforts in this direction, CV is still far from a system like GPT that naturally integrates all tasks. We point out that the essential weakness of CV lies in lacking a paradigm to learn from environments, yet NLP has accomplished the task in the text world. We then imagine a pipeline that puts a CV algorithm (i.e., an agent) in world-scale, interactable environments, pre-trains it to predict future frames with respect to its action, and then fine-tunes it with instruction to accomplish various tasks. We expect substantial research and engineering efforts to push the idea forward and scale it up, for which we share our perspectives on future research directions.
△ Less
Submitted 14 June, 2023;
originally announced June 2023.
-
Pipeline MoE: A Flexible MoE Implementation with Pipeline Parallelism
Authors:
Xin Chen,
Hengheng Zhang,
Xiaotao Gu,
Kaifeng Bi,
Lingxi Xie,
Qi Tian
Abstract:
The Mixture of Experts (MoE) model becomes an important choice of large language models nowadays because of its scalability with sublinear computational complexity for training and inference. However, existing MoE models suffer from two critical drawbacks, 1) tremendous inner-node and inter-node communication overhead introduced by all-to-all dispatching and gathering, and 2) limited scalability f…
▽ More
The Mixture of Experts (MoE) model becomes an important choice of large language models nowadays because of its scalability with sublinear computational complexity for training and inference. However, existing MoE models suffer from two critical drawbacks, 1) tremendous inner-node and inter-node communication overhead introduced by all-to-all dispatching and gathering, and 2) limited scalability for the backbone because of the bound data parallel and expert parallel to scale in the expert dimension. In this paper, we systematically analyze these drawbacks in terms of training efficiency in the parallel framework view and propose a novel MoE architecture called Pipeline MoE (PPMoE) to tackle them. PPMoE builds expert parallel incorporating with tensor parallel and replaces communication-intensive all-to-all dispatching and gathering with a simple tensor index slicing and inner-node all-reduce. Besides, it is convenient for PPMoE to integrate pipeline parallel to further scale the backbone due to its flexible parallel architecture. Extensive experiments show that PPMoE not only achieves a more than $1.75\times$ speed up compared to existing MoE architectures but also reaches $90\%$ throughput of its corresponding backbone model that is $20\times$ smaller.
△ Less
Submitted 22 April, 2023;
originally announced April 2023.
-
Ensemble Ranking Model with Multiple Pretraining Strategies for Web Search
Authors:
Xiaojie Sun,
Lulu Yu,
Yiting Wang,
Keping Bi,
Jiafeng Guo
Abstract:
An effective ranking model usually requires a large amount of training data to learn the relevance between documents and queries. User clicks are often used as training data since they can indicate relevance and are cheap to collect, but they contain substantial bias and noise. There has been some work on mitigating various types of bias in simulated user clicks to train effective learning-to-rank…
▽ More
An effective ranking model usually requires a large amount of training data to learn the relevance between documents and queries. User clicks are often used as training data since they can indicate relevance and are cheap to collect, but they contain substantial bias and noise. There has been some work on mitigating various types of bias in simulated user clicks to train effective learning-to-rank models based on multiple features. However, how to effectively use such methods on large-scale pre-trained models with real-world click data is unknown. To alleviate the data bias in the real world, we incorporate heuristic-based features, refine the ranking objective, add random negatives, and calibrate the propensity calculation in the pre-training stage. Then we fine-tune several pre-trained models and train an ensemble model to aggregate all the predictions from various pre-trained models with human-annotation data in the fine-tuning stage. Our approaches won 3rd place in the "Pre-training for Web Search" task in WSDM Cup 2023 and are 22.6% better than the 4th-ranked team.
△ Less
Submitted 18 February, 2023;
originally announced February 2023.
-
Feature-Enhanced Network with Hybrid Debiasing Strategies for Unbiased Learning to Rank
Authors:
Lulu Yu,
Yiting Wang,
Xiaojie Sun,
Keping Bi,
Jiafeng Guo
Abstract:
Unbiased learning to rank (ULTR) aims to mitigate various biases existing in user clicks, such as position bias, trust bias, presentation bias, and learn an effective ranker. In this paper, we introduce our winning approach for the "Unbiased Learning to Rank" task in WSDM Cup 2023. We find that the provided data is severely biased so neural models trained directly with the top 10 results with clic…
▽ More
Unbiased learning to rank (ULTR) aims to mitigate various biases existing in user clicks, such as position bias, trust bias, presentation bias, and learn an effective ranker. In this paper, we introduce our winning approach for the "Unbiased Learning to Rank" task in WSDM Cup 2023. We find that the provided data is severely biased so neural models trained directly with the top 10 results with click information are unsatisfactory. So we extract multiple heuristic-based features for multi-fields of the results, adjust the click labels, add true negatives, and re-weight the samples during model training. Since the propensities learned by existing ULTR methods are not decreasing w.r.t. positions, we also calibrate the propensities according to the click ratios and ensemble the models trained in two different ways. Our method won the 3rd prize with a DCG@10 score of 9.80, which is 1.1% worse than the 2nd and 25.3% higher than the 4th.
△ Less
Submitted 15 February, 2023;
originally announced February 2023.
-
Pangu-Weather: A 3D High-Resolution Model for Fast and Accurate Global Weather Forecast
Authors:
Kaifeng Bi,
Lingxi Xie,
Hengheng Zhang,
Xin Chen,
Xiaotao Gu,
Qi Tian
Abstract:
In this paper, we present Pangu-Weather, a deep learning based system for fast and accurate global weather forecast. For this purpose, we establish a data-driven environment by downloading $43$ years of hourly global weather data from the 5th generation of ECMWF reanalysis (ERA5) data and train a few deep neural networks with about $256$ million parameters in total. The spatial resolution of forec…
▽ More
In this paper, we present Pangu-Weather, a deep learning based system for fast and accurate global weather forecast. For this purpose, we establish a data-driven environment by downloading $43$ years of hourly global weather data from the 5th generation of ECMWF reanalysis (ERA5) data and train a few deep neural networks with about $256$ million parameters in total. The spatial resolution of forecast is $0.25^\circ\times0.25^\circ$, comparable to the ECMWF Integrated Forecast Systems (IFS). More importantly, for the first time, an AI-based method outperforms state-of-the-art numerical weather prediction (NWP) methods in terms of accuracy (latitude-weighted RMSE and ACC) of all factors (e.g., geopotential, specific humidity, wind speed, temperature, etc.) and in all time ranges (from one hour to one week). There are two key strategies to improve the prediction accuracy: (i) designing a 3D Earth Specific Transformer (3DEST) architecture that formulates the height (pressure level) information into cubic data, and (ii) applying a hierarchical temporal aggregation algorithm to alleviate cumulative forecast errors. In deterministic forecast, Pangu-Weather shows great advantages for short to medium-range forecast (i.e., forecast time ranges from one hour to one week). Pangu-Weather supports a wide range of downstream forecast scenarios, including extreme weather forecast (e.g., tropical cyclone tracking) and large-member ensemble forecast in real-time. Pangu-Weather not only ends the debate on whether AI-based methods can surpass conventional NWP methods, but also reveals novel directions for improving deep learning weather forecast systems.
△ Less
Submitted 3 November, 2022;
originally announced November 2022.
-
Quantifying the Dzyaloshinskii-Moriya Interaction Induced by the Bulk Magnetic Asymmetry
Authors:
Qihan Zhang,
Jinghua Liang,
Kaiqi Bi,
Le Zhao,
He Bai,
Qirui Cui,
Heng-An Zhou,
Hao Bai,
Hongmei Feng,
Wenjie Song,
Guozhi Chai,
O. Gladii,
H. Schultheiss,
Tao Zhu,
Junwei Zhang,
Yong Peng,
Hongxin Yang,
Wanjun Jiang
Abstract:
A broken interfacial inversion symmetry in ultrathin ferromagnet/heavy metal (FM/HM) bilayers is generally believed to be a prerequisite for accommodating the Dzyaloshinskii-Moriya interaction (DMI) and for stabilizing chiral spin textures. In these bilayers, the strength of the DMI decays as the thickness of the FM layer increases and vanishes around a few nanometers. In the present study, throug…
▽ More
A broken interfacial inversion symmetry in ultrathin ferromagnet/heavy metal (FM/HM) bilayers is generally believed to be a prerequisite for accommodating the Dzyaloshinskii-Moriya interaction (DMI) and for stabilizing chiral spin textures. In these bilayers, the strength of the DMI decays as the thickness of the FM layer increases and vanishes around a few nanometers. In the present study, through synthesizing relatively thick films of compositions CoPt or FePt, CoCu or FeCu, FeGd and FeNi, contributions to DMI from the composition gradient induced bulk magnetic asymmetry (BMA) and spin-orbit coupling (SOC) are systematically examined. Using Brillouin light scattering spectroscopy, both the sign and amplitude of DMI in films with controllable direction and strength of BMA, in the presence and absence of SOC are experimentally studied. In particular, we show that a sizable amplitude of DMI (0.15 mJ/m^2) can be realized in CoPt or FePt films with BMA and strong SOC, whereas negligible DMI strengths are observed in other thick films with BMA but without significant SOC. The pivotal roles of BMA and SOC are further examined based on the three-site Fert-Levy model and first-principles calculations. It is expected that our findings may help to further understand the origin of chiral magnetism and to design novel non-collinear spin textures.
△ Less
Submitted 10 March, 2022;
originally announced March 2022.
-
An Improved Mathematical Model of Sepsis: Modeling, Bifurcation Analysis, and Optimal Control Study for Complex Nonlinear Infectious Disease System
Authors:
Yuyang Chen,
Kaiming Bi,
Chih-Hang J. Wu,
David Ben-Arieh,
Ashesh Sinha
Abstract:
Sepsis is a life-threatening medical emergency, which is a major cause of death worldwide and the second highest cause of mortality in the United States. Researching the optimal control treatment or intervention strategy on the comprehensive sepsis system is key in reducing mortality. For this purpose, first, this paper improves a complex nonlinear sepsis model proposed in our previous work. Then,…
▽ More
Sepsis is a life-threatening medical emergency, which is a major cause of death worldwide and the second highest cause of mortality in the United States. Researching the optimal control treatment or intervention strategy on the comprehensive sepsis system is key in reducing mortality. For this purpose, first, this paper improves a complex nonlinear sepsis model proposed in our previous work. Then, bifurcation analyses are conducted for each sepsis subsystem to study the model behaviors under some system parameters. The bifurcation analysis results also further indicate the necessity of control treatment and intervention therapy. If the sepsis system is without adding any control under some parameter and initial system value settings, the system will perform persistent inflammation outcomes as time goes by. Therefore, we develop our complex improved nonlinear sepsis model into a sepsis optimal control model, and then use some effective biomarkers recommended in existing clinic practices as optimization objective function to measure the development of sepsis. Besides that, a Bayesian optimization algorithm by combining Recurrent neural network (RNN-BO algorithm) is introduced to predict the optimal control strategy for the studied sepsis optimal control system. The difference between the RNN-BO algorithm from other optimization algorithms is that once given any new initial system value setting (initial value is associated with the initial conditions of patients), the RNN-BO algorithm is capable of quickly predicting a corresponding time-series optimal control based on the historical optimal control data for any new sepsis patient. To demonstrate the effectiveness and efficiency of the RNN-BO algorithm on solving the optimal control solution on the complex nonlinear sepsis system, some numerical simulations are implemented by comparing with other optimization algorithms in this paper.
△ Less
Submitted 7 January, 2022;
originally announced January 2022.
-
High-dimensional Bayesian Optimization Algorithm with Recurrent Neural Network for Disease Control Models in Time Series
Authors:
Yuyang Chen,
Kaiming Bi,
Chih-Hang J. Wu,
David Ben-Arieh,
Ashesh Sinha
Abstract:
Bayesian Optimization algorithm has become a promising approach for nonlinear global optimization problems and many machine learning applications. Over the past few years, improvements and enhancements have been brought forward and they have shown some promising results in solving the complex dynamic problems, systems of ordinary differential equations where the objective functions are computation…
▽ More
Bayesian Optimization algorithm has become a promising approach for nonlinear global optimization problems and many machine learning applications. Over the past few years, improvements and enhancements have been brought forward and they have shown some promising results in solving the complex dynamic problems, systems of ordinary differential equations where the objective functions are computationally expensive to evaluate. Besides, the straightforward implementation of the Bayesian Optimization algorithm performs well merely for optimization problems with 10-20 dimensions. The study presented in this paper proposes a new high dimensional Bayesian Optimization algorithm combining Recurrent neural networks, which is expected to predict the optimal solution for the global optimization problems with high dimensional or time series decision models. The proposed RNN-BO algorithm can solve the optimal control problems in the lower dimension space and then learn from the historical data using the recurrent neural network to learn the historical optimal solution data and predict the optimal control strategy for any new initial system value setting. In addition, accurately and quickly providing the optimal control strategy is essential to effectively and efficiently control the epidemic spread while minimizing the associated financial costs. Therefore, to verify the effectiveness of the proposed algorithm, computational experiments are carried out on a deterministic SEIR epidemic model and a stochastic SIS optimal control model. Finally, we also discuss the impacts of different numbers of the RNN layers and training epochs on the trade-off between solution quality and related computational efforts.
△ Less
Submitted 1 January, 2022;
originally announced January 2022.
-
Ultra-wideband electrostrictive mechanical antenna
Authors:
Jianchun Xu,
Zhao Li,
Xuchao Pan,
Xi Wen,
Jinqing Cao,
Wen Gong,
Shaolong Yang,
Ming Lei,
Fangzhou Yao,
Ke Bi
Abstract:
Conventional mechanical antennas provide a strategy in long-wave communication with a surprisingly compact size below 1/1,000 of the wavelength. However, the narrow bandwidth and weak field intensity seriously hamper its practical applications. Here, we present a mechanical antenna based on the electrostrictive effect of PMN-PT-based relaxor ferroelectric ceramic to improve radiation capacity and…
▽ More
Conventional mechanical antennas provide a strategy in long-wave communication with a surprisingly compact size below 1/1,000 of the wavelength. However, the narrow bandwidth and weak field intensity seriously hamper its practical applications. Here, we present a mechanical antenna based on the electrostrictive effect of PMN-PT-based relaxor ferroelectric ceramic to improve radiation capacity and achieve ultra-wideband characteristics (10 kHz - 1 MHz, the relative bandwidth is beyond 196%). Determined by the different underlying mechanism, the mechanical antenna based on the electrostrictive effect exhibits excellent communication properties from traditional mechanical antennas. The functions of signal coding, transmitting, receiving, and decoding were experimentally demonstrated. This approach offers a promising way of constructing mechanical antennas for long-wave communication.
△ Less
Submitted 30 December, 2021;
originally announced December 2021.
-
High-Sensitivity Electric Potential Sensors for Non-Contact Monitoring of Physiological Signals
Authors:
Xinyao Tang,
Wangbo Chen,
Soumyajit Mandal,
Kevin Bi,
Tayfun Ozdemir
Abstract:
The paper describes highly-sensitive passive electric potential sensors (EPS) for non-contact detection of multiple biophysical signals, including electrocardiogram (ECG), respiration cycle (RC), and electroencephalogram (EEG). The proposed EPS uses an optimized transimpedance amplifier (TIA), a single guarded sensing electrode, and an adaptive cancellation loop (ACL) to maximize sensitivity (DC t…
▽ More
The paper describes highly-sensitive passive electric potential sensors (EPS) for non-contact detection of multiple biophysical signals, including electrocardiogram (ECG), respiration cycle (RC), and electroencephalogram (EEG). The proposed EPS uses an optimized transimpedance amplifier (TIA), a single guarded sensing electrode, and an adaptive cancellation loop (ACL) to maximize sensitivity (DC transimpedance $=150$~G$Ω$) in the presence of power line interference (PLI) and motion artifacts. Tests were performed on healthy adult volunteers in noisy and unshielded indoor environments. Useful sensing ranges for ECG, RC, and EEG measurements, as validated against reference contact sensors, were observed to be approximately 50~cm, 100~cm, and 5~cm, respectively. ECG and RC signals were also successfully measured through wooden tables for subjects in sleep-like postures. The EPS were integrated with a wireless microcontroller to realize wireless sensor nodes capable of streaming acquired data to a remote base station in real-time.
△ Less
Submitted 23 October, 2021;
originally announced October 2021.
-
High dimensional Bayesian Optimization Algorithm for Complex System in Time Series
Authors:
Yuyang Chen,
Kaiming Bi,
Chih-Hang J. Wu,
David Ben-Arieh,
Ashesh Sinha
Abstract:
At present, high-dimensional global optimization problems with time-series models have received much attention from engineering fields. Since it was proposed, Bayesian optimization has quickly become a popular and promising approach for solving global optimization problems. However, the standard Bayesian optimization algorithm is insufficient to solving the global optimal solution when the model i…
▽ More
At present, high-dimensional global optimization problems with time-series models have received much attention from engineering fields. Since it was proposed, Bayesian optimization has quickly become a popular and promising approach for solving global optimization problems. However, the standard Bayesian optimization algorithm is insufficient to solving the global optimal solution when the model is high-dimensional. Hence, this paper presents a novel high dimensional Bayesian optimization algorithm by considering dimension reduction and different dimension fill-in strategies. Most existing literature about Bayesian optimization algorithms did not discuss the sampling strategies to optimize the acquisition function. This study proposed a new sampling method based on both the multi-armed bandit and random search methods while optimizing the acquisition function. Besides, based on the time-dependent or dimension-dependent characteristics of the model, the proposed algorithm can reduce the dimension evenly. Then, five different dimension fill-in strategies were discussed and compared in this study. Finally, to increase the final accuracy of the optimal solution, the proposed algorithm adds a local search based on a series of Adam-based steps at the final stage. Our computational experiments demonstrated that the proposed Bayesian optimization algorithm could achieve reasonable solutions with excellent performances for high dimensional global optimization problems with a time-series optimal control model.
△ Less
Submitted 4 August, 2021;
originally announced August 2021.
-
A New Bayesian Optimization Algorithm for Complex High-Dimensional Disease Epidemic Systems
Authors:
Yuyang Chen,
Kaiming Bi,
Chih-Hang J. Wu,
David Ben-Arieh,
Ashesh Sinha
Abstract:
This paper presents an Improved Bayesian Optimization (IBO) algorithm to solve complex high-dimensional epidemic models' optimal control solution. Evaluating the total objective function value for disease control models with hundreds of thousands of control time periods is a high computational cost. In this paper, we improve the conventional Bayesian Optimization (BO) approach from two parts. The…
▽ More
This paper presents an Improved Bayesian Optimization (IBO) algorithm to solve complex high-dimensional epidemic models' optimal control solution. Evaluating the total objective function value for disease control models with hundreds of thousands of control time periods is a high computational cost. In this paper, we improve the conventional Bayesian Optimization (BO) approach from two parts. The existing BO methods optimize the minimizer step for once time during each acquisition function update process. To find a better solution for each acquisition function update, we do more local minimization steps to tune the algorithm. When the model is high dimensions, and the objective function is complicated, only some update iterations of the acquisition function may not find the global optimal solution. The IBO algorithm adds a series of Adam-based steps at the final stage of the algorithm to increase the solution's accuracy. Comparative simulation experiments using different kernel functions and acquisition functions have shown that the Improved Bayesian Optimization algorithm is effective and suitable for handing large-scale and complex epidemic models under study. The IBO algorithm is then compared with four other global optimization algorithms on three well-known synthetic test functions. The effectiveness and robustness of the IBO algorithm are also demonstrated through some simulation experiments to compare with the Particle Swarm Optimization algorithm and Random Search algorithm. With its reliable convergence behaviors and straightforward implementation, the IBO algorithm has a great potential to solve other complex optimal control problems with high dimensionality.
△ Less
Submitted 30 July, 2021;
originally announced August 2021.
-
Asking Clarifying Questions Based on Negative Feedback in Conversational Search
Authors:
Keping Bi,
Qingyao Ai,
W. Bruce Croft
Abstract:
Users often need to look through multiple search result pages or reformulate queries when they have complex information-seeking needs. Conversational search systems make it possible to improve user satisfaction by asking questions to clarify users' search intents. This, however, can take significant effort to answer a series of questions starting with "what/why/how". To quickly identify user inten…
▽ More
Users often need to look through multiple search result pages or reformulate queries when they have complex information-seeking needs. Conversational search systems make it possible to improve user satisfaction by asking questions to clarify users' search intents. This, however, can take significant effort to answer a series of questions starting with "what/why/how". To quickly identify user intent and reduce effort during interactions, we propose an intent clarification task based on yes/no questions where the system needs to ask the correct question about intents within the fewest conversation turns. In this task, it is essential to use negative feedback about the previous questions in the conversation history. To this end, we propose a Maximum-Marginal-Relevance (MMR) based BERT model (MMR-BERT) to leverage negative feedback based on the MMR principle for the next clarifying question selection. Experiments on the Qulac dataset show that MMR-BERT outperforms state-of-the-art baselines significantly on the intent identification task and the selected questions also achieve significantly better performance in the associated document retrieval tasks.
△ Less
Submitted 12 July, 2021;
originally announced July 2021.
-
Leveraging User Behavior History for Personalized Email Search
Authors:
Keping Bi,
Pavel Metrikov,
Chunyuan Li,
Byungki Byun
Abstract:
An effective email search engine can facilitate users' search tasks and improve their communication efficiency. Users could have varied preferences on various ranking signals of an email, such as relevance and recency based on their tasks at hand and even their jobs. Thus a uniform matching pattern is not optimal for all users. Instead, an effective email ranker should conduct personalized ranking…
▽ More
An effective email search engine can facilitate users' search tasks and improve their communication efficiency. Users could have varied preferences on various ranking signals of an email, such as relevance and recency based on their tasks at hand and even their jobs. Thus a uniform matching pattern is not optimal for all users. Instead, an effective email ranker should conduct personalized ranking by taking users' characteristics into account. Existing studies have explored user characteristics from various angles to make email search results personalized. However, little attention has been given to users' search history for characterizing users. Although users' historical behaviors have been shown to be beneficial as context in Web search, their effect in email search has not been studied and remains unknown. Given these observations, we propose to leverage user search history as query context to characterize users and build a context-aware ranking model for email search. In contrast to previous context-dependent ranking techniques that are based on raw texts, we use ranking features in the search history. This frees us from potential privacy leakage while giving a better generalization power to unseen users. Accordingly, we propose a context-dependent neural ranking model (CNRM) that encodes the ranking features in users' search history as query context and show that it can significantly outperform the baseline neural model without using the context. We also investigate the benefit of the query context vectors obtained from CNRM on the state-of-the-art learning-to-rank model LambdaMart by clustering the vectors and incorporating the cluster information. Experimental results show that significantly better results can be achieved on LambdaMart as well, indicating that the query clusters can characterize different users and effectively turn the ranking model personalized.
△ Less
Submitted 17 March, 2021; v1 submitted 14 February, 2021;
originally announced February 2021.
-
Weight-Sharing Neural Architecture Search: A Battle to Shrink the Optimization Gap
Authors:
Lingxi Xie,
Xin Chen,
Kaifeng Bi,
Longhui Wei,
Yuhui Xu,
Zhengsu Chen,
Lanfei Wang,
An Xiao,
Jianlong Chang,
Xiaopeng Zhang,
Qi Tian
Abstract:
Neural architecture search (NAS) has attracted increasing attentions in both academia and industry. In the early age, researchers mostly applied individual search methods which sample and evaluate the candidate architectures separately and thus incur heavy computational overheads. To alleviate the burden, weight-sharing methods were proposed in which exponentially many architectures share weights…
▽ More
Neural architecture search (NAS) has attracted increasing attentions in both academia and industry. In the early age, researchers mostly applied individual search methods which sample and evaluate the candidate architectures separately and thus incur heavy computational overheads. To alleviate the burden, weight-sharing methods were proposed in which exponentially many architectures share weights in the same super-network, and the costly training procedure is performed only once. These methods, though being much faster, often suffer the issue of instability. This paper provides a literature review on NAS, in particular the weight-sharing methods, and points out that the major challenge comes from the optimization gap between the super-network and the sub-architectures. From this perspective, we summarize existing approaches into several categories according to their efforts in bridging the gap, and analyze both advantages and disadvantages of these methodologies. Finally, we share our opinions on the future directions of NAS and AutoML. Due to the expertise of the authors, this paper mainly focuses on the application of NAS to computer vision problems and may bias towards the work in our group.
△ Less
Submitted 4 August, 2020; v1 submitted 4 August, 2020;
originally announced August 2020.
-
GOLD-NAS: Gradual, One-Level, Differentiable
Authors:
Kaifeng Bi,
Lingxi Xie,
Xin Chen,
Longhui Wei,
Qi Tian
Abstract:
There has been a large literature of neural architecture search, but most existing work made use of heuristic rules that largely constrained the search flexibility. In this paper, we first relax these manually designed constraints and enlarge the search space to contain more than $10^{160}$ candidates. In the new space, most existing differentiable search methods can fail dramatically. We then pro…
▽ More
There has been a large literature of neural architecture search, but most existing work made use of heuristic rules that largely constrained the search flexibility. In this paper, we first relax these manually designed constraints and enlarge the search space to contain more than $10^{160}$ candidates. In the new space, most existing differentiable search methods can fail dramatically. We then propose a novel algorithm named Gradual One-Level Differentiable Neural Architecture Search (GOLD-NAS) which introduces a variable resource constraint to one-level optimization so that the weak operators are gradually pruned out from the super-network. In standard image classification benchmarks, GOLD-NAS can find a series of Pareto-optimal architectures within a single search procedure. Most of the discovered architectures were never studied before, yet they achieve a nice tradeoff between recognition accuracy and model complexity. We believe the new space and search algorithm can advance the search of differentiable NAS.
△ Less
Submitted 7 July, 2020;
originally announced July 2020.
-
A Transformer-based Embedding Model for Personalized Product Search
Authors:
Keping Bi,
Qingyao Ai,
W. Bruce Croft
Abstract:
Product search is an important way for people to browse and purchase items on E-commerce platforms. While customers tend to make choices based on their personal tastes and preferences, analysis of commercial product search logs has shown that personalization does not always improve product search quality. Most existing product search techniques, however, conduct undifferentiated personalization ac…
▽ More
Product search is an important way for people to browse and purchase items on E-commerce platforms. While customers tend to make choices based on their personal tastes and preferences, analysis of commercial product search logs has shown that personalization does not always improve product search quality. Most existing product search techniques, however, conduct undifferentiated personalization across search sessions. They either use a fixed coefficient to control the influence of personalization or let personalization take effect all the time with an attention mechanism. The only notable exception is the recently proposed zero-attention model (ZAM) that can adaptively adjust the effect of personalization by allowing the query to attend to a zero vector. Nonetheless, in ZAM, personalization can act at most as equally important as the query and the representations of items are static across the collection regardless of the items co-occurring in the user's historical purchases. Aware of these limitations, we propose a transformer-based embedding model (TEM) for personalized product search, which could dynamically control the influence of personalization by encoding the sequence of query and user's purchase history with a transformer architecture. Personalization could have a dominant impact when necessary and interactions between items can be taken into consideration when computing attention weights. Experimental results show that TEM outperforms state-of-the-art personalization product retrieval models significantly.
△ Less
Submitted 18 May, 2020;
originally announced May 2020.
-
Artemis: A Novel Annotation Methodology for Indicative Single Document Summarization
Authors:
Rahul Jha,
Keping Bi,
Yang Li,
Mahdi Pakdaman,
Asli Celikyilmaz,
Ivan Zhiboedov,
Kieran McDonald
Abstract:
We describe Artemis (Annotation methodology for Rich, Tractable, Extractive, Multi-domain, Indicative Summarization), a novel hierarchical annotation process that produces indicative summaries for documents from multiple domains. Current summarization evaluation datasets are single-domain and focused on a few domains for which naturally occurring summaries can be easily found, such as news and sci…
▽ More
We describe Artemis (Annotation methodology for Rich, Tractable, Extractive, Multi-domain, Indicative Summarization), a novel hierarchical annotation process that produces indicative summaries for documents from multiple domains. Current summarization evaluation datasets are single-domain and focused on a few domains for which naturally occurring summaries can be easily found, such as news and scientific articles. These are not sufficient for training and evaluation of summarization models for use in document management and information retrieval systems, which need to deal with documents from multiple domains. Compared to other annotation methods such as Relative Utility and Pyramid, Artemis is more tractable because judges don't need to look at all the sentences in a document when making an importance judgment for one of the sentences, while providing similarly rich sentence importance annotations. We describe the annotation process in detail and compare it with other similar evaluation systems. We also present analysis and experimental results over a sample set of 532 annotated documents.
△ Less
Submitted 13 May, 2020; v1 submitted 5 May, 2020;
originally announced May 2020.
-
Learning a Fine-Grained Review-based Transformer Model for Personalized Product Search
Authors:
Keping Bi,
Qingyao Ai,
W. Bruce Croft
Abstract:
Product search has been a crucial entry point to serve people shopping online. Most existing personalized product models follow the paradigm of representing and matching user intents and items in the semantic space, where finer-grained matching is totally discarded and the ranking of an item cannot be explained further than just user/item level similarity. In addition, while some models in existin…
▽ More
Product search has been a crucial entry point to serve people shopping online. Most existing personalized product models follow the paradigm of representing and matching user intents and items in the semantic space, where finer-grained matching is totally discarded and the ranking of an item cannot be explained further than just user/item level similarity. In addition, while some models in existing studies have created dynamic user representations based on search context, their representations for items are static across all search sessions. This makes every piece of information about the item always equally important in representing the item during matching with various user intents. Aware of the above limitations, we propose a review-based transformer model (RTM) for personalized product search, which encodes the sequence of query, user reviews, and item reviews with a transformer architecture. RTM conducts review-level matching between the user and item, where each review has a dynamic effect according to the context in the sequence. This makes it possible to identify useful reviews to explain the scoring. Experimental results show that RTM significantly outperforms state-of-the-art personalized product search baselines.
△ Less
Submitted 3 June, 2021; v1 submitted 20 April, 2020;
originally announced April 2020.
-
AREDSUM: Adaptive Redundancy-Aware Iterative Sentence Ranking for Extractive Document Summarization
Authors:
Keping Bi,
Rahul Jha,
W. Bruce Croft,
Asli Celikyilmaz
Abstract:
Redundancy-aware extractive summarization systems score the redundancy of the sentences to be included in a summary either jointly with their salience information or separately as an additional sentence scoring step. Previous work shows the efficacy of jointly scoring and selecting sentences with neural sequence generation models. It is, however, not well-understood if the gain is due to better en…
▽ More
Redundancy-aware extractive summarization systems score the redundancy of the sentences to be included in a summary either jointly with their salience information or separately as an additional sentence scoring step. Previous work shows the efficacy of jointly scoring and selecting sentences with neural sequence generation models. It is, however, not well-understood if the gain is due to better encoding techniques or better redundancy reduction approaches. Similarly, the contribution of salience versus diversity components on the created summary is not studied well. Building on the state-of-the-art encoding methods for summarization, we present two adaptive learning models: AREDSUM-SEQ that jointly considers salience and novelty during sentence selection; and a two-step AREDSUM-CTX that scores salience first, then learns to balance salience and redundancy, enabling the measurement of the impact of each aspect. Empirical results on CNN/DailyMail and NYT50 datasets show that by modeling diversity explicitly in a separate step, AREDSUM-CTX achieves significantly better performance than AREDSUM-SEQ as well as state-of-the-art extractive summarization baselines.
△ Less
Submitted 2 April, 2021; v1 submitted 13 April, 2020;
originally announced April 2020.
-
Topological Lattice Metamaterials -- A Platform For Novel Electromagnetic Material Design Based On An Artificial Topological "Atom"
Authors:
Wenjin Zhang,
Ziyuan Meng,
Zidong Zhang,
Ke Bi,
Runhua Fan,
Yi Du,
Weichang Hao
Abstract:
In nature, most materials are composed of atoms with periodic structures. Hence, it's impossible to introduce topological structures into their lattice compose, because the atoms as basic blocks cannot be modulated. However, the lattice compose of metamaterials can be designed conveniently. In our work, we propose to introduce topological non-trivial structures, Mobius unknots, as the basic block…
▽ More
In nature, most materials are composed of atoms with periodic structures. Hence, it's impossible to introduce topological structures into their lattice compose, because the atoms as basic blocks cannot be modulated. However, the lattice compose of metamaterials can be designed conveniently. In our work, we propose to introduce topological non-trivial structures, Mobius unknots, as the basic block (the artificial chiral "atoms") to design metamaterials. A 5.95 GHz intrinsic peak, in addition to the electrical resonance peak near 11 GHz on the transmission coefficient spectrum was confirmed by theoretical calculations, finite-difference time-domain (FDTD) simulations and experiments when electromagnetic waves transfer to a chiral Mobius unknot. Theoretical analysis indicates that this intrinsic peak originates from the phase transition caused by the electromagnetic waves propagate along the Mobius unknot non-trivial structure. It is similar to the state of spin-splitting of electron levels. Take the artificial chiral "atoms" - Mobius unknots as the basic block, we can construct two-dimensional and even three-dimensional ordered metamaterials. The simulation and experimental results showed that the response to electromagnetic wave in the GHz band can be modulated by the coupling between the periodic potential and the spin-like of energy levels.
△ Less
Submitted 4 October, 2020; v1 submitted 12 December, 2019;
originally announced December 2019.
-
Stabilizing DARTS with Amended Gradient Estimation on Architectural Parameters
Authors:
Kaifeng Bi,
Changping Hu,
Lingxi Xie,
Xin Chen,
Longhui Wei,
Qi Tian
Abstract:
DARTS is a popular algorithm for neural architecture search (NAS). Despite its great advantage in search efficiency, DARTS often suffers weak stability, which reflects in the large variation among individual trials as well as the sensitivity to the hyper-parameters of the search process. This paper owes such instability to an optimization gap between the super-network and its sub-networks, namely,…
▽ More
DARTS is a popular algorithm for neural architecture search (NAS). Despite its great advantage in search efficiency, DARTS often suffers weak stability, which reflects in the large variation among individual trials as well as the sensitivity to the hyper-parameters of the search process. This paper owes such instability to an optimization gap between the super-network and its sub-networks, namely, improving the validation accuracy of the super-network does not necessarily lead to a higher expectation on the performance of the sampled sub-networks. Then, we point out that the gap is due to the inaccurate estimation of the architectural gradients, based on which we propose an amended estimation method. Mathematically, our method guarantees a bounded error from the true gradients while the original estimation does not. Our approach bridges the gap from two aspects, namely, amending the estimation on the architectural gradients, and unifying the hyper-parameter settings in the search and re-training stages. Experiments on CIFAR10 and ImageNet demonstrate that our approach largely improves search stability and, more importantly, enables DARTS-based approaches to explore much larger search spaces that have not been investigated before.
△ Less
Submitted 4 May, 2020; v1 submitted 25 October, 2019;
originally announced October 2019.
-
Explainable Product Search with a Dynamic Relation Embedding Model
Authors:
Qingyao Ai,
Yongfeng Zhang,
Keping Bi,
W. Bruce Croft
Abstract:
Product search is one of the most popular methods for customers to discover products online. Most existing studies on product search focus on developing effective retrieval models that rank items by their likelihood to be purchased. They, however, ignore the problem that there is a gap between how systems and customers perceive the relevance of items. Without explanations, users may not understand…
▽ More
Product search is one of the most popular methods for customers to discover products online. Most existing studies on product search focus on developing effective retrieval models that rank items by their likelihood to be purchased. They, however, ignore the problem that there is a gap between how systems and customers perceive the relevance of items. Without explanations, users may not understand why product search engines retrieve certain items for them, which consequentially leads to imperfect user experience and suboptimal system performance in practice. In this work, we tackle this problem by constructing explainable retrieval models for product search. Specifically, we propose to model the "search and purchase" behavior as a dynamic relation between users and items, and create a dynamic knowledge graph based on both the multi-relational product data and the context of the search session. Ranking is conducted based on the relationship between users and items in the latent space, and explanations are generated with logic inferences and entity soft matching on the knowledge graph. Empirical experiments show that our model, which we refer to as the Dynamic Relation Embedding Model (DREM), significantly outperforms the state-of-the-art baselines and has the ability to produce reasonable explanations for search results.
△ Less
Submitted 16 September, 2019;
originally announced September 2019.
-
A Study of Context Dependencies in Multi-page Product Search
Authors:
Keping Bi,
Choon Hui Teo,
Yesh Dattatreya,
Vijai Mohan,
W. Bruce Croft
Abstract:
In product search, users tend to browse results on multiple search result pages (SERPs) (e.g., for queries on clothing and shoes) before deciding which item to purchase. Users' clicks can be considered as implicit feedback which indicates their preferences and used to re-rank subsequent SERPs. Relevance feedback (RF) techniques are usually involved to deal with such scenarios. However, these metho…
▽ More
In product search, users tend to browse results on multiple search result pages (SERPs) (e.g., for queries on clothing and shoes) before deciding which item to purchase. Users' clicks can be considered as implicit feedback which indicates their preferences and used to re-rank subsequent SERPs. Relevance feedback (RF) techniques are usually involved to deal with such scenarios. However, these methods are designed for document retrieval, where relevance is the most important criterion. In contrast, product search engines need to retrieve items that are not only relevant but also satisfactory in terms of customers' preferences. Personalization based on users' purchase history has been shown to be effective in product search. However, this method captures users' long-term interest, which does not always align with their short-term interest, and does not benefit customers with little or no purchase history. In this paper, we study RF techniques based on both long-term and short-term context dependencies in multi-page product search. We also propose an end-to-end context-aware embedding model which can capture both types of context. Our experimental results show that short-term context leads to much better performance compared with long-term and no context. Moreover, our proposed model is more effective than state-of-art word-based RF models.
△ Less
Submitted 9 January, 2020; v1 submitted 9 September, 2019;
originally announced September 2019.
-
Conversational Product Search Based on Negative Feedback
Authors:
Keping Bi,
Qingyao Ai,
Yongfeng Zhang,
W. Bruce Croft
Abstract:
Intelligent assistants change the way people interact with computers and make it possible for people to search for products through conversations when they have purchase needs. During the interactions, the system could ask questions on certain aspects of the ideal products to clarify the users' needs. For example, previous work proposed to ask users the exact characteristics of their ideal items b…
▽ More
Intelligent assistants change the way people interact with computers and make it possible for people to search for products through conversations when they have purchase needs. During the interactions, the system could ask questions on certain aspects of the ideal products to clarify the users' needs. For example, previous work proposed to ask users the exact characteristics of their ideal items before showing results. However, users may not have clear ideas about what an ideal item looks like, especially when they have not seen any item. So it is more feasible to facilitate the conversational search by showing example items and asking for feedback instead. In addition, when the users provide negative feedback for the presented items, it is easier to collect their detailed feedback on certain properties (aspect-value pairs) of the non-relevant items. By breaking down the item-level negative feedback to fine-grained feedback on aspect-value pairs, more information is available to help clarify users' intents. So in this paper, we propose a conversational paradigm for product search driven by non-relevant items, based on which fine-grained feedback is collected and utilized to show better results in the next iteration. We then propose an aspect-value likelihood model to incorporate both positive and negative feedback on fine-grained aspect-value pairs of the non-relevant items. Experimental results show that our model is significantly better than state-of-the-art product search baselines without using feedback and those baselines using item-level negative feedback.
△ Less
Submitted 4 September, 2019;
originally announced September 2019.
-
Leverage Implicit Feedback for Context-aware Product Search
Authors:
Keping Bi,
Choon Hui Teo,
Yesh Dattatreya,
Vijai Mohan,
W. Bruce Croft
Abstract:
Product search serves as an important entry point for online shopping. In contrast to web search, the retrieved results in product search not only need to be relevant but also should satisfy customers' preferences in order to elicit purchases. Previous work has shown the efficacy of purchase history in personalized product search. However, customers with little or no purchase history do not benefi…
▽ More
Product search serves as an important entry point for online shopping. In contrast to web search, the retrieved results in product search not only need to be relevant but also should satisfy customers' preferences in order to elicit purchases. Previous work has shown the efficacy of purchase history in personalized product search. However, customers with little or no purchase history do not benefit from personalized product search. Furthermore, preferences extracted from a customer's purchase history are usually long-term and may not always align with her short-term interests. Hence, in this paper, we leverage clicks within a query session, as implicit feedback, to represent users' hidden intents, which further act as the basis for re-ranking subsequent result pages for the query. It has been studied extensively to model user preference with implicit feedback in recommendation tasks. However, there has been little research on modeling users' short-term interest in product search. We study whether short-term context could help promote users' ideal item in the following result pages for a query. Furthermore, we propose an end-to-end context-aware embedding model which can capture long-term and short-term context dependencies. Our experimental results on the datasets collected from the search log of a commercial product search engine show that short-term context leads to much better performance compared with long-term and no context. Our results also show that our proposed model is more effective than word-based context-aware models.
△ Less
Submitted 9 January, 2020; v1 submitted 4 September, 2019;
originally announced September 2019.
-
Iterative Relevance Feedback for Answer Passage Retrieval with Passage-level Semantic Match
Authors:
Keping Bi,
Qingyao Ai,
W. Bruce Croft
Abstract:
Relevance feedback techniques assume that users provide relevance judgments for the top k (usually 10) documents and then re-rank using a new query model based on those judgments. Even though this is effective, there has been little research recently on this topic because requiring users to provide substantial feedback on a result list is impractical in a typical web search scenario. In new enviro…
▽ More
Relevance feedback techniques assume that users provide relevance judgments for the top k (usually 10) documents and then re-rank using a new query model based on those judgments. Even though this is effective, there has been little research recently on this topic because requiring users to provide substantial feedback on a result list is impractical in a typical web search scenario. In new environments such as voice-based search with smart home devices, however, feedback about result quality can potentially be obtained during users' interactions with the system. Since there are severe limitations on the length and number of results that can be presented in a single interaction in this environment, the focus should move from browsing result lists to iterative retrieval and from retrieving documents to retrieving answers. In this paper, we study iterative relevance feedback techniques with a focus on retrieving answer passages. We first show that iterative feedback is more effective than the top-k approach for answer retrieval. Then we propose an iterative feedback model based on passage-level semantic match and show that it can produce significant improvements compared to both word-based iterative feedback models and those based on term-level semantic similarity.
△ Less
Submitted 20 December, 2018;
originally announced December 2018.
-
Revisiting Iterative Relevance Feedback for Document and Passage Retrieval
Authors:
Keping Bi,
Qingyao Ai,
W. Bruce Croft
Abstract:
As more and more search traffic comes from mobile phones, intelligent assistants, and smart-home devices, new challenges (e.g., limited presentation space) and opportunities come up in information retrieval. Previously, an effective technique, relevance feedback (RF), has rarely been used in real search scenarios due to the overhead of collecting users' relevance judgments. However, since users te…
▽ More
As more and more search traffic comes from mobile phones, intelligent assistants, and smart-home devices, new challenges (e.g., limited presentation space) and opportunities come up in information retrieval. Previously, an effective technique, relevance feedback (RF), has rarely been used in real search scenarios due to the overhead of collecting users' relevance judgments. However, since users tend to interact more with the search results shown on the new interfaces, it becomes feasible to obtain users' assessments on a few results during each interaction. This makes iterative relevance feedback (IRF) techniques look promising today. IRF has not been studied systematically in the new search scenarios and its effectiveness is mostly unknown. In this paper, we re-visit IRF and extend it with RF models proposed in recent years. We conduct extensive experiments to analyze and compare IRF with the standard top-k RF framework on document and passage retrieval. Experimental results show that IRF is at least as effective as the standard top-k RF framework for documents and much more effective for passages. This indicates that IRF for passage retrieval has huge potential.
△ Less
Submitted 9 June, 2019; v1 submitted 13 December, 2018;
originally announced December 2018.
-
Unbiased Learning to Rank with Unbiased Propensity Estimation
Authors:
Qingyao Ai,
Keping Bi,
Cheng Luo,
Jiafeng Guo,
W. Bruce Croft
Abstract:
Learning to rank with biased click data is a well-known challenge. A variety of methods has been explored to debias click data for learning to rank such as click models, result interleaving and, more recently, the unbiased learning-to-rank framework based on inverse propensity weighting. Despite their differences, most existing studies separate the estimation of click bias (namely the \textit{prop…
▽ More
Learning to rank with biased click data is a well-known challenge. A variety of methods has been explored to debias click data for learning to rank such as click models, result interleaving and, more recently, the unbiased learning-to-rank framework based on inverse propensity weighting. Despite their differences, most existing studies separate the estimation of click bias (namely the \textit{propensity model}) from the learning of ranking algorithms. To estimate click propensities, they either conduct online result randomization, which can negatively affect the user experience, or offline parameter estimation, which has special requirements for click data and is optimized for objectives (e.g. click likelihood) that are not directly related to the ranking performance of the system. In this work, we address those problems by unifying the learning of propensity models and ranking models. We find that the problem of estimating a propensity model from click data is a dual problem of unbiased learning to rank. Based on this observation, we propose a Dual Learning Algorithm (DLA) that jointly learns an unbiased ranker and an \textit{unbiased propensity model}. DLA is an automatic unbiased learning-to-rank framework as it directly learns unbiased ranking models from biased click data without any preprocessing. It can adapt to the change of bias distributions and is applicable to online learning. Our empirical experiments with synthetic and real-world data show that the models trained with DLA significantly outperformed the unbiased learning-to-rank algorithms based on result randomization and the models trained with relevance signals extracted by click models.
△ Less
Submitted 23 April, 2018; v1 submitted 16 April, 2018;
originally announced April 2018.
-
Learning a Deep Listwise Context Model for Ranking Refinement
Authors:
Qingyao Ai,
Keping Bi,
Jiafeng Guo,
W. Bruce Croft
Abstract:
Learning to rank has been intensively studied and widely applied in information retrieval. Typically, a global ranking function is learned from a set of labeled data, which can achieve good performance on average but may be suboptimal for individual queries by ignoring the fact that relevant documents for different queries may have different distributions in the feature space. Inspired by the idea…
▽ More
Learning to rank has been intensively studied and widely applied in information retrieval. Typically, a global ranking function is learned from a set of labeled data, which can achieve good performance on average but may be suboptimal for individual queries by ignoring the fact that relevant documents for different queries may have different distributions in the feature space. Inspired by the idea of pseudo relevance feedback where top ranked documents, which we refer as the \textit{local ranking context}, can provide important information about the query's characteristics, we propose to use the inherent feature distributions of the top results to learn a Deep Listwise Context Model that helps us fine tune the initial ranked list. Specifically, we employ a recurrent neural network to sequentially encode the top results using their feature vectors, learn a local context model and use it to re-rank the top results. There are three merits with our model: (1) Our model can capture the local ranking context based on the complex interactions between top results using a deep neural network; (2) Our model can be built upon existing learning-to-rank methods by directly using their extracted feature vectors; (3) Our model is trained with an attention-based loss function, which is more effective and efficient than many existing listwise methods. Experimental results show that the proposed model can significantly improve the state-of-the-art learning to rank methods on benchmark retrieval corpora.
△ Less
Submitted 23 April, 2018; v1 submitted 16 April, 2018;
originally announced April 2018.
-
Residues modulo powers of two in the Young-Fibonacci lattice
Authors:
N. Karimilla Bi,
Amritanshu Prasad,
P. Giftson Santhosh
Abstract:
We study the subgraph of the Young-Fibonacci graph induced by elements with odd $f$-statistic (the $f$-statistic of an element $w$ of a differential graded poset is the number of saturated chains from the minimal element of the poset to $w$). We show that this subgraph is a binary tree. Moreover, the odd residues of the $f$-statistics in a row of this tree equidistibute modulo any power two. This…
▽ More
We study the subgraph of the Young-Fibonacci graph induced by elements with odd $f$-statistic (the $f$-statistic of an element $w$ of a differential graded poset is the number of saturated chains from the minimal element of the poset to $w$). We show that this subgraph is a binary tree. Moreover, the odd residues of the $f$-statistics in a row of this tree equidistibute modulo any power two. This is equivalent to a purely number theoretic result about the equidistribution of residues modulo powers of two among the products of distinct odd numbers less than a fixed number.
△ Less
Submitted 22 February, 2017;
originally announced February 2017.
-
Cellularity of a Larger Class of Diagram Algebras
Authors:
N. Karimilla Bi
Abstract:
In this paper, we realize the algebra of $\mathbb{Z}_2$-relations, signed partition algebras and partition algebras as tabular algebras and prove the cellularity of these algebras using the method of \cite{GM1}.
Using the results of Graham and Lehrer in \cite{GL}, we give the modular representations of the algebra of $\mathbb{Z}_2$-relations, signed partition algebras and partition algebras.
In this paper, we realize the algebra of $\mathbb{Z}_2$-relations, signed partition algebras and partition algebras as tabular algebras and prove the cellularity of these algebras using the method of \cite{GM1}.
Using the results of Graham and Lehrer in \cite{GL}, we give the modular representations of the algebra of $\mathbb{Z}_2$-relations, signed partition algebras and partition algebras.
△ Less
Submitted 9 June, 2015;
originally announced June 2015.
-
Eigenvalues of Gram Matrices of a class of Diagram Algebras
Authors:
N. Karimilla Bi,
M. Parvathi
Abstract:
In this paper, we introduce symmetric diagram matrices $A_{s+r,s}$ of size ${_{(s+r)}}C_s$ whose entries are $\{x_i\}_{min\{s,r\}}$. We compute the eigenvalues of symmetric diagram matrices using elementary row and column operations inductively. As a byproduct, we obtain the eigenvalues of Gram matrices of a larger class of diagram algebras like the signed partition algebras, algebra of…
▽ More
In this paper, we introduce symmetric diagram matrices $A_{s+r,s}$ of size ${_{(s+r)}}C_s$ whose entries are $\{x_i\}_{min\{s,r\}}$. We compute the eigenvalues of symmetric diagram matrices using elementary row and column operations inductively. As a byproduct, we obtain the eigenvalues of Gram matrices of a larger class of diagram algebras like the signed partition algebras, algebra of $\mathbb{Z}_2$ relations and partition algebras.
△ Less
Submitted 6 April, 2015;
originally announced April 2015.