CUPID: A Real-Time Session-Based Reciprocal Recommendation System for a One-on-One Social Discovery Platform

Beomsu Kim11, Sangbum Kim11, Minchan Kim11, Joonyoung Yi1, Sungjoo Ha1 Suhyun Lee1, Youngsoo Lee1, Gihoon Yeom1, Buru Chang2, Gihun Lee21 1Hyperconnect, 2Sogang University
Abstract

This study introduces Cupid, a novel approach to session-based reciprocal recommendation systems designed for a real-time one-on-one social discovery platform. In such platforms, low latency is critical to enhance user experiences. However, conventional session-based approaches struggle with high latency due to the demands of modeling sequential user behavior for each recommendation process. Additionally, given the reciprocal nature of the platform, where users act as items for each other, training recommendation models on large-scale datasets is computationally prohibitive using conventional methods. To address these challenges, Cupid decouples the time-intensive user session modeling from the real-time user matching process to reduce inference time. Furthermore, Cupid employs a two-phase training strategy that separates the training of embedding and prediction layers, significantly reducing the computational burden by decreasing the number of sequential model inferences by several hundredfold. Extensive experiments on large-scale Azar datasets demonstrate Cupid’s effectiveness in a real-world production environment. Notably, Cupid reduces response latency by more than 76% compared to non-asynchronous systems, while significantly improving user engagement.

Index Terms:
Session-based Recommendation, Reciprocal Recommendation, Real-time One-on-one Social Discovery
**footnotetext:  These authors contributed equally to this work.footnotetext:  Corresponding to: Gihun Lee (dylan.l@hpcnt.com).

I Introduction

Refer to caption
Figure 1: Difference between (a) conventional session-based recommendations and (b) session-based reciprocal recommendations for real-time social discovery. The representation of the earmuffs remains unchanged for both Alice and Bob. In contrast, on real-time platforms, user representations continuously evolve with each session. For instance, after Carol interacts with both Alice and Bob, her representation changes based on the timing of these interactions. When Bob pairs with Carol, her representation now reflects her previous interaction with Alice.

Azar is a leading real-time social discovery platform that connects users for one-on-one video conversations. To facilitate these interactions, the platform gathers users who signal their readiness for immediate video calls into a matching pool. The platform then matches users from this pool aiming to maximize overall user satisfaction, measured by the total chat duration across all pairs. Longer chat durations are indicative of more engaging and satisfying interactions, thus serving as a proxy for user satisfaction. In such reciprocal recommendation systems, where both users need to be mutually satisfied, the recommendations must reflect the preferences of both parties [1, 2, 3]. Furthermore, as users engage with the platform, their preferences can change dynamically [4, 5]. For example, a user might start by wanting to chat casually about favorite hobbies but later seek deeper conversations about social issues.

In real-time social discovery platforms, adapting to evolving user preferences is crucial for maintaining engagement and satisfaction. One effective approach is session-based recommendations [6, 7, 8], where a session represents a single visit or interaction period during which the user actively engages with the platform. By focusing solely on the current session, session-based recommendations consider a user’s behavior within that session rather than building a user profile from long-term historical data. This approach leverages session-specific information, enabling the system to respond to dynamic preferences [9, 10, 11, 12, 13] and address the cold-start problem [14, 15, 16, 17, 18], where new users lack sufficient historical data, by relying on data from the current session.

However, applying session-based recommendations to reciprocal recommendation systems with strict real-time constraints presents unique and significant challenges. First, conventional session-based systems build user profiles through computationally intensive session modeling [19, 6, 20, 21], which can take several seconds and thereby far exceeding the immediate response times required by platforms like Azar. This delay results in a bottleneck in delivering timely recommendations. Second, user behavior in reciprocal systems can evolve rapidly within a single session, even after each interaction. For example, a positive interaction might make a user more inclined toward similar profiles, while a negative experience could shift their preferences entirely. Moreover, conventional session based recommendations mostly assume static item representations [6, 22, 23]. In reciprocal systems [24, 25, 2], however, both user preferences and the items (i.e., other users) change dynamically since users act as both consumers and items. Consequently, each interaction not only updates a user’s preferences but also impacts other users’ representations, complicating the recommendation algorithm. These factors make real-time session-based reciprocal recommendations more complex than conventional systems. The differences between conventional session-based recommendation and its application in real-time reciprocal recommendation are illustrated in Figure 1.

To address these challenges, we propose Cupid, a session-based reciprocal recommendation system specifically designed for real-time social discovery platforms. For the inference efficiency, Cupid aims to minimize the overall time consumption of the recommendation pipeline by decoupling it from the computationally intensive session modeling for each user. More specifically, Cupid adopts an asynchronous session modeling approach, where user session representations are updated separately from the recommendation process. In this approach, the asynchronously updated user profiles for session modeling are stored in a separate embedding memory. On the other hand, the feature embedding, which is computationally lightweight as it relies on static user information (e.g., country, gender) or match-related statistics, is updated synchronously. When a match request arrives, the system retrieves the pre-computed session embedding from the embedding memory and combines it with the synchronously computed feature embedding to estimate the chat duration between users.

To tackle the training complexity inherent in reciprocal environments, Cupid divides the training process into two distinct phases. In phase 1, the focus is on training the embedding layers that model user sessions and features. In phase 2, these embedding layers are frozen, and the prediction layer is trained to estimate chat duration using the pre-trained embeddings. This two-phase strategy reduces significantly the overall computational cost, which would otherwise be much higher if both components were trained jointly. By separating the training process, the embedding layer handles each user individually rather than modeling interactions between users for every match. This approach not only lowers the training cost but also ensures high prediction performance.

In our experiments, we evaluate Cupid using large-scale, real-world data from Azar. Both offline and online production tests demonstrate that Cupid significantly reduces the recommendation latency and enhances overall user satisfaction, proving its effectiveness for real-time reciprocal recommendation systems. Notably, implementing Cupid increases the average chat duration by 6.8% for warm-start users and 5.9% for cold-start users. At the same time, it reduces response latency by 79.7% for the 90-th percentile of users and 75.9% for the 99-th percentile in the Azar service.

Our main contributions are summarized as follows:

  • We systematically formulate session-based reciprocal recommendation systems for real-time social discovery platforms. To the best of our knowledge, this is the first study to tackle this specific challenge. (section II)

  • We introduce Cupid, a novel session-based recommendation system for real-time reciprocal recommendation. Using asynchronous session embedding and a two-phase training strategy, Cupid improves both inference time and training efficiency. (section III)

  • We validate the efficacy of Cupid using large-scale real-world data from Azar. Cupid significantly enhances recommendation performance in both offline and online evaluations while meeting strict latency constraints required in real-time social discovery. (section IV)

Refer to caption
Figure 2: System design consideration. (a) The overall latency of the session-based recommendation pipeline largely depends on the computational time of user session modeling (session embedding layer). (b) Our recommendation system, Cupid, reduces the latency of the recommendation pipeline by asynchronously conducting user session modeling in parallel with the pipeline.

II Problem Formulation

A real-time social discovery platform connects online users enabling immediate, one-on-one conversations. Let 𝒰𝒰\mathcal{U}caligraphic_U represent the set of all users on such a platform. At any given time t𝑡titalic_t, the matching pool 𝒰(t)={u1,u2,,un}superscript𝒰𝑡subscript𝑢1subscript𝑢2subscript𝑢𝑛\mathcal{U}^{(t)}=\{u_{1},u_{2},\ldots,u_{n}\}caligraphic_U start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT = { italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , … , italic_u start_POSTSUBSCRIPT italic_n end_POSTSUBSCRIPT } consists of n𝑛nitalic_n users available for matching. As illustrated in Figure 1, the matching pool 𝒰(t)superscript𝒰𝑡\mathcal{U}^{(t)}caligraphic_U start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT is dynamic, constantly changing as users log in or out and conversations begin or end. Each user ui𝒰(t)subscript𝑢𝑖superscript𝒰𝑡u_{i}\in\mathcal{U}^{(t)}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_U start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT is characterized by a set of features Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (e.g., gender, country code, and other match-related statistics) and session information Si=[mi,1,mi,2,,mi,h]subscript𝑆𝑖subscript𝑚𝑖1subscript𝑚𝑖2subscript𝑚𝑖S_{i}=[m_{i,1},m_{i,2},\ldots,m_{i,h}]italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = [ italic_m start_POSTSUBSCRIPT italic_i , 1 end_POSTSUBSCRIPT , italic_m start_POSTSUBSCRIPT italic_i , 2 end_POSTSUBSCRIPT , … , italic_m start_POSTSUBSCRIPT italic_i , italic_h end_POSTSUBSCRIPT ], which includes hhitalic_h matching histories. Each matching history mi,kSisubscript𝑚𝑖𝑘subscript𝑆𝑖m_{i,k}\in S_{i}italic_m start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT ∈ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT consists of the chat counterpart ujsubscript𝑢𝑗u_{j}italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT and the chat duration yijsubscript𝑦𝑖𝑗y_{ij}italic_y start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT as follows:

Matching History:mi,k=(ui,uj,yij).Matching History:subscript𝑚𝑖𝑘subscript𝑢𝑖subscript𝑢𝑗subscript𝑦𝑖𝑗\textbf{Matching History:}\quad m_{i,k}=(u_{i},u_{j},y_{ij})\,.Matching History: italic_m start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT = ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) . (1)

The goal of the session-based reciprocal recommendation system is to optimally pair suitable users from 𝒰(t)superscript𝒰𝑡\mathcal{U}^{(t)}caligraphic_U start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT by considering their features and matching histories to maximize overall user satisfaction. We define a recommendation model f()𝑓f(\cdot)italic_f ( ⋅ ), which estimates satisfaction scores sijsubscript𝑠𝑖𝑗s_{ij}italic_s start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT for all possible pairs of users (ui,uj)subscript𝑢𝑖subscript𝑢𝑗(u_{i},u_{j})( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ):

Satisfaction Score:sij=f(ui,uj).Satisfaction Score:subscript𝑠𝑖𝑗𝑓subscript𝑢𝑖subscript𝑢𝑗\textbf{Satisfaction Score:}\quad s_{ij}=f(u_{i},u_{j})\,.Satisfaction Score: italic_s start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = italic_f ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) . (2)

For the satisfaction score, chat duration yijsubscript𝑦𝑖𝑗y_{ij}italic_y start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT is used as a proxy for satisfaction scores, based on the assumption that longer conversations correlate with higher user satisfaction. Therefore, the recommendation model’s objective is revised to predict chat durations y^ijsubscript^𝑦𝑖𝑗\hat{y}_{ij}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT for each user pair:

Predicted Chat Duration:y^ij=f(ui,uj).Predicted Chat Duration:subscript^𝑦𝑖𝑗𝑓subscript𝑢𝑖subscript𝑢𝑗\textbf{Predicted Chat Duration:}\quad\hat{y}_{ij}=f(u_{i},u_{j})\,.Predicted Chat Duration: over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = italic_f ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) . (3)

These predictions are used to connect users through efficient matching algorithms designed according to the service’s business logic. By predicting chat durations, the system can expedite connections that likely enhance user satisfaction, successfully addressing the challenges of dynamic user preferences in real-time social discovery platforms.

III Proposed Approach: Cupid

In this section, we introduce Cupid, our session-based reciprocal recommendation system designed for real-world social discovery services with a focus on low-latency performance. We describe the implementation of Cupid and present a novel training method that efficiently captures mutual interests among users based on extensive matching histories.

III-A System Design Considerations

As highlighted earlier, delivering recommendations with minimal delay is crucial for real-time social discovery platforms. Any latency may lead to longer wait times for users, harming user experience and potentially causing them to leave the service. A key challenge is efficiently modeling short-term, dynamic user behaviors to capture real-time preferences and intents. Cupid addresses two primary considerations: (i) rapidly computing satisfaction scores for all potential user pairs in the matching pool to minimize latency, and (ii) overcoming the slower processing times associated with sequence modeling architectures, such as RNNs or transformers. To tackle these challenges, we have developed two core strategies for score computation and session modeling.

Linear Scaling Score Computation   We compute the expected satisfaction score y^ijsubscript^𝑦𝑖𝑗\hat{y}_{ij}over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT (i.e., chat duration) by applying a simple linear transformation to the dot product of user representations as follows:

y^ij=f(ui,uj)=w(𝐞i𝐞j)+b,subscript^𝑦𝑖𝑗𝑓subscript𝑢𝑖subscript𝑢𝑗𝑤subscript𝐞𝑖subscript𝐞𝑗𝑏\hat{y}_{ij}=f(u_{i},u_{j})=w(\mathbf{e}_{i}\cdot\mathbf{e}_{j})+b,over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = italic_f ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = italic_w ( bold_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ bold_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) + italic_b , (4)

where 𝐞isubscript𝐞𝑖\mathbf{e}_{i}bold_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝐞jsubscript𝐞𝑗\mathbf{e}_{j}bold_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT are the d𝑑ditalic_d-dimensional representation of users uisubscript𝑢𝑖u_{i}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and ujsubscript𝑢𝑗u_{j}italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, respectively. This approach allows us to compute the matrix of predicted satisfaction scores 𝐘^n×n^𝐘superscript𝑛𝑛\hat{\mathbf{Y}}\in\mathbb{R}^{n\times n}over^ start_ARG bold_Y end_ARG ∈ blackboard_R start_POSTSUPERSCRIPT italic_n × italic_n end_POSTSUPERSCRIPT for all user pairs efficiently using a single matrix multiplication by leveraging optimized BLAS [26] libraries.

Asynchronous Session Modeling   We decouple the computationally intensive user session modeling from the real-time matching pipeline by handling it asynchronously. This design significantly enhances the responsiveness of our recommendation system, enabling Cupid to deliver swift recommendations, which is essential for maintaining user engagement, as illustrated in Figure 2. An overview of Cupid’s architecture is provided in Figure 3. The performance of Cupid is measured by the Mean Squared Error (MSE) as follows:

MSE=1|𝒟|m𝒟(y^ijyij)2,subscriptMSE1𝒟subscript𝑚𝒟superscriptsubscript^𝑦𝑖𝑗subscript𝑦𝑖𝑗2\mathcal{L}_{\texttt{MSE}}=\frac{1}{|\mathcal{D}|}\sum_{m\in\mathcal{D}}(\hat{% y}_{ij}-y_{ij})^{2},caligraphic_L start_POSTSUBSCRIPT MSE end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG | caligraphic_D | end_ARG ∑ start_POSTSUBSCRIPT italic_m ∈ caligraphic_D end_POSTSUBSCRIPT ( over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT - italic_y start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (5)

where 𝒟𝒟\mathcal{D}caligraphic_D is the dataset match history of all users. Further details of Cupid are presented in the subsequent sections.

Refer to caption
Figure 3: An overview of Cupid architecture. The user uisubscript𝑢𝑖u_{i}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s features Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and session information Sisubscript𝑆𝑖S_{i}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT are modeled into the user feature representation 𝐞iusubscriptsuperscript𝐞𝑢𝑖\mathbf{e}^{u}_{i}bold_e start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and the session representation 𝐞issubscriptsuperscript𝐞𝑠𝑖\mathbf{e}^{s}_{i}bold_e start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT via the user feature embedding layer fusubscript𝑓𝑢f_{u}italic_f start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT and the session embedding layer fssubscript𝑓𝑠f_{s}italic_f start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, respectively. The session representation is asynchronously computed and stored in the embedding memory E𝐸Eitalic_E.

III-B Asynchronous Session Embedding Layer fssubscript𝑓𝑠f_{s}italic_f start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT

To ensure low-latency recommendations, Cupid models user behaviors in their sessions asynchronously rather than synchronously with matching requests. As illustrated in Figure 2(b), when a user uisubscript𝑢𝑖u_{i}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s previous match ends, the session representation vector 𝐞issubscriptsuperscript𝐞𝑠𝑖\mathbf{e}^{s}_{i}bold_e start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is computed asynchronously using the session embedding layer fssubscript𝑓𝑠f_{s}italic_f start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT. More specifically, each matching history m𝑚mitalic_m in user uisubscript𝑢𝑖u_{i}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s session information Sisubscript𝑆𝑖S_{i}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is embedded into a representation 𝐞msuperscript𝐞𝑚\mathbf{e}^{m}bold_e start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT using Wide& Deep model [27]. This incorporates features Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT from the user ujsubscript𝑢𝑗u_{j}italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, and the features Xjsubscript𝑋𝑗X_{j}italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT from the chat counterpart user ujsubscript𝑢𝑗u_{j}italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, along with the chat duration yijsubscript𝑦𝑖𝑗y_{ij}italic_y start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT. The user session representation 𝐞issubscriptsuperscript𝐞𝑠𝑖\mathbf{e}^{s}_{i}bold_e start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is then formed from these matching history representations [𝐞i,1m,𝐞i,2m,,𝐞i,hmsubscriptsuperscript𝐞𝑚𝑖1subscriptsuperscript𝐞𝑚𝑖2subscriptsuperscript𝐞𝑚𝑖\mathbf{e}^{m}_{i,1},\mathbf{e}^{m}_{i,2},\cdots,\mathbf{e}^{m}_{i,h}bold_e start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , 1 end_POSTSUBSCRIPT , bold_e start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , 2 end_POSTSUBSCRIPT , ⋯ , bold_e start_POSTSUPERSCRIPT italic_m end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_h end_POSTSUBSCRIPT] employing a causal transformer, ensuring that each output 𝐞i,kssubscriptsuperscript𝐞𝑠𝑖𝑘\mathbf{e}^{s}_{i,k}bold_e start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT represents the user’s state after the k𝑘kitalic_k-th match, influenced only by preceding matches. The final session representation 𝐞issubscriptsuperscript𝐞𝑠𝑖\mathbf{e}^{s}_{i}bold_e start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is stored in an embedding memory E𝐸Eitalic_E, replacing any existing representation. When user uisubscript𝑢𝑖u_{i}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, requests a new match, the stored embedding 𝐞issubscriptsuperscript𝐞𝑠𝑖\mathbf{e}^{s}_{i}bold_e start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is retrieved to predict chat durations. Note that the session representation may not be updated before the session representation lookup occurs, as the computation might still be in progress when a new match is requested. In such cases, we refer to the session representation retrieved as a delayed session representation.

This design provides significant advantages: it decouples the slower user session modeling from the synchronous matching pipeline, improving both recommendation speed and efficiency. However, asynchronously updating session representations may cause recent information to be displaced during inference, as new match data could arrive while the session representations are still being updated. Despite this, the system incurs only a few seconds of delay, so the impact on performance is negligible. Furthermore, by handling session information asynchronously, the overall throughput of session processing is enhanced through the batching of multiple inferences, which also reduces computational costs.

III-C Synchronous User Feature Embedding Layer fusubscript𝑓𝑢f_{u}italic_f start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT

Along with session information, Cupid incorporates user features such as demographic details (e.g., gender, country) and other match statistics to capture general user preferences. We use Wide&Deep [27] as the user feature embedding layer fusubscript𝑓𝑢f_{u}italic_f start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT, which processes the user features Xisubscript𝑋𝑖X_{i}italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to generate a representation 𝐞iu=fu(Xi)subscriptsuperscript𝐞𝑢𝑖subscript𝑓𝑢subscript𝑋𝑖\mathbf{e}^{u}_{i}=f_{u}(X_{i})bold_e start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ). This representation 𝐞iusubscriptsuperscript𝐞𝑢𝑖\mathbf{e}^{u}_{i}bold_e start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is then used in the prediction layer to estimate chat duration of users.

III-D Chat Duration Prediction Layer fosubscript𝑓𝑜f_{o}italic_f start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT

The chat duration prediction layer fosubscript𝑓𝑜f_{o}italic_f start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT aims to accurately predict the chat duration for a user pair (uisubscript𝑢𝑖u_{i}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, ujsubscript𝑢𝑗u_{j}italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT) by combining their session and feature representations:

𝐞i=𝐞is+𝐞iu,𝐞j=𝐞js+𝐞ju.formulae-sequencesubscript𝐞𝑖subscriptsuperscript𝐞𝑠𝑖subscriptsuperscript𝐞𝑢𝑖subscript𝐞𝑗subscriptsuperscript𝐞𝑠𝑗subscriptsuperscript𝐞𝑢𝑗\mathbf{e}_{i}=\mathbf{e}^{s}_{i}+\mathbf{e}^{u}_{i},\>\>\>\>\mathbf{e}_{j}=% \mathbf{e}^{s}_{j}+\mathbf{e}^{u}_{j}\,.bold_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_e start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_e start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = bold_e start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + bold_e start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT . (6)

While a simple method to predict chat duration may involve computing the dot product of these user representations (𝐞i𝐞jsubscript𝐞𝑖subscript𝐞𝑗\mathbf{e}_{i}\cdot\mathbf{e}_{j}bold_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ bold_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT), this can lead to overestimating chat duration for users with similar profiles, resulting in sub-optimal recommendations when recommending similar users is not always ideal [28, 29]. To more accurately capture mutual interest while avoiding overestimation for similar users, we linearly project user representations into separate latent spaces:

𝐞¯i=𝐖1𝐞i+𝐛1,𝐞¯j=𝐖2𝐞j+𝐛2,formulae-sequencesubscript¯𝐞𝑖subscript𝐖1subscript𝐞𝑖subscript𝐛1subscript¯𝐞𝑗subscript𝐖2subscript𝐞𝑗subscript𝐛2\bar{\mathbf{e}}_{i}=\mathbf{W}_{1}\mathbf{e}_{i}+\mathbf{b}_{1},\>\>\>\>\bar{% \mathbf{e}}_{j}=\mathbf{W}_{2}\mathbf{e}_{j}+\mathbf{b}_{2},over¯ start_ARG bold_e end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = bold_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + bold_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , over¯ start_ARG bold_e end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = bold_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT bold_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + bold_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , (7)

where 𝐖1subscript𝐖1\mathbf{W}_{1}bold_W start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and 𝐛1subscript𝐛1\mathbf{b}_{1}bold_b start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT are the learnable weight matrix and bias for the projection of the representation of user uisubscript𝑢𝑖u_{i}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT, and 𝐖2subscript𝐖2\mathbf{W}_{2}bold_W start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT and 𝐛2subscript𝐛2\mathbf{b}_{2}bold_b start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are the corresponding weight matrix and bias for their chat counterpart ujsubscript𝑢𝑗u_{j}italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT. Then, the predicted chat duration is estimated by the dot product of the mapped representations 𝐞¯isubscript¯𝐞𝑖\bar{\mathbf{e}}_{i}over¯ start_ARG bold_e end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and 𝐞¯jsubscript¯𝐞𝑗\bar{\mathbf{e}}_{j}over¯ start_ARG bold_e end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT:

y^ij=𝐞¯i𝐞¯j.subscript^𝑦𝑖𝑗subscript¯𝐞𝑖subscript¯𝐞𝑗\hat{y}_{ij}=\bar{\mathbf{e}}_{i}\cdot\bar{\mathbf{e}}_{j}.over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = over¯ start_ARG bold_e end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ over¯ start_ARG bold_e end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT . (8)

Exponential Transformation   As plotted in Figure 4, the actual chat durations follow a long-tailed distribution in real-world social discovery platforms (blue histogram) in practice. However, when trained with naive MSE objective, predictions based on the dot product tend to follow a normal distribution (red histogram), which deviates from the true distribution. To match predictions with the true distribution, we apply an exponential transformation to Equation 8 as follows:

y^ij=fo(𝐞i,𝐞j)=exp(w(𝐞¯i𝐞¯j)+b),subscript^𝑦𝑖𝑗subscript𝑓𝑜subscript𝐞𝑖subscript𝐞𝑗𝑤subscript¯𝐞𝑖subscript¯𝐞𝑗𝑏\hat{y}_{ij}=f_{o}(\mathbf{e}_{i},\mathbf{e}_{j})=\exp\left(w\left(\bar{% \mathbf{e}}_{i}\cdot\bar{\mathbf{e}}_{j}\right)+b\right),over^ start_ARG italic_y end_ARG start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( bold_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) = roman_exp ( italic_w ( over¯ start_ARG bold_e end_ARG start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ⋅ over¯ start_ARG bold_e end_ARG start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) + italic_b ) , (9)

where w𝑤witalic_w and b𝑏bitalic_b are learnable parameters.

Refer to caption
Figure 4: (Left): True chat duration distribution. (Right): Predicted chat duration with and without exponential transform.

As a result, the exponential transformation effectively adjusts the predicted durations to match the long-tailed distribution of actual chat durations (green histogram) with minimal computational overhead.

III-E Two-Phase Training

Training session-based recommendation systems in real-time contexts poses significant computational challenges due to the dynamic nature of user representations. Each user’s preferences evolve after each interaction, requiring the system to frequently update their session representations to accurately reflect their current state. To make precise recommendations, the system must consider the updated session data for both users involved in each match. Traditionally, this involves processing and updating the session data for both the initiating user uisubscript𝑢𝑖u_{i}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and their chat counterpart ujsubscript𝑢𝑗u_{j}italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT separately using complex models like transformers. For each user, the causal transformer processes their session history with a computational complexity of O(|S|2)𝑂superscript𝑆2O(|S|^{2})italic_O ( | italic_S | start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT ), where |S|𝑆|S|| italic_S | is the average session length per user. Modeling both users separately effectively doubles the computational cost, making it computationally intensive.

Moreover, accurately predicting matches requires jointly modeling how the sessions of uisubscript𝑢𝑖u_{i}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and ujsubscript𝑢𝑗u_{j}italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT interact, which significantly increases computational overhead. This is because every interaction in uisubscript𝑢𝑖u_{i}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT’s session might influence and be influenced by every interaction in ujsubscript𝑢𝑗u_{j}italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT’s session, expanding the interaction space exponentially. In a naive approach, considering cross-attention between both users’ sequences could lead to a theoretical complexity of O(|S|4)𝑂superscript𝑆4O(|S|^{4})italic_O ( | italic_S | start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT ). Such high computational demands make real-time processing prohibitive, especially in large-scale platforms with millions of users and high interaction rates like Azar. The sequential dependencies inherent in causal transformers further exacerbate the issue, as each interaction’s representation depends on all previous interactions, leading to extensive computations. To address this challenge and improve training efficiency, we propose a Two-Phase Training Strategy, outlined in Algorithm1, which significantly reduces computational overhead during training without substantially compromising the model’s performance.

1:Input: feature embedding layer fusubscript𝑓𝑢f_{u}italic_f start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT, auxiliary feature embedding layer f~usubscript~𝑓𝑢\tilde{f}_{u}over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT, session embedding layer fssubscript𝑓𝑠f_{s}italic_f start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, and chat duration prediction layer fosubscript𝑓𝑜f_{o}italic_f start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT
2:Output: the trained layers fusubscript𝑓𝑢f_{u}italic_f start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT, fssubscript𝑓𝑠f_{s}italic_f start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT, and fosubscript𝑓𝑜f_{o}italic_f start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT
3:# Phase 1 Training (Embedding Layer)
4:repeat
5:    for uisubscript𝑢𝑖u_{i}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT \in 𝒰𝒰\mathcal{U}caligraphic_U do
6:         compute fs(Si)subscript𝑓𝑠subscript𝑆𝑖f_{s}(S_{i})italic_f start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ) = [𝐞i,0s,𝐞i,1s,𝐞i,2s,,𝐞i,hssubscriptsuperscript𝐞𝑠𝑖0subscriptsuperscript𝐞𝑠𝑖1subscriptsuperscript𝐞𝑠𝑖2subscriptsuperscript𝐞𝑠𝑖\mathbf{e}^{s}_{i,0},\mathbf{e}^{s}_{i,1},\mathbf{e}^{s}_{i,2},\cdots,\mathbf{% e}^{s}_{i,h}bold_e start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , 0 end_POSTSUBSCRIPT , bold_e start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , 1 end_POSTSUBSCRIPT , bold_e start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , 2 end_POSTSUBSCRIPT , ⋯ , bold_e start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_h end_POSTSUBSCRIPT]
7:         for mk=(ui,uj,yij,k)subscript𝑚𝑘subscript𝑢𝑖subscript𝑢𝑗subscript𝑦𝑖𝑗𝑘m_{k}=(u_{i},u_{j},y_{ij,k})italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT = ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i italic_j , italic_k end_POSTSUBSCRIPT ) \in Sisubscript𝑆𝑖S_{i}italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT do
8:             compute 𝐞i,ku=fu(Xi,k),𝐞~j,ku=f~u(Xj,k)formulae-sequencesubscriptsuperscript𝐞𝑢𝑖𝑘subscript𝑓𝑢subscript𝑋𝑖𝑘subscriptsuperscript~𝐞𝑢𝑗𝑘subscript~𝑓𝑢subscript𝑋𝑗𝑘\mathbf{e}^{u}_{i,k}=f_{u}(X_{i,k}),\>\>\tilde{\mathbf{e}}^{u}_{j,k}=\tilde{f}% _{u}(X_{j,k})bold_e start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT = italic_f start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT ) , over~ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT = over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_X start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT )
9:             compute MSEsubscript𝑀𝑆𝐸\mathcal{L}_{MSE}caligraphic_L start_POSTSUBSCRIPT italic_M italic_S italic_E end_POSTSUBSCRIPT = (fo(𝐞i,ku+𝐞i,k1s,𝐞~j,ku)yij,k)2superscriptsubscript𝑓𝑜subscriptsuperscript𝐞𝑢𝑖𝑘subscriptsuperscript𝐞𝑠𝑖𝑘1subscriptsuperscript~𝐞𝑢𝑗𝑘subscript𝑦𝑖𝑗𝑘2(f_{o}(\mathbf{e}^{u}_{i,k}+\mathbf{e}^{s}_{i,k-1},\tilde{\mathbf{e}}^{u}_{j,k% })-y_{ij,k})^{2}( italic_f start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( bold_e start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT + bold_e start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_k - 1 end_POSTSUBSCRIPT , over~ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT ) - italic_y start_POSTSUBSCRIPT italic_i italic_j , italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
10:             update fu,f~u,fs,fosubscript𝑓𝑢subscript~𝑓𝑢subscript𝑓𝑠subscript𝑓𝑜f_{u},\tilde{f}_{u},f_{s},f_{o}italic_f start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , italic_f start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT with MSEsubscriptMSE\mathcal{L}_{\texttt{MSE}}caligraphic_L start_POSTSUBSCRIPT MSE end_POSTSUBSCRIPT
11:         end for
12:    end for
13:until CUPID converges
14:# Phase 2 Training (Prediction Layer)
15:freeze the feature embedding layer fusubscript𝑓𝑢f_{u}italic_f start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT and session embedding layer fssubscript𝑓𝑠f_{s}italic_f start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT
16:compute 𝐞iu,𝐞is,𝐞ju,𝐞jssubscriptsuperscript𝐞𝑢𝑖subscriptsuperscript𝐞𝑠𝑖subscriptsuperscript𝐞𝑢𝑗subscriptsuperscript𝐞𝑠𝑗\mathbf{e}^{u}_{i},\mathbf{e}^{s}_{i},\mathbf{e}^{u}_{j},\mathbf{e}^{s}_{j}bold_e start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_e start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , bold_e start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , bold_e start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT in advance
17:repeat
18:    for m=(ui,uj,yij)𝒟𝑚subscript𝑢𝑖subscript𝑢𝑗subscript𝑦𝑖𝑗𝒟m=(u_{i},u_{j},y_{ij})\in\mathcal{D}italic_m = ( italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT , italic_y start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) ∈ caligraphic_D do
19:         compute MSE=(fo(eiu+eis,eju+ejs)yij)2subscriptMSEsuperscriptsubscript𝑓𝑜subscriptsuperscript𝑒𝑢𝑖subscriptsuperscript𝑒𝑠𝑖subscriptsuperscript𝑒𝑢𝑗subscriptsuperscript𝑒𝑠𝑗subscript𝑦𝑖𝑗2\mathcal{L}_{\texttt{MSE}}=(f_{o}(e^{u}_{i}+e^{s}_{i},e^{u}_{j}+e^{s}_{j})-y_{% ij})^{2}caligraphic_L start_POSTSUBSCRIPT MSE end_POSTSUBSCRIPT = ( italic_f start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( italic_e start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT + italic_e start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT + italic_e start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ) - italic_y start_POSTSUBSCRIPT italic_i italic_j end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT
20:         update fosubscript𝑓𝑜f_{o}italic_f start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT with ���MSEsubscriptMSE\mathcal{L}_{\texttt{MSE}}caligraphic_L start_POSTSUBSCRIPT MSE end_POSTSUBSCRIPT
21:    end for
22:until CUPID converges
Algorithm 1 Two-Phase Training Strategy

Phase 1: Training Embedding Layers   The primary goal of this phase is to efficiently train the user feature embedding layer fusubscript𝑓𝑢f_{u}italic_f start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT and the asynchronous session embedding layer fssubscript𝑓𝑠f_{s}italic_f start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT. We introduce an auxiliary user feature embedding layer f~usubscript~𝑓𝑢\tilde{f}_{u}over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT to assist in training these layers excluding session information from the chat counterparts. This reduces the input to (Xi;Si,Xj)subscript𝑋𝑖subscript𝑆𝑖subscript𝑋𝑗(X_{i};S_{i},X_{j})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ), allowing us to leverage the causal transformer to generate session representations 𝐞i,kssubscriptsuperscript𝐞𝑠𝑖𝑘\mathbf{e}^{s}_{i,k}bold_e start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT with a single forward pass per user. This phase is aimed at minimizing the following objective:

MSE=1|𝒟|ui𝒰mkSi(fo(𝐞i,ku+𝐞i,k1s,𝐞~j,ku)yij,k)2,subscriptMSE1𝒟subscriptsubscript𝑢𝑖𝒰subscriptsubscript𝑚𝑘subscript𝑆𝑖superscriptsubscript𝑓𝑜subscriptsuperscript𝐞𝑢𝑖𝑘subscriptsuperscript𝐞𝑠𝑖𝑘1subscriptsuperscript~𝐞𝑢𝑗𝑘subscript𝑦𝑖𝑗𝑘2\small\mathcal{L}_{\texttt{MSE}}=\frac{1}{|\mathcal{D}|}\sum_{u_{i}\in\mathcal% {U}}\sum_{m_{k}\in S_{i}}\left(f_{o}\left(\mathbf{e}^{u}_{i,k}+\mathbf{e}^{s}_% {i,k-1},\,\tilde{\mathbf{e}}^{u}_{j,k}\right)-y_{ij,k}\right)^{2},caligraphic_L start_POSTSUBSCRIPT MSE end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG | caligraphic_D | end_ARG ∑ start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_U end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( bold_e start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT + bold_e start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_k - 1 end_POSTSUBSCRIPT , over~ start_ARG bold_e end_ARG start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT ) - italic_y start_POSTSUBSCRIPT italic_i italic_j , italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT , (10)

where 𝐞i,k1ssubscriptsuperscript𝐞𝑠𝑖𝑘1\mathbf{e}^{s}_{i,k-1}bold_e start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_k - 1 end_POSTSUBSCRIPT is the session state of user uisubscript𝑢𝑖u_{i}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT after the (k1)𝑘1(k-1)( italic_k - 1 )-th match for predicting the chat duration of the k𝑘kitalic_k-th match.

Phase 2: Training the Chat Duration Prediction Layer   In this phase, we enhance the chat duration prediction layer fosubscript𝑓𝑜f_{o}italic_f start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT by fully incorporating session information from both users in each match (Xi;Si,Xj;Sj)subscript𝑋𝑖subscript𝑆𝑖subscript𝑋𝑗subscript𝑆𝑗(X_{i};S_{i},X_{j};S_{j})( italic_X start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ; italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ; italic_S start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ). Thereby, the objective becomes:

MSE=1|𝒟|ui𝒰mkSi(fo(𝐞i,ku+𝐞i,k1s,𝐞j,ku+𝐞j,k1s)yij,k)2.subscriptMSE1𝒟subscriptsubscript𝑢𝑖𝒰subscriptsubscript𝑚𝑘subscript𝑆𝑖superscriptsubscript𝑓𝑜subscriptsuperscript𝐞𝑢𝑖𝑘subscriptsuperscript𝐞𝑠𝑖𝑘1subscriptsuperscript𝐞𝑢𝑗𝑘subscriptsuperscript𝐞𝑠𝑗𝑘1subscript𝑦𝑖𝑗𝑘2\small\mathcal{L}_{\texttt{MSE}}=\frac{1}{|\mathcal{D}|}\sum_{u_{i}\in\mathcal% {U}}\sum_{m_{k}\in S_{i}}\left(f_{o}\left(\mathbf{e}^{u}_{i,k}+\mathbf{e}^{s}_% {i,k-1},\,\mathbf{e}^{u}_{j,k}+\mathbf{e}^{s}_{j,k-1}\right)-y_{ij,k}\right)^{% 2}.caligraphic_L start_POSTSUBSCRIPT MSE end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG | caligraphic_D | end_ARG ∑ start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ caligraphic_U end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_m start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT ∈ italic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_f start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT ( bold_e start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_k end_POSTSUBSCRIPT + bold_e start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i , italic_k - 1 end_POSTSUBSCRIPT , bold_e start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_k end_POSTSUBSCRIPT + bold_e start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_j , italic_k - 1 end_POSTSUBSCRIPT ) - italic_y start_POSTSUBSCRIPT italic_i italic_j , italic_k end_POSTSUBSCRIPT ) start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT . (11)

In this final phase, we discontinue using the auxiliary user feature embedding layer f~usubscript~𝑓𝑢\tilde{f}_{u}over~ start_ARG italic_f end_ARG start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT from the first phase and freeze the embedding layers (fusubscript𝑓𝑢f_{u}italic_f start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT, fssubscript𝑓𝑠f_{s}italic_f start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT) to improve processing efficiency. By computing the user feature representations in advance, subsequent calculations can be optimized.

Computational Complexity Analysis   We analyze how our two-phase training strategy enhances training efficiency, as detailed in Table I. Let N𝑁Nitalic_N denote the total number of training epochs in standard learning, with N1subscript𝑁1N_{1}italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and N2subscript𝑁2N_{2}italic_N start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT for the first and second phases, respectively. |𝒟|𝒟|\mathcal{D}|| caligraphic_D | denotes the number of matching histories in the dataset, and |S|¯¯𝑆\overline{|S|}over¯ start_ARG | italic_S | end_ARG indicates the average session length per user. In standard training, modeling session representations for both users requires 2N|𝒟|2𝑁𝒟2N|\mathcal{D}|2 italic_N | caligraphic_D | inferences since both uisubscript𝑢𝑖u_{i}italic_u start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT and ujsubscript𝑢𝑗u_{j}italic_u start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT need to be processed for each match history across all epochs. In contrast, the first phase of our method requires only N1|𝒟|/|S|¯subscript𝑁1𝒟¯𝑆N_{1}|\mathcal{D}|/\overline{|S|}italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | caligraphic_D | / over¯ start_ARG | italic_S | end_ARG inferences, as we compute session representations for only one user per inference and leverage the average session length to reduce computations. During the second phase, by pre-extracting user representations and freezing the embedding layers fssubscript𝑓𝑠f_{s}italic_f start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT and fusubscript𝑓𝑢f_{u}italic_f start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT, the total number of inferences needed is just 2|𝒟|/|S|¯2𝒟¯𝑆2|\mathcal{D}|/\overline{|S|}2 | caligraphic_D | / over¯ start_ARG | italic_S | end_ARG, regardless of N2subscript𝑁2N_{2}italic_N start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. This method reduces the inferences required by the causal transformer in fssubscript𝑓𝑠f_{s}italic_f start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT to:

2N|S|¯N1+22𝑁¯𝑆subscript𝑁12\frac{2N\overline{|S|}}{N_{1}+2}divide start_ARG 2 italic_N over¯ start_ARG | italic_S | end_ARG end_ARG start_ARG italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 2 end_ARG

compared to 2N|𝒟|2𝑁𝒟2N|\mathcal{D}|2 italic_N | caligraphic_D | in conventional methods. Assuming N=N1=10𝑁subscript𝑁110N=N_{1}=10italic_N = italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 10 and |S|¯=128¯𝑆128\overline{|S|}=128over¯ start_ARG | italic_S | end_ARG = 128, our method achieves a 213x reduction in transformer inferences. Considering that transformer inference constitutes the majority of training latency, this substantial reduction greatly facilitates the efficient training of Cupid, even with large datasets.

Table I: Comparison of the number of causal transformer inferences with and without our two-phase training.
Phase Time Complexity
w/o Two-phase 2N|𝒟|2𝑁𝒟2N|\mathcal{D}|2 italic_N | caligraphic_D |
w/ Two-phase Phase-1 N1|𝒟|/|S|¯subscript𝑁1𝒟¯𝑆N_{1}|\mathcal{D}|/\overline{|S|}italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | caligraphic_D | / over¯ start_ARG | italic_S | end_ARG (N1+2)|𝒟|/|S|¯subscript𝑁12𝒟¯𝑆(N_{1}+2)|\mathcal{D}|/\overline{|S|}( italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 2 ) | caligraphic_D | / over¯ start_ARG | italic_S | end_ARG
Phase-2 2|𝒟|/|S|¯2𝒟¯𝑆2|\mathcal{D}|/\overline{|S|}2 | caligraphic_D | / over¯ start_ARG | italic_S | end_ARG
Reduction Factor 2N|S|¯/(N1+2)2𝑁¯𝑆subscript𝑁122N\overline{|S|}/(N_{1}+2)2 italic_N over¯ start_ARG | italic_S | end_ARG / ( italic_N start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT + 2 )

IV Experiments

IV-A Experimental Setups

Data Setups   The performance of Cupid is evaluated in both offline and online environments. In the offline evaluation, its performance is tested in a controlled setting. A large-scale matching history from Azar is used, consisting of a billion-scale dataset from millions of user sessions generated over a month. Data from the last two days is used for validation and testing, while the remaining data is used as the training set. In the online evaluation, Cupid’s effectiveness is validated in real-world conditions to ensure that the gains observed are consistent in a live service environment.

Evaluation Setups   We adopt two baseline models based on Wide&Deep [27], which were previously used in Azar before adopting session-based recommendations as follows:

  • Wide&Deep: A widely adopted recommendation method that captures higher-order interactions among input features using neural networks. It employs user representations ei=eiusubscript𝑒𝑖superscriptsubscript𝑒𝑖𝑢e_{i}=e_{i}^{u}italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT and ej=ejusubscript𝑒𝑗superscriptsubscript𝑒𝑗𝑢e_{j}=e_{j}^{u}italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT = italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT are employed in Equation 6 without session representations eissuperscriptsubscript𝑒𝑖𝑠e_{i}^{s}italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT and ejssuperscriptsubscript𝑒𝑗𝑠e_{j}^{s}italic_e start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT. Consequently, it relies solely on static user features and does not include any real-time information from user sessions.

  • Wide&Deep-S: A variant of Wide&Deep that incorporates real-time features generated during user sessions. It captures user behaviors while maintaining low latency by leveraging aggregated features from recent match histories, such as average chat duration, along with existing user features. This baseline serves to demonstrate the effectiveness of our sequential approach for modeling user sessions.

For performance evaluation, we use MSE and Area Under the Receiver Operating Characteristic (AUROC). MSE measures the average squared difference between actual and predicted chat durations by applying log-scaled chat durations (ms) to minimize the impact of noise in shorter intervals. The same log-scaling is also used during the training of our recommendation models. In contrast, AUROC assesses the model’s ability to distinguish between potential matches that result in quality interactions and those that do not. A quality match is defined as one where the chat duration exceeds a specific threshold. As latency is another critical factor for real-time reciprocal recommendation, we also evaluate the latency improvement achieved by adopting Cupid in the real-world deployment of Azar.

Refer to caption
Refer to caption
Figure 5: Offline evaluation results on four types of matches.
Table II: Online evaluation results across all user segments in Azar production. The relative performance change of Cupid compared to the baseline Wide&Deep is reported.
User Segment Average Chat Duration Long Match Ratio Short Match Ratio
All users +6.8% +12.6% -2.4%
Warm-start users +6.8% +12.9% -2.3%
Cold-start users +5.9% +9.7% -4.1%
Table III: The simulation of the deployment environment where user session representations may not up-to-date. We observe the performance changes by using delayed user session representations stored at time (tt)𝑡superscript𝑡(t-t^{\prime})( italic_t - italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ), with varying delay time tsuperscript𝑡t^{\prime}italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT.
Method tsuperscript𝑡t^{\prime}italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT Entire Match Warm-Warm Match Warm-Cold Match Cold-Cold Match
MSE (\downarrow) AUROC (\uparrow) MSE (\downarrow) AUROC (\uparrow) MSE (\downarrow) AUROC (\uparrow) MSE (\downarrow) AUROC (\uparrow)
Cupid - 0.3968 0.8635 0.3815 0.8735 0.4094 0.8564 0.4214 0.8409
2000ms 0.3989 0.8616 0.3834 0.8719 0.4117 0.8541 0.4239 0.8389
4000ms 0.3990 0.8616 0.3834 0.8719 0.4118 0.8540 0.4239 0.8389
8000ms 0.3993 0.8614 0.3837 0.8717 0.4121 0.8538 0.4242 0.8386
16000ms 0.4004 0.8605 0.3848 0.8710 0.4132 0.8528 0.4254 0.8375
Wide&Deep-S - 0.4197 0.8497 0.3996 0.8655 0.4359 0.8375 0.4539 0.8136
Table IV: Ablation test results. SP and ET denote the second phase in our two-phase learning and the exponential transform in Equation 9, respectively. The performance changes are observed by removing each component.
Components Entire Match Warm-Warm Match Warm-Cold Match Cold-Cold Match
𝐞ssuperscript𝐞𝑠\mathbf{e}^{s}bold_e start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT SP ET MSE (\downarrow) AUROC (\uparrow) MSE (\downarrow) AUROC (\uparrow) MSE (\downarrow) AUROC (\uparrow) MSE (\downarrow) AUROC (\uparrow)
0.4197 0.8497 0.3996 0.8655 0.4359 0.8375 0.4539 0.8136
0.4271 0.8464 0.4059 0.8649 0.4436 0.8329 0.4648 0.7995
0.3996 0.8615 0.3845 0.8712 0.4121 0.8545 0.4239 0.8394
0.3948 0.8648 0.3797 0.8745 0.4072 0.8577 0.4190 0.8431

IV-B Offline Performance Evaluation

In Figure 5, the overall performances across three match types are presented. The match types include Entire Match, which encompasses all categories; Warm-Warm, for matches between warm-start users; Warm-Cold, for matches between warm-start and cold-start users; and Cold-Cold, for matches exclusively between cold-start users. Here, cold-start users have no previous matching history in the training dataset. The distribution is 58.1% for Warm-Warm, 35.5% for Warm-Cold, and 6.3% for Cold-Cold. Cupid consistently outperforms baseline methods across all categories and metrics.

IV-C Online Production Performance

Table V: Response latency of CUPID in online environments under the Azar service scenario.
Components 90-th 99-th
percentile percentile
User representation 𝐞usuperscript𝐞𝑢\mathbf{e}^{u}bold_e start_POSTSUPERSCRIPT italic_u end_POSTSUPERSCRIPT 9ms 17ms
Session representation 𝐞ssuperscript𝐞𝑠\mathbf{e}^{s}bold_e start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT 236ms 290ms
Synchronous implementation 236ms 290ms
CUPID:Asynchronous 48ms 70ms
implementation (Ours.) (-79.7%) (-75.9%)

While Cupid shows a significant improvement in predicting satisfaction scores in offline experiments, it may not always lead to increased user engagement online. To evaluate its real-world impact, we test Cupid in the production environment of Azar, comparing it with the baselines. We conduct a Switchback [30] test instead of an A/B test due to the shared matching pool, which makes it difficult to independently separate A/B groups. The results in Table II show improvements in metrics such as average chat duration and the ratio of long to short matches, defined by a preset threshold. For all user segments, Cupid consistently increases the average chat duration and improves match quality. This demonstrates that Cupid not only accurately predicts satisfaction scores but also enhances user experience in a live setting. Meanwhile, Table V shows the latency improvement achieved by Cupid, emphasizing its primary goal of delivering low-latency recommendations through asynchronous session modeling. In the real-world deployment of Azar, Cupid reduces latencies at the 90th and 99th percentiles by up to 79.7% compared to synchronous computation of session representations in the matching pipeline. This significant reduction ensures stable latency, which is essential for real-time services.

IV-D Effect of Delayed Session Representation

In real deployment, the session representation 𝐞ssuperscript𝐞𝑠\mathbf{e}^{s}bold_e start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT might miss the latest matching histories if a user requests a new match before the update is complete. The system then uses a delayed representation, which lacks data from the most recent matches. To study the impact of this delay, we simulate an environment where the representation update is delayed for tsuperscript𝑡t^{\prime}italic_t start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT milliseconds and predict chat durations for users in the matching pool 𝒰(t)superscript𝒰𝑡\mathcal{U}^{(t)}caligraphic_U start_POSTSUPERSCRIPT ( italic_t ) end_POSTSUPERSCRIPT using this delayed data. The results are summarized in Table III. Two main observations emerge. First, prediction performance slightly decreases as delay time increases, which is expected since the system design decouples session modeling from the synchronous matching pipeline to avoid latency issues. This compromise is acceptable, as it prevents session modeling from becoming a bottleneck. Second, even with this delay, the models still outperform the Wide&Deep-S baseline by a significant margin in all cases while maintaining similar latency. This shows that the approach, with its decoupled session modeling, achieves an optimal balance between latency and prediction performance.

IV-E Ablation Study

An ablation test is conducted to evaluate the impact of individual components on Cupid’s performance, focusing on session representation 𝐞ssuperscript𝐞𝑠\mathbf{e}^{s}bold_e start_POSTSUPERSCRIPT italic_s end_POSTSUPERSCRIPT, the Exponential Transformation (ET), and the second phase of the two-phase training strategy. The results, shown in Table IV, indicate a performance drop when any component is removed. Excluding the session representation results in a significant decline, especially for cold-start users, underscoring its role in capturing mutual interests. Skipping the second-phase training also negatively impacts performance, highlighting its importance in using session data from both users to improve chat duration predictions. Additionally, omitting the exponential transformation leads to poorer performance, underscoring its value in aligning predicted chat durations with the actual distribution and stabilizing model training.

V Related Work

Reciprocal Recommendation   Reciprocal recommendation systems differ from conventional in the sense of they aim to enhance mutual satisfaction through user-to-user recommendations, instead of focusing on item-to-user recommendations [4, 31, 32]. These systems have been widely studied, especially in contexts like online dating [33, 34, 35, 36], and job search platforms [37, 38, 29, 39]. Our work shifts the focus to real-time reciprocal recommendations, where candidates appear and disappear dynamically. This is the first comprehensive study to investigate these complex dynamics in real-time.

Session-Based Recommendation   Session-based recommendation systems predict the next item by capturing dynamic user behaviors and intents within a session. Various models, such as Markov Chains [40, 41], recurrent neural networks [42, 43, 22], graph neural networks [44, 45, 46, 47], transformers [48, 49, 20, 50, 51], and other attention mechanisms [52, 53, 54] have been utilized for this purpose. Our study extends session-based recommendations into the underexplored area of reciprocal recommendation tasks. While [55] examines sequential recommendations in a two-sided market, it does not address the low-latency requirements essential for real-time one-on-one social discovery platforms. In contrast, our work specifically focuses on meeting these extreme low-latency constraints, facilitating rapid and efficient user matching in reciprocal session-based recommendation systems.

VI Conclusion

To the best of our knowledge, this is the first study to develop a session-based reciprocal recommendation system optimized for real-time social discovery platforms. Our approach tackles stringent latency requirements by using asynchronous session modeling, which significantly reduces the time required for processing. Additionally, we introduce an efficient two-phase training method that simplifies the complexities of combining session-based and reciprocal recommendations. Our system, validated on a large-scale offline dataset and in a real-world environment, increases average chat duration by 6.8% for warm-start users and 5.9% for cold-start users. Moreover, it achieves over a 76% reduction in latency compared to purely synchronous methods. This research opens a new direction for session-based real-time reciprocal recommendations.

Ethical Statement   By introducing Cupid, we aim to enhance user engagement and satisfaction through efficient, personalized matchmaking in social discovery. Using asynchronous session modeling and a two-phase training strategy, Cupid addresses low latency and dynamic user preferences. However, deploying such a system involves ethical considerations, including user privacy, data security, and potential algorithmic biases. To address these, we ensure strict adherence to data protection laws, implement robust security measures, and commit to developing fairness-aware algorithms with regular audits to prevent unintended discrimination.

References

  • [1] Z. Zheng, X. Hu, S. Gao, H. Zhu, and H. Xiong, “Mirror: A multi-view reciprocal recommender system for online recruitment,” in Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2024, pp. 543–552.
  • [2] B. A. Potts, H. Khosravi, C. Reidsema, A. Bakharia, M. Belonogoff, and M. Fleming, “Reciprocal peer recommendation for learning purposes,” in Proceedings of the 8th international conference on learning analytics and knowledge, 2018, pp. 226–235.
  • [3] Y. Zheng, T. Dave, N. Mishra, and H. Kumar, “Fairness in reciprocal recommendations: A speed-dating study,” in Adjunct publication of the 26th conference on user modeling, adaptation and personalization, 2018, pp. 29–34.
  • [4] I. Palomares, C. Porcel, L. Pizzato, I. Guy, and E. Herrera-Viedma, “Reciprocal recommender systems: Analysis of state-of-art literature, challenges and opportunities towards social recommendation,” Information Fusion, vol. 69, pp. 103–127, 2021.
  • [5] L. Pizzato, T. Rej, T. Chung, I. Koprinska, and J. Kay, “Recon: a reciprocal recommender for online dating,” in Proceedings of the fourth ACM conference on Recommender systems, 2010, pp. 207–214.
  • [6] S. Wang, L. Cao, Y. Wang, Q. Z. Sheng, M. A. Orgun, and D. Lian, “A survey on session-based recommender systems,” ACM Computing Surveys (CSUR), vol. 54, no. 7, pp. 1–38, 2021.
  • [7] S. Wang, Q. Zhang, L. Hu, X. Zhang, Y. Wang, and C. Aggarwal, “Sequential/session-based recommendations: Challenges, approaches, applications and opportunities,” in Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2022, pp. 3425–3428.
  • [8] M. Ludewig, N. Mauro, S. Latifi, and D. Jannach, “Performance comparison of neural and non-neural approaches to session-based recommendation,” in Proceedings of the 13th ACM conference on recommender systems, 2019, pp. 462–466.
  • [9] Y. K. Tan, X. Xu, and Y. Liu, “Improved recurrent neural networks for session-based recommendations,” in Proceedings of the 1st workshop on deep learning for recommender systems, 2016, pp. 17–22.
  • [10] M. Quadrana, A. Karatzoglou, B. Hidasi, and P. Cremonesi, “Personalizing session-based recommendations with hierarchical recurrent neural networks,” in proceedings of the Eleventh ACM Conference on Recommender Systems, 2017, pp. 130–137.
  • [11] Z. Liu, L. Zou, X. Zou, C. Wang, B. Zhang, D. Tang, B. Zhu, Y. Zhu, P. Wu, K. Wang et al., “Monolith: real time recommendation system with collisionless embedding table,” arXiv preprint arXiv:2209.07663, 2022.
  • [12] Z. Hou, F. Bu, Y. Zhou, L. Bu, Q. Ma, Y. Wang, H. Zhai, and Z. Han, “Dycars: A dynamic context-aware recommendation system,” Mathematical Biosciences and Engineering, vol. 21, no. 3, pp. 3563–3593, 2024.
  • [13] A. Mahyari, P. Pirolli, and J. A. LeBlanc, “Real-time learning from an expert in deep recommendation systems with application to mhealth for physical exercises,” IEEE journal of biomedical and health informatics, vol. 26, no. 8, pp. 4281–4290, 2022.
  • [14] A. I. Schein, A. Popescul, L. H. Ungar, and D. M. Pennock, “Methods and metrics for cold-start recommendations,” in Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, 2002, pp. 253–260.
  • [15] R. Sethi and M. Mehrotra, “Cold start in recommender systems—a survey from domain perspective,” in Intelligent Data Communication Technologies and Internet of Things: Proceedings of ICICI 2020.   Springer, 2021, pp. 223–232.
  • [16] D. K. Panda and S. Ray, “Approaches and algorithms to mitigate cold start problems in recommender systems: a systematic literature review,” Journal of Intelligent Information Systems, vol. 59, no. 2, pp. 341–366, 2022.
  • [17] N. A. Abdullah, R. A. Rasheed, M. H. N. M. Nasir, and M. M. Rahman, “Eliciting auxiliary information for cold start user recommendation: A survey,” Applied Sciences, vol. 11, no. 20, p. 9608, 2021.
  • [18] F. Berisha and E. Bytyçi, “Addressing cold start in recommender systems with neural networks: a literature survey,” International Journal of Computers and Applications, vol. 45, no. 7-8, pp. 485–496, 2023.
  • [19] X. Zheng, R. Wu, Z. Han, C. Chen, L. Chen, and B. Han, “Heterogeneous information crossing on graphs for session-based recommender systems,” ACM Transactions on the Web, vol. 18, no. 2, pp. 1–24, 2024.
  • [20] G. de Souza Pereira Moreira, S. Rabhi, J. M. Lee, R. Ak, and E. Oldridge, “Transformers4rec: Bridging the gap between nlp and sequential/session-based recommendation,” in Proceedings of the 15th ACM Conference on Recommender Systems, 2021, pp. 143–153.
  • [21] J. Wang, K. Ding, Z. Zhu, and J. Caverlee, “Session-based recommendation with hypergraph attention networks,” in Proceedings of the 2021 SIAM international conference on data mining (SDM).   SIAM, 2021, pp. 82–90.
  • [22] S. Liu and Y. Zheng, “Long-tail session-based recommendation,” in Proceedings of the 14th ACM Conference on Recommender Systems, 2020, pp. 509–514.
  • [23] C. Hansen, C. Hansen, L. Maystre, R. Mehrotra, B. Brost, F. Tomasi, and M. Lalmas, “Contextual and sequential user embeddings for large-scale music recommendation,” in Proceedings of the 14th ACM Conference on Recommender Systems, 2020, pp. 53–62.
  • [24] X. Cai, M. Bain, A. Krzywicki, W. Wobcke, Y. S. Kim, P. Compton, and A. Mahidadia, “Reciprocal and heterogeneous link prediction in social networks,” in Advances in Knowledge Discovery and Data Mining: 16th Pacific-Asia Conference, PAKDD 2012, Kuala Lumpur, Malaysia, May 29–June 1, 2012, Proceedings, Part II 16.   Springer, 2012, pp. 193–204.
  • [25] P. Xia, B. Liu, Y. Sun, and C. Chen, “Reciprocal recommendation system for online dating,” in Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, 2015, pp. 234–241.
  • [26] L. S. Blackford, A. Petitet, R. Pozo, K. Remington, R. C. Whaley, J. Demmel, J. Dongarra, I. Duff, S. Hammarling, G. Henry et al., “An updated set of basic linear algebra subprograms (blas),” ACM Transactions on Mathematical Software, vol. 28, no. 2, pp. 135–151, 2002.
  • [27] H.-T. Cheng, L. Koc, J. Harmsen, T. Shaked, T. Chandra, H. Aradhye, G. Anderson, G. Corrado, W. Chai, M. Ispir et al., “Wide & deep learning for recommender systems,” in Proceedings of the 1st workshop on deep learning for recommender systems, 2016, pp. 7–10.
  • [28] J. Neve and R. McConville, “Imrec: Learning reciprocal preferences using images,” in Proceedings of the 14th ACM Conference on Recommender Systems, 2020, pp. 170–179.
  • [29] C. Yang, Y. Hou, Y. Song, T. Zhang, J.-R. Wen, and W. X. Zhao, “Modeling two-way selection preference for person-job fit,” in Proceedings of the 16th ACM Conference on Recommender Systems, 2022, pp. 102–112.
  • [30] J. Robins, “A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect,” Mathematical modelling, vol. 7, no. 9-12, pp. 1393–1512, 1986.
  • [31] H. Abdollahpouri, G. Adomavicius, R. Burke, I. Guy, D. Jannach, T. Kamishima, J. Krasnodebski, and L. Pizzato, “Multistakeholder recommendation: Survey and research directions,” User Modeling and User-Adapted Interaction, vol. 30, pp. 127–158, 2020.
  • [32] I. Palomares, “Reciprocal recommendation: Matching users with the right users,” in Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, 2020, pp. 2429–2431.
  • [33] J. Neve and I. Palomares, “Latent factor models and aggregation operators for collaborative filtering in reciprocal recommender systems,” in Proceedings of the 13th ACM conference on recommender systems, 2019, pp. 219–227.
  • [34] Y. Tomita, R. Togashi, and D. Moriwaki, “Matching theory-based recommender systems in online dating,” in Proceedings of the 16th ACM Conference on Recommender Systems, 2022, pp. 538–541.
  • [35] A. Alanazi and M. Bain, “A people-to-people content-based reciprocal recommender using hidden markov models,” in Proceedings of the 7th ACM conference on Recommender systems, 2013, pp. 303–306.
  • [36] K. Tu, B. Ribeiro, D. Jensen, D. Towsley, B. Liu, H. Jiang, and X. Wang, “Online dating recommendations: matching markets and learning preferences,” in Proceedings of the 23rd international conference on world wide web, 2014, pp. 787–792.
  • [37] J. Jiang, S. Ye, W. Wang, J. Xu, and X. Luo, “Learning effective representations for person-job fit by feature fusion,” in Proceedings of the 29th ACM International Conference on Information & Knowledge Management, 2020, pp. 2549–2556.
  • [38] Y. Lu, S. El Helou, and D. Gillet, “A recommender system for job seeking and recruiting website,” in Proceedings of the 22nd International Conference on World Wide Web, 2013, pp. 963–966.
  • [39] R. Yan, R. Le, Y. Song, T. Zhang, X. Zhang, and D. Zhao, “Interview choice reveals your preference on the market: To improve job-resume matching through profiling memories,” in Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2019, pp. 914–922.
  • [40] S. Rendle, C. Freudenthaler, and L. Schmidt-Thieme, “Factorizing personalized markov chains for next-basket recommendation,” in Proceedings of the 19th international conference on World wide web, 2010, pp. 811–820.
  • [41] R. He and J. McAuley, “Fusing similarity models with markov chains for sparse sequential recommendation,” in 2016 IEEE 16th international conference on data mining (ICDM).   IEEE, 2016, pp. 191–200.
  • [42] B. Hidasi, A. Karatzoglou, L. Baltrunas, and D. Tikk, “Session-based recommendations with recurrent neural networks,” in 4th International Conference on Learning Representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2016. [Online]. Available: http://arxiv.org/abs/1511.06939
  • [43] D. Jannach and M. Ludewig, “When recurrent neural networks meet the neighborhood for session-based recommendation,” in Proceedings of the eleventh ACM conference on recommender systems, 2017, pp. 306–310.
  • [44] S. Wu, Y. Tang, Y. Zhu, L. Wang, X. Xie, and T. Tan, “Session-based recommendation with graph neural networks,” in Proceedings of the AAAI conference on artificial intelligence, vol. 33, no. 01, 2019, pp. 346–353.
  • [45] J. Guo, Y. Yang, X. Song, Y. Zhang, Y. Wang, J. Bai, and Y. Zhang, “Learning multi-granularity consecutive user intent unit for session-based recommendation,” in Proceedings of the fifteenth ACM International conference on web search and data mining, 2022, pp. 343–352.
  • [46] P. Zhang, J. Guo, C. Li, Y. Xie, J. B. Kim, Y. Zhang, X. Xie, H. Wang, and S. Kim, “Efficiently leveraging multi-level user intent for session-based recommendation via atten-mixer network,” in Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, 2023, pp. 168–176.
  • [47] Y. Pang, L. Wu, Q. Shen, Y. Zhang, Z. Wei, F. Xu, E. Chang, B. Long, and J. Pei, “Heterogeneous global graph neural networks for personalized session-based recommendation,” in Proceedings of the fifteenth ACM international conference on web search and data mining, 2022, pp. 775–783.
  • [48] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  • [49] F. Sun, J. Liu, J. Wu, C. Pei, X. Lin, W. Ou, and P. Jiang, “Bert4rec: Sequential recommendation with bidirectional encoder representations from transformer,” in Proceedings of the 28th ACM international conference on information and knowledge management, 2019, pp. 1441–1450.
  • [50] X. Xia, J. Yu, Q. Wang, C. Yang, N. Q. V. Hung, and H. Yin, “Efficient on-device session-based recommendation,” ACM Transactions on Information Systems, vol. 41, no. 4, pp. 1–24, 2023.
  • [51] K. Zhou, H. Wang, W. X. Zhao, Y. Zhu, S. Wang, F. Zhang, Z. Wang, and J.-R. Wen, “S3-rec: Self-supervised learning for sequential recommendation with mutual information maximization,” in Proceedings of the 29th ACM international conference on information & knowledge management, 2020, pp. 1893–1902.
  • [52] J. Li, P. Ren, Z. Chen, Z. Ren, T. Lian, and J. Ma, “Neural attentive session-based recommendation,” in Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, 2017, pp. 1419–1428.
  • [53] Q. Liu, Y. Zeng, R. Mokhosi, and H. Zhang, “Stamp: short-term attention/memory priority model for session-based recommendation,” in Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, 2018, pp. 1831–1839.
  • [54] K. Zhou, H. Yu, W. X. Zhao, and J.-R. Wen, “Filter-enhanced mlp is all you need for sequential recommendation,” in Proceedings of the ACM web conference 2022, 2022, pp. 2388–2399.
  • [55] B. Zheng, Y. Hou, W. X. Zhao, Y. Song, and H. Zhu, “Reciprocal sequential recommendation,” in Proceedings of the 17th ACM Conference on Recommender Systems, 2023, pp. 89–100.