Skip to main content

Showing 1–24 of 24 results for author: Łukasik, M

  1. arXiv:2403.04182  [pdf, other

    cs.CL cs.AI

    Metric-aware LLM inference for regression and scoring

    Authors: Michal Lukasik, Harikrishna Narasimhan, Aditya Krishna Menon, Felix Yu, Sanjiv Kumar

    Abstract: Large language models (LLMs) have demonstrated strong results on a range of NLP tasks. Typically, outputs are obtained via autoregressive sampling from the LLM's underlying distribution. Building on prior work on Minimum Bayes Risk Decoding, we show that this inference strategy can be suboptimal for a range of regression and scoring tasks, and associated evaluation metrics. As a remedy, we propose… ▽ More

    Submitted 4 April, 2024; v1 submitted 6 March, 2024; originally announced March 2024.

    Comments: 15 pages

  2. arXiv:2310.09250  [pdf, other

    cs.LG cs.AI stat.ML

    It's an Alignment, Not a Trade-off: Revisiting Bias and Variance in Deep Models

    Authors: Lin Chen, Michal Lukasik, Wittawat Jitkrittum, Chong You, Sanjiv Kumar

    Abstract: Classical wisdom in machine learning holds that the generalization error can be decomposed into bias and variance, and these two terms exhibit a \emph{trade-off}. However, in this paper, we show that for an ensemble of deep learning based classification models, bias and variance are \emph{aligned} at a sample level, where squared bias is approximately \emph{equal} to variance for correctly classif… ▽ More

    Submitted 13 October, 2023; originally announced October 2023.

  3. arXiv:2310.05337  [pdf, other

    cs.LG cs.CV

    What do larger image classifiers memorise?

    Authors: Michal Lukasik, Vaishnavh Nagarajan, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

    Abstract: The success of modern neural networks has prompted study of the connection between memorisation and generalisation: overparameterised models generalise well, despite being able to perfectly fit (memorise) completely random labels. To carefully study this issue, Feldman proposed a metric to quantify the degree of memorisation of individual training examples, and empirically computed the correspondi… ▽ More

    Submitted 8 October, 2023; originally announced October 2023.

    MSC Class: Machine Learning (cs.LG); Artificial Intelligence (cs.AI) Machine Learning (stat.ML)

  4. arXiv:2302.01576  [pdf, other

    cs.LG cs.AI stat.ME stat.ML

    ResMem: Learn what you can and memorize the rest

    Authors: Zitong Yang, Michal Lukasik, Vaishnavh Nagarajan, Zonglin Li, Ankit Singh Rawat, Manzil Zaheer, Aditya Krishna Menon, Sanjiv Kumar

    Abstract: The impressive generalization performance of modern neural networks is attributed in part to their ability to implicitly memorize complex training patterns. Inspired by this, we explore a novel mechanism to improve model generalization via explicit memorization. Specifically, we propose the residual-memorization (ResMem) algorithm, a new method that augments an existing prediction model (e.g. a ne… ▽ More

    Submitted 20 October, 2023; v1 submitted 3 February, 2023; originally announced February 2023.

  5. arXiv:2211.05110  [pdf, other

    cs.CL cs.AI cs.LG

    Large Language Models with Controllable Working Memory

    Authors: Daliang Li, Ankit Singh Rawat, Manzil Zaheer, Xin Wang, Michal Lukasik, Andreas Veit, Felix Yu, Sanjiv Kumar

    Abstract: Large language models (LLMs) have led to a series of breakthroughs in natural language processing (NLP), owing to their excellent understanding and generation abilities. Remarkably, what further sets these models apart is the massive amounts of world knowledge they internalize during pretraining. While many downstream applications provide the model with an informational context to aid its performa… ▽ More

    Submitted 9 November, 2022; originally announced November 2022.

  6. arXiv:2211.00635  [pdf, other

    cs.CL cs.LG

    Two-stage LLM Fine-tuning with Less Specialization and More Generalization

    Authors: Yihan Wang, Si Si, Daliang Li, Michal Lukasik, Felix Yu, Cho-Jui Hsieh, Inderjit S Dhillon, Sanjiv Kumar

    Abstract: Pretrained large language models (LLMs) are general purpose problem solvers applicable to a diverse set of tasks with prompts. They can be further improved towards a specific task by fine-tuning on a specialized dataset. However, fine-tuning usually makes the model narrowly specialized on this dataset with reduced general in-context learning performances, which is undesirable whenever the fine-tun… ▽ More

    Submitted 12 March, 2024; v1 submitted 1 November, 2022; originally announced November 2022.

    Comments: ICLR 2024

  7. arXiv:2207.03833  [pdf, other

    cs.HC

    XR Hackathon Going Online: Lessons Learned from a Case Study with Goethe-Institut

    Authors: Wiesław Kopeć, Kinga Skorupska, Anna Jaskulska, Michał Łukasik, Barbara Karpowicz, Julia Paluch, Kinga Kwiatkowska, Daniel Jabłoński, Rafał Masłyk

    Abstract: In this article we report a case study of a Language and Culture-oriented transdisciplinary XR hackathon organized with Goethe-Institut. The hackathon was hosted as an online event in November 2020 by our University Lab in collaboration with Goethe-Institut as a follow-up to our previous co-organized event within our research group Living Lab. We have improved the formula of the event based on les… ▽ More

    Submitted 8 July, 2022; originally announced July 2022.

  8. arXiv:2206.06479  [pdf, other

    cs.LG

    Robust Distillation for Worst-class Performance

    Authors: Serena Wang, Harikrishna Narasimhan, Yichen Zhou, Sara Hooker, Michal Lukasik, Aditya Krishna Menon

    Abstract: Knowledge distillation has proven to be an effective technique in improving the performance a student model using predictions from a teacher model. However, recent work has shown that gains in average efficiency are not uniform across subgroups in the data, and in particular can often come at the cost of accuracy on rare subgroups and classes. To preserve strong performance across classes that may… ▽ More

    Submitted 13 June, 2022; originally announced June 2022.

  9. arXiv:2110.15440  [pdf, other

    cs.CR cs.LG

    HD-cos Networks: Efficient Neural Architectures for Secure Multi-Party Computation

    Authors: Wittawat Jitkrittum, Michal Lukasik, Ananda Theertha Suresh, Felix Yu, Gang Wang

    Abstract: Multi-party computation (MPC) is a branch of cryptography where multiple non-colluding parties execute a well designed protocol to securely compute a function. With the non-colluding party assumption, MPC has a cryptographic guarantee that the parties will not learn sensitive information from the computation process, making it an appealing framework for applications that involve privacy-sensitive… ▽ More

    Submitted 28 October, 2021; originally announced October 2021.

  10. arXiv:2110.06821  [pdf, other

    cs.LG cs.CL cs.CV

    Leveraging redundancy in attention with Reuse Transformers

    Authors: Srinadh Bhojanapalli, Ayan Chakrabarti, Andreas Veit, Michal Lukasik, Himanshu Jain, Frederick Liu, Yin-Wen Chang, Sanjiv Kumar

    Abstract: Pairwise dot product-based attention allows Transformers to exchange information between tokens in an input-dependent way, and is key to their success across diverse applications in language and vision. However, a typical Transformer model computes such pairwise attention scores repeatedly for the same sequence, in multiple heads in multiple layers. We systematically analyze the empirical similari… ▽ More

    Submitted 13 October, 2021; originally announced October 2021.

  11. arXiv:2106.10494  [pdf, other

    cs.LG

    Teacher's pet: understanding and mitigating biases in distillation

    Authors: Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar

    Abstract: Knowledge distillation is widely used as a means of improving the performance of a relatively simple student model using the predictions from a complex teacher model. Several works have shown that distillation significantly boosts the student's overall performance; however, are these gains uniform across all data subgroups? In this paper, we show that distillation can harm performance on certain s… ▽ More

    Submitted 8 July, 2021; v1 submitted 19 June, 2021; originally announced June 2021.

    Comments: 21 pages, 8 figures

  12. arXiv:2106.08823  [pdf, other

    cs.LG

    Eigen Analysis of Self-Attention and its Reconstruction from Partial Computation

    Authors: Srinadh Bhojanapalli, Ayan Chakrabarti, Himanshu Jain, Sanjiv Kumar, Michal Lukasik, Andreas Veit

    Abstract: State-of-the-art transformer models use pairwise dot-product based self-attention, which comes at a computational cost quadratic in the input sequence length. In this paper, we investigate the global structure of attention scores computed using this dot product mechanism on a typical distribution of inputs, and study the principal components of their variation. Through eigen analysis of full atten… ▽ More

    Submitted 16 June, 2021; originally announced June 2021.

    Comments: 14 pages

  13. arXiv:2010.11851  [pdf, other

    cs.SI

    Hawkes Process Classification through Discriminative Modeling of Text

    Authors: Rohan Tondulkar, Manisha Dubey, P. K. Srijith, Michal Lukasik

    Abstract: Social media has provided a platform for users to gather and share information and stay updated with the news. Such networks also provide a platform to users where they can engage in conversations. However, such micro-blogging platforms like Twitter restricts the length of text. Due to paucity of sufficient word occurrences in such posts, classification of this information is a challenging task us… ▽ More

    Submitted 22 October, 2020; originally announced October 2020.

    Comments: 9 pages, 10 figures

  14. arXiv:2010.07447  [pdf, ps, other

    cs.CL cs.LG

    Semantic Label Smoothing for Sequence to Sequence Problems

    Authors: Michal Lukasik, Himanshu Jain, Aditya Krishna Menon, Seungyeon Kim, Srinadh Bhojanapalli, Felix Yu, Sanjiv Kumar

    Abstract: Label smoothing has been shown to be an effective regularization strategy in classification, that prevents overfitting and helps in label de-noising. However, extending such methods directly to seq2seq settings, such as Machine Translation, is challenging: the large target output space of such problems makes it intractable to apply label smoothing over all possible outputs. Most existing approache… ▽ More

    Submitted 14 October, 2020; originally announced October 2020.

  15. arXiv:2007.01570  [pdf, other

    cs.LG cs.SI stat.ML

    Scaling Graph Neural Networks with Approximate PageRank

    Authors: Aleksandar Bojchevski, Johannes Gasteiger, Bryan Perozzi, Amol Kapoor, Martin Blais, Benedek Rózemberczki, Michal Lukasik, Stephan Günnemann

    Abstract: Graph neural networks (GNNs) have emerged as a powerful approach for solving many network mining tasks. However, learning on large graphs remains a challenge - many recently proposed scalable GNN approaches rely on an expensive message-passing procedure to propagate information through the graph. We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs… ▽ More

    Submitted 5 April, 2022; v1 submitted 3 July, 2020; originally announced July 2020.

    Comments: Published as a Conference Paper at ACM SIGKDD 2020. Author name changed from Johannes Klicpera to Johannes Gasteiger

  16. arXiv:2004.14535  [pdf, other

    cs.CL

    Text Segmentation by Cross Segment Attention

    Authors: Michal Lukasik, Boris Dadachev, Gonçalo Simões, Kishore Papineni

    Abstract: Document and discourse segmentation are two fundamental NLP tasks pertaining to breaking up text into constituents, which are commonly used to help downstream tasks such as information retrieval or text summarization. In this work, we propose three transformer-based architectures and provide comprehensive comparisons with previously proposed approaches on three standard datasets. We establish a ne… ▽ More

    Submitted 7 December, 2020; v1 submitted 29 April, 2020; originally announced April 2020.

    Comments: 10 pages, 4 figures

  17. arXiv:2003.02819  [pdf, other

    cs.LG stat.ML

    Does label smoothing mitigate label noise?

    Authors: Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar

    Abstract: Label smoothing is commonly used in training deep learning models, wherein one-hot training labels are mixed with uniform label vectors. Empirically, smoothing has been shown to improve both predictive performance and model calibration. In this paper, we study whether label smoothing is also effective as a means of coping with label noise. While label smoothing apparently amplifies this problem --… ▽ More

    Submitted 5 March, 2020; originally announced March 2020.

  18. Discourse-Aware Rumour Stance Classification in Social Media Using Sequential Classifiers

    Authors: Arkaitz Zubiaga, Elena Kochkina, Maria Liakata, Rob Procter, Michal Lukasik, Kalina Bontcheva, Trevor Cohn, Isabelle Augenstein

    Abstract: Rumour stance classification, defined as classifying the stance of specific social media posts into one of supporting, denying, querying or commenting on an earlier post, is becoming of increasing interest to researchers. While most previous work has focused on using individual tweets as classifier inputs, here we report on the performance of sequential classifiers that exploit the discourse featu… ▽ More

    Submitted 6 December, 2017; originally announced December 2017.

    Journal ref: Information Processing & Management, Volume 54, Issue 2, March 2018, Pages 273-290

  19. arXiv:1609.09028  [pdf, other

    cs.CL cs.SI

    Stance Classification in Rumours as a Sequential Task Exploiting the Tree Structure of Social Media Conversations

    Authors: Arkaitz Zubiaga, Elena Kochkina, Maria Liakata, Rob Procter, Michal Lukasik

    Abstract: Rumour stance classification, the task that determines if each tweet in a collection discussing a rumour is supporting, denying, questioning or simply commenting on the rumour, has been attracting substantial interest. Here we introduce a novel approach that makes use of the sequence of transitions observed in tree-structured conversation threads in Twitter. The conversation threads are formed by… ▽ More

    Submitted 11 October, 2016; v1 submitted 28 September, 2016; originally announced September 2016.

    Comments: COLING 2016

  20. arXiv:1609.01962  [pdf, other

    cs.CL cs.IR cs.SI

    Using Gaussian Processes for Rumour Stance Classification in Social Media

    Authors: Michal Lukasik, Kalina Bontcheva, Trevor Cohn, Arkaitz Zubiaga, Maria Liakata, Rob Procter

    Abstract: Social media tend to be rife with rumours while new reports are released piecemeal during breaking news. Interestingly, one can mine multiple reactions expressed by social media users in those situations, exploring their stance towards rumours, ultimately enabling the flagging of highly disputed rumours as being potentially false. In this work, we set out to develop an automated, supervised classi… ▽ More

    Submitted 7 September, 2016; originally announced September 2016.

  21. arXiv:1506.00468  [pdf, ps, other

    cs.SI cs.CL cs.LG

    Classifying Tweet Level Judgements of Rumours in Social Media

    Authors: Michal Lukasik, Trevor Cohn, Kalina Bontcheva

    Abstract: Social media is a rich source of rumours and corresponding community reactions. Rumours reflect different characteristics, some shared and some individual. We formulate the problem of classifying tweet level judgements of rumours as a supervised learning task. Both supervised and unsupervised domain adaptation are considered, in which tweets from a rumour are classified on the basis of other annot… ▽ More

    Submitted 10 September, 2015; v1 submitted 1 June, 2015; originally announced June 2015.

  22. arXiv:1208.4822  [pdf, other

    cond-mat.mtrl-sci physics.optics

    Spectral and kinetic properties of electroluminescence of ZnS:Cu powder in polymer structure

    Authors: E. Chimczak, T. Dunaj, M. Bertandt, A. Wieczorek, G. Neunert, G. Chimczak, M. Cież, M. Łukasik

    Abstract: Spectral and kinetic measurements of the light output have been made for AC electroluminescent structure. ZnS:Cu is luminescence active layer in the structure. In kinetic measurements, excitation was by rectangular wave voltage pulse of 1 ms duration. During the excitation the structure emits blue-green light. The maximum of the spectrum lies at about 455 nm.

    Submitted 23 August, 2012; originally announced August 2012.

  23. Conformal Yano-Killing tensors for the Taub-NUT metric

    Authors: Jacek Jezierski, Maciej Łukasik

    Abstract: Symmetric conformal Killing tensors and (skew-symmetric) conformal Yano-Killing tensors for Euclidean Taub-NUT metric are given in explicit form. Relations between Yano and CYK tensors in terms of conformal rescaling are discussed.

    Submitted 18 October, 2006; originally announced October 2006.

    Comments: 12 pages

    Journal ref: Class.Quant.Grav.24:1331-1340,2007

  24. Conformal Yano-Killing tensor for the Kerr metric and conserved quantities

    Authors: Jacek Jezierski, Maciej Łukasik

    Abstract: Properties of (skew-symmetric) conformal Yano--Killing tensors are reviewed. Explicit forms of three symmetric conformal Killing tensors in Kerr spacetime are obtained from the Yano--Killing tensor. The relation between spin-2 fields and solutions to the Maxwell equations is used in the construction of a new conserved quantity which is quadratic in terms of the Weyl tensor. The formula obtained… ▽ More

    Submitted 28 December, 2005; v1 submitted 12 October, 2005; originally announced October 2005.

    Comments: 29 pages

    Journal ref: Class.Quant.Grav.23:2895-2918,2006