skip to main content
short-paper

Tweet2Vec: Learning Tweet Embeddings Using Character-level CNN-LSTM Encoder-Decoder

Published: 07 July 2016 Publication History

Abstract

We present Tweet2Vec, a novel method for generating general-purpose vector representation of tweets. The model learns tweet embeddings using character-level CNN-LSTM encoder-decoder. We trained our model on 3 million, randomly selected English-language tweets. The model was evaluated using two methods: tweet semantic similarity and tweet sentiment categorization, outperforming the previous state-of-the-art in both tasks. The evaluations demonstrate the power of the tweet embeddings generated by our model for various tweet categorization tasks. The vector representations generated by our model are generic, and hence can be applied to a variety of tasks. Though the model presented in this paper is trained on English-language tweets, the method presented can be used to learn tweet embeddings for different languages.

References

[1]
J. Chung, C. Gulcehre, K. Cho, and Y. Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
[2]
R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. The Journal of Machine Learning Research, 12:2493--2537, 2011.
[3]
C. Fellbaum. WordNet. Wiley Online Library, 1998.
[4]
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural computation, 9(8):1735--1780, 1997.
[5]
D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
[6]
R. Kiros, Y. Zhu, R. R. Salakhutdinov, R. Zemel, R. Urtasun, A. Torralba, and S. Fidler. Skip-thought vectors. In Advances in Neural Information Processing Systems, pages 3276--3284, 2015.
[7]
Q. V. Le and T. Mikolov. Distributed representations of sentences and documents. arXiv preprint arXiv:1405.4053, 2014.
[8]
J. Li, M.-T. Luong, and D. Jurafsky. A hierarchical neural autoencoder for paragraphs and documents. arXiv preprint arXiv:1506.01057, 2015.
[9]
T. Mikolov, I. Sutskever, K. Chen, G. S. Corrado, and J. Dean. Distributed representations of words and phrases and their compositionality. In Advances in neural information processing systems, pages 3111--3119, 2013.
[10]
S. Rosenthal, P. Nakov, S. Kiritchenko, S. Mohammad, A. Ritter, and V. Stoyanov. Semeval-2015 task 10: Sentiment analysis in twitter. In SemEval, 2015.
[11]
I. Sutskever, O. Vinyals, and Q. V. Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104--3112, 2014.
[12]
P. Vijayaraghavan, I. Sysoev, S. Vosoughi, and D. Roy. Deepstance at semeval-2016 task 6: Detecting stance in tweets using character and word-level cnns. 2016.
[13]
S. Vosoughi and D. Roy. Tweet acts: A speech act classifier for twitter. In proceedings of the 10th ICWSM, 2016.
[14]
W. Xu, C. Callison-Burch, and W. B. Dolan. Semeval-2015 task 1: Paraphrase and semantic similarity in twitter (pit). In SemEval, 2015.
[15]
X. Zhang and Y. LeCun. Text understanding from scratch. arXiv preprint arXiv:1502.01710, 2015.

Cited By

View all
  • (2024)On the Impact of Heterogeneity on Federated Learning at the Edge with DGA Malware DetectionProceedings of the Asian Internet Engineering Conference 202410.1145/3674213.3674215(10-17)Online publication date: 9-Aug-2024
  • (2024)Modulating LSTMs of Data-Driven Domain Features for DGA Detection: A Semantic Context-Dependent Method2024 IEEE 2nd International Conference on Control, Electronics and Computer Technology (ICCECT)10.1109/ICCECT60629.2024.10545717(1508-1514)Online publication date: 26-Apr-2024
  • (2024)GPTCN: Gated Parallel Transformer Convolutional Networks for Downstream-Task User Representation Learning on App UsageICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10446256(5175-5179)Online publication date: 14-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGIR '16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval
July 2016
1296 pages
ISBN:9781450340694
DOI:10.1145/2911451
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 July 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cnn
  2. convolutional neural networks
  3. embedding
  4. encoder-decoder
  5. lstm
  6. tweet
  7. tweet2vec
  8. twitter

Qualifiers

  • Short-paper

Conference

SIGIR '16
Sponsor:

Acceptance Rates

SIGIR '16 Paper Acceptance Rate 62 of 341 submissions, 18%;
Overall Acceptance Rate 792 of 3,983 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)52
  • Downloads (Last 6 weeks)7
Reflects downloads up to 19 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)On the Impact of Heterogeneity on Federated Learning at the Edge with DGA Malware DetectionProceedings of the Asian Internet Engineering Conference 202410.1145/3674213.3674215(10-17)Online publication date: 9-Aug-2024
  • (2024)Modulating LSTMs of Data-Driven Domain Features for DGA Detection: A Semantic Context-Dependent Method2024 IEEE 2nd International Conference on Control, Electronics and Computer Technology (ICCECT)10.1109/ICCECT60629.2024.10545717(1508-1514)Online publication date: 26-Apr-2024
  • (2024)GPTCN: Gated Parallel Transformer Convolutional Networks for Downstream-Task User Representation Learning on App UsageICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10446256(5175-5179)Online publication date: 14-Apr-2024
  • (2024)ChatGPT as a Text Annotation Tool to Evaluate Sentiment Analysis on South African Financial InstitutionsIEEE Access10.1109/ACCESS.2024.346437412(144017-144043)Online publication date: 2024
  • (2024)Detecting Domain Names Generated by DGAs With Low False Positives in Chinese Domain NamesIEEE Access10.1109/ACCESS.2024.345424212(123716-123730)Online publication date: 2024
  • (2024)DDHCNExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121564237:PBOnline publication date: 1-Feb-2024
  • (2024)Generation of Histopathological Images Caption Using CNN and LSTMProceedings of the Second International Conference on Computing, Communication, Security and Intelligent Systems10.1007/978-981-99-8398-8_1(1-10)Online publication date: 28-Mar-2024
  • (2023)PKDGA: A Partial Knowledge-Based Domain Generation Algorithm for BotnetsIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.329822918(4854-4869)Online publication date: 1-Jan-2023
  • (2023)ReplaceDGA: BiLSTM-Based Adversarial DGA With High Anti-Detection AbilityIEEE Transactions on Information Forensics and Security10.1109/TIFS.2023.329395618(4406-4421)Online publication date: 1-Jan-2023
  • (2023)Integrating Bidirectional Long Short-Term Memory with Subword Embedding for Authorship Attribution2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC)10.1109/SMC53992.2023.10393898(1910-1917)Online publication date: 1-Oct-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media