Abstract
When traveling to a foreign city, we often find ourselves in dire need of an intelligent agent that can provide instant and informative responses to our various queries. Such an agent should have the ability to understand our queries and possess the knowledge to generate helpful responses. Furthermore, if the agent can comprehend image information, it can provide solutions from multiple perspectives. Knowledge graph-based multimodal dialog systems offer a promising approach to fulfill these requirements. In this paper, we present a solution for efficiently constructing a multimodal dialog system in the travel domain without large-scale datasets. The system’s main objective is to assist users in completing various travel-related tasks, specifically attraction recommendation and route planning, which are frequently requested by users while traveling. We introduce the Multimodal Chinese Tourism Knowledge Graph (MCTKG) and integrate image processing and recommendation technology into a dialog system. Specifically, our approach utilizes modular design to construct the dialog system, and leverages the rich information available in the knowledge graph to enhance the performance of each module. To the best of our knowledge, this is the first multimodal travel dialog system that provides users with personalized travel route recommendations. Multiple experiments have proven that our dialog system can effectively enhance the user’s travel experience.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chen, H., Liu, X., Yin, D., Tang, J.: A survey on dialogue systems: recent advances and new frontiers. ACM SIGKDD Explor. Newsl. 19(2), 25–35 (2017)
Chen, Q., Zhuo, Z., Wang, W.: Bert for joint intent classification and slot filling. arXiv preprint arXiv:1902.10909 (2019)
Dai, Z., Callan, J.: Deeper text understanding for IR with contextual neural language modeling. In: Proceedings of SIGIR, pp. 985–988. Association for Computing Machinery, New York (2019)
Dhingra, B., et al.: Towards end-to-end reinforcement learning of dialogue agents for information access. In: Proceedings of ACL, Vancouver, Canada, pp. 484–495. Association for Computational Linguistics (2017)
Goo, C.W., et al.: Slot-gated modeling for joint slot filling and intent prediction. In: Proceedings of NAACL-HLT, New Orleans, Louisiana, pp. 753–757. Association for Computational Linguistics (2018)
Han, S., Bang, J., Ryu, S., Lee, G.G.: Exploiting knowledge base to generate responses for natural language dialog listening agents. In: Proceedings of SIGDIAL, Prague, Czech Republic, pp. 129–133. Association for Computational Linguistics (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of CVPR, New York, USA, pp. 770–778. IEEE (2016)
Huang, J., Zhao, W.X., Dou, H., Wen, J.R., Chang, E.Y.: Improving sequential recommendation with knowledge-enhanced memory networks. In: Proceedings of SIGIR, pp. 505–514. Association for Computing Machinery, New York (2018)
Jung, J., Son, B., Lyu, S.: AttnIO: knowledge graph exploration with in-and-out attention flow for knowledge-grounded dialogue. In: Proceedings of EMNLP, Stroudsburg, PA, pp. 3484–3497. Association for Computational Linguistics (2020)
Kurata, G., Xiang, B., Zhou, B., Yu, M.: Leveraging sentence-level information with encoder LSTM for semantic slot filling. In: Proceedings of EMNLP, Austin, Texas, pp. 2077–2083. Association for Computational Linguistics (2016)
Liao, L., Ma, Y., He, X., Hong, R., Chua, T.S.: Knowledge-aware multimodal dialogue systems. In: Proceedings of ACM MM, pp. 801–809. Association for Computing Machinery, New York (2018)
Liu, B., Lane, I.: Attention-based recurrent neural network models for joint intent detection and slot filling. In: Proceedings of Interspeech, Baixas, France, pp. 685–689. ISCA-INT Speech Communication Association (2016)
Liu, H., Zhang, F., Zhang, X., Zhao, S., Zhang, X.: An explicit-joint and supervised-contrastive learning framework for few-shot intent classification and slot filling. In: Proceedings of EMNLP, Punta Cana, Dominican Republic, pp. 1945–1955. Association for Computational Linguistics (2021)
Mrkšić, N., Séaghdha, D.O., Wen, T.H., Thomson, B., Young, S.: Neural belief tracker: data-driven dialogue state tracking. In: Proceedings of ACL, Stroudsburg, PA, pp. 1777–1788. Association for Computational Linguistics (2017)
Peng, B., Yao, K., Jing, L., Wong, K.F.: Recurrent neural networks with external memory for spoken language understanding. In: Li, J., Ji, H., Zhao, D., Feng, Y. (eds.) NLPCC 2015. LNCS, vol. 9362, pp. 25–35. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25207-0_3
Qin, L., Xu, X., Che, W., Liu, T.: AGIF: an adaptive graph-interactive framework for joint multiple intent detection and slot filling. In: Proceedings of EMNLP, Stroudsburg, PA, pp. 1807–1816. Association for Computational Linguistics (2020)
Saha, A., Khapra, M.M., Sankaranarayanan, K.: Towards building large scale multimodal domain-aware conversation systems. In: Proceedings of AAAI, Palo Alto, CA, pp. 696–704. AAAI Press (2018)
Serban, I., Sordoni, A., Bengio, Y., Courville, A., Pineau, J.: Building end-to-end dialogue systems using generative hierarchical neural network models. In: Proceedings of AAAI, Palo Alto, CA, vol. 30, pp. 3776–3783. AAAI Press (2016)
Tur, G., Hakkani-Tür, D., Heck, L., Parthasarathy, S.: Sentence simplification for spoken language understanding. In: Proceedings of ICASSP, New York, USA, pp. 5628–5631. IEEE (2011)
Tur, G., Hakkani-Tür, D., Heck, L.: What is left to be understood in atis? In: IEEE Spoken Language Technology Workshop, pp. 19–24. IEEE (2010)
Wang, X., Wang, D., Xu, C., He, X., Cao, Y., Chua, T.S.: Explainable reasoning over knowledge graphs for recommendation. In: Proceedings of AAAI, Palo Alto, CA, vol. 33, pp. 5329–5336. AAAI Press (2019)
Wen, Q., Tian, Y., Zhang, X., Hu, R., Wang, J., Hou, L., Li, J.: Type-aware open information extraction via graph augmentation model. In: Chen, H., Liu, K., Sun, Y., Wang, S., Hou, L. (eds.) CCKS 2020. CCIS, vol. 1356, pp. 119–131. Springer, Singapore (2020). https://doi.org/10.1007/978-981-16-1964-9_10
Wen, T.H., et al.: A network-based end-to-end trainable task-oriented dialogue system. In: Proceedings of EACL, Stroudsburg, PA, pp. 438–449. Association for Computational Linguistics (2017)
Xie, J., et al.: Construction of multimodal Chinese tourism knowledge graph. In: Zeng, J., Qin, P., Jing, W., Song, X., Lu, Z. (eds.) ICPCSEE 2021. CCIS, vol. 1452, pp. 16–29. Springer, Singapore (2021). https://doi.org/10.1007/978-981-16-5943-0_2
Yan, Z., Duan, N., Chen, P., Zhou, M., Zhou, J., Li, Z.: Building task-oriented dialogue systems for online shopping. In: Proceedings of AAAI, Palo Alto, CA, vol. 31, pp. 4618–4625. AAAI Press (2017)
Yu, Z., Yu, J., Fan, J., Tao, D.: Multi-modal factorized bilinear pooling with co-attention learning for visual question answering. In: Proceedings of ICCV, New York, USA, pp. 1839–1848. IEEE (2017)
Zhang, C., Wang, H., Jiang, F., Yin, H.: Adapting to context-aware knowledge in natural conversation for multi-turn response selection. In: Proceedings of the Web Conference, pp. 1990—2001. Association for Computing Machinery, New York (2021)
Zhou, K., Zhao, W.X., Bian, S., Zhou, Y., Wen, J.R., Yu, J.: Improving conversational recommender systems via knowledge graph based semantic fusion. In: Proceedings of KDD, pp. 1006–1014. Association for Computing Machinery, New York (2020)
Zhu, Q., Huang, K., Zhang, Z., Zhu, X., Huang, M.: Crosswoz: a large-scale Chinese cross-domain task-oriented dialogue dataset. Trans. Assoc. Comput. Linguist. 8, 281–295 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wan, J. et al. (2024). Construction of Multimodal Dialog System via Knowledge Graph in Travel Domain. In: Song, X., Feng, R., Chen, Y., Li, J., Min, G. (eds) Web and Big Data. APWeb-WAIM 2023. Lecture Notes in Computer Science, vol 14334. Springer, Singapore. https://doi.org/10.1007/978-981-97-2421-5_28
Download citation
DOI: https://doi.org/10.1007/978-981-97-2421-5_28
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-2420-8
Online ISBN: 978-981-97-2421-5
eBook Packages: Computer ScienceComputer Science (R0)