THASUM: Transformer for High-Performance Abstractive Summarizing Vietnamese Large-scale Dataset

Nguyen, Ti Hon; Do, Thanh Nghi

Please use this identifier to cite or link to this item: https://elib.vku.udn.vn/handle/123456789/4019

Full metadata record

DC Field	Value	Language
dc.contributor.author	Nguyen, Ti Hon	-
dc.contributor.author	Do, Thanh Nghi	-
dc.date.accessioned	2024-07-30T09:18:25Z	-
dc.date.available	2024-07-30T09:18:25Z	-
dc.date.issued	2024-07	-
dc.identifier.isbn	978-604-80-9774-5	-
dc.identifier.uri	https://elib.vku.udn.vn/handle/123456789/4019	-
dc.description	Proceedings of the 13th International Conference on Information Technology and Its Applications (CITA 2024); pp: 100-111.	vi_VN
dc.description.abstract	Our investigation aims to propose a high-performance abstractive text summarization model for Vietnamese languages. We based on the transformer network with a full encoder-decoder to study the high-quality features of the training data. Next, we scaled down the network size to increase the number of documents the model can summarize in a time frame. We trained the model with a large-scale dataset, including 880,895 documents in the training set and 110, 103 in the testing set. The summarizing speed for the testing set significantly improves with 5.93 hours when using a multiple-core CPU and 0.31 hours on a small GPU. The numerical test results of F1 are also close to the state-of-the-art with 51.03% in ROUGE-1, 18.17% in ROUGE-2, and 31.60% in ROUGE-L.	vi_VN
dc.language.iso	en	vi_VN
dc.publisher	Vietnam-Korea University of Information and Communication Technology	vi_VN
dc.relation.ispartofseries	CITA;	-
dc.subject	Abstractive Text Summarization	vi_VN
dc.subject	Transformer	vi_VN
dc.subject	Vietnamese Large-scale Dataset	vi_VN
dc.title	THASUM: Transformer for High-Performance Abstractive Summarizing Vietnamese Large-scale Dataset	vi_VN
dc.type	Working Paper	vi_VN
Appears in Collections:	CITA 2024 (Proceeding - Vol 2)

Files in This Item:

Sign in to read

Show simple item record