Vui lòng dùng định danh này để trích dẫn hoặc liên kết đến tài liệu này: https://elib.vku.udn.vn/handle/123456789/1902
Nhan đề: A Comparison of several Deep Learning based Models for Diacritic Restoration Problem in Vietnamese Text
Tác giả: Tran, Quang Linh
Lam, Gia Huy
Duong, Van Binh
Vuong, Cong Dat
Do, Trong Hop
Từ khoá: Diacritic Restoration
Neuron Network
Machine Translation
Natural Language Processing
Word Tokenization
Năm xuất bản: 2021
Nhà xuất bản: Da Nang Publishing House
Tóm tắt: Diacritic restoration is a challenging problem in natural lan- guage processing (NLP). With diacritic restoration, one can text faster and easier. Diacritic restoration is also helpful in making use of diacritic- missing texts, which are normally discarded in many NLP applications. This paper deals with the diacritic restoration problem for Vietnamese text. Three state-of-the-art deep learning models including Gated Re- current Unit, Bidirectional Long-short Term Memory and Bidirectional Gated Recurrent Unit have been examined for the problem and the last one turned out to be the best among them. Besides deep learning models, it was found in this paper that word tokenization, which is the final pre-processing step applied on the data before feeding it to deep learning models also have influences on the final accuracy. Between two examined word tokenization methods: morpheme-based tokenization and phrasebased tokenization, the former yield better results regardless of the applied deep learning models. The experimental results show that the combination of morpheme-based tokenization and Bidirectional-GRU achieve the best performance of diacritic restoration with the Bleu-score of 88.06%.
Mô tả: The 10th Conference on Information Technology and its Applications; Topic: Image and Natural Language Poster; pp. 65-74.
Định danh: http://elib.vku.udn.vn/handle/123456789/1902
Bộ sưu tập: CITA 2021

Các tập tin trong tài liệu này:

 Đăng nhập để xem toàn văn



Khi sử dụng các tài liệu trong Thư viện số phải tuân thủ Luật bản quyền.