Enhancing Image Classification Capabilities in the Vision Transformer Network Model with Quaternion Algebra

Pham, Minh Tuan; Nguyen, An Hung

Vui lòng dùng định danh này để trích dẫn hoặc liên kết đến tài liệu này: https://elib.vku.udn.vn/handle/123456789/4031

Nhan đề:	Enhancing Image Classification Capabilities in the Vision Transformer Network Model with Quaternion Algebra
Tác giả:	Pham, Minh Tuan Nguyen, An Hung
Từ khoá:	Image classification Deep learning Vision Transformer Quaternion Algebra Multilayer Perceptron Algebra
Năm xuất bản:	thá-2024
Nhà xuất bản:	Vietnam-Korea University of Information and Communication Technology
Tùng thư/Số báo cáo:	CITA;
Tóm tắt:	Abstract. Vision Transformer is a novel approach in artificial intelligence, focusing on image classification. Despite its potential, ViT's emphasis on global data processing presents accuracy challenges compared to local data processing methods like Convolutional Neural Networks (CNN). To address this, we propose two methods. The first integrates a portion of the Residual Network to replace token transformation layers, allowing for local data feature extraction and improved relationship learning between tokens. The second solution suggests transforming layers in the bottleneck component into types that process in the Quaternion hypercomplex domain, enhancing the multidimensional representation of data. Both solutions aim to leverage the strengths of CNN and ViT, thereby indirectly improving image classification accuracy.
Mô tả:	Proceedings of the 13th International Conference on Information Technology and Its Applications (CITA 2024); pp: 174-185
Định danh:	https://elib.vku.udn.vn/handle/123456789/4031
ISBN:	978-604-80-9774-5
Bộ sưu tập:	CITA 2024 (Proceeding - Vol 2)

Các tập tin trong tài liệu này:

Đăng nhập để xem toàn văn

Hiển thị đầy đủ biểu ghi tài liệu Xem thống kê

Khi sử dụng các tài liệu trong Thư viện số phải tuân thủ Luật bản quyền.