Action Recognition in Dark Videos Using Spatio-Temporal Features and Bidirectional Encoder Representations from Transformers
Article Type
Research Article
Publication Title
IEEE Transactions on Artificial Intelligence
Abstract
Several research works have been developed in the area of action recognition. Unfortunately, when these algorithms are applied to low-light or dark videos, their performances are highly affected and found to be very poor or fall rapidly. To address the issue of improving the performance of action recognition in dark or low-light videos; in this article, we have developed an efficient deep 3-D convolutional neural network based action recognition model. The proposed algorithm follows two-stages for action recognition. In the first stage, the low-light videos are enhanced using zero-reference deep curve estimation, followed by the min-max sampling algorithm. In the latter stage, we propose an action classification network to recognize the actions in the enhanced videos. In the proposed action classification network, we explored the capabilities of the R(2+1)D for spatio-temporal feature extraction. The model's overall generalization performance depends on how well it can capture long-range temporal structure in videos, which is essential for action recognition. So we have used a graph convolutional network on the top of R(2+1)D as our video feature encoder, which captures long-term temporal dependencies of the extracted features. Finally, a bidirectional encoder representations from transformers is adhered to classify the actions from the 3-D features extracted from the enhanced video scenes. The effectiveness of the proposed action recognition scheme is verified on ARID V1.0 and ARID V1.5 datasets. It is observed that the proposed algorithm is able to achieve 96.60% and 99.88% as Top-1 and Top-5 accuracy, respectively, on ARID V1.0 dataset. Similarly, on ARID V1.5, the proposed algorithm is able to achieve 86.93% and 99.35% as Top-1 and Top-5 accuracies, respectively. To corroborate our findings, we have compared the results obtained by the proposed scheme with those of 15 state-of-the-art action recognition techniques.
First Page
1461
Last Page
1471
DOI
https://10.1109/TAI.2022.3221912
Publication Date
12-1-2023
Recommended Citation
Singh, Himanshu; Suman, Saurabh; Subudhi, Badri Narayan; Jakhetiya, Vinit; and Ghosh, Ashish, "Action Recognition in Dark Videos Using Spatio-Temporal Features and Bidirectional Encoder Representations from Transformers" (2023). Journal Articles. 3492.
https://digitalcommons.isical.ac.in/journal-articles/3492