59.5 |
Multi-view super vector for action recognition[Cai, Z., Wang, L., Peng, X., Qiao, Y]
|
MVSV |
URL
|
Yes
|
2014
|
61.1 |
Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice[Peng, X., Wang, L., Wang, X., Qiao, Y]
|
|
URL
|
Yes
|
2016
|
61.7 |
A multi-level representation for action recognition[Wang, L., Qiao, Y., Tang, X]
|
|
URL
|
Yes
|
2016
|
59.4 |
Two-stream convolutional networks for action recognition in videos[Simonyan, K., Zisserman, A]
|
|
URL
|
Yes
|
2014
|
63.7 |
Modeling video evolution for action recognition[Fernando, B., Gavves, E., M., J.O., Ghodrati, A.]
|
|
URL
|
Yes
|
2015
|
65.5 |
Motion part regularization: Improving action recognition via trajectory group selection[Ni, B., Moulin, P., Yang, X., Yan, S]
|
|
URL
|
Yes
|
2015
|
59.1 |
Human action recognition using factorized spatio-temporal convolutional networks[Sun, L., Jia, K., Yeung, D., Shi, B.E]
|
|
URL
|
Yes
|
2015
|
63.2 |
Action recognition with trajectory-pooled deepconvolutional descriptors[Wang, L., Qiao, Y., Tang, X]
|
|
URL
|
Yes
|
2015
|
64.8 |
Long-term temporal convolutions for action recognition[Varol, G., Laptev, I., Schmid, C]
|
|
URL
|
Yes
|
2016
|
63.3 |
A key volume mining deep framework for action recognition[Zhu, W., Hu, J., Sun, G., Cao, X., Qiao, Y]
|
|
URL
|
Yes
|
2016
|
69.4 |
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition[Limin Wang , Yuanjun Xiong , Zhe Wang , Yu Qiao , Dahua Lin , Xiaoou Tang , and Luc Van Gool]
|
|
URL
|
Yes
|
2016
|
57.2 |
Action recognition with improved trajectories[Wang, H., Schmid, C]
|
|
URL
|
No
|
2013
|
58.9 |
Hidden Two-Stream Convolutional Networks for Action Recognition[Yi Zhu , Zhenzhong Lan ,Shawn Newsam ,Alexander G. Hauptmann ]
|
|
URL
|
No
|
2017
|
70.6 |
Action Representation Using Classifier Decision Boundaries[Jue Wang , Anoop Cherian , Fatih Porikli , Stephen Gould]
|
|
URL
|
No
|
2017
|
69.8 |
ActionVLAD: Learning spatio-temporal aggregation for action classification[Rohit Girdhar, Deva Ramanan, Abhinav Gupta, Josef Sivic, Bryan Russell]
|
|
URL
|
Yes
|
2017
|
68.9 |
Spatiotemporal Pyramid Network for Video Action Recognition[Yunbo Wang, Mingsheng Long, Jianmin Wang, Philip S. Yu]
|
Spatiotemporal Pyramid Network / BN-Inception |
URL
|
Yes
|
2017
|
72.2 |
Spatiotemporal Multiplier Networks for Video Action Recognition[Christoph Feichtenhofer, Axel Pinz, Richard P. Wildes]
|
Spatiotemporal Multiplier Networks + IDT |
URL
|
Yes
|
2017
|
66.79 |
Action Recognition with Stacked Fisher Vectors[Xiaojiang Peng, Changqing Zou, Yu Qiao, Qiang Peng]
|
Stacked Fisher Vectors (FV+SFV) |
URL
|
Yes
|
2014
|
67 |
Generalized Rank Pooling for Activity Recognition[Anoop Cherian, Basura Fernando, Mehrtash Harandi, Stephen Gould]
|
Generalized Rank Pooling + IDT-FV |
URL
|
Yes
|
2017
|
51.4 |
Hierarchical Clustering Multi-Task Learning for Joint Human Action Grouping and Recognition[ An-An Liu, Yu-Ting Su, Wei-Zhi Nie, Mohan Kankanhalli]
|
HC-MTL with STIP + BOW |
URL
|
Yes
|
2017
|
66.4 |
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset[Joao Carreira, Andrew Zisserman]
|
Two-Stream I3D, ImageNet pre-training |
URL
|
Yes
|
2017
|
80.7 |
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset[Joao Carreira, Andrew Zisserman]
|
Two-Stream I3D, Kinetics pre-training |
URL
|
Yes
|
2017
|
71.8 |
Pillar Networks for action recognition[Biswa Sengupta, Yu Qian]
|
ResNet/Inception + MKL-SVM |
URL
|
Yes
|
2017
|
56.59 |
Robust Action Recognition framework using Segmented Block and Distance Mean Histogram of Gradients Approach[Vikas Tripathi, Durgaprasad Gangodkar, Ankush Mittal, Vishnu Kanth]
|
segmented blocks |
URL
|
Yes
|
2017
|
56 |
Video Classification With CNNs: Using The Codec As A Spatio-Temporal Activity Sensor[Aaron Chadha, Alhabib Abbas and Yiannis Andreopoulos]
|
Codec Based |
URL
|
No
|
2017
|
63 |
Improved Rank Pooling Strategy for Complex Action Recognition[Eman Mohammadi, Q. M. Jonathan Wu, Mehrdad Saif]
|
Improved Rank Pooling |
URL
|
Yes
|
2017
|
71.7 |
Learning Long-Term Dependencies for Action Recognition With a Biologically-Inspired Deep Network[Yemin Shi, Yonghong Tian, Yaowei Wang, Wei Zeng, Tiejun Huang]
|
shuttleNet |
URL
|
Yes
|
2017
|
73.6 |
Pillar Networks++: Distributed non-parametric deep and wide networks[Biswa Sengupta, Yu Qian]
|
Pillar Networks++ (4 Networks) |
URL
|
No
|
2017
|
66.2 |
Lattice Long Short-Term Memory for Human Action Recognition[Lin Sun, Kui Jia, Kevin Chen, Dit Yan Yeung, Bertram E. Shi, Silvio Savarese]
|
Lattice LSTM |
URL
|
Yes
|
2017
|
69.7 |
Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection[Mohammadreza Zolfaghari , Gabriel L. Oliveira, Nima Sedaghat, Thomas Brox]
|
Chained Multi-stream Networks |
URL
|
Yes
|
2017
|
82.1 |
End-to-end Video-level Representation Learning for Action Recognition[Jiagang Zhu, Wei Zou, Zheng Zhu, Lin Li]
|
DTPP (Kinetics pre-training) |
URL
|
No
|
2017
|
69 |
Action Recognition with Coarse-to-Fine Deep Feature Integration and Asynchronous Fusion[Weiyao Lin, Yang Mi, Jianxin Wu, Ke Lu, Hongkai Xiong]
|
CO2FI + ASYN |
URL
|
No
|
2017
|
72.6 |
Action Recognition with Coarse-to-Fine Deep Feature Integration and Asynchronous Fusion[Weiyao Lin, Yang Mi, Jianxin Wu, Ke Lu, Hongkai Xiong]
|
CO2FI + ASYN+IDT |
URL
|
No
|
2017
|
70.2 |
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?[Kensho Hara, Hirokatsu Kataoka, Yutaka Satoh]
|
ResNeXt-101 (64f) |
URL
|
No
|
2017
|
69.2 |
Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification[Xiang Long , Chuang Gan , Gerard de Melo , Jiajun Wu , Xiao Liu , Shilei Wen]
|
Attention Cluster RGB+Flow |
URL
|
No
|
2017
|
70.9 |
Appearance-and-Relation Networks for Video Classification[Limin Wang , Wei Li , Wen Li ,Luc Van Gool]
|
ARTNet with TSN (Pre-train dataset Kinetics) |
URL
|
No
|
2017
|
72.6 |
Action Recognition with Coarse-to-Fine Deep Feature Integration and Asynchronous Fusion[Weiyao Lin , Yang Mi , Jianxin Wu , Ke Lu , Hongkai Xiong]
|
CO2FI + ASYN + IDT |
URL
|
No
|
2017
|
63.5 |
Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification[Ali Diba, Mohsen Fayyaz, Vivek Sharma, Amir Hossein Karami, Mohammad Mahdi Arzani, Rahman Yousefzadeh, Luc Van Gool]
|
T3D+TSN ( Three splits) |
URL
|
No
|
2017
|
61.8 |
Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification[Xiaodong Yang, Pavlo Molchanov, Jan Kautz]
|
|
URL
|
Yes
|
2016
|
53.9 |
Action Recognition Using Super Sparse Coding Vector with Spatio-Temporal Awareness[Xiaodong Yang, Ying-Li Tian]
|
|
URL
|
Yes
|
2014
|
70.2 |
Compressed Video Action Recognition[Chao-Yuan Wu and Manzil Zaheer and Hexiang Hu and R. Manmatha and Alexander J. Smola and Philipp Kraehenbuehl]
|
CoViAR + optical flow |
URL
|
No
|
2017
|
78.7 |
A Closer Look at Spatiotemporal Convolutions for Action Recognition[Du Tran , Heng Wang , Lorenzo Torresani , Jamie Ray, Yann LeCun, Manohar Paluri]
|
|
URL
|
Yes
|
2018
|
66.2 |
Activity Recognition based on a Magnitude-Orientation Stream Network[Caetano, C., de Melo, V. H. C., dos Santos, J. A., Schwartz, W. R.]
|
When compared with neural network methods, we were able to outperform many methods using the proposed Magnitude Orientation Stream (MOS). Furthermore, we were able to outperform the original two-stream by 6:6 p.p. just using our temporal stream. |
URL
|
Yes
|
2017
|
80.9 |
PoTion: Pose MoTion Representation for Action Recognition[Vasileios Choutas, Philippe Weinzaepfel, Jérôme Revaud, Cordelia Schmid]
|
I3D + PoTion |
URL
|
Yes
|
2018
|
81.3 |
Video Representation Learning Using Discriminative Pooling[Jue Wang, Anoop Cherian, Fatih Porikli, Stephen Gould]
|
SVMP+I3D |
URL
|
Yes
|
2018
|
72.2 |
Non-Linear Temporal Subspace Representations for Activity Recognition[Anoop Cherian, Suvrit Sra, Stephen Gould, Richard Hartley]
|
KRP-FS + IDT-FV |
URL
|
Yes
|
2018
|
63.8 |
MiCT: Mixed 3D/2D Convolutional Tube for Human Action Recognition[Yizhou Zhou, Xiaoyan Sun, Zheng-Jun Zha, Wenjun Zeng]
|
MiCT-Net |
URL
|
Yes
|
2018
|
70.5 |
MiCT: Mixed 3D/2D Convolutional Tube for Human Action Recognition[Yizhou Zhou, Xiaoyan Sun, Zheng-Jun Zha, Wenjun Zeng]
|
Two-stream MiCT-Net |
URL
|
Yes
|
2018
|
74.2 |
Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition[Shuyang Sun, Zhanghui Kuang, Wanli Ouyang, Lu Sheng, Wei Zhang]
|
RGB + OFF(RGB) + OFF(optical flow) + OFF(raw-OFF) |
URL
|
Yes
|
2018
|
30.7 |
Geometry Guided Convolutional Neural Networks for Self-Supervised Video Representation Learning[Chuang Gan, Boqing Gong, Kun Liu, Hao Su, Leonidas J. Guibas]
|
GG-CNN ImageNet pretraining |
URL
|
Yes
|
2018
|
72.6 |
End-to-End Learning of Motion Representation for Video Understanding[Lijie Fan, Wenbing Huang, Chuang Gan, Stefano Ermon, Boqing Gong, Junzhou Huang]
|
TVNets + IDT |
URL
|
Yes
|
2018
|
55.4 |
Learning and Using the Arrow of Time[Donglai Wei, Jospeh Lim, Andrew Zisserman, William T. Freeman]
|
AoT (flow only) |
URL
|
Yes
|
2018
|
66.2 |
Action recognition by Latent Duration Model[Tingwei Wang, Chuancai Liu and Liantao Wang]
|
the proposed LDM+MIFS |
URL
|
Yes
|
2017
|
69.5 |
Procedural Generation of Videos to Train Deep Action Recognition Networks[César Roberto de Souza, Adrien Gaidon, Yohann Cabon, Antonio Manuel López Peña]
|
Leveraging our synthetic dataset and multi-task models, we increase the performance from 66.6 to 69.5 |
URL
|
No
|
2017
|
81.1 |
Unsupervised Universal Attribute Modelling for Action Recognition[Debaditya Roy, K. Sri Rama Murty, C. Krishna Mohan]
|
|
URL
|
No
|
2018
|
80.1 |
DA-VLAD: DISCRIMINATIVE ACTION VECTOR OF LOCALLY AGGREGATED DESCRIPTORS FOR ACTION RECOGNITION[Fiza Murtaza, Muhammad Haroon Yousaf, Sergio A. Velastin]
|
iDT+DA-VLAD |
URL
|
No
|
2018
|
74.8 |
IF-TTN: Information Fused Temporal Transformation Network for Video Action Recognition[Ke Yang, Peng Qiao, Dongsheng Li, Yong Dou]
|
Full IF-TTN |
URL
|
No
|
2019
|
70 |
IF-TTN: Information Fused Temporal Transformation Network for Video Action Recognition[Ke Yang, Peng Qiao, Dongsheng Li, Yong Dou]
|
MV-IF-TTN |
URL
|
No
|
2019
|
76.2 |
Holistic Large Scale Video Understanding[Ali Diba, Mohsen Fayyaz, Vivek Sharma, Manohar Paluri, Jurgen Gall, Rainer Stiefelhagen, Luc Van Gool]
|
HATNet (32 frames) |
URL
|
No
|
2019
|
78.7 |
Hidden Two-Stream Convolutional Networks for Action Recognition[Yi Zhu, Zhenzhong Lan, Shawn Newsam, Alexander Hauptmann]
|
Hidden Two-stream(I3D) |
URL
|
Yes
|
2018
|
74.8 |
Spatial-Temporal Pyramid Based Convolutional Neural Network for Action Recognition[Zhenxing Zheng, Gaoyun An, Dapeng Wu, Qiuqi Ruan]
|
S-TPNet + iDT |
URL
|
No
|
2019
|
82.48 |
Hallucinating IDT Descriptors and I3D Optical Flow Features for ActionRecognition with CNNs[Lei Wang, Piotr Koniusz, Du Q. Huynh]
|
HAF+BoW/FV halluc. |
URL
|
Yes
|
2019
|
65.9 |
Moments in Time Dataset: one million videos for event understanding[Mathew Monfort, Alex Andonian, Bolei Zhou, Kandan Ramakrishnan, Sarah Adel Bargal, Tom Yan, Lisa Brown, Quanfu Fan, Dan Gutfruend, Carl Vondrick, Aude Oliva]
|
ResNet50 I3D pretrained on Moments and Kinetics |
URL
|
Yes
|
2019
|
82.1 |
PA3D: Pose-Action 3D Machine for Video Recognition[An Yan, Yali Wang, Zhifeng Li, Yu Qiao]
|
PA3D + I3D |
URL
|
Yes
|
2019
|
74.9 |
Spatio-Temporal Channel Correlation Networks for Action Classification[Ali Diba*, Mohsen Fayyaz*, Vivek Sharma, M Mahdi Arzani, Rahman Yousefzadeh, Juergen Gall, Luc Van Gool]
|
STC-ResNext 101 (64 frames) RGB Only |
URL
|
Yes
|
2018
|
63.5 |
Temporal 3d convnets using temporal transition layer[Ali Diba, Mohsen Fayyaz, Vivek Sharma, A Hossein Karami, M Mahdi Arzani, Rahman Yousefzadeh, Luc Van Gool]
|
|
URL
|
No
|
2018
|
82.3 |
Evolving Space-Time Neural Architectures for Videos[AJ Piergiovanni, Anelia Angelova, Alexander Toshev, and Michael Ryoo]
|
|
URL
|
Yes
|
2019
|
81.1 |
Representation Flow for Action Recognition[AJ Piergiovanni and Michael Ryoo]
|
|
URL
|
Yes
|
2019
|
79.5 |
MARS: Motion-Augmented RGB Stream for Action Recognition[Nieves Crasto, Philippe Weinzaepfel, Karteek Alahari, Cordelia Schmid]
|
input = RGB frames (Pretrained on Kinetics) |
URL
|
Yes
|
2019
|
81.1 |
Global and Local Knowledge-Aware Attention Network for Action Recognition[Zhenxing Zheng, Gaoyun An, Dapeng Wu, Qiuqi Ruan]
|
global and local attention + I3D |
URL
|
No
|
2019
|
74.6 |
Multi-Fiber Networks for Video Recognition[Yunpeng Chen,Yannis Kalantidis,Jianshu Li,Shuicheng Yan,Jiashi Feng]
|
|
URL
|
No
|
2018
|