Disclaimer
If you're planing to use information provided on this site, please keep in mind that all numbers and papers are added by authors without double checking. We of course try to keep results as accurate as possible, and whenever we got notice of an error it will be fixed, but this does not release you from the obligation of reading the papers and double checking the numbers listed here before using them.

UCF101

Dataset URL

Description : UCF101 is an action recognition data set of realistic action videos, collected from YouTube, having 101 action categories. This data set is an extension of UCF50 data set which has 50 action categories.

Number of Videos : 13320

Number of Classes : 101

Evaluation: UCF101 Eval

Description: Three splits as defined by authors

Results


Result Paper Description URL Peer Reviewed Year
Result Paper Description URL Peer Reviewed Year
83.5 Multi-view super vector for action recognition[Cai, Z., Wang, L., Peng, X., Qiao, Y] MVSV URL Yes 2014
87.9 Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice[Peng, X., Wang, L., Wang, X., Qiao, Y] URL Yes 2016
88.3 A multi-level representation for action recognition[Wang, L., Qiao, Y., Tang, X] URL Yes 2016
88 Two-stream convolutional networks for action recognition in videos[Simonyan, K., Zisserman, A] URL Yes 2014
88.1 Human action recognition using factorized spatio-temporal convolutional networks[Sun, L., Jia, K., Yeung, D., Shi, B.E] URL Yes 2015
90.3 Action recognition with trajectory-pooled deepconvolutional descriptors[Wang, L., Qiao, Y., Tang, X] URL Yes 2015
91.7 Long-term temporal convolutions for action recognition[Varol, G., Laptev, I., Schmid, C] URL Yes 2016
93.1 A key volume mining deep framework for action recognition[Zhu, W., Hu, J., Sun, G., Cao, X., Qiao, Y] URL Yes 2016
94.2 Temporal Segment Networks: Towards Good Practices for Deep Action Recognition[Limin Wang , Yuanjun Xiong , Zhe Wang , Yu Qiao , Dahua Lin , Xiaoou Tang , and Luc Van Gool] URL Yes 2016
88.6 Beyond short snippets: Deep networks for video classification[Ng, J.Y.H., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G] URL Yes 2015
85.2 Learning spatiotemporal features with 3d convolutional networks[Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., Paluri, M] URL No 2015
90.3 Hidden Two-Stream Convolutional Networks for Action Recognition[Yi Zhu , Zhenzhong Lan ,Shawn Newsam ,Alexander G. Hauptmann ] URL No 2017
94.6 Action Representation Using Classifier Decision Boundaries[Jue Wang , Anoop Cherian , Fatih Porikli , Stephen Gould] URL No 2017
93.6 ActionVLAD: Learning spatio-temporal aggregation for action classification[Rohit Girdhar, Deva Ramanan, Abhinav Gupta, Josef Sivic, Bryan Russell] URL Yes 2017
94.6 Spatiotemporal Pyramid Network for Video Action Recognition[Yunbo Wang, Mingsheng Long, Jianmin Wang, Philip S. Yu] Spatiotemporal Pyramid Network / BN-Inception URL Yes 2017
94.9 Spatiotemporal Multiplier Networks for Video Action Recognition[Christoph Feichtenhofer, Axel Pinz, Richard P. Wildes] Spatiotemporal Multiplier Networks + IDT URL Yes 2017
92.3 Generalized Rank Pooling for Activity Recognition[Anoop Cherian, Basura Fernando, Mehrtash Harandi, Stephen Gould] Generalized Rank Pooling + IDT-FV URL Yes 2017
76.3 Hierarchical Clustering Multi-Task Learning for Joint Human Action Grouping and Recognition[ An-An Liu, Yu-Ting Su, Wei-Zhi Nie, Mohan Kankanhalli] HC-MTL with STIP + BOW URL Yes 2017
93.4 Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset[Joao Carreira, Andrew Zisserman] Two-Stream I3D, ImageNet pre-training URL Yes 2017
98 Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset[Joao Carreira, Andrew Zisserman] Two-Stream I3D, Kinetics pre-training URL Yes 2017
89.8 Video Classification With CNNs: Using The Codec As A Spatio-Temporal Activity Sensor[Aaron Chadha, Alhabib Abbas and Yiannis Andreopoulos] Codec Based URL No 2017
94.5 Learning Gating ConvNet for Two-Stream based Methods in Action Recognition[Jiagang Zhu , Wei Zou , Zheng Zhu] Gated TSN URL No 2017
95.4 Learning Long-Term Dependencies for Action Recognition With a Biologically-Inspired Deep Network[Yemin Shi, Yonghong Tian, Yaowei Wang, Wei Zeng, Tiejun Huang] shuttleNet URL Yes 2017
95.8 Eigen Evolution Pooling for Human Action Recognition[Yang Wang, Vinh Tran, Minh Hoai] Eigen TSN + DTD URL No 2017
93.6 Lattice Long Short-Term Memory for Human Action Recognition[Lin Sun, Kui Jia, Kevin Chen, Dit Yan Yeung, Bertram E. Shi, Silvio Savarese] Lattice LSTM URL Yes 2017
91.1 Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection[Mohammadreza Zolfaghari , Gabriel L. Oliveira, Nima Sedaghat, Thomas Brox] Chained Multi-stream Networks URL Yes 2017
98 End-to-end Video-level Representation Learning for Action Recognition[Jiagang Zhu, Wei Zou, Zheng Zhu, Lin Li] DTPP (Kinetics pre-training) URL No 2017
94.3 Action Recognition with Coarse-to-Fine Deep Feature Integration and Asynchronous Fusion[Weiyao Lin, Yang Mi, Jianxin Wu, Ke Lu, Hongkai Xiong] CO2FI + ASYN URL No 2017
95.2 Action Recognition with Coarse-to-Fine Deep Feature Integration and Asynchronous Fusion[Weiyao Lin, Yang Mi, Jianxin Wu, Ke Lu, Hongkai Xiong] CO2FI + ASYN+IDT URL No 2017
94.5 Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?[Kensho Hara, Hirokatsu Kataoka, Yutaka Satoh] ResNeXt-101 (64f) URL No 2017
94.6 Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification[Xiang Long , Chuang Gan , Gerard de Melo , Jiajun Wu , Xiao Liu , Shilei Wen] Attention Cluster RGB+Flow URL No 2017
94.3 Appearance-and-Relation Networks for Video Classification[Limin Wang , Wei Li , Wen Li ,Luc Van Gool] ARTNet with TSN (Pre-train dataset Kinetics) URL No 2017
95.2 Action Recognition with Coarse-to-Fine Deep Feature Integration and Asynchronous Fusion[Weiyao Lin , Yang Mi , Jianxin Wu , Ke Lu , Hongkai Xiong] CO2FI + ASYN + IDT URL No 2017
93.2 Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification[Ali Diba, Mohsen Fayyaz, Vivek Sharma, Amir Hossein Karami, Mohammad Mahdi Arzani, Rahman Yousefzadeh, Luc Van Gool] T3D+TSN ( Three splits) URL No 2017
91.6 Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification[Xiaodong Yang, Pavlo Molchanov, Jan Kautz] URL Yes 2016
94.9 Compressed Video Action Recognition[Chao-Yuan Wu and Manzil Zaheer and Hexiang Hu and R. Manmatha and Alexander J. Smola and Philipp Kraehenbuehl] CoViAR + optical flow URL No 2017
94.3 Making Convolutional Networks Recurrent for Visual Sequence Learning[Xiaodong Yang, Pavlo Molchanov, Jan Kautz ] URL Yes 2018
97.3 A Closer Look at Spatiotemporal Convolutions for Action Recognition[Du Tran , Heng Wang , Lorenzo Torresani , Jamie Ray, Yann LeCun, Manohar Paluri] URL Yes 2018
79 What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets[De-An Huang, Vignesh Ramanathan, Dhruv Mahajan, Lorenzo Torresani , Manohar Paluri, Li Fei-Fei, and Juan Carlos Niebles] URL Yes 2018
93.8 Activity Recognition based on a Magnitude-Orientation Stream Network[Caetano, C., de Melo, V. H. C., dos Santos, J. A., Schwartz, W. R.] According to the results, just using our Magnitude-Orientation Stream (MOS), we outperform many methods . In comparison with C3D, we outperform them by 5.3 p.p. using our temporal stream and 8.6 p.p. when combining it with Very Deep Two-Stream. This indicates that our magnitude orientation approach learns temporal information better than the approaches that perform 3D convolution operations directly. It is worth mentioning that we also improved the results achieved by the original two-stream by URL Yes 2017
98.2 PoTion: Pose MoTion Representation for Action Recognition[Vasileios Choutas, Philippe Weinzaepfel, Jérôme Revaud, Cordelia Schmid] I3D + PoTion URL Yes 2018
88.9 MiCT: Mixed 3D/2D Convolutional Tube for Human Action Recognition[Yizhou Zhou, Xiaoyan Sun, Zheng-Jun Zha, Wenjun Zeng] MiCT-Net URL Yes 2018
94.7 MiCT: Mixed 3D/2D Convolutional Tube for Human Action Recognition[Yizhou Zhou, Xiaoyan Sun, Zheng-Jun Zha, Wenjun Zeng] Two-stream MiCT-Net URL Yes 2018
96 Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition[Shuyang Sun, Zhanghui Kuang, Wanli Ouyang, Lu Sheng, Wei Zhang] RGB + OFF(RGB) + OFF(optical flow) + OFF(raw-OFF) URL Yes 2018
95.9 Recognize Actions by Disentangling Components of Dynamics[Yue Zhao, Yuanjun Xiong, Dahua Lin] Disen w. pretrained ImageNet+Kinetics URL Yes 2018
66.1 Geometry Guided Convolutional Neural Networks for Self-Supervised Video Representation Learning[Chuang Gan, Boqing Gong, Kun Liu, Hao Su, Leonidas J. Guibas] GG-CNN ImageNet pretraining URL Yes 2018
94.3 Making Convolutional Networks Recurrent for Visual Sequence Learning[Xiaodong Yang, Pavlo Molchanov, Jan Kautz] PreRNN-SIH URL Yes 2018
95.4 End-to-End Learning of Motion Representation for Video Understanding[Lijie Fan, Wenbing Huang, Chuang Gan, Stefano Ermon, Boqing Gong, Junzhou Huang] TVNets + IDT URL Yes 2018
87.8 Learning and Using the Arrow of Time[Donglai Wei, Jospeh Lim, Andrew Zisserman, William T. Freeman] AoT (flow only) URL Yes 2018
94.2 Procedural Generation of Videos to Train Deep Action Recognition Networks[César Roberto de Souza, Adrien Gaidon, Yohann Cabon, Antonio Manuel López Peña] Leveraging our synthetic dataset and multi-task models, we increase the performance from 93.6 to 94.2 URL No 2017

If you want to add this result data into your web page, please insert the following HTML code on your web page: