If you're planing to use information provided on this site, please keep in mind that all numbers and papers are added by authors without double checking. We of course try to keep results as accurate as possible, and whenever we got notice of an error it will be fixed, but this does not release you from the obligation of reading the papers and double checking the numbers listed here before using them.


Dataset URL

Description : UCF101 is an action recognition data set of realistic action videos, collected from YouTube, having 101 action categories. This data set is an extension of UCF50 data set which has 50 action categories.

Number of Videos : 13320

Number of Classes : 101

Evaluation: UCF101 Eval

Description: Three splits as defined by authors


Result Paper Description URL Peer Reviewed Year
Result Paper Description URL Peer Reviewed Year
83.5 Multi-view super vector for action recognition[Cai, Z., Wang, L., Peng, X., Qiao, Y] MVSV URL Yes 2014
87.9 Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice[Peng, X., Wang, L., Wang, X., Qiao, Y] URL Yes 2016
88.3 A multi-level representation for action recognition[Wang, L., Qiao, Y., Tang, X] URL Yes 2016
88 Two-stream convolutional networks for action recognition in videos[Simonyan, K., Zisserman, A] URL Yes 2014
88.1 Human action recognition using factorized spatio-temporal convolutional networks[Sun, L., Jia, K., Yeung, D., Shi, B.E] URL Yes 2015
90.3 Action recognition with trajectory-pooled deepconvolutional descriptors[Wang, L., Qiao, Y., Tang, X] URL Yes 2015
91.7 Long-term temporal convolutions for action recognition[Varol, G., Laptev, I., Schmid, C] URL Yes 2016
93.1 A key volume mining deep framework for action recognition[Zhu, W., Hu, J., Sun, G., Cao, X., Qiao, Y] URL Yes 2016
94.2 Temporal Segment Networks: Towards Good Practices for Deep Action Recognition[Limin Wang , Yuanjun Xiong , Zhe Wang , Yu Qiao , Dahua Lin , Xiaoou Tang , and Luc Van Gool] URL Yes 2016
88.6 Beyond short snippets: Deep networks for video classification[Ng, J.Y.H., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G] URL Yes 2015
85.2 Learning spatiotemporal features with 3d convolutional networks[Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., Paluri, M] URL No 2015
90.3 Hidden Two-Stream Convolutional Networks for Action Recognition[Yi Zhu , Zhenzhong Lan ,Shawn Newsam ,Alexander G. Hauptmann ] URL No 2017
94.6 Action Representation Using Classifier Decision Boundaries[Jue Wang , Anoop Cherian , Fatih Porikli , Stephen Gould] URL No 2017
93.6 ActionVLAD: Learning spatio-temporal aggregation for action classification[Rohit Girdhar, Deva Ramanan, Abhinav Gupta, Josef Sivic, Bryan Russell] URL Yes 2017
94.6 Spatiotemporal Pyramid Network for Video Action Recognition[Yunbo Wang, Mingsheng Long, Jianmin Wang, Philip S. Yu] Spatiotemporal Pyramid Network / BN-Inception URL Yes 2017
94.9 Spatiotemporal Multiplier Networks for Video Action Recognition[Christoph Feichtenhofer, Axel Pinz, Richard P. Wildes] Spatiotemporal Multiplier Networks + IDT URL Yes 2017
92.3 Generalized Rank Pooling for Activity Recognition[Anoop Cherian, Basura Fernando, Mehrtash Harandi, Stephen Gould] Generalized Rank Pooling + IDT-FV URL Yes 2017
76.3 Hierarchical Clustering Multi-Task Learning for Joint Human Action Grouping and Recognition[ An-An Liu, Yu-Ting Su, Wei-Zhi Nie, Mohan Kankanhalli] HC-MTL with STIP + BOW URL Yes 2017
93.4 Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset[Joao Carreira, Andrew Zisserman] Two-Stream I3D, ImageNet pre-training URL Yes 2017
98 Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset[Joao Carreira, Andrew Zisserman] Two-Stream I3D, Kinetics pre-training URL Yes 2017
89.8 Video Classification With CNNs: Using The Codec As A Spatio-Temporal Activity Sensor[Aaron Chadha, Alhabib Abbas and Yiannis Andreopoulos] Codec Based URL No 2017
94.5 Learning Gating ConvNet for Two-Stream based Methods in Action Recognition[Jiagang Zhu , Wei Zou , Zheng Zhu] Gated TSN URL No 2017
95.4 Learning Long-Term Dependencies for Action Recognition With a Biologically-Inspired Deep Network[Yemin Shi, Yonghong Tian, Yaowei Wang, Wei Zeng, Tiejun Huang] shuttleNet URL Yes 2017
95.8 Eigen Evolution Pooling for Human Action Recognition[Yang Wang, Vinh Tran, Minh Hoai] Eigen TSN + DTD URL No 2017
93.6 Lattice Long Short-Term Memory for Human Action Recognition[Lin Sun, Kui Jia, Kevin Chen, Dit Yan Yeung, Bertram E. Shi, Silvio Savarese] Lattice LSTM URL Yes 2017
91.1 Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection[Mohammadreza Zolfaghari , Gabriel L. Oliveira, Nima Sedaghat, Thomas Brox] Chained Multi-stream Networks URL Yes 2017
98 End-to-end Video-level Representation Learning for Action Recognition[Jiagang Zhu, Wei Zou, Zheng Zhu, Lin Li] DTPP (Kinetics pre-training) URL No 2017
94.3 Action Recognition with Coarse-to-Fine Deep Feature Integration and Asynchronous Fusion[Weiyao Lin, Yang Mi, Jianxin Wu, Ke Lu, Hongkai Xiong] CO2FI + ASYN URL No 2017
95.2 Action Recognition with Coarse-to-Fine Deep Feature Integration and Asynchronous Fusion[Weiyao Lin, Yang Mi, Jianxin Wu, Ke Lu, Hongkai Xiong] CO2FI + ASYN+IDT URL No 2017
96 Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition[Shuyang Sun 1, 2 , Zhanghui Kuang , Wanli Ouyang , Lu Sheng , Wei Zhang] Three splits URL No 2017
94.5 Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?[Kensho Hara, Hirokatsu Kataoka, Yutaka Satoh] ResNeXt-101 (64f) URL No 2017
94.6 Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification[Xiang Long , Chuang Gan , Gerard de Melo , Jiajun Wu , Xiao Liu , Shilei Wen] Attention Cluster RGB+Flow URL No 2017
94.3 Appearance-and-Relation Networks for Video Classification[Limin Wang , Wei Li , Wen Li ,Luc Van Gool] ARTNet with TSN (Pre-train dataset Kinetics) URL No 2017
95.2 Action Recognition with Coarse-to-Fine Deep Feature Integration and Asynchronous Fusion[Weiyao Lin , Yang Mi , Jianxin Wu , Ke Lu , Hongkai Xiong] CO2FI + ASYN + IDT URL No 2017
93.2 Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification[Ali Diba, Mohsen Fayyaz, Vivek Sharma, Amir Hossein Karami, Mohammad Mahdi Arzani, Rahman Yousefzadeh, Luc Van Gool] T3D+TSN ( Three splits) URL No 2017
91.6 Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification[Xiaodong Yang, Pavlo Molchanov, Jan Kautz] URL Yes 2016
94.9 Compressed Video Action Recognition[Chao-Yuan Wu and Manzil Zaheer and Hexiang Hu and R. Manmatha and Alexander J. Smola and Philipp Kraehenbuehl] CoViAR + optical flow URL No 2017

If you want to add this result data into your web page, please insert the following HTML code on your web page: