If you're planing to use information provided on this site, please keep in mind that all numbers and papers are added by authors without double checking. We of course try to keep results as accurate as possible, and whenever we got notice of an error it will be fixed, but this does not release you from the obligation of reading the papers and double checking the numbers listed here before using them.


Dataset URL

Description : HMDB collected from various sources, mostly from movies, and a small proportion from public databases such as the Prelinger archive, YouTube and Google videos. The dataset contains 6849 clips divided into 51 action categories, each containing a minimum of 101 clips.

Number of Videos : 6849

Number of Classes : 51

Evaluation: HMDB Eval



Result Paper Description URL Peer Reviewed Year
Result Paper Description URL Peer Reviewed Year
59.5 Multi-view super vector for action recognition[Cai, Z., Wang, L., Peng, X., Qiao, Y] MVSV URL Yes 2014
61.1 Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice[Peng, X., Wang, L., Wang, X., Qiao, Y] URL Yes 2016
61.7 A multi-level representation for action recognition[Wang, L., Qiao, Y., Tang, X] URL Yes 2016
59.4 Two-stream convolutional networks for action recognition in videos[Simonyan, K., Zisserman, A] URL Yes 2014
63.7 Modeling video evolution for action recognition[Fernando, B., Gavves, E., M., J.O., Ghodrati, A.] URL Yes 2015
65.5 Motion part regularization: Improving action recognition via trajectory group selection[Ni, B., Moulin, P., Yang, X., Yan, S] URL Yes 2015
59.1 Human action recognition using factorized spatio-temporal convolutional networks[Sun, L., Jia, K., Yeung, D., Shi, B.E] URL Yes 2015
63.2 Action recognition with trajectory-pooled deepconvolutional descriptors[Wang, L., Qiao, Y., Tang, X] URL Yes 2015
64.8 Long-term temporal convolutions for action recognition[Varol, G., Laptev, I., Schmid, C] URL Yes 2016
63.3 A key volume mining deep framework for action recognition[Zhu, W., Hu, J., Sun, G., Cao, X., Qiao, Y] URL Yes 2016
69.4 Temporal Segment Networks: Towards Good Practices for Deep Action Recognition[Limin Wang , Yuanjun Xiong , Zhe Wang , Yu Qiao , Dahua Lin , Xiaoou Tang , and Luc Van Gool] URL Yes 2016
57.2 Action recognition with improved trajectories[Wang, H., Schmid, C] URL No 2013
58.9 Hidden Two-Stream Convolutional Networks for Action Recognition[Yi Zhu , Zhenzhong Lan ,Shawn Newsam ,Alexander G. Hauptmann ] URL No 2017
70.6 Action Representation Using Classifier Decision Boundaries[Jue Wang , Anoop Cherian , Fatih Porikli , Stephen Gould] URL No 2017
69.8 ActionVLAD: Learning spatio-temporal aggregation for action classification[Rohit Girdhar, Deva Ramanan, Abhinav Gupta, Josef Sivic, Bryan Russell] URL Yes 2017
68.9 Spatiotemporal Pyramid Network for Video Action Recognition[Yunbo Wang, Mingsheng Long, Jianmin Wang, Philip S. Yu] Spatiotemporal Pyramid Network / BN-Inception URL Yes 2017
72.2 Spatiotemporal Multiplier Networks for Video Action Recognition[Christoph Feichtenhofer, Axel Pinz, Richard P. Wildes] Spatiotemporal Multiplier Networks + IDT URL Yes 2017
66.79 Action Recognition with Stacked Fisher Vectors[Xiaojiang Peng, Changqing Zou, Yu Qiao, Qiang Peng] Stacked Fisher Vectors (FV+SFV) URL Yes 2014
67 Generalized Rank Pooling for Activity Recognition[Anoop Cherian, Basura Fernando, Mehrtash Harandi, Stephen Gould] Generalized Rank Pooling + IDT-FV URL Yes 2017
51.4 Hierarchical Clustering Multi-Task Learning for Joint Human Action Grouping and Recognition[ An-An Liu, Yu-Ting Su, Wei-Zhi Nie, Mohan Kankanhalli] HC-MTL with STIP + BOW URL Yes 2017
66.4 Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset[Joao Carreira, Andrew Zisserman] Two-Stream I3D, ImageNet pre-training URL Yes 2017
80.7 Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset[Joao Carreira, Andrew Zisserman] Two-Stream I3D, Kinetics pre-training URL Yes 2017
71.8 Pillar Networks for action recognition[Biswa Sengupta, Yu Qian] ResNet/Inception + MKL-SVM URL Yes 2017
56.59 Robust Action Recognition framework using Segmented Block and Distance Mean Histogram of Gradients Approach[Vikas Tripathi, Durgaprasad Gangodkar, Ankush Mittal, Vishnu Kanth] segmented blocks URL Yes 2017
56 Video Classification With CNNs: Using The Codec As A Spatio-Temporal Activity Sensor[Aaron Chadha, Alhabib Abbas and Yiannis Andreopoulos] Codec Based URL No 2017
63 Improved Rank Pooling Strategy for Complex Action Recognition[Eman Mohammadi, Q. M. Jonathan Wu, Mehrdad Saif] Improved Rank Pooling URL Yes 2017
71.7 Learning Long-Term Dependencies for Action Recognition With a Biologically-Inspired Deep Network[Yemin Shi, Yonghong Tian, Yaowei Wang, Wei Zeng, Tiejun Huang] shuttleNet URL Yes 2017
73.6 Pillar Networks++: Distributed non-parametric deep and wide networks[Biswa Sengupta, Yu Qian] Pillar Networks++ (4 Networks) URL No 2017
66.2 Lattice Long Short-Term Memory for Human Action Recognition[Lin Sun, Kui Jia, Kevin Chen, Dit Yan Yeung, Bertram E. Shi, Silvio Savarese] Lattice LSTM URL Yes 2017
69.7 Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection[Mohammadreza Zolfaghari , Gabriel L. Oliveira, Nima Sedaghat, Thomas Brox] Chained Multi-stream Networks URL Yes 2017
82.1 End-to-end Video-level Representation Learning for Action Recognition[Jiagang Zhu, Wei Zou, Zheng Zhu, Lin Li] DTPP (Kinetics pre-training) URL No 2017
69 Action Recognition with Coarse-to-Fine Deep Feature Integration and Asynchronous Fusion[Weiyao Lin, Yang Mi, Jianxin Wu, Ke Lu, Hongkai Xiong] CO2FI + ASYN URL No 2017
72.6 Action Recognition with Coarse-to-Fine Deep Feature Integration and Asynchronous Fusion[Weiyao Lin, Yang Mi, Jianxin Wu, Ke Lu, Hongkai Xiong] CO2FI + ASYN+IDT URL No 2017
70.2 Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?[Kensho Hara, Hirokatsu Kataoka, Yutaka Satoh] ResNeXt-101 (64f) URL No 2017
69.2 Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification[Xiang Long , Chuang Gan , Gerard de Melo , Jiajun Wu , Xiao Liu , Shilei Wen] Attention Cluster RGB+Flow URL No 2017
70.9 Appearance-and-Relation Networks for Video Classification[Limin Wang , Wei Li , Wen Li ,Luc Van Gool] ARTNet with TSN (Pre-train dataset Kinetics) URL No 2017
72.6 Action Recognition with Coarse-to-Fine Deep Feature Integration and Asynchronous Fusion[Weiyao Lin , Yang Mi , Jianxin Wu , Ke Lu , Hongkai Xiong] CO2FI + ASYN + IDT URL No 2017
63.5 Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification[Ali Diba, Mohsen Fayyaz, Vivek Sharma, Amir Hossein Karami, Mohammad Mahdi Arzani, Rahman Yousefzadeh, Luc Van Gool] T3D+TSN ( Three splits) URL No 2017
61.8 Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification[Xiaodong Yang, Pavlo Molchanov, Jan Kautz] URL Yes 2016
53.9 Action Recognition Using Super Sparse Coding Vector with Spatio-Temporal Awareness[Xiaodong Yang, Ying-Li Tian] URL Yes 2014
70.2 Compressed Video Action Recognition[Chao-Yuan Wu and Manzil Zaheer and Hexiang Hu and R. Manmatha and Alexander J. Smola and Philipp Kraehenbuehl] CoViAR + optical flow URL No 2017
78.7 A Closer Look at Spatiotemporal Convolutions for Action Recognition[Du Tran , Heng Wang , Lorenzo Torresani , Jamie Ray, Yann LeCun, Manohar Paluri] URL Yes 2018
66.2 Activity Recognition based on a Magnitude-Orientation Stream Network[Caetano, C., de Melo, V. H. C., dos Santos, J. A., Schwartz, W. R.] When compared with neural network methods, we were able to outperform many methods using the proposed Magnitude Orientation Stream (MOS). Furthermore, we were able to outperform the original two-stream by 6:6 p.p. just using our temporal stream. URL Yes 2017
80.9 PoTion: Pose MoTion Representation for Action Recognition[Vasileios Choutas, Philippe Weinzaepfel, Jérôme Revaud, Cordelia Schmid] I3D + PoTion URL Yes 2018
81.3 Video Representation Learning Using Discriminative Pooling[Jue Wang, Anoop Cherian, Fatih Porikli, Stephen Gould] SVMP+I3D URL Yes 2018
72.2 Non-Linear Temporal Subspace Representations for Activity Recognition[Anoop Cherian, Suvrit Sra, Stephen Gould, Richard Hartley] KRP-FS + IDT-FV URL Yes 2018
63.8 MiCT: Mixed 3D/2D Convolutional Tube for Human Action Recognition[Yizhou Zhou, Xiaoyan Sun, Zheng-Jun Zha, Wenjun Zeng] MiCT-Net URL Yes 2018
70.5 MiCT: Mixed 3D/2D Convolutional Tube for Human Action Recognition[Yizhou Zhou, Xiaoyan Sun, Zheng-Jun Zha, Wenjun Zeng] Two-stream MiCT-Net URL Yes 2018
74.2 Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition[Shuyang Sun, Zhanghui Kuang, Wanli Ouyang, Lu Sheng, Wei Zhang] RGB + OFF(RGB) + OFF(optical flow) + OFF(raw-OFF) URL Yes 2018
30.7 Geometry Guided Convolutional Neural Networks for Self-Supervised Video Representation Learning[Chuang Gan, Boqing Gong, Kun Liu, Hao Su, Leonidas J. Guibas] GG-CNN ImageNet pretraining URL Yes 2018
72.6 End-to-End Learning of Motion Representation for Video Understanding[Lijie Fan, Wenbing Huang, Chuang Gan, Stefano Ermon, Boqing Gong, Junzhou Huang] TVNets + IDT URL Yes 2018
55.4 Learning and Using the Arrow of Time[Donglai Wei, Jospeh Lim, Andrew Zisserman, William T. Freeman] AoT (flow only) URL Yes 2018
66.2 Action recognition by Latent Duration Model[Tingwei Wang, Chuancai Liu and Liantao Wang] the proposed LDM+MIFS URL Yes 2017
69.5 Procedural Generation of Videos to Train Deep Action Recognition Networks[César Roberto de Souza, Adrien Gaidon, Yohann Cabon, Antonio Manuel López Peña] Leveraging our synthetic dataset and multi-task models, we increase the performance from 66.6 to 69.5 URL No 2017
81.1 Unsupervised Universal Attribute Modelling for Action Recognition[Debaditya Roy, K. Sri Rama Murty, C. Krishna Mohan] URL No 2018

If you want to add this result data into your web page, please insert the following HTML code on your web page: