Paper Overview


Paper Title
Weakly Supervised Action Learning with RNN based Fine-to-coarse Modeling [Alexander Richard, Hilde Kuehne, Juergen Gall]new
NeuralNetwork-Viterbi: A Framework for Weakly Supervised Video Learning [Alexander Richard, Hilde Kuehne, Ahsan Iqbal, Juergen Gall]new
Procedural Generation of Videos to Train Deep Action Recognition Networks [César Roberto de Souza, Adrien Gaidon, Yohann Cabon, Antonio Manuel López Peña]new
Action recognition by Latent Duration Model[Tingwei Wang, Chuancai Liu and Liantao Wang]
Adding Attentiveness to the Neurons in Recurrent Neural Networks[Pengfei Zhang, Jianru Xue, Cuiling Lan, Wenjun Zeng, Zhanning Gao, Nanning Zheng]
Learning and Using the Arrow of Time[Donglai Wei, Jospeh Lim, Andrew Zisserman, William T. Freeman]
End-to-End Learning of Motion Representation for Video Understanding[Lijie Fan, Wenbing Huang, Chuang Gan, Stefano Ermon, Boqing Gong, Junzhou Huang]
Making Convolutional Networks Recurrent for Visual Sequence Learning[Xiaodong Yang, Pavlo Molchanov, Jan Kautz]
Learning Latent Super-Events to Detect Multiple Activities in Videos[AJ Piergiovanni, Michael S. Ryoo]
Geometry Guided Convolutional Neural Networks for Self-Supervised Video Representation Learning[Chuang Gan, Boqing Gong, Kun Liu, Hao Su, Leonidas J. Guibas]
Recognize Actions by Disentangling Components of Dynamics[Yue Zhao, Yuanjun Xiong, Dahua Lin]
Weakly-Supervised Action Segmentation with Iterative Soft Boundary Assignment[Li Ding, Chenliang Xu]
Optical Flow Guided Feature: A Fast and Robust Motion Representation for Video Action Recognition[Shuyang Sun, Zhanghui Kuang, Wanli Ouyang, Lu Sheng, Wei Zhang]
MiCT: Mixed 3D/2D Convolutional Tube for Human Action Recognition[Yizhou Zhou, Xiaoyan Sun, Zheng-Jun Zha, Wenjun Zeng]
Non-Linear Temporal Subspace Representations for Activity Recognition[Anoop Cherian, Suvrit Sra, Stephen Gould, Richard Hartley]
Video Representation Learning Using Discriminative Pooling[Jue Wang, Anoop Cherian, Fatih Porikli, Stephen Gould]
PoTion: Pose MoTion Representation for Action Recognition[Vasileios Choutas, Philippe Weinzaepfel, Jérôme Revaud, Cordelia Schmid]
Activity Recognition based on a Magnitude-Orientation Stream Network[Caetano, C., de Melo, V. H. C., dos Santos, J. A., Schwartz, W. R.]
What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets[De-An Huang, Vignesh Ramanathan, Dhruv Mahajan, Lorenzo Torresani , Manohar Paluri, Li Fei-Fei, and Juan Carlos Niebles]
What Makes a Video a Video: Analyzing Temporal Information in Video Understanding Models and Datasets[De-An Huang, Vignesh Ramanathan, Dhruv Mahajan, Lorenzo Torresani , Manohar Paluri, Li Fei-Fei, and Juan Carlos Niebles]
A Closer Look at Spatiotemporal Convolutions for Action Recognition[Du Tran , Heng Wang , Lorenzo Torresani , Jamie Ray, Yann LeCun, Manohar Paluri]
Making Convolutional Networks Recurrent for Visual Sequence Learning[Xiaodong Yang, Pavlo Molchanov, Jan Kautz ]
Temporal Dynamic Graph LSTM for Action-driven Video Object Detection[Yuan Yuan, Xiaodan Liang, Xiaolong Wang, Dit-Yan Yeung, Abhinav Gupta]
Compressed Video Action Recognition[Chao-Yuan Wu and Manzil Zaheer and Hexiang Hu and R. Manmatha and Alexander J. Smola and Philipp Kraehenbuehl]
An End-to-end 3D Convolutional Neural Network for Action Detection and Segmentation in Videos[Rui Hou and Chen Chen and Mubarak Shah]
Action Recognition Using Super Sparse Coding Vector with Spatio-Temporal Awareness[Xiaodong Yang, Ying-Li Tian]
Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification[Xiaodong Yang, Pavlo Molchanov, Jan Kautz]
Multilayer and Multimodal Fusion of Deep Neural Networks for Video Classification[Xiaodong Yang, Pavlo Molchanov, Jan Kautz]
Non-local Neural Networks[Xiaolong Wang, Ross Girshick, Abhinav Gupta, Kaiming He]
Temporal 3D ConvNets: New Architecture and Transfer Learning for Video Classification[Ali Diba, Mohsen Fayyaz, Vivek Sharma, Amir Hossein Karami, Mohammad Mahdi Arzani, Rahman Yousefzadeh, Luc Van Gool]
Attend and Interact: Higher-Order Object Interactions for Video Understanding[Chih-Yao Ma , Asim Kadav , Iain Melvin , Zsolt Kira, Ghassan AlRegib , and Hans Peter Graf]
Action Recognition with Coarse-to-Fine Deep Feature Integration and Asynchronous Fusion[Weiyao Lin , Yang Mi , Jianxin Wu , Ke Lu , Hongkai Xiong]
Appearance-and-Relation Networks for Video Classification[Limin Wang , Wei Li , Wen Li ,Luc Van Gool]
Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification[Xiang Long , Chuang Gan , Gerard de Melo , Jiajun Wu , Xiao Liu , Shilei Wen]
Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?[Kensho Hara, Hirokatsu Kataoka, Yutaka Satoh]
Learning Spatio-Temporal Features with 3D Residual Networks for Action Recognition[Kensho Hara, Hirokatsu Kataoka, Yutaka Satoh]
Action Recognition with Coarse-to-Fine Deep Feature Integration and Asynchronous Fusion[Weiyao Lin, Yang Mi, Jianxin Wu, Ke Lu, Hongkai Xiong]
End-to-end Video-level Representation Learning for Action Recognition[Jiagang Zhu, Wei Zou, Zheng Zhu, Lin Li]
Chained Multi-stream Networks Exploiting Pose, Motion, and Appearance for Action Classification and Detection[Mohammadreza Zolfaghari , Gabriel L. Oliveira, Nima Sedaghat, Thomas Brox]
Lattice Long Short-Term Memory for Human Action Recognition[Lin Sun, Kui Jia, Kevin Chen, Dit Yan Yeung, Bertram E. Shi, Silvio Savarese]
ActionVLAD: Learning spatio-temporal aggregation for action classification[Rohit Girdhar, Deva Ramanan, Abhinav Gupta, Josef Sivic, Bryan Russell]
Asynchronous Temporal Fields for Action Recognition[Gunnar A. Sigurdsson, Santosh Divvala, Ali Farhadi, Abhinav Gupta]
Predictive-Corrective Networks for Action Detection[Achal Dave,Olga Russakovsky,Deva Ramanan]
R-C3D: Region Convolutional 3D Network for Temporal Activity Detection[Huijuan Xu,Abir Das,Kate Saenko]
Pillar Networks++: Distributed non-parametric deep and wide networks[Biswa Sengupta, Yu Qian]
Eigen Evolution Pooling for Human Action Recognition[Yang Wang, Vinh Tran, Minh Hoai]
Learning Long-Term Dependencies for Action Recognition With a Biologically-Inspired Deep Network[Yemin Shi, Yonghong Tian, Yaowei Wang, Wei Zeng, Tiejun Huang]
AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions[Chunhui Gu, Chen Sun, Sudheendra Vijayanarasimhan, Caroline Pantofaru, David A. Ross, George Toderici, Yeqing Li, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, Jitendra Malik]
Multi-Label Zero-Shot Human Action Recognition via Joint Latent Embedding[Qian Wang, Ke Chen]
Recurrent Assistance: Cross-Dataset Training of LSTMs on Kitchen Tasks[Toby Perrett Dima Damen]
Learning Gating ConvNet for Two-Stream based Methods in Action Recognition[Jiagang Zhu , Wei Zou , Zheng Zhu]
Improved Rank Pooling Strategy for Complex Action Recognition[Eman Mohammadi, Q. M. Jonathan Wu, Mehrdad Saif]
Video Classification With CNNs: Using The Codec As A Spatio-Temporal Activity Sensor[Aaron Chadha, Alhabib Abbas and Yiannis Andreopoulos]
Robust Action Recognition framework using Segmented Block and Distance Mean Histogram of Gradients Approach[Vikas Tripathi, Durgaprasad Gangodkar, Ankush Mittal, Vishnu Kanth]
Asynchronous Temporal Fields for Action Recognition[Gunnar A. Sigurdsson, Santosh Divvala, Ali Farhadi, Abhinav Gupta]
Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding[Gunnar A. Sigurdsson and G{"u}l Varol and Xiaolong Wang and Ali Farhadi and Ivan Laptev and Abhinav Gupta]
Pillar Networks++: Distributed non-parametric deep and wide networks[Biswa Sengupta, Yu Qian]
Pillar Networks for action recognition[Biswa Sengupta, Yu Qian]
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset[Joao Carreira, Andrew Zisserman]
Hierarchical Clustering Multi-Task Learning for Joint Human Action Grouping and Recognition[ An-An Liu, Yu-Ting Su, Wei-Zhi Nie, Mohan Kankanhalli]
Higher-order Pooling of CNN Features via Kernel Linearization for Action Recognition[A. Cherian, P. Koniusz, S. Gould]
Action Recognition with Stacked Fisher Vectors[Xiaojiang Peng, Changqing Zou, Yu Qiao, Qiang Peng]
P-CNN: Pose-based CNN Features for Action Recognition[Guilhem Cheron, Ivan Laptev, Cordelia Schmid]
Spatiotemporal Multiplier Networks for Video Action Recognition[Christoph Feichtenhofer, Axel Pinz, Richard P. Wildes]
The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities[H. Kuehne, A. B. Arslan and T. Serre]
An end-to-end generative framework for video segmentation and recognition[Hilde Kuehne, Juergen Gall, Thomas Serre]
Finding Action Tubes[Georgia Gkioxari, Jitendra Malik]
Multi-region two-stream R-CNN for action detection[Xiaojiang Peng, Cordelia Schmid]
Weakly supervised learning of actions from transcripts[Hilde Kuehne, Alexander Richard, Juergen Gall]
Spatiotemporal Pyramid Network for Video Action Recognition[Yunbo Wang, Mingsheng Long, Jianmin Wang, Philip S. Yu]
Action Representation Using Classifier Decision Boundarie s[Yi Zhu, Zhenzhong Lan, Shawn Newsam, Alexander G. Hauptmann]
Generalized Rank Pooling for Activity Recognition[Anoop Cherian, Basura Fernando, Mehrtash Harandi, Stephen Gould]
Connectionist temporal modeling for weakly supervised action labeling[D.-A. Huang, L. Fei-Fei, and J. C. Niebles]
Weakly supervised learning of actions from transcripts.[H. Kuehne, A. Richard, and J. Gall]
Weakly supervised action labeling in videos under ordering constraints[P. Bojanowski, R. Lajugie, F. Bach, I. Laptev, J. Ponce, C. Schmid, and J. Sivic]
Weakly Supervised Action Labeling in Videos Under Ordering Constraints[Bojanowski, Piotr and Lajugie, R'emi and Bach, Francis and Laptev, Ivan and Ponce, Jean and Schmid, Cordelia and Sivic, Josef]
Towards understanding action recognition[H. Jhuang and J. Gall and S. Zuffi and C. Schmid and M. J. Black]
The Language of Actions: Recovering the Syntax and Semantics of Goal-Directed Human Activities[H. Kuehne, A. B. Arslan and T. Serre]
ActionVLAD: Learning spatio-temporal aggregation for action classification[Rohit Girdhar, Deva Ramanan, Abhinav Gupta, Josef Sivic, Bryan Russell]
Action Representation Using Classifier Decision Boundaries[Jue Wang , Anoop Cherian , Fatih Porikli , Stephen Gould]
Hidden Two-Stream Convolutional Networks for Action Recognition[Yi Zhu , Zhenzhong Lan ,Shawn Newsam ,Alexander G. Hauptmann ]
Learning spatiotemporal features with 3d convolutional networks[Tran, D., Bourdev, L.D., Fergus, R., Torresani, L., Paluri, M]
Action recognition with improved trajectories[Wang, H., Schmid, C]
Beyond short snippets: Deep networks for video classification[Ng, J.Y.H., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G]
Temporal Segment Networks: Towards Good Practices for Deep Action Recognition[Limin Wang , Yuanjun Xiong , Zhe Wang , Yu Qiao , Dahua Lin , Xiaoou Tang , and Luc Van Gool]
A key volume mining deep framework for action recognition[Zhu, W., Hu, J., Sun, G., Cao, X., Qiao, Y]
Long-term temporal convolutions for action recognition[Varol, G., Laptev, I., Schmid, C]
Action recognition with trajectory-pooled deepconvolutional descriptors[Wang, L., Qiao, Y., Tang, X]
Human action recognition using factorized spatio-temporal convolutional networks[Sun, L., Jia, K., Yeung, D., Shi, B.E]
Motion part regularization: Improving action recognition via trajectory group selection[Ni, B., Moulin, P., Yang, X., Yan, S]
Modeling video evolution for action recognition[Fernando, B., Gavves, E., M., J.O., Ghodrati, A.]
Two-stream convolutional networks for action recognition in videos[Simonyan, K., Zisserman, A]
A multi-level representation for action recognition[Wang, L., Qiao, Y., Tang, X]
Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice[Peng, X., Wang, L., Wang, X., Qiao, Y]
Multi-view super vector for action recognition[Cai, Z., Wang, L., Peng, X., Qiao, Y]