Self portrait

Predicting Personal Traits from Facial Images using Convolutional Neural Networks Augmented with Facial Landmark Information

Yoad Lewenberg, Yoram Bachrach, Sukrit Shankar, Antonio Criminisi

IJCAI 2016    &    AAAI 2016 Demo

We consider the task of predicting various traits of a person given an image of their face. We estimate both objective traits, such as gender, ethnicity and hair-color; as well as subjective traits, such as the emotion a person expresses or whether he is humorous or attractive. For sizeable experimentation, we contribute a new Face Attributes Dataset (FAD), having roughly 200,000 attribute labels for the above traits, for over 10,000 facial images. Due to the recent surge of research on Deep Convolutional Neural Networks (CNNs), we begin by using a CNN architecture for estimating facial attributes and show that they indeed provide an impressive baseline performance. To further improve performance, we propose a novel approach that incorporates facial landmark information for input images as an additional channel, helping the CNN learn better attribute-specific features so that the landmarks across various training images hold correspondence.


Self portrait

Refining Architectures of Deep Convolutional Neural Networks

Sukrit Shankar, Duncan Robertson, Yani Ioannou, Antonio Criminisi, Roberto Cipolla

CVPR 2016

Deep Convolutional Neural Networks (CNNs) have recently evinced immense success for various image recognition tasks. However, a question of paramount importance is somewhat unanswered in deep learning research - is the selected CNN optimal for the dataset in terms of accuracy and model size? In this paper, we intend to answer this question and introduce a novel strategy that alters the architecture of a given CNN for a specified dataset, to potentially enhance the original accuracy while possibly reducing the model size. We use two operations for architecture refinement, viz. stretching and symmetrical splitting. Our procedure starts with a pre-trained CNN for a given dataset, and optimally decides the stretch and split factors across the network to refine the architecture.


Self portrait

Deep-carving: Discovering Visual Attributes by Carving Deep Neural Nets

Sukrit Shankar, Vikas K. Garg, Roberto Cipolla

CVPR 2015    &    CVPR 2015 Scene Understanding Workshop

Deep Convolutional Neural Networks (CNNs) have enjoyed remarkable success in vision applications recently. However, in a weakly supervised scenario, widely used CNN training procedures do not learn a robust model for predicting multiple attribute labels simultaneously. The primary reason is that the attributes highly co-occur within the training data, and unlike objects, do not generally exist as well-defined spatial boundaries within the image. To ameliorate this limitation, we propose Deep-Carving, a novel training procedure with CNNs, that helps the net efficiently carve itself for the task of multiple attribute prediction. During training, the responses of the feature maps are exploited in an ingenious way to provide the net with multiple pseudo-labels (for training images) for subsequent iterations. The process is repeated periodically after a fixed number of iterations, and enables the net carve itself iteratively for efficiently disentangling features.


Self portrait

Part Bricolage: Flow-Assisted Part-Based Graphs for Detecting Activities in Videos

Sukrit Shankar, Vijay Badrinarayanan, Roberto Cipolla

ECCV 2014

This paper introduces a unified formulation for robustly detecting activities in videos. Our first contribution is the formulation of the detection task as an undirected node- and edge-weighted graphical structure called Part Bricolage (PB), where the node weights represent the type of features along with their importance, and edge weights incorporate the probability of the features belonging to a known activity class, while also accounting for the trustworthiness of the features connecting the edge. Prize-Collecting-Steiner- Tree (PCST) problem is solved for such a graph that gives the best connected subgraph comprising the activity of interest. Our second contribution is a novel technique for robust body part estimation, which uses two types of state-of-the-art pose detectors, and resolves the plausible detection ambiguities with pre-trained classifiers that predict the trustworthiness of the pose detectors. Our third contribution is the proposal of fusing the low-level descriptors with the mid-level ones, while maintaining the spatial structure between the features.


Self portrait

Semantic Transform: Weakly Supervised Semantic Inference for Relating Visual Attributes

Sukrit Shankar, Joan Lasenby, Roberto Cipolla

ICCV 2013

Relative (comparative) attributes are promising for thematic ranking of visual entities, which also aids in recognition tasks. However, attribute rank learning often requires a substantial amount of relational supervision, which is highly tedious, and apparently impractical for realworld applications. In this paper, we introduce the Semantic Transform, which under minimal supervision, adaptively finds a semantic feature space along with a class ordering that is related in the best possible way. Such a semantic space is found for every attribute category. To relate the classes under weak supervision, the class ordering needs to be refined according to a cost function in an iterative procedure. This problem is ideally NP-hard, and we thus propose a constrained search tree formulation for the same.