SPCOM 2020

INSTITUTIONAL SPONSOR

TECHNICAL CO-SPONSORS

DIAMOND SPONSORS

GOLD SPONSORS

SILVER SPONSOR

AWARDS SPONSOR

DEEP LEARNING AND INFORMATION THEORY

Organizer: Prathosh A.P., Department of Electrical Engineering, Indian Institute of Technology (IIT) Delhi

Prathosh received his PhD degree from IISc (EE Dept.) in 2014. During his Ph.D., he was a recipient of the TCS Research Fellowship. He worked as a Data Scientist in Philips Innovation Campus, India during 2016-2017, as a Research Scientist in the Data Analytics group of Xerox Research Center India (2014-2016). Since July 2017, he is an Assistant Professor at the Department of Electrical Engineering, Indian Institute of Technology, Delhi. He received the Qualcomm Innovation Fellowship in 2018. His research is in the theory and practice of deep learning.

SPEAKER	TITLE OF THE TALK
Vineeth N. Balasubramanian	Explainable Deep Learning
Himanshu Asnani	The Landscape of Conditional Mutual Information Estimation
Sreeram Kannan	Learning in Gated Neural Networks.
Rishabh Iyer	Combinatorial Information Measures with Applications to Machine Learning
Prathosh A.P	Optimality of the Latent Space in Variational Neural Generative Models

Explainable Deep Learning

Vineeth N. Balasubramanian, Indian Institute of Technology Hyderabad

Vineeth N. Balasubramanian is an Associate Professor in the Department of Computer Science and Engineering at the Indian Institute of Technology, Hyderabad (IIT-H), as well as serves as the Head of the Department of Artificial Intelligence at IIT-H. His research interests include deep learning, machine learning, and computer vision. His research has resulted in ~100 peer-reviewed publications at various international venues, including top-tier ones such as ICML, CVPR, ICCV, KDD, ICDM, and IEEE TPAMI. His PhD dissertation at Arizona State University on the Conformal Predictions framework was nominated for the Outstanding PhD Dissertation at the Department of Computer Science. He is an active reviewer/contributor at many conferences such as NeurIPS, CVPR, ICCV, AAAI, IJCAI with a recent award as an Outstanding Reviewer at CVPR 2019, as well as journals including IEEE TPAMI, IEEE TNNLS, JMLR and Machine Learning. He currently serves as the Secretary of the AAAI India Chapter. For more details, please see https://iith.ac.in/~vineethnb/.

Abstract: As neural network (deep learning) models get absorbed into real-world applications each day, there is an impending need to explain the decisions of these neural network models. This talk will begin with an introduction to the need for explaining neural network models, summarize existing efforts in this regard, as well as present a few of our efforts in this direction. In particular, while existing methods for neural network attributions (for explanations) are largely statistical, we propose a new attribution method for neural networks developed using first principles of causality (to the best of our knowledge, the first such). The neural network architecture is viewed as a Structural Causal Model, and a methodology to compute the causal effect of each feature on the output is presented. With reasonable assumptions on the causal structure of the input data, we propose algorithms to efficiently compute the causal effects, as well as scale the approach to data with large dimensionality. We also show how this method can be used for recurrent neural networks. We report experimental results on both simulated and real datasets showcasing the promise and usefulness of the proposed algorithm. This work was presented as a Long Oral at ICML 2019 (http://proceedings.mlr.press/v97/chattopadhyay19a.html ).

The Landscape of Conditional Mutual Information Estimation

Himanshu Asnani, School of Technology and Computer Science, Tata Institute of Fundamental Research (TIFR), Mumbai, India.

Himanshu Asnani is currently Reader (eq. to tenure-track Assistant Professor) in the School of Technology and Computer Science (STCS) at the Tata Institute of Fundamental Research (TIFR), Mumbai and Affiliate Assistant Professor in the Electrical and Computer Engineering Department at University of Washington, Seattle. His research interests include information and coding theory, statistical learning and inference and machine learning. Dr. Asnani is the recipient of 2014 Marconi Society Paul Baran Young Scholar Award and was named Amazon Catalyst Fellow for the year 2018.

He received his Ph.D. in Electrical Engineering Department in 2014 from Stanford University, working under Professor Tsachy Weissman, where he was a Stanford Graduate Fellow. Following his graduate studies, he worked in Ericsson Silicon Valley as a System Architect for couple of years, focusing on designing next generation networks with emphasis on network redundancy elimination and load balancing. Driven by a deep desire to innovate and contribute in the education space, with the aid of technology, Dr. Himanshu Asnani quit his corporate sojourn and got involved for a while in his education startups (where he currently holds Founding Advisor role) to bring the promise of quality education in vernacular languages in underdeveloped and developing countries - places which do not have access to English, Internet and Electricity.

Moving on then, from industry and entrepreneurial world back to the academia, before joining TIFR, Dr. Asnani worked as a Research Associate in Electrical and Computer Engineering Department at University of Washington, Seattle. In the past, he has also held visiting faculty appointments in the Electrical Engineering Department at Stanford University and Electrical Engineering Department at IIT Bombay. He was the recipient of Best Paper Award at MobiHoc 2009 and was also the finalist for Student Paper Award in ISIT 2011, Saint Petersburg, Russia. Prior to that, he received his B.Tech. from IIT Bombay in 2009 and M.S. from Stanford University in 2011, both in Electrical Engineering.

Abstract: Conditional Mutual Information (CMI) is a measure of conditional dependence between random variables X and Y, given another random variable Z. It can be used to quantify conditional dependence among variables in many data-driven inference problems such as graphical models, causal learning, feature selection, and time-series analysis. This talk reviews the landscape of the estimation of CMI, given the i.i.d. samples of the random variables.

First, we review the traditional k-nearest neighbor (kNN) based estimators as well as kernel-based methods that have been widely used for CMI estimation, noting well that they suffer severely from the curse of dimensionality.

Secondly, leveraging the advances in classifiers and generative models to design methods for CMI estimation, we introduce an estimator for KL-Divergence based on the likelihood ratio by training a classifier to distinguish the observed joint distribution from the product distribution. We then show how to construct several CMI estimators using this basic divergence estimator by drawing ideas from conditional generative models. We demonstrate that the estimates from our proposed approaches do not degrade in performance with increasing dimension and obtain significant improvement over the widely used KSG estimator.

Finally, we focus on CMI estimation by utilizing its formulation as a minmax optimization problem. Such a formulation leads to a joint training procedure similar to that of generative adversarial networks. We find that our proposed estimator provides better estimates than the existing approaches on a variety of simulated data sets comprising linear and non-linear relations between variables. As an application of CMI estimation, we deploy our estimator for conditional independence (CI) testing on real data and obtain better results than state-of-the-art CI testers.

This talk is based on joint work with a number of collaborators at IIT Delhi and UW-Seattle which appeared in NeurIPS 2018, UAI 2019, and UAI 2020

Learning in Gated Neural Networks.

Sreeram Kannan, Department of EECS, University of Washinton, Seattle, WA, USA.

I have been an assistant professor at UW since 2014. Before that, I spent two years as a postdoctoral scholar at University of California, Berkeley and Stanford University, working with Prof. David Tse , and collaborating with Prof. Lior Pachter. I received my Ph.D. in Electrical Engineering and M.S. in mathematics from the University of Illinois, Urbana-Champaign where I was supervised by Prof. Pramod Viswanath and also worked closely with Prof. Chandra Chekuri. I received my M.E. in telecommunications from Indian Institute of Science, Bangalore under the guidance of Prof. P. Vijay Kumar. I spent my delightful undergraduate years at College of Engineering, Guindy, Anna University, where I was part of a team that developed and succesfully launched ANUSAT , the first student-designed micro-satellite in India, led by Prof. P.V. Ramakrishna.

I have spent two summers at Qualcomm Corporate Research and Development, San Diego, and another wonderful summer at Microsoft Research, New England, Cambridge, MA with Prof. Madhu Sudan. I have also been a visiting researcher for several months each at Stanford University, University of Southern California, Indian Institute of Science, Bangalore and Indian Institute of Technology, Kanpur.

I am a recipient of the 2019 UW ECE Outstanding Teaching Award, 2018 Amazon Catalyst award, 2017 NSF Faculty Early CAREER award, the 2015 Washington Research Foundation Early Career Faculty award, Van Valkenburg outstanding graduate research award from UIUC, 2013, a co-recipient of the Qualcomm Cognitive Radio Contest first prize, 2010, a recipient of Qualcomm (CTO) Roberto Padovani outstanding intern award, 2010, a recipient of the S.V.C. Aiya medal from the Indian Institute of Science, 2008, and a co-recipient of Intel India Student Research Contest first prize, 2006.

Abstract: Gating is a key feature in modern neural networks including LSTMs, GRUs and sparsely-gated deep neural networks. The backbone of such gated networks is a mixture-of-experts layer, where several experts make regression decisions and gating controls how to weigh the decisions in an input-dependent manner. Despite having such a prominent role in both modern and classical machine learning, very little is understood about parameter recovery of mixture-of-experts since gradient descent and EM algorithms are known to be stuck in local optima in such models.

In this paper, we perform a careful analysis of the optimization landscape and show that with appropriately designed loss functions, gradient descent can indeed learn the parameters accurately. A key idea underpinning our results is the design of two {\em distinct} loss functions, one for recovering the expert parameters and another for recovering the gating parameters. We demonstrate the first sample complexity results for parameter recovery in this model for any algorithm and demonstrate significant performance gains over standard loss functions in numerical experiments.

Combinatorial Information Measures with Applications to Machine Learning

Rishabh Iyer, Department of CS, University of Texas at Dallas Texas, USA.

Rishabh Iyer has a B.Tech. from IIT Bombay (2011) and a Ph.D. from University of Washington, Seattle (2015). He was a postdoctoral researcher at University of Washington (2016). He is presently an Assistant Professor at the Department of Computer Science, UT Dallas. His research interests are broadly in the area of Artificial Intelligence and Machine Learning. He received the Microsoft Research Fellowship Award (2014) and was also selected for the Facebook Fellowship Award (2014). He won the Best Paper Awards at the International Conference on Machine Learning (ICML) 2013 and Neural Information Processing Systems (2013).

Abstract: Information-theoretic quantities like entropy and mutual information have found numerous uses in machine learning. It is well known that there is a strong connection between these entropic quantities and submodularity since entropy over a set of random variables is submodular. In this talk, we will study combinatorial information measures that generalize independence, (conditional) entropy, (conditional) mutual information, and total correlation defined over sets of (not necessarily random) variables. These measures strictly generalize the corresponding entropic measures since they are all parameterized via submodular functions that strictly generalize entropy. Critically, we show that, unlike entropic mutual information in general, the submodular mutual information is actually submodular in one argument, holding the other fixed, for a large class of submodular functions whose third-order partial derivatives satisfy a non-negativity property. This turns out to include a number of practically useful cases such as the facility location and set-cover functions. We study specific instantiations of the submodular information measures on these, as well as the probabilistic coverage, graph-cut, and saturated coverage functions, and see that they all have mathematically intuitive and practically useful expressions. Regarding applications, we connect the maximization of submodular (conditional) mutual information to problems such as robust, query-based, and privacy-preserving summarization --- and we connect optimizing the multi-set submodular mutual information to clustering and robust partitioning.

Optimality of the Latent Space in Variational Neural Generative Models

Prathosh A.P, Department of EE, Indian Institute of Technology Delhi (IITD), New Delhi, India.

Abstract: Regularized Auto-Encoders (AE) form a rich class of methods within the landscape of neural generative models. They effectively model the joint-distribution between the data and a latent space using an Encoder-Decoder combination, with regularization imposed in terms of a prior over the latent space. In this work, we hypothesise that the dimensionality of the AE model's latent space has a critical effect on the quality of generated data. Under the assumption that nature generates data by sampling from a "true" generative latent space followed by a deterministic function, we show that the optimal performance is obtained when the dimensionality of the latent space of the AE-model matches with that of the "true" generative latent space. Further, we propose an algorithm called the Mask Adversarial Auto-Encoder (MaskAAE), in which the dimensionality of the latent space of an adversarial autoencoder is brought closer to that of the "true" generative latent space, via a procedure to mask the spurious latent dimensions. Next, we examine the effect of the latent prior on the generation quality of the AE models in this paper. We show that there is no single fixed prior which is optimal for all data distributions, given a Gaussian Decoder. Further, with finite data, we show that there exists a bias-variance trade-off that comes with prior imposition. As a remedy, we optimize a generalized ELBO objective, with an additional state space over the latent prior. We implicitly learn this flexible prior jointly with the AE training (FlexAE) using an adversarial learning technique, which facilitates operation on different points of the bias-variance curve. Our experiments on multiple datasets show that FlexAE is the new state-of-the-art for AE based generative models.

Welcome to SPCOM 2020!