I consider Information Theory my home field, which is the science of modeling information mathematically and even breaking it into small units (like bits used in digital communication). Over the years I have applied Information Theory to problems in cryptography, statistics, privacy, federated learning, and networks. Also, I am super excited about engineering (implementation and not theory alone) and have been building systems at scale for various research and para-research projects. I am involved in the following research threads:
I love discussing and working with deep tech startups in various roles. If you are working on a 5G, blockchain, or privacy startup and would like to have a technology, research, or VC-related discussion with me, please do drop a line. Keep in mind that I am overcommitted and easily excited. I even have a startup of my own. So kindly moderate your expectations.
When COVID-19 pandemic hit us, I started working with Bangalore Municipality (BBMP) for COVID data modeling. I learned two things: First, Indian government agencies work very hard for us, and second, the problems they deal with are so complex that hard work alone may not suffice. I now try to work with government agencies and help them whenever possible. I have been providing technical advice on 5G, role of blockchain in decentralized markets, legal privacy, and data analytics.
We consider over-the-air convex optimization on a $d$ dimensional space where coded gradients are sent over an additive Gaussian noise channel with variance $\sigma^2$. The codewords satisfy an average power constraint $P$, resulting in the signal-to-noise ratio (SNR) of $P/\sigma^2$. We derive bounds for the convergence rates for over-the-air optimization. Our first result is a lower bound for the convergence rate showing that any code must {slowdown} the convergence rate by a factor of roughly $\sqrt{d/\log(1+ SNR)}$. Next, we consider a popular class of schemes called \emph{analog coding}, where a linear function of the gradient is sent. We show that a simple scaled transmission analog coding scheme results in a slowdown in convergence rate by a factor of $\sqrt{d(1+1/ SNR)}$. This matches the previous lower bound up to constant factors for low SNR, making the scaled transmission scheme optimal at low SNR. However, we show that this slowdown is necessary for any analog coding scheme. In particular, a slowdown in convergence by a factor of $\sqrt{d}$ for analog coding remains even when SNR tends to infinity. Remarkably, we present a simple quantize-and-modulate scheme that uses \emph{Amplitude Shift Keying} and almost attains the optimal convergence rate at all SNRs.
In the problem of multiple support recovery, we are given access to linear measurements of multiple sparse samples in $\mathbb{R}^{d}$. These samples can be partitioned into $\ell$ groups, with samples having the same support belonging to the same group. For a given budget of $m$ measurements per sample, the goal is to recover the $\ell$ underlying supports, in the absence of the knowledge of group labels. We study this problem with a focus on the measurement-constrained regime where $m$ is than the support size $k$ of each sample. We design a two-step procedure that estimates the union of the underlying supports first, and then uses a spectral algorithm to estimate the individual supports. Our proposed estimator can recover the supports with $m < k$ measurements per sample, from $\tilde{O}(k^{4}\ell^{4}/m^{4})$ samples. Our guarantees hold for a general, generative model assumption on the samples and measurement matrices. We also provide results from experiments conducted on synthetic data and on the MNIST dataset.
Intelligent electronic devices for power systems often entail high frequency sampling of electric signals, enabled to capture anomalous signal behavior. However, in normal operation this oversampling is redundant and leads to excessive data being stored or transmitted. This gives rise to a new compression problem where the collected samples should be further subsampled and quantized based on the presence of an anomaly in the underlying signal. We propose an Anomaly-aware Compressive Sampler (ACS) which tests the signal for the presence of an anomaly in a block of samples, and subsamples in a hierarchical manner to retain the desired sampling rate. ACS has been designed keeping hardware constraints in mind, using integer operations, an appropriate bit-packing, a simple iterated delta filter, and a streaming data pipeline. We present a mathematical formulation of the problem and analyze the performance of ACS, establishing theoretically its ability to identify anomalies in the signal and adapt the sampling rate. ACS competes with the state-of-the-art algorithm for the better-behaved transmission system data from DOE/EPRI, and outperforms it significantly on real-time distribution system data recorded in our laboratory. Finally, ACS is lightweight and was implemented on an ARM processor.
We consider distributed inference using sequentially interactive protocols. We obtain lower bounds for the minimax sample complexity of interactive protocols under local information constraints, a broad family of resource constraints which captures communication constraints, local differential privacy, and noisy binary queries as special cases. We focus on the inference tasks of learning (density estimation) and identity testing (goodness-of-fit) for discrete distributions under total variation distance, and establish general lower bounds that take into account the local constraints modeled as a channel family. Our main technical contribution is an approach to handle the correlation that builds due to interactivity and quantifies how effectively one can exploit this correlation in spite of the local constraints. Using this, we characterize the optimal sample complexity of learning and testing discrete distributions under total variation distance, for both communication and local differential privacy constraints, in a unified manner. Finally, we provide the first instance of a natural family local constraints under which interactive protocols strictly outperform the noninteractive ones for distribution testing.
Data samples from $\mathbb{R}^d$ with a common support of size $k$ are accessed through $m$ random linear projections per sample. It is well-known that roughly $k$ measurements from a single sample are sufficient to recover the support. In the multiple sample setting, do $k$ overall measurements still suffice when only $m< k$ measurements per sample are allowed? We answer this question in the negative by considering a generative model setting with independent samples drawn from a subgaussian prior. We show that $n=\Theta((k^2/m^2)\cdot\log k(d-k))$ samples are necessary and sufficient to recover the support exactly. In turn, this shows that $k$ overall samples are insufficient for support recovery when $m< k$; instead we need about $k^{2}/m$ overall measurements. Our proposed sample-optimal estimator has a closed-form expression, has computational complexity of $O(d n m)$, and is of independent interest.
Two parties observe independent copies of a $d$-dimensional vector and a scalar. They seek to test if their data is correlated or not, namely they seek to test if the norm $\|\rho\|_2$ of the correlation vector $\rho$ between their observations exceeds $\tau$ or is it $0$. To that end, they communicate interactively and declare the output of the test. We show that roughly order $d/\tau^2$ bits of communication are sufficient and necessary for resolving the distributed correlation testing problem above. Furthermore, we establish a lower bound of roughly $d^2/\tau^2$ bits for communication needed for distributed correlation estimation, rendering the estimate-and-test approach suboptimal in communication required for distributed correlation testing. For the one-dimensional case with one-way communication, our bounds are tight even in the constant and provide a precise dependence of communication complexity on the probabilities of error of two types.
We present Rotated Adaptive Tetra-iterated Quantizer (RATQ), a fixed-length
quantizer for gradients in first order stochastic optimization. RATQ is easy to
implement and involves only a Hadamard transform computation and adaptive
uniform quantization with appropriately chosen dynamic ranges. For noisy
gradients with almost surely bounded Euclidean norms, we establish an
information theoretic lower bound for optimization accuracy using finite
precision gradients and show that RATQ almost attains this lower bound.
For mean square bounded noisy gradients, we use a gain-shape quantizer which
separately quantizes the Euclidean norm and uses RATQ to quantize the
normalized unit norm vector. We establish lower bounds for performance of any
optimization procedure and shape quantizer, when used with a uniform gain
quantizer. Finally, we propose an adaptive quantizer for gain which when used
with RATQ for shape quantizer outperforms uniform gain quantization and is, in
fact, close to optimal.
As a by-product, we show that our fixed-length quantizer RATQ has almost the
same performance as the optimal variable-length quantizers for distributed mean
estimation. Also, we obtain an efficient quantizer for Gaussian vectors which
attains a rate very close to the Gaussian rate-distortion function and is, in
fact, universal for subgaussian input vectors.
Samples from a high-dimensional first-order auto-regressive process generated by an independently and identically distributed random innovation sequence are observed by a sender which can communicate only finitely many bits per unit time to a receiver. The receiver seeks to form an estimate of the process value at every time instant in real-time. We consider a time-slotted communication model in a slow-sampling regime where multiple communication slots occur between two sampling instants. We propose a successive update scheme which uses communication between sampling instants to refine estimates of the latest sample and study the following question: Is it better to collect communication of multiple slots to send better refined estimates, making the receiver wait more for every refinement, or to be fast but loose and send new information in every communication opportunity? We show that the fast but loose successive update scheme with ideal spherical codes is universally optimal asymptotically for a large dimension. However, most practical quantization codes for fixed dimensions do not meet the ideal performance required for this optimality, and they typically will have a bias in the form of a fixed additive error. Interestingly, our analysis shows that the fast but loose scheme is not an optimal choice in the presence of such errors, and a judiciously chosen frequency of updates outperforms it.
This paper presents a low cost open source heteroge- neous resource constrained hardware platform called ”Parallella” as a measurement platform for edge computing applications research in smart grid. The unique hardware architecture of the Parallella provides a multitude of edge compute resources in the form of a Zynq SoC (dual core ARM + FPGA) and a 16 core co-processor called Epiphany. A multi-functional IED design is demonstrated to show case the capabilities of the platform. A custom I/O board has been developed for the desktop and embedded versions of Parallella which can interface external daughter boards for measurements and custom peripherals. One such daughter board is an analog sensing board which can measure 3 phase voltages and 4 line currents using a 16 bit synchronous ADC set at 32 kHz. The ADC samples are synchronized with a GPS PPS time clock as the global time reference. This 7 channel high rate raw waveform data is sent to a cloud server over a bandwidth limited communication channel using a custom anomaly aware data compression algorithm implemented on ARM. A Phasor measurement algorithm using Teager Enegy Operator is implemented on FPGA. A parallel power quality measurement algorithm is implemented on the 16-core co-processor Epiphany.
We study goodness-of-fit and independence testing of discrete distributions in a setting where samples are distributed across multiple users. The users wish to send messages that preserve the privacy of their data while enabling a central server to perform the tests. Under the notion of local differential privacy, we propose simple, sample-optimal, and communication-efficient protocols for these two questions in the noninteractive setting, where in addition users may or may not share a common random seed. In particular, we show that the availability of shared (public) randomness greatly reduces the sample complexity. Underlying our public-coin protocols are privacy-preserving mappings which, when applied to the samples, minimally contract the distance between their respective probability distributions.
The number of confirmed cases of COVID-19 is often used as a proxy for the actual number of ground truth COVID-19 infected cases in both public discourse and policy making. However, the number of confirmed cases depends on the testing policy, and it is important to understand how the number of positive cases obtained using different testing policies reveals the unknown ground truth. We develop an agent-based simulation framework in Python that can simulate various testing policies as well as interventions such as lockdown based on them. The interaction between the agents can take into account various communities and mobility patterns. A distinguishing feature of our framework is the presence of another flu-like illness with symptoms similar to COVID-19, that allows us to model the noise in selecting the pool of patients to be tested. We instantiate our model for the city of Bengaluru in India, using census data to distribute agents geographically, and traffic flow mobility data to model long-distance interactions and mixing. We use the simulation framework to compare the performance of three testing policies: Random Symptomatic Testing (RST), Contact Tracing (CT), and a new Location Based Testing policy (LBT). We observe that if a sufficient fraction of symptomatic patients come out for testing, then RST can capture the ground truth quite closely even with very few daily tests. However, CT consistently captures more positive cases. Interestingly, our new LBT, which is operationally less intensive than CT, gives performance that is comparable with CT. In another direction, we compare the efficacy of these three testing policies in enabling lockdown, and observe that CT flattens the ground truth curve maximally, followed closely by LBT, and significantly better than RST.
A central server needs to perform statistical inference based on samples that are distributed over multiple users who can each send a message of limited length to the center. We study problems of distribution learning and identity testing in this distributed inference setting and examine the role of shared randomness as a resource. We propose a general-purpose simulate-and-infer strategy that uses only private-coin communication protocols and is sample-optimal for distribution learning. This general strategy turns out to be sample-optimal even for distribution testing among private-coin protocols. Interestingly, we propose a public-coin protocol that outperforms simulate-and-infer for distribution testing and is, in fact, sample-optimal. Underlying our public-coin protocol is a random hash that when applied to the samples minimally contracts the chi-squared distance of their distribution to the uniform distribution.
We consider a distributed inference problem where only limited information is allowed from each sample. We present a general framework where multiple players are given one independent sample each about which they can only provide limited information to a central referee. Motivated by important instances of communication and privacy constraints, in our abstraction we allow each player to describe its
observed sample to the referee using a channel from a prespecified
family of channels $\mathcal{W}$. This family $\mathcal{W}$ can be instantiated to capture
both the communication and privacy constrained settings and
beyond. The central referee uses the messages from players to solve an
inference problem for the unknown underlying distribution that
generated samples of the players. We derive lower bounds for sample
complexity of learning and testing discrete distributions in this
information-constrained setting.
Underlying our lower bounds is a quantitative characterization of the
contraction in chi-square distances between the observed distributions
of the samples when an information constraint is placed. This
contraction is captured in a local neighborhood in terms of chi-square
and decoupled chi-square fluctuations of a given channel, two
quantities we introduce in this work. The former captures the average
distance between two product distributions and the latter the distance
of a product distribution to a mixture of product distributions. These
quantities may be of independent interest. For discrete distribution
learning and testing, our quantitative characterization of this
contraction entails a new measure of information for channel families
that captures the restriction it poses on statistical inference. As a
corollary, we quantify the sample complexity blow-up in the learning
and testing settings when enforcing the corresponding local
information constraints. In contrast to prior work in this area, we
systematically study the role of randomness and consider both private-
and public-coin protocols. Our approach leads to bounds that allows us
to quantitatively separate the performance of private- and public-coin
protocols for testing problems: The sample complexity of testing is
orderwise higher when restricted to private-coin protocols.
A transmitter observing a sequence of independent and identically distributed random variables seeks to keep a receiver updated about its latest observations. The receiver need not be apprised about each symbol seen by the transmitter, but needs to output a symbol at each time instant $t$. If at time $t$ the receiver outputs the symbol seen by the transmitter at time $U(t) \leq t$, the age of information at the receiver at time $t$ is $t − U(t)$. We study the design of lossless source codes that enable transmission with minimum average age at the receiver. We show that the asymptotic minimum average age can be roughly attained by Shannon codes for a tilted version of the original pmf generating the symbols, which can be computed easily by solving an optimization problem. Furthermore, we exhibit an example with alphabet $\mathcal{X}$ where the Shannon codes for the original pmf incur an asymptotic average age of a factor $O(\log |\mathcal{X}|)$ more than that achieved by our codes. Underlying our prescription for optimal codes is a new variational formula for integer moments of random variables, which may be of independent interest. Also, we discuss possible extensions of our formulation to randomized schemes and erasure channel, and include a treatment of the related problem of source coding for minimum average queuing delay.
The strong converse for a coding theorem shows that the optimal asymptotic rate possible with vanishing error cannot be improved by allowing a fixed error. Building on a method introduced by Gu and Effros for centralized coding problems, we develop a general and simple recipe for proving strong converse that is applicable for distributed problems as well. Heuristically, our proof of strong converse mimics the standard steps for proving a weak converse, except that we apply those steps to a modified distribution obtained by conditioning the original distribution on the event that no error occurs. A key component of our recipe is the replacement of the hard Markov constraints implied by the distributed nature of the problem with a soft information cost using a variational formula introduced by Oohama. We illustrate our method by providing a short proof of the strong converse for the Wyner-Ziv problem and new strong converse theorems for interactive function computation, common randomness and secret key agreement, and the wiretap channel.
The task of manipulating correlated random variables in a distributed setting has received attention in the fields of both Information Theory and Computer Science. Often shared correlations can be converted, using a little amount of communication, into perfectly shared uniform random variables. Such perfect shared randomness, in turn, enables the solutions of many tasks. Even the reverse conversion of perfectly shared uniform randomness into variables with a desired form of correlation turns out to be insightful and technically useful. In this article, we describe progress-to-date on such problems and lay out pertinent measures, achievability results, limits of performance, and point to new directions.
Two parties observing correlated data seek to exchange their data using interactive communication. How many bits must they communicate? We propose a new interactive protocol for data exchange which increases the communication size in steps until the task is done. We also derive a lower bound on the minimum number of bits that is based on relating the data exchange problem to the secret key agreement problem. Our single-shot analysis applies to all discrete random variables and yields upper and lower bounds of a similar form. In fact, the bounds are asymptotically tight and lead to a characterization of the optimal rate of communication needed for data exchange for a general source sequence such as a mixture of IID random variables as well as the optimal second-order asymptotic term in the length of communication needed for data exchange for IID random variables, when the probability of error is fixed. This gives a precise characterization of the asymptotic reduction in the length of optimal communication due to interaction; in particular, two-sided Slepian-Wolf compression is strictly suboptimal.
A simulation of an interactive protocol entails the use of interactive communication to produce the output of the protocol to within a fixed statistical distance $\epsilon$. Recent works have proposed that the information complexity of the protocol plays a central role in characterizing the minimum number of bits that the parties must exchange for a successful simulation, namely the distributional communication complexity of simulating the protocol. Several simulation protocols have been proposed with communication complexity depending on the information complexity of the simulated protocol. However, in the absence of any general lower bounds for distributional communication complexity, the conjectured central role of information complexity is far from settled. We fill this gap and show that the distributional communication complexity of $\epsilon$-simulating a protocol is bounded below by the $\epsilon$-tail $\lambda_\epsilon$ of the information complexity density, a random variable with information complexity as its expected value. For protocols with bounded number of rounds, we give a simulation protocol that yields a matching upper bound. Thus, it is not information complexity but $\lambda_\epsilon$ that governs the distributional communication complexity. As applications of our bounds, in the amortized regime for product protocols, we identify the exact second order term, together with the precise dependence on $\epsilon$. For general protocols such as a mixture of two product protocols or for the amortized case when the repetitions are not independent, we derive a general formula for the leading asymptotic term. These results sharpen and significantly extend known results in the amortized regime. In the single-shot regime, our lower bound clarifies the dependence of communication complexity on $\epsilon$. We illustrate this with an example that exhibits an arbitrary separation between distributional communication complexity and information complexity for all sufficiently small $\epsilon$.
Multiple parties observing correlated data seek to recover each other’s data and attain omniscience. To that end, they communicate interactively over a noiseless broadcast channel: Each bit transmitted over this channel is received by all the parties. We give a universal interactive communication protocol for omniscience which requires communication of rate only $\mathcal{O}\left(n^{−1/2}\sqrt{\log n}\right)$ more than the optimal rate for every independent and identically distributed (in time) sequence of data. Drawing on the duality between omniscience and secret key agreement due to Csiszá́r and Narayan, as a by-product, we obtain a universal protocol for generating a multiparty secret key of rate at most $\mathcal{O}\left(n^{−1/2}\sqrt{\log n}\right)$ less than the maximum rate possible.
It was shown recently that estimating the Shannon entropy $H(p)$ of a discrete $k$-symbol distribution $p$ requires $\Theta(k/\log k)$ samples, a number that grows near-linearly in the support size. In many applications $H(p)$ can be replaced by the more general Rènyi entropy of order $\alpha$, $H_\alpha(p)$. We determine the number of samples needed to estimate $H_\alpha(p)$ for all $\alpha$, showing that $\alpha < 1$ requires a super-linear, roughly $k^{1/\alpha}$ samples, noninteger $\alpha>1$ requires a near-linear $k$ samples, but, perhaps surprisingly, integer $\alpha>1$ requires only $\Theta(k^{1-1/\alpha})$ samples. Furthermore, developing on a recently established connection between polynomial a\ pproximation and estimation of additive functions of the form $\sum_xf (p_x)$, we reduce the \ sample complexity for noninteger values of $\alpha$ by a factor of $\log k$ compared to the empirical estimator. The estimators achieving these bounds are simple and run in time linear in the number of samples. Our lower bounds provide explicit constructions of distributions with different Rènyi entropies that are hard to distinguish.
We revisit the problem of secret key agreement using interactive public communication for two parties and propose a new secret key agreement protocol. The protocol attains the secret key capacity for general observations and attains the second-order asymptotic term in the maximum length of a secret key for independent and identically distributed observations. In contrast to the previously suggested secret key agreement protocols, the proposed protocol uses interactive communication. In fact, the standard one-way communication protocol used prior to this work fails to attain the asymptotic results above. Our converse proofs rely on a recently established upper bound for secret key lengths. Both our lower and upper bounds are derived in a single-shot setup and the asymptotic results are obtained as corollaries.
The information theoretic approach to security entails harnessing the correlated randomness available in nature to establish security. It uses tools from information theory and coding and yields provable security, even against an adversary with unbounded computational power. However, the feasibility of this approach in practice depends on the development of efficiently implementable schemes. In this article, we review a special class of practical schemes for information theoretic security that are based on 2-universal hash families. Specific cases of secret key agreement and wiretap coding are considered, and general themes are identified. The scheme presented for wiretap coding is modular and can be implemented easily by including an extra pre-processing layer over the existing transmission codes.
We consider information theoretic secret key agreement and secure function computation by multiple parties observing correlated data, with access to an interactive public communication channel. Our main result is an upper bound on the secret key length, which is derived using a reduction of binary hypothesis testing to multiparty secret key agreement. Building on this basic result, we derive new converses for multiparty secret key agreement. Furthermore, we derive converse results for the oblivious transfer problem and the bit commitment problem by relating them to secret key agreement. Finally, we derive a necessary condition for the feasibility of secure computing by trusted parties that seek to compute a function of their collective data, using interactive public communication that by itself does not give away the value of the function. In many cases, we strengthen and improve upon previously known converse bounds. Our results are single-shot and do not assume that the observations are independent and identically distributed. For the case when the observations are indeed independent and identically distributed, we derive strong versions of previously known converses.
A set of m terminals, observing correlated signals, communicate interactively to generate common randomness for a given subset of them. Knowing only the communication, how many direct queries of the value of the common randomness will resolve it? A general upper bound, valid for arbitrary signal alphabets, is developed for the number of such queries by using a query strategy that applies to all common randomness and associated communication. When the underlying signals are independent and identically distributed repetitions of m correlated random variables, the number of queries can be exponential in signal length. For this case, the mentioned upper bound is tight and leads to a single-letter formula for the largest query exponent, which coincides with the secret key capacity of a corresponding multiterminal source model. In fact, the upper bound constitutes a strong converse for the optimum query exponent, and implies also a new strong converse for secret key capacity. A key tool, estimating the size of a large probability set in terms of Rènyi entropy, is interpreted separately also as a lossless block coding result for general sources. As a particularization, it yields the classic result for a discrete memoryless source.
We study the generation of a secret key of maximum rate by a pair of terminals observing correlated sources and with the means to communicate over a noiseless public communication channel. Our main result establishes a structural equivalence between the generation of a maximum rate secret key and the generation of a common randomness that renders the observations of the two terminals conditionally independent. The minimum rate of such common randomness, termed {\it interactive common information}, is related to Wyner's notion of common information, and serves to characterize the minimum rate of interactive public communication required to generate an optimum rate secret key. This characterization yields a single-letter expression for the aforementioned communication rate when the number of rounds of interaction are bounded. An application of our results shows that interaction does not reduce this rate for binary symmetric sources. Further, we provide an example for which interaction does reduce the rate of communication above. Also, certain invariance properties of common information quantities are established that may be of independent interest.
A set of terminals observe correlated data and seek to compute functions of the data using interactive public communication. At the same time, it is required that the value of a private function of the data remains concealed from an eavesdropper observing this communication. In general, the private function and the functions computed by the nodes can be all different. We show that a class of functions are securely computable if and only if the conditional entropy of data given the value of private function is greater than the least rate of interactive communication required for a related multiterminal source-coding task. A single-letter formula is provided for this rate in special cases.
We consider an information theoretic model of a communication channel with a time-varying probability law. Specifically, our model consists of a state dependent discrete memoryless channel, in which the underlying state process is independent and identically distributed with known probability distribution, and for which the channel output at any time instant depends on the inputs and states only through their current values. For this channel, we provide a strong converse result for its capacity, explaining the structure of optimal transmission codes. Exploiting this structure, we obtain upper bounds for the reliability function when the transmitter is provided channel state information causally and noncausally. Instrumental to our proofs is a new technical result which provides an upper bound on the rate of codes with codewords that are "conditionally typical over large message dependent subsets of a typical set of state sequences." This technical result is a nonstraightforward extension of an analogous result for a discrete memoryless channel without states; the latter provides a bound on the rate of a good code with codewords of a fixed composition.
A subset of a set of terminals that observe correlated signals seek to compute a function of the signals using public communication. It is required that the value of the function be concealed from an eavesdropper with access to the communication. We show that the function is securely computable if and only if its entropy is less than the capacity of a new secrecy generation model, for which a single-letter characterization is provided.
A finite sequence of nonnegative integers is called graphic if the terms in the sequence can be realized as the degrees of vertices of a finite simple graph. We present two new characterizations of graphic sequences. The first of these is similar to a result of Havel-Hakimi, and the second equivalent to a result of Erdös and Gallai, thus providing a short proof of the latter result. We also show how some known results concerning degree sets and degree sequences follow from our results.
We study high-dimensional sparse estimation under three natural constraints: communication constraints, local privacy constraints, and linear measurements (compressive sensing). Without sparsity assumptions, it has been established that interactivity cannot improve the minimax rates of estimation under these information constraints. The question of whether interactivity helps with natural inference tasks has been a topic of active research. We settle this question in the affirmative for the prototypical problems of high-dimensional sparse mean estimation and compressive sensing, by demonstrating a gap between interactive and noninteractive protocols. We further establish that the gap increases when we have more structured sparsity: for block sparsity this gap can be as large as polynomial in the dimensionality. Thus, the more structured the sparsity is, the greater is the advantage of interaction. Proving the lower bounds requires a careful breaking of a sum of correlated random variables into independent components using Baranyai’s theorem on decomposition of hypergraphs, which might be of independent interest.
Consider distributed optimization of smooth convex functions over $\mathbb{R}^d$ where $K$ independent clients can provide estimates of the gradient. Assume that all the gradient estimates are within Euclidean distance $\sigma$ of the true gradient and that each oracle's output must be compressed to $r$ bits. For this problem, in the centralized setting with one client, the optimal convergence rate using $T$ iterations is known to be roughly $\sqrt{\sigma^2/ T}$. We show that in the distributed setting the optimal convergence rate for large $K$ is roughly $\sqrt{\sigma^2/ T}\cdot\sqrt{d/Kr}$. Our main contribution is an algorithm which attains this rate by exploiting the fact that the gradient estimates are close to each other. Specifically, our gradient compression scheme first uses half of the parties to form side information and then uses a Wyner-Ziv compression scheme to compress the remaining half of the gradient estimates.
We demonstrate tele-driving operation for an electric vehicle capable of stopping itself in case of system failure over a captive LTE network deployed in a university campus. Our electronically controlled vehicle is driven remotely by an operator from a control room which receives the multi-camera real-time video feed from the vehicle over this network. Our primary contribution includes the responsive emergency braking mechanism for the vehicle, modular vehicle design based on CAN bus, low latency LTE MAC scheduler design, and modifications to popular video tool, FFMPEG to support low latency real time video streaming. Our demonstration shows complete integration of the different components, i.e., the vehicle, the LTE network and the remote driving application. Another salient feature of our system is the O-RAN compliant RAN awareness module and KPI (Key Performance Indicator) application which enables real-time network performance monitoring.
We consider the problem of distributed estimation and testing of discrete distributions under local information constraints that include communication and privacy as special cases. Our main result is a unified method that establishes tight bounds for interactive protocols under both the constraints and both the problems. Our main technical contribution is an average information bound which connects learning and testing and handles correlations due to interactivity. While we establish that for learning and testing under both the constraints above, interactivity does not help, we also illustrate a natural family of "leaky query" local constraints under which interactive protocols strictly outperform the noninteractive ones for identity testing.
We study the problem of recovering the common $k$-sized support of a set of $n$ samples of dimension $d$, using $m$ noisy linear measurements per sample. Most prior work has focused on the case when $m$ exceeds $k$, in which case $n$ of the order $(k/m)\log(d/k)$ is both necessary and sufficient. Thus, in this regime, only the total number of measurements across the samples matter, and there is not much benefit in getting more than $k$ measurements per sample. In the measurement-constrained regime where we have access to fewer than $k$ measurements per sample, we show an upper bound of $O((k^{2}/m^{2})\log d)$ on the sample complexity for successful support recovery when $m\ge 2\log d$. Along with the lower bound from our previous work, this shows a sharp phase transition for the sample complexity of this problem around $k/m=1$. In fact, our proposed algorithm is sample-optimal in both the regimes. It follows that, in the $m\ll k$ regime, multiple measurements from the same sample are more valuable than measurements from different samples.
In the problem of multiple support recovery, we are given access to linear measurements of multiple sparse samples in $\mathbb{R}^{d}$. These samples can be partitioned into $\ell$ groups, with samples having the same support belonging to the same group. For a given budget of $m$ measurements per sample, the goal is to recover the $\ell$ underlying supports, in the absence of the knowledge of group labels. We study this problem with a focus on the measurement-constrained regime where $m$ is smaller than the support size $k$ of each sample. We design a two-step procedure that estimates the union of the underlying supports first, and then uses a spectral algorithm to estimate the individual supports. Our proposed estimator can recover the supports with $m < k$ measurements per sample, from $\tilde{O}(k^{4}\ell^{4}/m^{4})$ samples. Our guarantees hold for a general, generative model assumption on the samples and measurement matrices.
We consider convex optimization on a $d$ dimensional space where coded gradients are sent over an additive Gaussian noise channel with variance $\sigma^2$. The codewords satisfy an average power constraint $P$, resulting in the signal-to-noise ratio (SNR) of $P/\sigma^2$. Many schemes have been proposed for this problem, termed over-the-air optimization, in recent years. We present lower and upper bounds for the convergence rates for over-the-air optimization. Our first result is a lower bound for the convergence rate showing that any code must slowdown the convergence rate by a factor of roughly $\sqrt{d/\log(1+SNR)}$. Next, we consider a popular class of schemes called analog coding, where a linear function of the gradient is sent. We show that a simple scaled transmission analog coding scheme results in a slowdown in convergence rate by a factor of $\sqrt{d(1+1/SNR)}$. This matches the previous lower bound up to constant factors for low SNR, making the scaled transmission scheme optimal at low SNR. However, we show that this slowdown is necessary for any analog coding scheme. In particular, a slowdown in convergence by a factor of $\sqrt{d}$ remains even when SNR tends to infinity, a clear shortcoming of analog coding schemes at high SNR. Remarkably, we present a simple quantize-and-modulate scheme that uses Amplitude Shift Keying and almost attains the optimal convergence rate at all SNRs.
We obtain tight minimax rates for the problem of distributed estimation of discrete distributions under communication constraints, where $n$ users observing $m$ samples each can broadcast only $\ell$ bits. Our main result is a full understanding of the rate as a function of $m$, $\ell$, the domain size, and the number of users. While previous work has focused on the setting where each user only holds one sample, we show that as $m$ grows the $l_1$ error rate gets reduced by a factor of $\sqrt{m}$. However, for large $m$ we observe an interesting phase transition: the dependence of the error rate on the communication constraint $\ell$ changes from \smash{$1/\sqrt{2^{\ell}}$} to \smash{$1/\sqrt{\ell}$}.
We consider density estimation for Besov spaces when the estimator is restricted to use only a limited number of bits about each sample. We provide a noninteractive adaptive estimator which exploits the sparsity of wavelet bases, along with a simulate-and-infer technique from parametric estimation under communication constraints. We show that our estimator is nearly rate-optimal by deriving minmax lower bounds that hold even when interactive protocols are allowed. Interestingly, while our wavelet-based estimator is almost rate-optimal for Sobolev spaces as well, it is unclear whether the standard Fourier basis, which arise naturally for those spaces, can be used to achieve the same performance.
We revisit first-order optimization under local information constraints such as local privacy, gradient quantization, and computational constraints limiting access to a few coordinates of the gradient. In this setting, the optimization algorithm is not allowed to directly access the complete output of the gradient oracle, but only gets limited information about it subject to the local information constraints. We study the role of adaptivity in processing the gradient output to obtain this limited information from it. We consider optimization for both convex and strongly convex functions and obtain tight or nearly tight lower bounds for the convergence rate, when adaptive gradient processing is allowed. Prior work was restricted to convex functions and allowed only nonadaptive processing of gradients. For both of these function classes and for the three information constraints mentioned above, our lower bound implies that adaptive processing of gradients cannot outperform nonadaptive processing in most regimes of interest. We complement these results by exhibiting a natural optimization problem under information constraints for which adaptive processing of gradient strictly outperforms nonadaptive processing.
Communication efficient distributed mean estimation is an important primitive that arises in many distributed learning and optimization scenarios such as federated learning. Without any probabilistic assumptions on the underlying data, we study the problem of distributed mean estimation where the server has access to side information. We propose Wyner-Ziv estimators, which are communication and computationally efficient and near-optimal when an upper bound for the distance between the side information and the data is known. As a corollary, we also show that our algorithms provide efficient schemes for the classic Wyner-Ziv problem in information theory. In a different direction, when there is no knowledge assumed about the distance between side information and the data, we present an alternative Wyner-Ziv estimator that uses correlated sampling. This latter setting offers universal recovery guarantees, and perhaps will be of interest in practice when the number of users is large, where keeping track of the distances between the data and the side information may not be
We present Rotated Adaptive Tetra-iterated Quantizer (RATQ), a fixed-length quantizer for gradients in first order stochastic optimization. RATQ is easy to implement and involves only a Hadamard transform computation and adaptive uniform quantization with appropriately chosen dynamic ranges. For noisy gradients with almost surely bounded Euclidean norms, we establish an information theoretic lower bound for optimization accuracy using finite precision gradients and show that RATQ almost attains this lower bound. For mean square bounded noisy gradients, we use a gain-shape quantizer which separately quantizes the Euclidean norm and uses RATQ to quantize the normalized unit norm vector. We establish lower bounds for performance of any optimization procedure and shape quantizer, when used with a uniform gain quantizer. Finally, we propose an adaptive quantizer for gain which when used with RATQ for shape quantizer outperforms uniform gain quantization and is, in fact, close to optimal.
We study goodness-of-fit of discrete distributions in the distributed setting, where samples are divided between multiple users who can only release a limited amount of information about their samples due to various information constraints. Recently, a subset of the authors showed that having access to a common random seed (i.e., shared randomness) leads to a significant reduction in the sample complexity of this problem. In this work, we provide a complete understanding of the interplay between the amount of shared randomness available, the stringency of information constraints, and the sample complexity of the testing problem by characterizing a tight trade-off between these three parameters. We provide a general distributed goodness-of-fit protocol that as a function of the amount of shared randomness interpolates smoothly between the private- and public-coin sample complexities. We complement our upper bound with a general framework to prove lower bounds on the sample complexity of this testing problems under limited shared randomness. Finally, we instantiate our bounds for the two archetypal information constraints of communication and local privacy, and show that our sample complexity bounds are optimal as a function of all the parameters of the problem, including the amount of shared randomness.
Independent draws from a $d$-dimensional spherical Gaussian distribution are distributed across users, each holding one sample. A central server seeks to distinguish between the two hypotheses: the distribution has zero mean, or the mean has $\ell_2$-norm at least $\varepsilon$, a pre-specified threshold. However, the users can each transmit at most $\ell$ bits to the server. This is the problem of detecting whether an observed signal is simply white noise in a distributed setting. We study this distributed testing problem with and without the availability of a common randomness shared by the users. We design schemes with and without such shared randomness which achieve sample complexities. We then obtain lower bounds for protocols with public randomness, tight when $\ell=O(1)$. We finally conclude with several conjectures and open problems.
Samples from a high-dimensional AR[1] process are quantized and sent over a time-slotted communication channel of finite capacity. The receiver seeks to form an estimate of the process in real-time. We consider the slow-sampling regime where multiple communication slots occur between two sampling instants. We propose a successive update scheme which uses communication between sampling instants to update the estimates of the latest sample. We show that there exist quantizers that render the fast but loose version of this scheme, which updates estimates in every slot, universally optimal asymptotically. However, we provide evidence that most practical quantizers will require a judiciously chosen update frequency.
We consider stochastic optimization over $\ell_p$ spaces using access to a first-order oracle. We ask: What is the minimum precision required for oracle outputs to retain the unrestricted convergence rates? We characterize this precision for every $p\geq 1$ by deriving information theoretic lower bounds and by providing quantizers that (almost) achieve these lower bounds. Our quantizers are new and easy to implement. In particular, our results are exact for $p=2$ and $p=\infty$, showing the minimum precision needed in these settings are $\Theta(d)$ and $\Theta(\log d)$, respectively. The latter result is surprising since recovering the gradient vector will require $\Omega(d)$ bits.
We consider the use of Wi-Fi (IEEE 802.11n/r) network for remote control of a vehicle using video transmission on the uplink and control signals for the actuator on the downlink. We have setup a network with multiple access points (AP) providing indoor and outdoor coverage, which connects an unmanned ground vehicle (UGV) to a remote command center. Additionally, our setup includes a redundant IEEE 802.11p link for sending control messages over downlink with high reliability and low latency. We study the end-to-end communication de- lay and complete a latency profiling for each sub-component, including the video codec and the Wi-Fi links. Furthermore, we provide guidelines for practical design choices including the optimal configuration of the scanning process during handoffs and the codec parameters for delay optimization. Overall, our proposed configuration reduces the end-to-end delay significantly in comparison with the default configuration.
This paper describes CORNET, a co-simulation middleware for applications involving multi-robot systems like a network of Unmanned Aerial Vehicle (UAV) systems. Design of such systems requires knowledge of the flight dynamics of UAVs and the communication links connecting UAVs with each other or with the ground control station. Besides, UAV networks are dynamic and distinctive from other ad-hoc networks and require protocols that can adapt to high-mobility, dynamic topology and changing link quality in power constrained resource platforms. Therefore, it is necessary to co-design the UAV path planning algorithms and the communication protocols. The proposed co-simulation framework integrates existing tools to simulate flight dynamics and network related aspects. Gazebo with robot operating system (ROS) is used as a physical system UAV simulator and NS-3 is used as a network simulator, to jointly capture the cyber-physical system (CPS) aspects of the multi-UAV systems. A particular aspect we address is on synchronizing time and position across the two simulation environments, and we provide APIs to allow easy migration of the algorithms to real platforms.
In the multiparty data exchange problem, parties with correlated data seek to recover each-other's data. We study practical, universal schemes for this problem that accomplish data exchange using optimal rate communication for any distribution of observations. Our focus in this work is on binary symmetric distributions where each user observes bit sequences with uniform marginals. We consider binary symmetric Markov trees as a natural multiparty extension of the binary symmetric source and seek universally rate optimal algorithms for this family. Our main theoretical result is a completeness theorem which shows that any universal Slepian-Wolf scheme can be converted efficiently to a universal data exchange scheme for a subfamily of binary symmetric Markov trees. We instantiate this result using Polar codes. In particular, we provide a universal Slepian-Wolf code using Polar codes and use our reduction algorithm to convert it to a multiparty data exchange protocol. The resulting scheme provides the first practical construction of codes for universal data exchange, which we evaluate numerically.
Data samples from $\mathbb{R}^d$ with common support of size $k$ are accessed through $m$ linear projections per sample. In the measurement-starved regime of $m < k$, how many samples are needed to recover the common support? We answer this question for a generative model with independent samples drawn from a subgaussian prior. We show that $n = \Theta((k^2/m^2) \log(k(d − k)))$ samples are necessary and sufficient to exactly recover the support. Our proposed sample-optimal estimator has a closed-form expression and has computational complexity of $O(dnm)$.
We present an information theoretic proof of the nonsignalling multiprover parallel repetition theorem, a recent extension of its two-prover variant that underlies many hardness of approximation results. The original proofs used de Finetti type decomposition for strategies. We present a new proof that is based on a technique we introduced recently for proving strong converse results in multiuser information theory and entails a change of measure after replacing hard information constraints with soft ones.
Multiple users getting one sample each from an unknown distribution seek to enable a central server to conduct statistical inference. However, each player can only provide limited amount of information about its sample to the server. We propose a unified framework to study such distributed inference problems under local information constraints. We model the local information constraints by a set of channels $\mathcal{W}$: each player chooses a channel from $\mathcal{W}$, and then passes their data through this channel before transmitting the output to the server. The goal in this distributed setting is to understand the blow-up in data requirement imposed by the information constraints, compared to the centralized setting where all data samples are available to the server.
A central server needs to perform statistical inference based on samples that are distributed over multiple users who can each send a message of limited length to the center. We study problems of distribution learning and identity testing in this distributed inference setting and examine the role of shared randomness as a resource. We propose a general purpose simulate-and-infer strategy that uses only private-coin communication protocols and is sample-optimal for distribution learning. This general strategy turns out to be sample-optimal even for distribution testing among private-coin protocols. Interestingly, we propose a public-coin protocol that outperforms simulate-and-infer for distribution testing and is, in fact, sample-optimal. Underlying our public-coin protocol is a random hash that when applied to the samples minimally contracts the chi-squared distance of their distribution from the uniform distribution.
We study the problem of distribution test- ing when the samples can only be accessed using a locally differentially private mechanism and consider two representative testing questions of identity (goodness-of-fit) and independence testing for discrete distributions. First, we construct tests that use existing, general-purpose locally differentially private mechanisms such as the popular Rappor or the recently introduced Hadamard Response for collecting data and propose tests that are sample optimal, when we insist on using these mechanisms. Next, we allow bespoke mechanisms designed specifically for testing and introduce the Randomized Aggregated Private Testing Optimal Response (Raptor) mechanism which is remarkably simple and requires only one bit of communication per sample. We show that our proposed mechanism yields sample-optimal tests, and in particular, outperforms any test based on Rappor or Hadamard Response. A distinguishing feature of our optimal mechanism is that, in contrast to existing mechanisms, it uses public randomness.
We present a construction for a universal channel code with feedback using Polar Codes. Our construction includes an error detection mechanism that is used to compute the ACK/NACK feedback directly from the received vector, without a higher layer CRC. Our scheme, termed the Repeat-Top Polar Code (RT-Polar), builds on a rate-compatible Polar Code and retransmits the t message bits sent over the most reliable polarized good channels over the least reliable good channels. At the decoder, these two t-bit strings are decoded and compared to detect an error. Through simulations, we illustrate the universal performance of our scheme for a binary symmetric channel with an unknown flipover probability. Our scheme performs comparably with a genie-aided scheme, where the detection mechanism is assumed to be error-free, for practically relevant message lengths of roughly 512 bits; this is the first instance of such a universal performance reported in literature. The proposed scheme is suitable for use as a HARQ in low-latency communication where including a higher-layer CRC will induce computational delays.
The strong converse for a coding theorem shows that the optimal asymptotic rate possible with vanishing error cannot be improved by allowing a fixed error. Building on a method introduced by Gu and Effros for centralized coding problems, we develop a general and simple recipe for proving strong converse that is applicable for the distributed problems as well. Heuristically, our proof of strong converse mimics the standard steps for proving a weak converse, except that we apply those steps to a modified distribution obtained by conditioning the original distribution on the event that no error occurs. A key component of our recipe is the replacement of the hard Markov constraints with a soft information cost using a variational formula introduced by Oohama. We illustrate our method by providing a short proof of the strong converse for the Wyner- Ziv problem and a strong converse theorem for the interactive function computation problem; the latter result was not available prior to our work.
A transmitter observing a sequence of independent and identically distributed random variables seeks to keep a receiver updated about its latest observations. The receiver need not be apprised about each symbol seen by the transmitter, but needs to output a symbol at each time instant $t$. If at time $t$ the receiver outputs the symbol seen by the transmitter at time $U(t)\leq t$, the age of information at the receiver at time $t$ is $t-U(t)$. We study the design of lossless source codes that enable transmission with minimum average age at the receiver. We show that the asymptotic minimum average age can be attained (up to a constant bits gap) by {Shannon codes for a tilted version of the original pmf} generating the symbols, which can be computed easily by solving an optimization problem. Underlying our construction for minimum average age codes is a new variational formula for integer moments of random variables, which may be of independent interest.
It is known that a processor with limited memory consisting of an $m$-state machine can distinguish two coins with biases that differ by $1/ m$. On the other hand, the best additive accuracy with which the same processor can estimate the bias of a coin is only $1 /\sqrt{m}$. We demystify this apparent shrinkage in memory by showing that for any such estimator using an $m$-state machine, there exist two values of the bias that are $1/\sqrt{m}$ apart but for which the effective number of states available to resolve them is only $O(\sqrt{m})$. Building on this result, we show that the number of bits of memory required to estimate a bias in the interval $(a, a2^{\alpha})$ with a multiplicative accuracy of $2^{\pm\delta}$ is $\log (\alpha/ {\delta^2})$, up to an additive constant. In fact, we show that the lower bound is attained by a {\em Gaussian counter}, namely a probabilistic counter whose stationary distribution has a Gaussian form. This gives a precise characterization of memory-complexity of bias estimation along with a heuristically appealing family of optimal estimators. Underlying our results are new bounds for estimation of the natural parameter of a discrete exponential family, which maybe of independent interest.
Two parties observing sequences of bits want to determine if their bits were generated independently or not. To that end, the first party communicates to the second. A simple communication scheme involves taking as few sample bits as determined by the sample complexity of independence testing and sending it to the second party. But is there a scheme that uses fewer bits of communication than the sample complexity, perhaps by observing more sample bits? We show that the answer to this question is in the affirmative when the joint distribution is a binary symmetric source. More generally, for any given joint distribution, we present a distributed independence test that uses linear correlation between functions of the observed random variables. Furthermore, we provide lower bounds for the general setting that use hypercontractivity and reverse hypercontractivity to obtain a measure change bound between the joint and the independent distributions. The resulting bounds are tight for both a binary symmetric source and a Gaussian symmetric source.
Multiple parties observe correlated data generated independent and identically (in time) from a known joint distribution. Parties communicate with each other interactively to enable each party to recover the data observed by all the other parties and attain omniscience. We characterize the asymptotic growth of the number of bits of interactive communication required by the parties to attain omniscience up to the second-order term. For the converse, we provide a single-shot lower bound for the required number of bits of communication, which yields the asymptotic result as a special case. It is shown that the knowledge of the distribution can be used to modify the recently proposed recursive universal data exchange protocol to render it optimal up to the second-order term. As a corollary, we provide a precise characterization of the reduction in communication for omniscience due to interaction.
Multiple parties observing correlated data seek to recover each other’s data and attain omniscience. To that end, they communicate interactively over a noiseless broadcast channel: Each bit transmitted over this channel is received by all the parties. We give a universal interactive protocol for omniscience which requires communication of rate only $\mathcal{O}\left(n^{−1/2}\sqrt{\log n}\right)$ more than the optimal rate for every independent and identically distributed (in time) sequence of data.
A simulation of an interactive protocol entails the use of interactive communication to produce the output of the protocol to within a fixed statistical distance $\epsilon$. Recent works have proposed that the information complexity of the protocol plays a central role in characterizing the minimum number of bits that the parties must exchange for a successful simulation, namely the distributional communication complexity of simulating the protocol. Several simulation protocols have been proposed with communication complexity depending on the information complexity of the simulated protocol. However, in the absence of any general lower bounds for distributional communication complexity, the conjectured central role of information complexity is far from settled. We fill this gap and show that the distributional communication complexity of $\epsilon$-simulating a protocol is bounded below by the $\epsilon$-tail $\lambda_\epsilon$ of the information complexity density, a random variable with information complexity as its expected value. For protocols with bounded number of rounds, we give a simulation protocol that yields a matching upper bound. Thus, it is not information complexity but $\lambda_\epsilon$ that governs the distributional communication complexity. As applications of our bounds, in the amortized regime for product protocols, we identify the exact second order term, together with the precise dependence on $\epsilon$. For general protocols such as a mixture of two product protocols or for the amortized case when the repetitions are not independent, we derive a general formula for the leading asymptotic term. These results sharpen and significantly extend known results in the amortized regime. In the single-shot regime, our lower bound clarifies the dependence of communication complexity on $\epsilon$. We illustrate this with an example that exhibits an arbitrary separation between distributional communication complexity and information complexity for all sufficiently small $\epsilon$.
Two parties observing correlated data seek to exchange their data using interactive communication. How many bits must they communicate? We propose a new interactive protocol for data exchange which increases the communication size in steps until the task is done. Next, we derive a lower bound on the minimum number of bits that is based on relating the data exchange problem to the secret key agreement problem. Our single-shot analysis applies to all discrete random variables and yields upper and lower bound of a similar form. In fact, the bounds are asymptotically tight and lead to a characterization of the optimal rate of communication needed for data exchange for a general sequence such as mixture of IID random variables as well as the optimal second-order asymptotic term in the minimum length of communication needed for data exchange for the IID random variables, when the probability of error is fixed. This gives a precise characterization of the asymptotic reduction in the length of optimal communication due to interaction; in particular, two-sided Slepian-Wolf compression is strictly suboptimal.
We derive impossibility (converse) bounds for the efficiency of implementing information theoretically secure oblivious transfer and bit commitment using correlated observations. Our approach is based on relating these problems to that of testing if the observations of the parties are conditionally independent given the adversary's observation. The resulting bounds strengthen and improve upon several previously known results.
We revisit A.C. Yao's classic problem of secure function computation by interactive communication, in an information theoretic setting. Our approach, based on examining the underlying common randomness, provides a new proof of the characterization of a securely computable function by deterministic protocols. This approach also yields a characterization of the minimum communication needed for secure computability.
We consider the estimation of a standard Gaussian random variable under an observation attack where an adversary may add a zero mean Gaussian noise with variance in a bounded, closed interval to an otherwise noiseless observation. A straightforward approach would entail either ignoring the attack and simply using an optimal estimator under normal operation or taking the worst-case attack into account and using a minimax estimator that minimizes the cost under the worst-case attack. In contrast, we seek to characterize the optimal tradeoff between the MSE under normal operation and the MSE under the worst-case attack. Equivalently, we seek a minimax estimator for any fixed prior probability of attack. Our main result shows that a unique minimax estimator exists for every fixed probability of attack and is given by the Bayesian estimator for a least-favorable prior on the set of possible variances. Furthermore, the least-favorable prior is unique and has a finite support. While the minimax estimator is linear when the probability of attack is 0 or 1, our numerical results show that the minimax linear estimator is far from optimal for all other probabilities of attack and a simple nonlinear estimator does much better.
We study the number of independent samples from a $k$-symbol distribution needed to estimate its Renyi entropy of order a and show that super-linear (in $k$) samples are needed for $a < 1$, near linear $k$ samples are needed for noninteger $a>1$, but, perhaps surprisingly, for integer a>1 only $k^{1-1/a}$ samples suffice (upto constants).
We revisit the problem of secret key agreement in channel models, where in addition to a noisy, albeit secure channel, the terminals have access to a noiseless public commu- nication channel. We show a strong converse for the secret key capacity in the point-to-point model and give upper bounds for the general case. Underlying our proofs is a recently discovered single-shot converse for secret key rates in multiterminal source models.
We establish an upper bound on the rate of codes for a wiretap channel with public feedback for a fixed probability of error and secrecy parameter. As a corollary, we obtain a strong converse for the capacity of a degraded wiretap channel with public feedback. Our converse proof is based on a reduction of active hypothesis testing for discriminating between two channels to coding for wiretap channel with feedback.
We extend the Bellare-Tessaro coding scheme for a discrete, degraded, symmetric wiretap channel to a Gaussian wiretap channel. Denoting by SNR the signal-to-noise ratio of the eavesdropper's channel, the proposed scheme converts a transmission code of rate R for the channel of the legitimate receiver into a code of rate R - 0.5 log(1 + SNR) for the Gaussian wiretap channel. The conversion has a polynomial complexity in the codeword length and the proposed scheme achieves strong security. In particular, when the underlying transmission code is capacity achieving, this scheme achieves the secrecy capacity of the Gaussian wiretap channel. In effect, this work allows us to convert any constructive scheme for transmitting over a Gaussian channel into a constructive scheme for the Gaussian wiretap channel.
We revisit the problem of secret key agreement using interactive public communication for two parties. When the underlying observations are independent and identically distributed, we establish the second-order asymptotic term in the maximum length of a secret key. Furthermore, for general observations, we establish the secret key capacity. Underlying our proofs is a new secret key agreement scheme and a recently established upper bound on secret key lengths.
We consider secret key agreement
by multiple parties observing
correlated data and communicating
interactively over an insecure
communication channel. Our main
contribution is a single-shot
upper bound on the length of the
secret keys that can be generated,
without making any assumptions on
the distribution of the underlying
data. Heuristically, we bound the
secret key length in terms of
``how far'' is the joint
distribution of the initial
observations of the parties and
the eavesdropper from a
distribution that renders the
observations of the parties
conditionally independent across
some partition, when conditioned
on the eavesdropper's side
information. The closeness of the
two distributions is measured in
terms of the exponent of the
probability of error of type II
for a binary hypothesis testing
problem, thus bringing out a
structural connection between
secret key agreement and binary
hypothesis testing. When the
underlying data consists of an
independent and identically
distributed sequence, an
application of our bound recovers
several known upper bounds for the
asymptotic rate of a secret key
that can be generated, without
requiring the agreement error
probability or the security index
to vanish to 0 asymptotically.
Also, we consider the following
problem of secure function
computation with trusted parties:
Multiple parties observing
correlated data seek to compute a
function of their collective
data. To this end, they
communicate interactively over an
insecure communication channel. It
is required that the value of the
function be concealed from an
eavesdropper with access to the
communication. When is such a
secure computation of a given
function feasible? Using the
aforementioned upper bound, we
derive a necessary condition for
the existence of a communication
protocol that allows the parties
to reliably recover the value of a
given function, while keeping this
value concealed from an
eavesdropper with access to (only)
the communication.
We consider the generation of a secret key (SK) by the inputs and the output of a secure multipleaccess channel (MAC) that additionally have access to a noiseless public communication channel. Under specific restrictions on the protocols, we derive various upper bounds on the rate of such SKs. Specifically, if the public communication consists of only the feedback from the output terminal, then the rate of SKs that can be generated is bounded above by the maximum symmetric rate $R_f^*$ in the capacity region of the MAC with feedback. On the other hand, if the public communication is allowed only before and after the transmission over the MAC, then the rate of SKs is bounded above by the maximum symmetric rate $R^*$ in the capacity region of the MAC without feedback. Furthermore, for a symmetric MAC, we present a scheme that generates an SK of rate $R_f^*$, improving the best previously known achievable rate $R^*$. An application of our results establishes the SK capacity for adder MAC, without any restrictions on the protocols.
A set of m terminals, observing correlated signals, communicate interactively to generate common randomness for a given subset of them. Knowing only the communication, how many direct queries of the value of the common randomness will resolve it? A general upper bound, valid for arbitrary signal alphabets, is developed for the number of such queries by using a query strategy that applies to all common randomness and associated communication. When the underlying signals are independent and identically distributed repetitions of m correlated random variables, the number of queries can be exponential in signal length. For this case, the mentioned upper bound is tight and leads to a single-letter formula for the largest query exponent, which coincides with the secret key capacity of a corresponding multiterminal source model. In fact, the upper bound constitutes a strong converse for the optimum query exponent, and implies also a new strong converse for secret key capacity. A key lemma, estimating the size of a large probability set in terms of Rènyi entropy, is interpreted separately also as a lossless block coding result for a general source. As a particularization, it yields the classic result for a discrete memoryless source.
Mobile nodes observing correlated data commu- nicate using an insecure bidirectional switch to generate a secret key, which remains concealed from the switch. We are interested in fault-tolerant secret key rates, i.e., the rates of secret key generated even if a subset of nodes drop out before the completion of the communication protocol. We formulate a new notion of fault-tolerant secret key capacity, and present an upper bound on it. This upper bound is shown to be tight when the random variables corresponding to the observations of nodes are exchangeable. Further, it is shown that one round of interaction achieves the fault-tolerant secret key capacity in this case. The upper bound is also tight for the case of a pairwise independent network model consisting of a complete graph, and can be attained by a noninteractive protocol.
A set of terminals that observe correlated data seek to compute functions of the data using interactive public commu- nication. At the same time it is required that this communication observed by an eavesdropper, does not reveal the value of a private function of the data. In general, the private function and the functions computed by the nodes can be all different. We show that a class of functions are securely computable if and only if the conditional entropy of data given the value of private function is greater than the least rate of interactive communication required for an appropriately chosen multiterminal source coding task. A single-letter formula is provided for this rate in special cases.
Secret key generation is considered for a pair of terminals that observe correlated sources and communicate interactively over a public channel. It is argued that optimum rate secret key generation is linked inherently to the Wyner's notion of common information between two dependent random variables. The minimum rate of interactive public communication required to generate an optimum rate secret key is characterized in terms of a variant of this notion of common information.
A subset of a set of terminals that observe correlated signals seek to compute a given function of the signals using public communication. It is required that the value of the function be kept secret from an eavesdropper with access to the communication. We show that the function is securely computable if and only if its entropy is less than the aided secret key capacity of an associated secrecy generation model, for which a single-letter characterization is provided.
We study a problem of secure computation by multiple parties of a given function of their cumulative observations, using public communication but without revealing the value of the function to an eavesdropper with access to this communication. A Shannon theoretic formulation is introduced to characterize necessary and sufficient conditions for secure computability. Drawing on innate connections of this formulation to the problem of secret key generation by the same parties using public communication, we show that a function is securely computable if and only if its entropy is smaller than the secret key capacity. Conditions for secure computability at a lone terminal are also derived by association with an appropriate secret key generation problem.
We consider a Gelfand-Pinsker discrete memoryless channel (DMC) model and provide a strong converse for its capacity. The strong converse is then used to obtain an upper bound on the reliability function. Instrumental in our proofs is a new technical lemma which provides an upper bound for the rate of codes with codewords that are conditionally typical over large message dependent subsets of a typical set of state sequences. This technical result is a nonstraightforward analog of a known result for a DMC without states that provides an upper bound on the rate of a good code with codewords of a fixed type (to be found in, for instance, the Csiszàr-Körner book
We derive the structure of the opti- mal receiver for MPSK signaling over a correlated Rayleigh fading channel. The channel is estimated using the minimum mean square error criterion by means of pilot symbols. For the case of high signal-to-noise ratio, an approximate expression for the symbol error probability (SEP) of this scheme is obtained as an integral, and compared with the SEP of a suboptimal receiver which uses maximal- ratio combining. Numerical results show that the performance gap between the optimal and sub- optimal receivers increases with increase of the channel correlation and the number of diversity branches, whereas it decreases with increase of pilot-to-signal ratio
All of my students so far have had a deep interest in information theory, but they have applied it to many different areas. These days I am looking for students who want to work in (i) privacy or (ii) blockchains for decentralized networks. If you want to write a Ph.D. thesis which will build interesting theory on these topics or will lead to an exciting start up (with you as the founder), do reach out to me.