SPCOM 2024

Information Theory Session

Title: Computationally efficient codes for some adversarial channels

Sidharth Jaggi, University of Bristol

Sidharth (Sid) Jaggi (B.Tech. IIT Bombay 2000, M.S./Ph.D. CalTech 2006, all in electrical engineering, Post-Doctoral Associate MIT 2006). He joined The Chinese University of Hong Kong in 2007, and the School of Mathematics at the University of Bristol in 2020, where he is currently a Professor of Information and Coding Theory. His research group (somewhat unwillingly) calls itself the CAN-DO-IT Team (Codes, Algorithms, Networks: Design and Optimization for Information Theory). Topics he has worked in include sparse recovery/group-testing, covert communication, network coding, and adversarial channels

Abstract: We propose a concatenated code construction for a class of discrete-alphabet oblivious arbitrarily varying channels (AVCs) with cost constraints. The code has time and space complexity polynomial in the blocklength $n$. It uses a Reed-Solomon outer code, logarithmic blocklength random inner codes, and stochastic encoding by permuting the codeword before transmission. When the channel satisfies a condition called strong DS-nonsymmetrizability (a modified version of nonsymmetrizability originally due to Dobrushin and Stambler), we show that the code achieves a rate that for a variety of oblivious AVCs (such as classically studied error/erasure channels) match the known capacities.

Title: The out-of-sample prediction error of the square-root lasso and related estimators

Cynthia Rush, Columbia University

Cynthia Rush is the Howard Levene Associate Professor of Statistics at Columbia University. She received a Ph.D. and M.A. in Statistics from Yale University in 2016 and 2011, respectively, and she completed her undergraduate coursework at the University of North Carolina at Chapel Hill where she obtained a B.S. in Mathematics in 2010. She received a NSF CRIII award in 2019, was a finalist for the 2016 IEEE Jack K. Wolf ISIT Student Paper Award, was an NTT Research Fellow at the Simons Institute for the Theory of Computing for the program on Probability, Computation, and Geometry in High Dimensions in Fall 2020, and was a Google Research Fellow at the Simons Institute for the Theory of Computing for the program on Computational Complexity of Statistical Inference in Fall 2021. Her research focuses on message passing algorithms, statistical robustness, and applications to wireless communications.

Abstract: We study the classical problem of predicting an outcome variable, Y, using a linear combination of a d-dimensional covariate vector, X. We are interested in linear predictors whose coefficients solve: inf_β (E[(Y - < β, X >)^r])^(1/r) + 𝛿 || β ||, where r >1 and 𝛿 > 0 is a regularization parameter. We provide conditions under which linear predictors based on these estimators minimize the worst-case prediction error over a ball of distributions determined by a type of max-sliced Wasserstein metric. A detailed analysis of the statistical properties of this metric yields a simple recommendation for the choice of regularization parameter. The suggested order of 𝛿, after a suitable normalization of the covariates, is typically d/n, up to logarithmic factors. Our recommendation is computationally straightforward to implement, pivotal, has provable out-of-sample performance guarantees, and does not rely on sparsity assumptions about the true data generating process. This is joint work with Jose Montiel Olea, Amilcar Velez and Johannes Wiesel.

Title: Quantum secure non-malleable cryptography

Rahul Jain, National University of Singapore

Rahul Jain obtained his Ph.D. in computer science from TIFR, India, in 2003. He was a Research Fellow at U.C. Berkeley, USA, followed by at IQC, University of Waterloo, Canada. He joined the Centre for Quantum Technologies (CQT), as a Principal Investigator (PI) and the Computer Science (CS) Department, National University of Singapore (NUS), Singapore, as an Assistant Professor in 2008. He is presently a Professor at the CS Department, NUS, and a PI at CQT. His research interests are in the areas of quantum and classical information theory, cryptography, and complexity theory.

Abstract: In this talk, we survey some recent results in quantum secure non-malleable cryptography. We look at some recent constructions of non-malleable extractors and their applications in privacy amplification (e.g. for QKD), constructing non-malleable randomness-encoders, non-malleable codes, and non-malleable secret sharing schemes (both for classical and quantum secrets). This talk is based on the following works: ArXiv:2308.06466 (QCrypt 2023), ArXiv: 2308.07340 (QCrypt 2023), ArXiv:2202.13354 (IEEE-TIT 2023), ArXiv:2109.03097 (TQC 2023) and ArXiv:2106.02766 (IEEE-TIT 2023).

Title: Sample complexity of parameter estimation in logistic regression

Arya Mazumdar, UC San Diego

Arya Mazumdar is an Associate Professor of Data Science and Computer Science in UC San Diego. He is the Deputy Director and the Associate Director for Research in the NSF AI Institute TILOS, and also the UCSD Site-Lead of NSF TRIPODS Institute EnCORE. Arya obtained his Ph.D. degree from University of Maryland, College Park specializing in information theory. Subsequently Arya was a postdoctoral scholar at Massachusetts Institute of Technology, an assistant professor in University of Minnesota, and an assistant followed by associate professor in University of Massachusetts Amherst. Arya is a recipient of a Distinguished Dissertation Award for his Ph.D. thesis, the NSF CAREER award, an EURASIP Best Paper Award, and the ISIT Jack K. Wolf Student Paper Award. He is also a Distinguished Lecturer of the IEEE Information Theory Society, 2023-24. He is currently serving as an Associate Editor for the IEEE Transactions on Information Theory and as an Area editor for Now Publishers Foundation and Trends in Communication and Information Theory. Arya’s research interests include information theory, coding theory, statistical learning and optimization.

Abstract: The logistic regression model is one of the most popular data generation models in noisy binary classification problems. In this work, we study the sample complexity of estimating the parameters of the logistic regression model up to a given $\ell_2$ error, in terms of the dimension and the inverse temperature, with standard normal covariates. The inverse temperature controls the signal-to-noise ratio of the data generation process. While both generalization bounds and asymptotic performance of the maximum-likelihood estimator for logistic regression are well-studied, the non-asymptotic sample complexity that shows the dependence on error and the inverse temperature for parameter estimation is absent from previous analyses. We show that the sample complexity curve has two change-points in terms of the inverse temperature, clearly separating the low, moderate, and high temperature regimes.

Joint work with Daniel Hsu.

Communications Session

Title: Artificial General Intelligence (AGI)-Native Wireless Systems with Common Sense: A Journey to 6G and Beyond

Walid Saad, Virginia Tech (USA)

Walid Saad (S’07, M’10, SM’15, F’19) received his Ph.D degree from the University of Oslo, Norway in 2010. He is currently a Professor at the Department of Electrical and Computer Engineering at Virginia Tech, where he leads the Network sciEnce, Wireless, and Security (NEWS) laboratory. His research interests include wireless networks (5G/6G/beyond), machine learning, game theory, security, UAVs, semantic communications, cyber-physical systems, and network science. Dr. Saad is a Fellow of the IEEE. He is also the recipient of the NSF CAREER award in 2013, the AFOSR summer faculty fellowship in 2014, and the Young Investigator Award from the Office of Naval Research (ONR) in 2015. He was the (co-)author of twelve conference best paper awards at IEEE WiOpt in 2009, ICIMP in 2010, IEEE WCNC in 2012, IEEE PIMRC in 2015, IEEE SmartGridComm in 2015, EuCNC in 2017, IEEE GLOBECOM (2018 and 2020), IFIP NTMS in 2019, IEEE ICC (2020 and 2022), and IEEE QCE in 2023. He is the recipient of the 2015 and 2022 Fred W. Ellersick Prize from the IEEE Communications Society, of the IEEE Communications Society Marconi Prize Award in 2023, and of the IEEE Communications Society Award for Advances in Communication in 2023. He was also a co-author of the papers that received the IEEE Communications Society Young Author Best Paper award in 2019, 2021, and 2023. He was also an IEEE Distinguished Lecturer in 2019-2020. He has been annually listed in the Clarivate Web of Science Highly Cited Researcher List since 2019. Dr. Saad is the Editor-in-Chief for the IEEE Transactions on Machine Learning in Communications and Networking.

Abstract: Next-generation wireless systems, such as 6G and beyond, are expected to tightly embed artificial intelligence (AI) into their design, giving rise to what is termed AI-native wireless systems. Remarkably, despite significant academic, industrial, and standardization efforts dedicated to AI-native wireless systems in the past few years, even the very definition of such systems remains ambiguous. Presently, most endeavors in this domain represent incremental extensions of conventional "AI for wireless" paradigms, employing classical tools like autoencoders, diffusion models, or large-language models to replicate established wireless functionalities. However, such approaches suffer from inherent limitations, including the opaque nature of the adopted AI models, their tendency toward curve-fitting, reliance on extensive training data, energy inefficiency, and limited generalizability to novel new, unseen scenarios and out-of-domain/out-of-distribution data points. To surmount these challenges, in this talk, we unveil a bold, pioneering framework for the development of artificial general intelligence (AGI)-native wireless systems. We particularly show how the fusion of wireless systems, digital twins, and AI can catalyze a transformative, revolutionary paradigm shift in both wireless and AI technologies by conceptualizing a next-generation AGI architecture imbued with "common sense" capabilities, akin to human cognition. This architecture is envisioned to empower networks with reasoning, planning, and other human-like cognitive faculties such as imagination and deep thinking. We first define the technical tenets of common sense and, subsequently, we demonstrate how the proposed AGI architecture can instill a new level of generalizability, explainability, and reasoning into tomorrow’s wireless networks, liberating them from their conventional physical constraints. We then discuss how AGI-native wireless systems can unleash novel use cases such as digital twins with analogical reasoning, resilient experiences for cognitive avatars, and brain-level holographic experiences. Following the establishment of the foundational principles and components of AGI-native wireless systems, we take a significant stride forward by forging a link with the emerging concept of semantic communications. In doing so, we demonstrate how the integration of causal reasoning (a key component of our AGI vision) with semantic communication can usher in a new era of knowledge-driven, reasoning AGI-native wireless systems. These systems represent a major departure from today’s data-driven, knowledge-agnostic models, offering enhanced sustainability and resilience in their design and operation. We present our recent key results, rooted in AI, theory of mind, digital twins, and game theory, laying the groundwork for the realization of AGI-native wireless systems, and illustrating how our designed framework reduces data volume in networks while enhancing reliability, crucial for next-generation wireless services like connected intelligence and holography. We conclude with a discussion on the exciting opportunities in this field that can help redefine the intersection of wireless communications and AI.

Title: Secure Cell-Free Massive MIMO for 6G

Hien-Quoc Ngo, Queen’s University Belfast (UK)

Hien Quoc Ngo is currently a Reader (Associate Professor) at Queen's University Belfast, UK, and a UK Research & Innovation (UKRI) Future Leaders Fellow. His main research interests include cellular/cell-free massive MIMO systems, physical layer security, reconfigurable intelligent surfaces, and cooperative communications. He has co-authored more than 150 research papers in wireless communications, one Cambridge University Press textbook “Fundamentals of Massive MIMO” (2016), 5 book chapters, and hold one patent. Dr. Hien Quoc Ngo received the IEEE ComSoc Stephen O. Rice Prize in 2015, the IEEE ComSoc Leonard G. Abraham Prize in 2017, the Best PhD Award from EURASIP in 2018, and the IEEE CTTC Early Achievement Award in 2023. He also received the IEEE Sweden VT-COM-IT Joint Chapter Best Student Journal Paper Award in 2015. He received the QUB Vice-Chancellor’s Early Career Researcher Prize in 2020. He was awarded the IEEE Wireless Communications Letters Best Editor Award (2019-2020). He was an IEEE Communications Letters exemplary reviewer for 2014, an IEEE Transactions on Communications exemplary reviewer for 2015, and an IEEE Wireless Communications Letters exemplary reviewer for 2016. In 2019, Dr Ngo was awarded a UKRI Future Leaders Fellowship (2019-2023). Dr Hien Quoc Ngo currently serves as an Editor for the IEEE Transactions on Communications, the IEEE Transactions on Wireless Communications, the Digital Signal Processing, and the Elsevier Physical Communication. He was a Guest Editor of the IET Communications, special issue on “Recent Advances on 5G Communications” and a Guest Editor of the IEEE Access, special issue on “Modelling, Analysis, and Design of 5G Ultra-Dense Networks”, in 2017. He has been a member of Technical Program Committees for several IEEE conferences such as ICC, GLOBECOM, WCNC, and VTC. More information can be found at https://sites.google.com/site/nqhienqn/

Abstract: Cell-free massive multiple-input multiple-output (MIMO) is a system where many (hundreds or thousands) access points or base stations coherently serve many (tens or hundreds) users. Unlike current cellular (mobile) networks where the coverage area is divided into cells, in cell-free massive MIMO, there are no fixed cells or cell boundaries. Cell-free massive MIMO is expected to overcome the boundary effect—the inherent limitation of current cellular networks which has persisted for the last 50 years. It is expected to ensure connectivity everywhere, fulfilling the key requirements of next-generation wireless communication systems (beyond 5G and towards 6G). In this talk, we will first focus on the fundamentals of cell-free massive MIMO. We will then discuss the PHY security aspects of cell-free massive MIMO, along with a range of important topics and future directions.

Title: Convertible Codes with Local Repair

Rashmi Korlakai Vinayak, Carnegie Mellon University

Rashmi Vinayak is an associate professor in the Computer Science department at Carnegie Mellon University. Rashmi received her Ph.D. from UC Berkeley in 2016, and was a postdoctoral scholar at UC Berkeley from 2016-17. Rashmi is a recipient of Sloan Research Fellowship 2023, IEEE ITSoc Goldsmith Lecturer award 2023, VMware Systems Research Award 2021, NSF CAREER Award 2020-25, TIFR Memorial Lecture Award 2020, several research awards from Meta and Google, and the UC Berkeley Eli Jury Dissertation Award 2016. Her work has received multiple best paper awards from both computer systems and information theory communities, has been adopted by industry, including at VMware, Twitter, Google, Meta, and numerous open source libraries. Her research interests broadly lie in information/coding theory and computer/networked systems, and the wide spectrum of intersection between the two areas.

Abstract: Erasure codes are a popular choice for distributed storage systems as they provide protection against failures and unavailabilities with low storage overhead. Two key properties of codes that are of interest in storage systems are (1) the ability to decode a lost code symbol by accessing a small number of other code symbols (local repair) and (2) the ability to modify the parameters of the code over time as the failure rate of disks and popularity of data change (code conversion). Two classes of codes that possess these two properties are Locally repairable codes (LRCs) and Convertible codes, respectively. In this talk, I will introduce a new class of codes that possess both of these properties simultaneously, termed Locally Repairable Convertible Codes.

Title: Fundamentals of Vision-Based Geolocation

Harpreet S. Dhillon, Virginia Tech

Harpreet S. Dhillon received the B.Tech. degree in electronics and communication engineering from IIT Guwahati in 2008, the M.S. degree in electrical engineering from Virginia Tech in 2010, and the Ph.D. degree in electrical engineering from the University of Texas at Austin in 2013. After serving as a Viterbi Postdoctoral Fellow at the University of Southern California for a year, he joined Virginia Tech in 2014, where he is currently the W. Martin Johnson Professor of Engineering and the incoming interim department head of the Bradley Department of Electrical and Computer Engineering. His research interests include communication theory, wireless networks, geolocation, and stochastic geometry. He is a fellow of IEEE, a fellow of AAIA, a fellow of AIIA, and a Clarivate Analytics (Web of Science) Highly Cited Researcher. He has received six best paper awards including the 2014 IEEE Leonard G. Abraham Prize, the 2015 IEEE ComSoc Young Author Best Paper Award, and the 2016 IEEE Heinrich Hertz Award. He has also received Early Achievement Awards from three IEEE ComSoc Technical Committees, namely, the Communication Theory Technical Committee (CTTC) in 2020, the Radio Communications Committee (RCC) in 2020, and the Wireless Communications Technical Committee (WTC) in 2021. He was named the 2017 Outstanding New Assistant Professor, the 2018 Steven O. Lane Junior Faculty Fellow, the 2018 College of Engineering Faculty Fellow, the 2019 Turner Faculty Fellow, and the recipient of the 2020 Dean's Award for Excellence in Research by Virginia Tech. He has served as the TPC Co-chair for IEEE WCNC 2022 and IEEE PIMRC 2024, and as a symposium TPC Co-chair for many IEEE conferences. He has also served on the Editorial boards of several IEEE journals with his current appointments being on the Executive Editorial Committee for IEEE Transactions on Wireless Communications and as a Senior Editor for IEEE Wireless Communications Letters.

Abstract: Many modern wireless devices with accurate positioning requirements have access to various vision sensors, such as a camera, radar, and Light Detection and Ranging (LiDAR). In numerous scenarios where wireless-based positioning is either inaccurate or unavailable, using information from vision sensors becomes highly desirable. While vision-based localization has seen major advances from the algorithmic perspectives, the underlying mathematical underpinnings of this problem space remain largely unexplored, which is the topic of this talk. Due to limitations in sensor resolution, the level of detail in prior information, and computational resources, we may not be able to distinguish between landmarks that are similar in appearance, such as trees, lampposts, and bus stops. For instance, if a target is close to a tree, we may not necessarily know which exact tree (out of potentially many that may appear similar) is close to the target unless some additional information is available. While one cannot accurately determine the absolute target position using a single "non-unique" landmark, it is possible to obtain an approximate position fix if the target can see multiple landmarks whose geometric placement on the map is unique. Modeling the locations of these indistinguishable landmarks as a point process, we present a new approach to analyzing geolocation performance, specifically localizability, in this setting. We define localizability as the ability of the target to determine the correct set of indistinguishable landmarks around it from the vision data. Our analysis reveals that the localizability probability approaches one when the landmark intensity tends to infinity, which means that error-free localization is achievable in this limiting regime. This is joint work with Haozhou Hu and R. Michael Buehrer.

Networking Session

Title: Strongly Tail-Optimal Scheduling in the Light-Tailed M/G/1

Ziv Scully, Cornell University, USA

Ziv Scully is an assistant professor at Cornell ORIE (Operations Research and Information Engineering). He completed his PhD in Computer Science at CMU in 2022, advised by Mor Harchol-Balter and Guy Blelloch, and obtained his BS from MIT in 2016. Between graduating from CMU and starting at Cornell, Ziv was a research fellow at the UC Berkeley Simons Institute for the Data-Driven Decision Processes program; and then an NSF FODSI postdoc at Harvard SEAS and MIT CSAIL, mentored by Michael Mitzenmacher and Piotr Indyk.
Broadly, Ziv researches the theory of decision making under uncertainty, including stochastic control, resource allocation, and performance evaluation. A particular emphasis of his work is scheduling and load balancing in queueing systems, as motivated by the needs of cloud computing data centers and service systems. Ziv’s work has been recognized by awards from INFORMS, ACM SIGMETRICS, and IFIP PERFORMANCE, including winning the 2022 George Nicholson Student Paper Competition and receiving the 2022 SIGMETRICS Doctoral Dissertation Award.

Abstract: We study the problem of scheduling jobs in a queueing system, specifically an M/G/1 with light-tailed job sizes, to asymptotically optimize the response time tail. For some time, the best known policy was First-Come First-Served (FCFS), which has an asymptotically exponential tail. FCFS achieves the optimal exponential decay rate, but its leading constant is suboptimal. Only recently have policies that improve upon FCFS’s leading constant been discovered. But it is a long-standing open problem to find a strongly tail-optimal policy, namely a policy that minimizes the leading constant.
In this work, we resolve the problem of strongly tail-optimal scheduling in the light-tailed M/G/1. We characterize the best possible leading constant, and we introduce a new scheduling policy, called 𝛾-Boost, that achieves it. Roughly speaking, 𝛾-Boost operates similarly to FCFS, but it pretends that small jobs arrive earlier than their true arrival times. This reduces the response time of small jobs without unduly delaying large jobs, leading to significantly better response time tail. In addition to proving 𝛾-Boost's asymptotic optimality, we show via simulation that 𝛾-Boost has excellent practical performance.
The 𝛾-Boost policy as described above requires knowledge of job sizes. We also generalize 𝛾-Boost to work with unknown job sizes, proving an analogous asymptotic optimality result in the unknown-size setting. Our generalization reveals that 𝛾-Boost can be viewed as a type of Gittins index policy, but with an unusual feature: it uses a negative discount rate.
Joint work with George Yu and Amit Harlev.

Title: On Optimal Server Allocation for Parallelisable Jobs

Arpan Mukhopadhyay, University of Warwick, UK

Dr. Arpan Mukhopadhyay received the B.E. degree in electronics and telecommunication engineering from Jadavpur University, Calcutta, India, in 2009, the M.E. degree in telecommunications from the Indian Institute of Science, Bengaluru, India, in 2011, and the Ph.D. degree in electrical and computer engineering from the University of Waterloo, Canada, in 2016. He is currently an Associate Professor with the Department of Computer Science, University of Warwick, U.K. His research interests include applied probability, stochastic processes, algorithm design, and optimisation with applications to cloud computing systems, caching systems, wireless networks, social networks, and smart grids. He was a recipient of Best Paper Awards from the IFIP Performance 2015 Conference and the International Teletraffic Congress (ITC) 2015, and he received the Rising Scholar Award at the International Teletraffic Congress 2018 for his contributions to mean field analysis of large heterogeneous networks.

Abstract: A large proportion of jobs submitted to modern computing clusters and data centers are parallelisable and capable of running on a flexible number of computing cores or servers. Examples of such jobs include training of large machine learning models, simulation of climate models etc. Although allocating more servers to such a job results in a higher speed-up in the job’s execution, it reduces the number of servers available to other jobs which, in the worst case, can result in an incoming job not finding any available server to run immediately upon arrival. Hence, a natural question to ask is: how to optimally allocate servers to jobs such that (i) the average execution time across jobs is minimised and (ii) almost all jobs find at least one server immediately upon arrival. To address this question, we consider a loss system where jobs with arbitrary concave speed-up functions arrive and an online server allocation scheme is used to allocate one or more servers from the system to each arriving job based on the availability of the servers. If a job does not find any available servers upon entry, then it is blocked and lost. We propose a simple server allocation scheme that achieves the minimum average execution time of accepted jobs while ensuring that the blocking probability of jobs vanishes as the system becomes large. To prove our result, we employ Stein’s method which also yields non-asymptotic bounds on the blocking probability and the mean execution time.
The talk will be based on joint work with Samira Ghanbarian (uWaterloo), Ravi R. Mazumdar (uWaterloo), and Fabrice Guillemin (Orange Labs, France).

Title: Incentivizing Client Participation in Federated learning

Gauri Joshi, Carnegie Mellon University

Gauri Joshi is a faculty member in the ECE department at Carnegie Mellon University. Gauri completed her Ph.D. from MIT EECS, and received her B.Tech and M.Tech from the Indian Institute of Technology (IIT) Bombay. Her awards include the MIT Technology Review 35 under 35 Award, ONR Young Investigator and NSF CAREER Award, Best Paper awards at MobiHoc 2022 and SIGMETRICS 2020, and the Institute Gold Medal of IIT Bombay (2010).

Abstract: Federated learning (FL) facilitates collaboration between a group of clients who seek to train a common machine learning model without directly sharing their local data. Although there is an abundance of research on improving the speed, efficiency, and accuracy of federated training, most works implicitly assume that all clients are willing to participate in the FL framework. Due to data heterogeneity, however, the global model may not work well for some clients, and they may instead choose to use their own local model. Such disincentivization of clients can be problematic from the server's perspective because having more participating clients yields a better global model, and offers better privacy guarantees to the participating clients. In this paper, we propose an algorithm called MaxFL that explicitly maximizes the fraction of clients who are incentivized to use the global model by dynamically adjusting the aggregation weights assigned to their updates. Our experiments show that MaxFL increases the number of incentivized clients by 30-55% compared to standard federated training algorithms, and can also improve the generalization performance of the global model on unseen clients.

Speech and Language Session

Title: Bootstrapping ASR for new languages

Sambuddha Bhattacharya, Amazon

Sambuddha Bhattacharya earned his BE in EEE from BITS, Pilani, and PhD in EE from University of Washington, Seattle. He then joined Synopsys and, for over a decade, built industry-leading VLSI layout optimization software. With his interest shifting to machine learning, he joined Zendrive, a Telematics startup, where he led the machine learning team that developed predictive models to detect driving behavior from mobile sensor data. He subsequently joined Amazon in 2022 and now leads a team of applied scientists building ASR systems. Sambuddha has authored 25 technical papers in peer reviewed IEEE conferences and journals, and has 3 granted and 3-pending patents.

Abstract: Building automatic speech recognition (ASR) system for new languages is a challenging task. Acceptable ASR performance is hard to achieve without a large labelled data corpus. Labelled data collection is expensive and time consuming. In this talk, we will discuss how we address these challenges via different approaches such as utilizing synthetic data, multilingual training, distillation from powerful ASR model, and fine-tuning. We will elaborate on how our ASR framework enables boot-strapping new languages with 10x less labelled data requirements, lesser human effort, and faster turn-around time.

Title: Few-shot Learning based E2E ASR: Towards Foundation Models with small-scale pretraining

V Ramasubramanian, IIIT Bangalore

Ramasubramanian (Ram) obtained his B.S. degree from University of Madras in 1981, B.E. degree from Indian Institute of Science, Bangalore in 1984 and the Ph.D. degree from Tata Institute of Fundamental Research (TIFR), Bombay in 1992. He has been engaged in research in speech processing and related areas (e.g., Speech coding, Speech recognition, Speaker and language recognition, Speech enhancement) for nearly 4 decades. He has worked in various institutions and universities, such as TIFR, Bombay (1984-99) as Research Scholar, Fellow and Reader; University of Valencia, Spain as Visiting Scientist (1991-92); Advanced Telecommunications Research (ATR) Laboratories, Kyoto, Japan as Invited Researcher (1996-97); Indian Institute of Science (IISc), Bangalore as Research Associate (2000-04) and Siemens Corporate Research & Technology (2005-13) as Senior Member Technical Staff and as Head of Professional Speech Processing - India (2006-09) and as Professor at PES University, South Campus, Bangalore (2013-2017). He has been with IIIT Bangalore, as Professor, since Feb 2017.
He has over 70 research publications in peer reviewed international journals and conferences. His current research interests include automatic speech recognition, machine learning, deep learning, few-shot learning, self-supervised learning, associative memory formulations.

Abstract: This talk will focus on design of E2E ASR models and systems for ultra-low resource speech recognition based on the recently emerging theory of Few-shot Learning (FSL). FSL-informed pre-training and inference of ASR models ensure that the pre-training conditions and objective functions match with downstream inference objectives. This contrasts with current Self-supervised Learning approaches to Foundation Model design which have a clear dichotomy between pre-training objectives (self-supervised loss functions) and downstream fine-tuning objectives (supervised loss). This talk will first dwell on a class of FSL frameworks termed ‘meta-/metric-learning’ within which we set our “Matching Networks – Connectionist Temporal Classification” (MN-CTC) model for E2E ASR in cross-lingual, ultra-low resource speech recognition scenarios for Indic languages. We will demonstrate the advantage accrued by such an FSL-paradigm in enabling low-data pre-training and fine-tuning at low model complexities in comparison to current de-facto Foundation Models for ASR. Importantly, we conclude on how this sets the basis for a new class of FSL-informed Foundation Models, which can be pre-trained at small-scales in either a supervised setting or a self-supervised setting from fully unsupervised data.

Title: Evaluating LLMs on Languages Beyond English: Challenges and Opportunities

Sunayana Sitaram, Microsoft Research India

Sunayana Sitaram is a Principal Researcher at Microsoft Research India. Her research goal is to make AI more inclusive to everyone on the planet. Her current area of research is on measuring and improving the performance of Large Language Models on non-English languages. Sunayana served as the director of the MSR India Research Fellow program from 2022-2024. Prior to joining MSRI as a Post Doc Researcher, Sunayana completed her MS and PhD at the Language Technologies Institute, Carnegie Mellon University in 2015. Sunayana’s research has been published in top NLP and Speech conferences including ACL, EMNLP, Interspeech, ICASSP and she regularly serves in the organizing committee of these conferences.

Abstract: The assessment of capabilities and limitations of Large Language Models (LLMs) through the lens of evaluation has emerged as a significant area of study. In this talk, I will discuss our research over the last 1.5 years on evaluating LLMs in a multilingual context, highlighting the lessons we learned and the general trends observed across various models. I will also discuss our recent efforts to evaluate Indic LLMs using a hybrid approach of human and LLM evaluators. Lastly, I will touch upon the challenges that remain in both advancing evaluation research and improving multilingual models.

Applied Machine Learing and Computer Vision Session

Title: Efficient LLM Inference with HiRE and Tandem Transformers

Praneeth Netrapalli, Google Research India

Praneeth Netrapalli is a research scientist at Google Research India, Bengaluru. He is also an adjunct professor at CMInDS, IIT Bombay and TIFR, Mumbai and a faculty associate of ICTS, Bengaluru. Prior to this, he was a researcher at Microsoft Research. He obtained MS and PhD in ECE from UT Austin, and B-Tech in EE from IIT Bombay. He is a co-recipient of IEEE Signal Processing Society Best Paper Award 2019, Indian National Science Academy (INSA) Medal for Young Scientists 2021 and was an associate of Indian Academy of Sciences (IASc) 2019-2022. His current research interests are to make training and inference of large language models more efficient.

Abstract: Large Language Models (LLMs) often suffer from memory-bound inference on accelerators, impacting latency of all the key layers: feedforward, attention, and softmax. Despite significant sparsity within these layers, efficient exploitation is hindered by a lack of accelerator support for unstructured sparsity and the computational cost of identifying important elements. We introduce HiRE, a novel technique that utilizes dimensionality reduction and quantization to predict the significant elements with high recall, followed by focused computation and an efficient approximate top-k operator. Applied to softmax and a group-sparse FFN layer, HiRE significantly reduces computational cost while preserving accuracy, leading to improved end-to-end inference latency.
Furthermore, we tackle the inherent sequential generation bottleneck of LLMs with tandem transformers. This architecture combines a small autoregressive model with a large block-mode model, where the small model leverages the large model's representations for improved accuracy. This results in enhanced prediction, faster inference, and the option of a verification step to ensure quality. Our approach demonstrates superior performance compared to standalone models and addresses the limitations of existing parallel decoding techniques.
Based on joint works with Yashas Samaga B L, Varun Yerram, Aishwarya P S, Pranav Nair, Srinadh Bhojanapalli, Chong You, Toby Boyd, Sanjiv Kumar and Prateek Jain.

Title: Going Beyond the Known: Detecting Unknown Categories in Computer Vision

Vineeth N Balasubramanian, IIT Hyderabad

Vineeth N Balasubramanian is a Professor in the Department of Computer Science and Engineering at the Indian Institute of Technology, Hyderabad (IIT-H), India, and was recently a visiting faculty at Carnegie Mellon University under the Fulbright-Nehru Fellowship in 2022-23. He is also the Founding Head of the Department of Artificial Intelligence at IIT-H from 2019-22. His research interests include deep learning, machine learning, and computer vision with a focus on explainability, continual learning and learning with limited labeled data. His research has been published at premier venues including ICML, CVPR, NeurIPS, ICCV, KDD, AAAI, and IEEE TPAMI, with Best Paper Awards at recent venues such as CODS-COMAD 2022, CVPR 2021 Workshop on Causality in Vision, etc. He was the General Chair of ACML 2022 held in India, and regularly serves in senior roles for conferences such as CVPR, ICCV, AAAI, IJCAI, ECCV. He is listed among the World's Top 2% Scientists (2022,2023), an INSA Associate Fellow (2024), and a recipient of the Research Excellence Award at IIT-H (2022), Google Research Scholar Award (2021), NASSCOM AI Gamechanger Award (2022), Teaching Excellence Award at IIT-H (2017 and 2021), Outstanding Reviewer Award (IJCAI 2023, ICLR 2021, CVPR 2019, etc), among others.

Abstract: Supervised learning, the harbinger of machine learning over the last decade, has had tremendous impact across application domains in recent years. However, the notion of a static trained machine learning model that identifies one among a few pre-specified set of classes is becoming increasingly limiting, as these models are deployed in changing and evolving environments. Among a few related settings, open-set and open-world learning have gained interest among practitioners to address this need of learning from new information, including the much-needed ability of saying "I don't know". In this talk, we will briefly discuss these settings, and highlight their importance in addressing real-world challenges. The talk will cover some of our recent research on open-world object detection (CVPR 2021), novel class discovery (ECCV 2022), and open-set object detection (WACV 2024) -- and also share interesting real-world use cases of these efforts. The talk will conclude with pointers to what could be ways to move forward as a community in this direction.

Title: Learning to Constrain and Control Text-to-image Diffusion Models

Srikrishna Karanam, Adobe Research India

Srikrishna Karanam earned his PhD in Computer and Systems Engineering from Rensselaer Polytechnic Institute, where he worked with Prof. Rich Radke on video analytics problems in wide-area camera networks. Subsequently, he worked as a research scientist at Siemens Corporate Research in Princeton NJ, USA and United Imaging Intelligence in Boston MA, USA, where his research was focused on problems in computer vision and machine learning with applications in industrial automation, digitization, and healthcare.

Abstract: In this talk, I will present results from our recent efforts in controlling and customizing text-to-image models to satisfy various user-supplied constraints. First, I will discuss how the concept of visual attention maps can help improve the alignment of the final generated image with the input text while also enabling layout control. Next, I will discuss results from a series of works where we constrain outputs of text-to-image diffusion models with attributes extracted from reference images, e.g., color, style, layout etc. We will particularly focus on techniques that do not need retraining of the base diffusion model but rely on specific token optimization or completely training-free approaches.

Medical Imaging Session

Title: Optimizing Quantitative Imaging: Deep Learning Meets Data Consistency

Raji Susan Mathew, IISER Thiruvananthapuram

Raji Susan Mathew is currently an Assistant Professor in the School of Data Science at IISER Thiruvananthapuram. Prior to this, she was a Postdoctoral Fellow in the Department of Computational and Data Sciences, Indian Institute of Science, Bangalore. She received the M.Tech. degree in signal processing and Ph.D. in medical image reconstruction, both from the Cochin University of Science and Technology, Cochin, India. Her research interests include medical image reconstruction, image analysis, and computational methods of medical imaging. She was also a recipient of the C. V. Raman Postdoctoral Fellowship, awarded by the Indian Institute of Science, Bangalore.

Abstract: This talk focuses on the advanced techniques of quantitative image reconstruction. Traditional MRI imaging, while powerful, does not provide detailed information about the extent of degradation or changes occurring in specific regions of interest within the body. To overcome this limitation, a recent post-processing method called Quantitative Susceptibility Mapping (QSM) has been developed. QSM enhances MRI capabilities by calculating the susceptibility values of underlying tissues. This talk will cover the fundamentals of quantitative image reconstruction and recent advancements in the field. We will explore various reconstruction methods, with a particular focus on deep learning-based approaches. The presentation will be concluded by discussing current challenges and discussing future directions.

Title: Generative Models for Simulating Medical Imaging from Ultrasound to CT via MRI

Debdoot Sheet, IIT Kharagpur

Debdoot Sheet is an Assistant Professor of Electrical Engineering at the Indian Institute of Technology Kharagpur and founder of SkinCurate Research, a Kharagpur, India based smart medical imaging devices startup. He was born in Kharagpur, India in 1986 and has spent his life across Kharagpur and Kolkata in India and Munich in Germany. He received the BTech degree in electronics and communication engineering in 2008 from the West Bengal University of Technology, Kolkata for studies at the Institute of Engineering and Management, Kolkata, MS and PhD degrees from the Indian Institute of Technology Kharagpur in 2010 and 2014 respectively for studies in ultrasonic and optical imaging and machine learning for developing in situ histopathology. His current research interests include computational medical imaging, machine learning, image and multidimensional signal processing, and social implications of technology. He is also a DAAD alumni and was a visiting scholar at the Technical University of Munich during 2011-12. He is also recipient of the IEEE Computer Society Richard E. Merwin Student Scholarship in 2012, the Fraunhofer Applications Award at the Indo-German Grand Science Slam in 2012, and the GE Edison Challenge 2013. He is a member of IEEE, SPIE and ACM and serves as Regional Editor of IEEE Pulse and Editor-in-Chief of IEEE Technology and Engineering Education since 2014. He has also served as Chair of Technical Program Committee during IEEE TechSym (2011, 2014) and Publlications Chair for IEEE TechSym in 2010.

Abstract: A prime challenge in building data driven inference models is the unavailability of statistically significant amount of labelled data. Datasets are typically designed for a specific purpose, and accordingly are categorised or weakly labelled for only a single class of tasks instead of being exhaustively annotated. This talk would specifically focus on the concepts of learning with imaging physics concept reinforcements applied on generative models, which enable the simulation of ultrasound images over a wide spectrum of scanner equivalent operating conditions viz. variation in transducer geometry, operating frequency, 3D simulation by learning from 2D images. The talk would focus on the impact such understanding of imaging physics has on lowering the quanta of data required for learning. Further, this talk would also discuss some of the recent works with ability to simulate missing pulse sequences in MRI as well as the ability of simulating radiation dosimetry equivalent CT from only MRI images, which are two completely unrelated modalities from the perspective of sensing physics principle. This talk would introduce the "relativistic Visual Turing Test (rVTT) Loss" which is a key driver in solving such optimization problems which require perception loss minimization rather than the classical distortion loss minimization.

Title: Noise-Aware Kernel Synthesis with self-supervision for Improved X-ray Computed Tomography Imaging

Rajesh Langoju, GE Healthcare

Dr. Rajesh LVVL earned his B.Tech in Electronics and Instrumentation Engineering from Nagarjuna University, Guntur in 2001, followed by an M.Tech in Instrumentation from the Indian Institute of Science, Bangalore in 2003. He completed his PhD in Optical Signal Processing at the Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland in 2006. Beginning his professional career in 2007, he joined GE Global Research Centre Bangalore as a Research Scientist, and currently serves as a Principal Data Scientist at the Advanced Technology Group, GE Healthcare Bangalore. Dr. Rajesh has authored over 21 papers in peer-reviewed international journals, presented 8 conference papers, and holds 27 US patents. He co-authored two book chapters on Signal Processing techniques in phase measurement and Biomedical Signal Processing for Taylor and Francis (CRC Press). His research interests encompass Computed Tomography Imaging, Ultrasound Imaging, Image Quality Enhancement, Deep Imaging Methods, and Signal/Image Analytics.

Abstract: The process of image-based kernel synthesis (KS) transforms X-ray computed tomography (CT) images originally reconstructed with one type of kernel into images using another type of kernel, all without requiring the original sinogram data. This approach enhances computer-aided detection, improves low-contrast distinguishability, and facilitates quantitative analysis. However, it also has the potential to amplify noise in the input images, thereby degrading overall image quality. Addressing noise-aware kernel synthesis presents a significant challenge, necessitating prior knowledge or a regularization function to manage the inherent ill-posedness and noise associated with the inverse problem. In this study, we introduce a novel method called self-supervised kernel synthesis (SSKS), which explicitly incorporates the physics underlying kernel synthesis. This method integrates information about the modulation transfer function (MTF), resolution, and display fields of view (DFoV). Our approach leverages Neumann networks trained with deep image prior techniques to achieve noise-aware self-supervised kernel synthesis. Specifically, the Neumann architecture utilizes dense skip connections to enhance image sharpness while learning the weights of a deep regularizer. Deep image prior (DIP) methodology enables learning of priors in a self-supervised manner, effectively controlling noise through a deep network acting as an implicit prior. We validated our proposed method by applying it to lung and bone kernel synthesis tasks using standard and detailed kernel images, respectively. The results demonstrate that our method effectively reconstructs images that combine the strengths of both input and output kernels, thereby improving overall image quality and suppressing noise.

Welcome to 2024 International Conference on Signal Processing and Communications (SPCOM)