Learning from Diverse and Small Data – Ramya Korlakai Vinayak

Machine learning (ML) algorithms are becoming ubiquitous in various application domains such as public
health, genomics, psychology, and social sciences. In these domains, data is often obtained from
populations that are diverse, e.g., varying demographics, phenotypes, preferences etc. Many ML
algorithms focus on learning model parameters that work well on average over the population but do not
capture the diversity. On the other hand, such datasets usually have few observations per individual that
limits our ability to learn about each individual separately. Question of interest in these scenarios is, how
can we reliably capture the diversity in the data in small data settings?

In this talk, we will address this question in the following settings:
(i) In many applications, we observe count data which can be modeled as Binomial (e.g., polling, surveys,
epidemiology) or Poisson (e.g., single cell RNA data) data. As a single or finite parameters do not capture
the diversity of the population in such datasets, they are often modeled as nonparametric mixtures. In this
setting, we will address the following question, “how well can we learn the distribution of parameters over
the population without learning the individual parameters?” and show that nonparametric maximum
likelihood estimators are in fact minimax optimal.
(ii) Learning preferences from human judgements using comparison queries plays a crucial role in
cognitive and behavioral psychology, crowdsourcing democracy, surveys in social science applications,
and recommendation systems. Models in the literature often focus on learning average preference over
the population due to the limitations on the amount of data available per individual. We will discuss some
recent results on how we can reliably capture diversity in preferences while pooling together data from

Ramya Korlakai Vinayak is an assistant professor in the Dept. of ECE and affiliated faculty in the Dept. of
Computer Science and the Dept. of Statistics at the UW-Madison. Her research interests span the areas
of machine learning, statistical inference, and crowdsourcing. Her work focuses on addressing theoretical
and practical challenges that arise when learning from societal data. Prior to joining UW-Madison, Ramya
was a postdoctoral researcher in the Paul G. Allen School of Computer Science and Engineering at the
University of Washington. She received her Ph.D. in Electrical Engineering from Caltech. She obtained
her Masters from Caltech and Bachelors from IIT Madras. She is a recipient of the Schlumberger
Foundation Faculty of the Future fellowship from 2013-15, and an invited participant at the Rising Stars in
EECS workshop in 2019. She is the recipient of NSF CAREER Award 2023-2028.

Join Zoom Link – https://ucsd.zoom.us/j/99334315002

The event is finished.


May 19 2023


10:00 am - 11:00 am


SDSC Room 408
9836 Hopkins Dr, La Jolla, CA 92093



Leave A Reply

Your email address will not be published. Required fields are marked *