Ananth Balashankar
Research Scientist, Google Research
New York
ananth(at)nyu(dot)edu
[CV][Research Statement][Teaching Statement]

I'm a Research Scientist at Google in the Responsible AI team, in New York. I recently graduated with a Ph.D in Computer Science advised by Prof. Lakshminarayanan Subramanian at NYU's Courant Institute of Mathematical Sciences and Dr. Alex Beutel, Senior Staff Research Scientist at Google. I work on improving the robustness and safety of foundation models (LLMs) that power applications such as Bard, and Search Generative Experience. More broadly, I am interested in building domain faithful ML models in the domains of natural language, socio-economics, policy, and privacy. I was a Student Researcher at Google AI (2019-22), where I worked on counterfactual text robustness. Previously, I was a Software Engineer at Google for 3 years, where I worked on recommendations at Google Play Store and the Play Developer Console in Mountain View and London. I graduated from the Indian Institute of Technology, Kharagpur with B.Tech/M.Tech in Computer Science advised by Prof. Niloy Ganguly.

Honors

  1. NYU Janet Fabri Prize for best Ph.D dissertation in Computer Science - 2023
  2. Google Student Research Advisor Program Fellowship (2019-2022)
  3. Spot bonus for research contributions in the Google Responsible AI team - 2021
  4. NYU Harold Grad Memorial Prize for promising Ph.D achievement - 2019
  5. Best Paper Award at ICML AI for social good conference - 2019
  6. MacCracken Fellowship (2017-22)

Research Interests

I am interested in building safe and responsible ML models through methodologies including domain faithful optimization, data augmentation and causal feature selection. Broadly, my work has had demonstrable business and research impact across five real world application domains:
  1. Safety and Responsibility in AI
    Automated detection of online toxic comments improves the quality of interaction in social media. However, the variations in context of comments make it hard to protect specific demographic groups from disparate impact. By explicitly modeling such nuances through counterfactual data augmentation, we improved the accuracy of detecting toxicity by 6% Through this publication at EMNLP '21, a premiere NLP conference, I have fostered deep engagements with Google's Responsible ML team. I have also deployed ML models that optimize business objectives like diversity at Google Play.

  2. ML Robustness
    Robust ML models are critical in building high-stake applications and require a shift from traditional ML models that focus only on optimizing accuracy over the observed but limited test data. By incorporating rules and data from the real world, we have improved accuracy of state-of-the-art transformer based models by 12% in this publication at WSDM '21, the premiere data mining conference.

  3. Causal-Aware ML
    Causality based question answering lies at the core of customer support tools like chatbots. Prior ML models fail to capture the directed nature of causality, for example rain causes traffic delay, and not vice versa. By learning asymmetric causal embeddings faithful to causal graphs, we improved accuracy on Yahoo! Answers by 21% in this paper at ACL '21, a premiere NLP conference.

  4. AI for Social Good
    Forecasting famine is critical for the mobilization of aid to millions of people, but hard to solve due to data scarcity in fragile countries. By building a news-based causal-aware forecasting framework that extracts causal features from 11.2 million news articles across 2 decades in 21 fragile countries, we have improved forecasting accuracy by 32% compared to state-of-the-art predictive models. This paper is accepted at IC2S2 '21, the premiere computational social science conference, and at Science Advances in 2023. The tool will be used by the World Bank Data Science group for aid allocation on food security. Based on this research, a few co-authors have founded a socio-economic inference start-up Velai, Inc.

  5. ML in Privacy
    Corporate privacy compliance policies are legally prescriptive, but not directly enforceable in computer systems. By using the theory of contextual integrity through post-processing mappings, we have improved the accuracy of BERT-based deep learning models by 6% to extract privacy parameters for SQL-based enforcement in this paper at WWW' 19, the premiere web research conference.

Publications

    Peer Reviewed Conference and Journal Papers

  1. Break it, Imitate it, Fix it: Robustness by Generating Human-Like Attacks
    Aradhana Sinha*, Ananth Balashankar*, Ahmad Beirami, Thi Avrahami, Jilin Chen, Alex Beutel. TMLR 2023. [pdf]

  2. Improving Robustness through Pairwise Generative Counterfactual Data Augmentation
    Ananth Balashankar, Xuezhi Wang, Yao Qin, Ben Packer, Nithum Thain, Ed Chi, Jilin Chen, and Alex Beutel. EMNLP 2023 (Findings). [pdf]

  3. Beyond The Text: Analysis of Privacy Statements through Syntactic and Semantic Role Labeling
    Yan Shvartzshnaider, Ananth Balashankar, Thomas Wies, Lakshminarayanan Subramanian. 5th Natural Legal Language Processing (NLLP) Workshop at EMNLP 2023. [arxiv]

  4. Predicting food crises using news streams
    Ananth Balashankar, Lakshmi Subramanian, Samuel Fraiberger
    Science Advances, 2023 [pdf]

  5. Learning Conditional Granger Causal Temporal Networks
    Ananth Balashankar, Srikanth Jagabathula, Lakshminarayanan Subramanian.
    Causal Learning and Reasoning Conference (CLeaR) 2023. [pdf]

  6. Spatio-temporal modeling of urban air quality using low-cost monitors
    Shiva Iyer, Ananth Balashankar, William Aeberhard, Sameeksha Jain, Sujoy Bhattacharya, Guiditta Rusconi, Anant Sudarshan, Rohini Pande, Lakshmi Subramanian
    NPJ Climate and Atmospheric Science (2022) [link]

  7. Fine-grained prediction of food insecurity using news streams
    Ananth Balashankar, Lakshminarayanan Subramanian, Samuel Fraiberger
    International Conference on Computational Social Science (IC2S2) 2022

  8. Targeted Policy Recommendations using Outcome-aware Clustering
    Joint work with World Bank collaborators - Samuel Fraiberger, Marelize Görgens, Clara Ivanescu, Andrew Longosz, Shaffiq Somani, Tushar Malik, Theo Hawkins; Lakshmi Subramanian, Eric Deregt (NYU) and David Wilson (Bill and Melinda Gates Foundation)
    ACM COMPASS 2022. [pdf]World Bank Technical Report 2018. [pdf]

  9. The need for transparent demographic group trade-offs in Credit Risk and Income Classification
    Ananth Balashankar, Alyssa Lees
    iConference 2022.[pdf]

  10. Can We Improve Model Robustness through Secondary Attribute Counterfactuals?
    Ananth Balashankar, Xuezhi Wang, Ben Packer, Nithum Thain, Ed Chi and Alex Beutel
    Conference on Empirical Methods in Natural Language Processing (EMNLP) 2021 (Main Conference Acceptance Rate: 22.4%) [pdf]

  11. Quantifying Risks of Food Insecurity by Analyzing News Media
    Ananth Balashankar, Lakshminarayanan Subramanian, Samuel Fraiberger
    International Conference on Computational Social Science (IC2S2) 2020. Contributed Talk at INFORMS 2021, World Bank Conference on AI in economic development, 2018.

  12. Learning Faithful Representations of Causal Graphs
    Ananth Balashankar, Lakshminarayanan Subramanian
    Conference of Association of Computational Linguistics (ACL) 2021 (Oral Paper Acceptance Rate: <6%) [pdf]

  13. Enhancing Neural Recommender Models through Domain-Specific Concordance
    Ananth Balashankar, Alex Beutel, Lakshminarayanan Subramanian
    International Conference on Web Search and Data Mining (WSDM) 2021 (Acceptance Rate: 18.6%) [pdf]

  14. Identifying Predictive Causal Factors from News Streams
    Ananth Balashankar, Sunandan Chakraborty, Samuel Fraiberger, Lakshminarayanan Subramanian
    Conference on Empirical Methods in Natural Language Processing (EMNLP) 2019 (Oral Paper Acceptance Rate: <7%) [pdf]

  15. VACCINE: Using Contextual Integrity for Data Leakage Detection
    Yan Shvartzshnaider, Zvonimir Pavlinovic, Ananth Balashankar, Thomas Wies, Lakshminarayanan Subramanian, Helen Nissenbaum and Prateek Mittal
    The Web Conference (WWW) 2019 (Acceptance Rate: 18%). [pdf]

  16. Reconstructing the MERS Disease Outbreak from News
    Ananth Balashankar, Aashish Dugar, Lakshmi Subramanian, Samuel Fraiberger
    ACM Computing and Sustainable Societies (COMPASS) 2019. [pdf]

  17. Working Journal and Conference Papers

  18. Improving Few-shot Generalization of Safety Classifiers via Data Augmented Parameter-Efficient Fine-Tuning
    Ananth Balashankar, Xiao Ma, Aradhana Sinha, Ahmad Beirami, Yao Qin, Jilin Chen, Alex Beutel Under Review. [pdf]

  19. Predicting Angiographic Disease Status: Where to draw the line between demographically decoupled and jointly trained models?
    Ananth Balashankar, Alyssa Lees, Srikanth Jagabathula, Lakshminarayanan Subramanian. Under Review. [pdf]

  20. Peer Reviewed Workshop Papers

  21. Pareto Efficient Fairness for Skewed Subgroup Data
    Ananth Balashankar, Alyssa Lees, Chris Welty, Lakshmi Subramanian
    ICML workshop on AI for Social Good 2019. [Best Paper!] [arxiv]

  22. Fairness Sample Complexity and the Case for Human Intervention
    Ananth Balashankar, Alyssa Lees
    International Conference on Human Factors in Computing Systems (CHI) 2019 - Bridging the Gap Between AI and HCI Workshop. [pdf]

  23. Unsupervised Word Influencer Networks from news streams
    Ananth Balashankar, Sunandan Chakraborty, Lakshmi Subramanian.
    ACL Workshop on Economics and Natural Language Processing (ECONLP) 2018 [pdf]

  24. Towards Applying Open Domain Question Answering to Privacy Policies
    Yan Shvartzshnaider, Ananth Balashankar, Thomas Wies, Lakshminarayanan Subramanian.
    ACL Workshop on Machine Reading for Question Answering (MRQA) 2018 [pdf]

  25. Causal Inference from News Streams
    Ananth Balashankar, Sunandan Chakraborty, Samuel Fraiberger, Srikanth Jagabathula, Lakshminarayanan Subramanian.
    ICML Workshop on Machine Learning for Causal Inference, Counterfactual Prediction, and Autonomous Action (CausalML) 2018 [pdf]

  26. Other Research

  27. Stable virtual landmarks: Spatial dropbox to enhance retail experience
    Swadhin Pradhan, Ananth Balashankar, Niloy Ganguly and Bivas Mitra
    Communication Systems and Networks (COMSNETS) 2014 [pdf]

  28. Signal-aware data transfer in cellular networks
    Vishnu Navda, Ramachandran Ramjee, Sahil Suneja, Ananth Balashankar
    US Patent 8843169

  29. App Discovery with Google Play: Personalized Recommendations with Related Apps
    Ananth Balashankar, Levent Koc, Norberto Guimaraes
    Google AI Blog 2016. [blog]

Teaching Experience

  1. Big Data and ML Systems (CSCI-GA. 3033-016, Spring 2019 - New York University) Designed and taught lab sessions for a class of MS in Computer Science and Computer Engineering, Entrepreneurship and Innovation (MS-CEI) students, on Spark distributed ML computing platform, PageRank algorithm, deep learning neural network models for text processing, image recognition, graph learning, multi-arm bandits, recommender systems and healthcare inference.

  2. Foundations of Networks and Mobile Systems (CSCI-GA. 2630-001 and 002, Fall 2021 - New York University) Designed and taught lab sessions for a class of 100+ students from Tech MBA, MS-CEI programs with hands-on lab sessions on internet technologies like DNS, HTML, JavaScript,SQL, PHP, React, etc. This was an introductory course that exposed students to the fundamentals of computer networks and mobile systems.

  3. Operating Systems (CS30002, Spring 2014, IIT Kharagpur) Designed and conducted lab sessions for 120+ undergraduate CS students for topics on file systems, schedulers, etc in Ubuntu OS

  4. Programming and Data Structures (CS 11001, Fall 2013, IIT Kharagpur) Designed and conducted lab sessions for 150+ undergraduate CS students for introductory topics in C programming