Ananth Balashankar
Ph.D Candidate, New York University
Courant Institute of Mathematical Sciences
60 5th Avenue, New York, NY 10011.

[CV][Research Statement][Teaching Statement]
I'm on the job market!

I'm a Research Scientist at Google in the Responsible AI team, in New York. I recently graduated with a Ph.D in Computer Science advised by Prof. Lakshminarayanan Subramanian at NYU's Courant Institute of Mathematical Sciences and Dr. Alex Beutel, Senior Staff Research Scientist at Google. I work on enhancing robustness of machine learning models by incorporating domain-specific structure for applications in the domains of natural language, socio-economics, public health, and privacy. I was also a Student Researcher at Google AI (2019-22), where I work on counterfactual text robustness. Previously, I was a Software Engineer at Google for 3 years, where I worked on recommendations at Google Play Store and the Play Developer Console in Mountain View and London. I graduated from the Indian Institute of Technology, Kharagpur with B.Tech/M.Tech in Computer Science advised by Prof. Niloy Ganguly.


  1. Google Student Research Advisor Program Fellowship (2019-2022)
  2. Spot bonus for research contributions in the Google Responsible AI team - 2021
  3. NYU Harold Grad Memorial Prize for promising Ph.D achievement - 2019
  4. Best Paper Award at ICML AI for social good conference - 2019
  5. MacCracken Fellowship (2017-22)

Research Interests

I am interested in building domain faithful ML models through methodologies including constrained optimization, data augmentation and causal feature selection. Broadly, my work has had demonstrable business and research impact across five real world application domains:
  1. Famine Forecasting
    Forecasting famine is critical for the mobilization of aid to millions of people, but hard to solve due to data scarcity in fragile countries. By building a news-based causal-aware forecasting framework that extracts causal features from 11.2 million news articles across 2 decades in 15 fragile countries, we have improved forecasting accuracy by 32% compared to state-of-the-art predictive models. This paper is accepted at IC2S2 '21, the premiere computational social science conference, and under review at a Science sub-journal. The tool will be used by the World Bank Data Science group for aid allocation on food security. Based on this research, a few co-authors have founded a socio-economic inference start-up Velai, Inc.

  2. Toxic Comments Detection
    Automated detection of online toxic comments improves the quality of interaction in social media. However, the variations in context of comments make it hard to protect specific demographic groups from disparate impact. By explicitly modeling such nuances through counterfactual data augmentation, we improved the accuracy of detecting toxicity by 6% Through this publication at EMNLP '21, a premiere NLP conference, I have fostered deep engagements with Google's Responsible ML team. I have also deployed ML models that optimize business objectives like diversity at Google Play.

  3. Causal Question Answering
    Causality based question answering lies at the core of customer support tools like chatbots. Prior ML models fail to capture the directed nature of causality, for example rain causes traffic delay, and not vice versa. By learning asymmetric causal embeddings faithful to causal graphs, we improved accuracy on Yahoo! Answers by 21% in this paper at ACL '21, a premiere NLP conference.

  4. Health Recommendations
    Trustworthy ML models in health recommendations need to be faithful to medical concepts over unseen patient data, while traditional ML models focus only on optimizing accuracy over the observed but limited test data. By incorporating trust through doctor specified mapping rules between diagnoses and medications through data augmentation, we have improved accuracy of state-of-the-art end-to-end neural models by 12% in this publication at WSDM '21, the premiere data mining conference.

  5. Privacy Enforcement
    Corporate privacy compliance policies are legally prescriptive, but not directly enforceable in computer systems. By using the theory of contextual integrity through post-processing mappings, we have improved the accuracy of BERT-based deep learning models by 6% to extract privacy parameters for SQL-based enforcement in this paper at WWW' 19, the premiere web research conference.

Teaching Experience

  1. Big Data and ML Systems (CSCI-GA. 3033-016, Spring 2019 - New York University) Designed and taught lab sessions for a class of MS in Computer Science and Computer Engineering, Entrepreneurship and Innovation (MS-CEI) students, on Spark distributed ML computing platform, PageRank algorithm, deep learning neural network models for text processing, image recognition, graph learning, multi-arm bandits, recommender systems and healthcare inference.

  2. Foundations of Networks and Mobile Systems (CSCI-GA. 2630-001 and 002, Fall 2021 - New York University) Designed and taught lab sessions for a class of 100+ students from Tech MBA, MS-CEI programs with hands-on lab sessions on internet technologies like DNS, HTML, JavaScript,SQL, PHP, React, etc. This was an introductory course that exposed students to the fundamentals of computer networks and mobile systems.

  3. Operating Systems (CS30002, Spring 2014, IIT Kharagpur) Designed and conducted lab sessions for 120+ undergraduate CS students for topics on file systems, schedulers, etc in Ubuntu OS

  4. Programming and Data Structures (CS 11001, Fall 2013, IIT Kharagpur) Designed and conducted lab sessions for 150+ undergraduate CS students for introductory topics in C programming


    Peer Reviewed Conference Papers

  1. Fine-grained prediction of food insecurity using news streams
    Ananth Balashankar, Lakshminarayanan Subramanian, Samuel Fraiberger
    International Conference on Computational Social Science (IC2S2) 2022

  2. Targeted Policy Recommendations using Outcome-aware Clustering
    Joint work with World Bank collaborators - Samuel Fraiberger, Marelize Görgens, Clara Ivanescu, Andrew Longosz, Shaffiq Somani, Tushar Malik, Theo Hawkins; Lakshmi Subramanian, Eric Deregt (NYU) and David Wilson (Bill and Melinda Gates Foundation)
    ACM COMPASS 2022. [pdf]World Bank Technical Report 2018. [pdf]

  3. The need for transparent demographic group trade-offs in Credit Risk and Income Classification
    Ananth Balashankar, Alyssa Lees
    iConference 2022.[pdf]

  4. Can We Improve Model Robustness through Secondary Attribute Counterfactuals?
    Ananth Balashankar, Xuezhi Wang, Ben Packer, Nithum Thain, Ed Chi and Alex Beutel
    Conference on Empirical Methods in Natural Language Processing (EMNLP) 2021 (Main Conference Acceptance Rate: 22.4%) [pdf]

  5. Quantifying Risks of Food Insecurity by Analyzing News Media
    Ananth Balashankar, Lakshminarayanan Subramanian, Samuel Fraiberger
    International Conference on Computational Social Science (IC2S2) 2020. Contributed Talk at INFORMS 2021, World Bank Conference on AI in economic development, 2018.

  6. Learning Faithful Representations of Causal Graphs
    Ananth Balashankar, Lakshminarayanan Subramanian
    Conference of Association of Computational Linguistics (ACL) 2021 (Oral Paper Acceptance Rate: <6%) [pdf]

  7. Enhancing Neural Recommender Models through Domain-Specific Concordance
    Ananth Balashankar, Alex Beutel, Lakshminarayanan Subramanian
    International Conference on Web Search and Data Mining (WSDM) 2021 (Acceptance Rate: 18.6%) [pdf]

  8. Identifying Predictive Causal Factors from News Streams
    Ananth Balashankar, Sunandan Chakraborty, Samuel Fraiberger, Lakshminarayanan Subramanian
    Conference on Empirical Methods in Natural Language Processing (EMNLP) 2019 (Oral Paper Acceptance Rate: <7%) [pdf]

  9. VACCINE: Using Contextual Integrity for Data Leakage Detection
    Yan Shvartzshnaider, Zvonimir Pavlinovic, Ananth Balashankar, Thomas Wies, Lakshminarayanan Subramanian, Helen Nissenbaum and Prateek Mittal
    The Web Conference (WWW) 2019 (Acceptance Rate: 18%). [pdf]

  10. Reconstructing the MERS Disease Outbreak from News
    Ananth Balashankar, Aashish Dugar, Lakshmi Subramanian, Samuel Fraiberger
    ACM Computing and Sustainable Societies (COMPASS) 2019. [pdf]

  11. Working Journal and Conference Papers

  12. Fine-grained prediction of food crisis using news
    Ananth Balashankar, Samuel Fraiberger, Lakshmi Subramanian
    Under Revision at Science Advances [pdf]

  13. Spatio-temporal modeling of urban air quality using low-cost monitors
    Shiva Iyer, Ananth Balashankar, William Aeberhard, Sameeksha Jain, Sujoy Bhattacharya, Guiditta Rusconi, Anant Sudarshan, Rohini Pande, Lakshmi Subramanian
    Under Review at NPJ Climate and Atmospheric Science [pdf]

  14. Predicting Angiographic Disease Status: Where to draw the line between demographically decoupled and jointly trained models?
    Ananth Balashankar, Alyssa Lees, Srikanth Jagabathula, Lakshminarayanan Subramanian. Under Review at the Journal of American Medical Informatics Association (JAMIA) [pdf]

  15. Beyond The Text: Analysis of Privacy Statements through Syntactic and Semantic Role Labeling
    Yan Shvartzshnaider, Ananth Balashankar, Vikas Patidar, Thomas Wies, Lakshminarayanan Subramanian. Under Review at a premiere NLP conference. [arxiv]

  16. Improving Robustness through Pairwise Generative Counterfactual Data Augmentation
    Ananth Balashankar, Xuezhi Wang, Yao Qin, Ben Packer, Nithum Thain, Ed Chi and Alex Beutel. Under Review at a premiere NLP conference.

  17. Granger-Causal Link Discovery in Large Temporal Networks through Conditional Models
    Ananth Balashankar, Srikanth Jagabathula, Lakshminarayanan Subramanian.
    Under Review at Proceedings of the National Academy of Sciences (PNAS). [pdf]

  18. Localized Pollution Hotspots: Inferences from a Three-year Fine-grained Air Quality Monitoring Study in Delhi
    Shiva Iyer, Ananth Balashankar, Rohini Pande, Anant Sudarshan, Lakshminarayanan Subramanian.
    Under Preparation.

  19. Peer Reviewed Workshop Papers

  20. Pareto Efficient Fairness for Skewed Subgroup Data
    Ananth Balashankar, Alyssa Lees, Chris Welty, Lakshmi Subramanian
    ICML workshop on AI for Social Good 2019. [Best Paper!] [arxiv]

  21. Fairness Sample Complexity and the Case for Human Intervention
    Ananth Balashankar, Alyssa Lees
    International Conference on Human Factors in Computing Systems (CHI) 2019 - Bridging the Gap Between AI and HCI Workshop. [pdf]

  22. Unsupervised Word Influencer Networks from news streams
    Ananth Balashankar, Sunandan Chakraborty, Lakshmi Subramanian.
    ACL Workshop on Economics and Natural Language Processing (ECONLP) 2018 [pdf]

  23. Towards Applying Open Domain Question Answering to Privacy Policies
    Yan Shvartzshnaider, Ananth Balashankar, Thomas Wies, Lakshminarayanan Subramanian.
    ACL Workshop on Machine Reading for Question Answering (MRQA) 2018 [pdf]

  24. Causal Inference from News Streams
    Ananth Balashankar, Sunandan Chakraborty, Samuel Fraiberger, Srikanth Jagabathula, Lakshminarayanan Subramanian.
    ICML Workshop on Machine Learning for Causal Inference, Counterfactual Prediction, and Autonomous Action (CausalML) 2018 [pdf]

  25. Other Research

  26. Stable virtual landmarks: Spatial dropbox to enhance retail experience
    Swadhin Pradhan, Ananth Balashankar, Niloy Ganguly and Bivas Mitra
    Communication Systems and Networks (COMSNETS) 2014 [pdf]

  27. Signal-aware data transfer in cellular networks
    Vishnu Navda, Ramachandran Ramjee, Sahil Suneja, Ananth Balashankar
    US Patent 8843169

  28. App Discovery with Google Play: Personalized Recommendations with Related Apps
    Ananth Balashankar, Levent Koc, Norberto Guimaraes
    Google AI Blog 2016. [blog]