Hello World!

I love mining data and the possibilities hidden underneath!

Major interests include computational linguistics, machine learning, deep learning, and climate science. At times I write for my personal blog as well.

  • MS, Data Science
    Columbia University, 2022
  • Ex-Associate
    Bain & Co., Gurugram, IN
  • B.tech, Mathematics & Computing
    Delhi Technological University, 2019
  • Resume

Publications

Projects

Capstone Project in collaboration with Accenture: Building knowledge graph and natural language query system on unstructured text documents

Worked in a team of 5 and developed a pipeline for building knowledge graph out of unstructured text documents. Completely owned research & development of building a natural language query system for fetching information out of this knowledge graph

Using Shapley Values to understand irregularities in Earth Systems (LEAP, Columbia University)

Explored how close or far is an earth system from real world physics specifically when it comes to predicting surface pCO2

Implementing physics-guided RNN for lake temperature stratification

Implemented physics-guided RNN as discussed in the research work by Jia et al, 2019 wherein a physics related loss is added in addition to the usual RNN loss to ensure that the predicted outputs agree with energy conservation laws. Bayesian optimization was used for fine-tuning models

Hurricane economic loss prediction (LEAP, Columbia University)

Analysed relationships between different hurricane characteristics and performed feature engineering. Ultimately built a linear regression model to predict the economic loss caused by hurricanes

Spectral Representation for CNN (Final Project, Neural Networks & Deep Learning, Columbia University)

Implemented complex-coefficient spectrally parameterized convolution filters as well as a mechanism to perform Fourier convolutions using Keras

Mercari Price Prediction (Final Project, Applied ML, Columbia University)

Led a team of 5 through this end-to-end applied ML project that involved exploratory data analysis, data cleaning, feature engineering, model selection, and model training

Exploring legal business licensing in NYC (Final Project, Exploratory Data Analysis & Visualization, Columbia University)

Analyzed data on business applications provided by DCA to understand the distribution of various businesses in the city as well as identify any disparity in the issuance of licenses specially amongst marginalized neighborhoods

Speech Emotion Recognition

Experimented with various statistical (GMM-HMM), machine learning (logistic regression, SVM), and deep learning (DNN, CNN) models as well as different representations for audio signals while building a Speech Emotion Recognition system. Achieved best in class accuracy (~85%) with DNN architecture when applied on audio signals compressed using PCA.

Object Localisation (Flipkart GRiD 2019)

Built a CNN-regression model for detecting bounding boxes around independent objects in images (accuracy ~87%). Qualified till 3rd round (pre-final) in the contest.

SATARK Crime mapper

Built a digital geospatial crime mapper and data visualization tool using R-Shiny to provide police officials with a comprehensive overview of the crime trends prevailing in their jurisdiction in real time.

Retail analytics & Customer Segmentation using Partioning Around Medoids: EXL EQ 2018

Worked on this project during our participation in the EXL EQ 2018 Contest. Were provided with the data of a shopping center based in San Francisco and were asked to draw insights from this data and recommend ways of increasing the revenue for the center.

Recent posts

animated

All about Alluvial Diagrams (and plotting them in R using ggalluvial)

animated

Quantifying the impact of lockdown & social distancing on COVID-19 using SEIR

Spatial Data Mapping in R

A few accomplishments

  • Sep 2022, 2021: Two times recipient of the "Programs Fellowship" by International House, NYC
  • Dec 2020: Research work (KeyGames) among the 16 submissions out of 600+ to be awarded the designation of "Outstanding Paper" at Coling 2020
  • Apr 2020: Overall Winner, Global Baincubator Hackathon 2020. 35 Teams from 21 Bain & Co. offices across the globe participated in the event
  • Jun 2019: Graduated with a bachelor's in Mathematics & Computing from Delhi Technological University (CGPA:9.03; Department Rank 2)
  • Mar 2018: Led a team of 6 in the national finals of Indian government sponsored Smart India Hackathon, 2018. Made it to the final 15 teams
  • Feb 2018: National Runner Up & Award for "Best Visualization", EXL EQ 2018. Competed against 680 teams from top 20 engineering colleges (IIT/NIT) in the annual national data science contest organized by EXL Analytics