Hello World!

I'm Ashwin John Chempolil, an aspiring Data Scientist with a passion for finding patterns in everyday life as well as underlying data

Get in touch chempolil.a@northeastern.edu

Education
Sep 2019 - Dec 2021
M.S. in Data Analytics Engineering

Coursework: Engineering Probability and Statistics, Machine Learning in Engineering, Neural Networks and Deep Learning, Data Mining in Engineering, Data Management and Database Design, Fundamentals of Cloud Computing

Aug 2014 - May 2018
B.Tech in Production Engineering

Coursework: Product Design and Development, Advanced Decision Modeling, Robotics, Marketing Management, Computer Programming and Numerical Methods, Mechatronics

Skills
Languages
  • Python
  • SQL
  • R
Libraries
  • pandas
  • numpy
  • scikit-learn
  • tensorflow
  • keplergl
  • nltk
Tools
  • Git & Github
  • Tableau
  • SPSS
  • MATLAB
  • MS Office Suite
Databases
  • MySQL
  • mongoDB
Experience
Sep 2021 - Dec 2021
Graduate Teaching Assistant (IE 6200)
Jan 2021 - Jul 2021
Data Scientist Intern
Jan 2017 - Jan 2019
Creative Director
View My Resume
Projects

Collaborated and took ownership in developing and implementing a serverless architecture using AWS services to automate the inspection process of widget manufacturing process to ensure the widgets are within compliance standards.

AWS IAM Amazon CloudWatch Amazon S3 Amazon Rekognition AWS Lambda Serverless Compute data lake Transfer Learning Amazon SNS

Developed an end-to-end Image Caption Generator by combining deep CNNs (ResNet101) for image classification with RNN (LSTM) for sequence modeling with soft attention using encoder-decoder framework to generate descriptions of an image on a large MSCOCO dataset in python using tensorflow library.

Python Google Colab Tensorflow LSTM CNN RNN ResNet101 Transfer Learning seq2seq soft attention Deep Learning encoder-decoder

Built a Database that's similar to the setup of an Insurance Service Provider's Enterprise level Data Repository. Collected the tweets of various Insurance Provider using Twitter API and transformed all the data to fit into a uniform dataset template.

Python Jupyter Notebook MySQL Workbench AWS RDS sqlalchemy Twitter API

A web app to classify Email as spam or ham, using a Multinomial Naive Bayes model trained and hosted on Heroku platform.

Python streamlit heroku Machine Learning nltk multinomial naive bayes

Developed a NoSQL Database after web scraping top 300 video game records from vgchartz.com using BeautifulSoup and requests library, and wrangled it with data obtained from rawg.io API data and stored it on mongoDB atlas cluster.

Python requests Web Scraping BeautifulSoup mongoDB Twitter API

Analyzed an AB-test to determine the placement of first gate at level 30 or at level 40 for player retention of over 90,000 players and determined the confidence in difference of the retention rate of the two AB-groups using bootstrap analysis for the mobile game 'Cookie Cats'.

Python pandas A/B Testing bootstrap analysis

Conducted and illustrated various statistical analysis exploring different seasonal trends on Taxi fare and ridership density in 5 different boroughs. Trained 3 ML models to predict the taxi fare and increased the accuracy of the model (Gradient Boosting) 33% over the baseline model (Mulit-linear regression).

Python pandas Jupyter Notebook statistical analysis plotly cufflinks Machine Learning Gradient Boosting regression Multi-linear Regression Random Forest Regression

Implemented logistic regression based on gradient iterative approach on a dataset of 50,000 records and 61 features after oversampling the minority class using SMOTE function to determine whether the client will subscribe a term deposit.

Python Visual Studio Code imblearn statsmodels

Applied advanced statistical methods to understand precise user behaviors and attributes to retain customers and employed feature selection techniques (LassoCV, Random Forest) to explore top 18 features that contribute towards customer dissatisfaction to ensure fact-based decision making. Deployed 6 machine learning classification algorithms to predict the customers willingness to churn and evaluated the F1, AUC score and log loss error of the models.

Python Visual Studio Code scikit-learn re seaborn LassoCV Random Forest