XXXX@XXXX.XXX; (XXX) XXX-XXXX; https://github.com/jsheng0901
Highly self-motivated Data Scientist with two years data science experience focusing on Machine learning, NLP,
and Computer Vision problem solving. Highly understand Python data manipulation based on business
requirement. Immigration status: Appending adjustment status. Fluent in both English and Chinese. TECHNICAL SKILLS
Advanced: Python (Scikit-learn, tensorflow, keras), R (e1071, caret) Median: GCP, AWS, SQL, Spark, Tableau, Linux EDUCATION
Columbia University New York City, NY
Master of Art in Statistics (GPA: 3.5) Sep 2018 – Dec 2019
Relevant Coursework: Statistics Machine learning, Deep Learning, Applied Data Science, NLP
Indiana University Bloomington Bloomington, IN
Bachelor of Science in Math, Minor in Finance and Computer Science Aug 2014 - Aug 2018
Major GPA: 3.9/4.0; Overall GPA: 3.7/4.0; Honors: Awarded for academic excellence (top 5%) WORK EXPERIENCE
UTOFUN New York, USA
Data Scientist Intern Jun 2019 – Aug 2019
Daily responsibility: data integrity by excel or python, data extract by SQL, data visualization repot by tableau.
Project: Based on customer search behaviors on company website, analyze which feature mostly influence
customer decision that they will finally contact agent.
Result: based on analysis results to let engineer change web design style. Finally, improved 10% monthly
company website searching volume and also improved contact agents volume.
Method: Extracted data and merge data in SQL. Basically, clean and EDA. Building Logistic Regression, Random
Forest and XGB model to analysis feature importance. Francis Peltast Partners New York, USA
Data Analytics Intern Oct 2018 – Dec 2018
Using gender, full or part-time status, etc. to predict the enrollment trend of undergraduate students in USA.
Method: Conducted stepwise rule to model, using t-test for stop rule. Then built Linear Regression. PROJECT
Movie Review Sentimental Analysis Aug 2019 – Sep2019
Applied NLP model to movie review text and obtained 70% accuracy on 5 different category sentiment labels.
Method: using trained embedding matrix combine LSTM, GRU, CNN built 3 models and ensemble the final result. Identify toxicity in online conversations Aug 2019 – Sep2019
Using online comments to classify six categories toxicity group labels serve points by applying NLP model.
Result: Obtained over 95% on whether comments are toxic and over 85% on six subgroups toxicity.
Method: using two pre-train embedding word matrix combine LSTM model. Object Detection Jul 2019 – Aug2019
Predict a tight bounding box around object instances across 500 categories on 9M images.
Method: Applied SSD and Inception-ResNet pre-trained TensorFlow model from TensorFlow Hub. CNN in Humpback Whale Identification Jun 2019 – Jul2019
Identify whale images for 5005 categories and over 25000 images. Achieved 72% multiclass accuracy.
Method: CNN with data augmentation with regulization methods and also build transfer learning ResNet50. Machine Learning in Helping Navigate Robots May 2019 – Jun2019
Using sensor data to predict nine floor types the robot is on. Achieved above 82% multiclass classify accuracy.
Method: Aggregate some sensor data to build Random Forest, LBG and DNN models. Machine Learning for Forecasting Apple Stock Price and Trend Feb 2019 – May 2019
Using fundamental finance data and quarterly stock price data to predict quarterly stock price and trend.
Result: Getting RMSE near 20 for regression and above 80% accuracy for predicting trend.
Method: Building linear, ridge, lasso regression, SVR, RF, XGB and DNN models to forecast stock price.
Method: Building logistic, SVM, KNN, RF, XGB to forecast stock trend.