Professional Summary
Data Scientist with experiences on leveraging statistical skills and machine learning techniques to solve
real-world problems, managing large-scale database systems, wrangling raw data from various sources
and formats, and extracting business insights from vast amounts of data. With strong problem-solving and
project management skills, I am always excited to synthesize business thinking, analytical skills and
technical expertise to tackle challenges and deliver key results.
Python / R / SQL / Excel / Tableau / AWS Sagemaker / Hive
Related Experiences R4 Technologies.Inc ​– Ridgefield, CT 05/2019 - present
Data Scientist

Created a customer segmentation model based on consumer purchasing behavior and demographic
information using K-medoids algorithm. Increased 8% of gross revenue of an online retail company by
providing customized recommendations using the segmentation results and conducted an A/B test to
measure the recommendation performance regarding different groups of customers.

Generated 6% lift of net revenue for a beverage company by calculating opportunity scores which
reflect the market potentials for various products across different locations and provide guidance on
generating customized marketing strategy.

Built a Linear Regression model to forecast market demand of a panel manufacturer company based
on Weather, Housing starts, and Historical Transactions datasets, and reduced demand estimation
error rate from 15% to 9.5%.

Established a Random Forest model for an online sport shop to measure customers’ likelihoods to
respond to Mail and E-mail promotions, and yielded a 12% increase of response rate by targeting
High-Potential customers.

Improved the sales volume of an electronics retailer by developing a Taste & Preference model to
produce more accurate lists of target customers on upcoming promotions.

Developed a scalable end-to-end Data Science system which fulfills needs from clients across different
industries by incorporating Data Ingestion, Attribute Model Creation, and Performance Measurement
in a single pipeline. Citibank, Beijing Branch ​– Beijing, China 10/2016 - 12/2016
Analyst Trainee

Handled data measured by GBs on customers’ credit records and performed data manipulation and
analysis using SQL and R.

Constructed a risk model to evaluate the customers’ credit card applications with Logistic Regression
and conducted client-centric analysis. Yelp Analytics Project (With Python, SQL, Tableau) ●
Loaded Yelp Open Dataset (Around 7.5 GBs, 5 files, millions of records) to a local Postgres relational
database and manipulate them through SQL.

Performed Exploratory Data Analysis with Tableau to visualize geographical, textual, and categorical

Applied Text Analytics techniques, such as stopwords removal, lemmatization, and stemming, to
process 6 million pieces of review data, and built a Bag-of-Words sentiment analysis model.

Selected L1 Logistic Regression model over Random Forest, SVM, and Naive Bayes, and achieved 97%
Columbia University​ – New York, NY 9/2017 - 2/2019
​Master of Art​: Statistics (STEM)
GPA: 3.7
Beijing Normal University(BNU)​ – Beijing, China 9/2013 - 7/2017
​Bachelor of Science​: Mathematics
GPA: 3.8


