XXXXXX Unit 2, Cambridge, MA 02139 | (XXX) XXX-XXXX | XXXX@XXXX.XXX
https://www.linkedin.com/in/vignesh-murali/ | https://github.com/vigneshmXXXXXX with prior experience in fast-paced startup environments a
nd an analytical a
pproach t o problem-solving. I aim
to revolutionize industries through data-driven decision making. EDUCATION Master of Science in Information Systems,
Northeastern University, Boston, MA Bachelor of Engineering in Information Science,
Ramaiah Institute of Technology, Bangalore, India SKILLS ●
Programming Languages: Python, SQL, Java, C#
Machine Learning Libraries: Scikit-learn, Keras, H2O.ai, AutoML, Spark MLib
Data Analysis Tools: Pandas, NumPy, SciPy, R, Excel, Apache Spark
Data Visualization Tools: Matplotlib, Seaborn, Tableau, Google Data Studio
Natural Language Processing: Spacy, NLTK, Regular Expressions, LSTMs, GRUs, IBM Watson NLU
Databases: Oracle 18c, MS SQL Server, MySQL, MongoDB
Cloud Technologies: Amazon Web Services, Google Cloud Platform, IBM Cloud WORK EXPERIENCE
EMR.AI Inc., San Francisco, CA
NLP Data Scientist, (07/2018)-(01/2019) ●
Engineered semi-supervised data from medical text corpora by using S
pacy, NLTK a
nd regular expressions t o train
deep learning models for detection of sections in medical reports.
Built a text classifier using Keras to classify unlabeled data into their respective medical categories with Recurrent
Neural Networks and pre-trained word embeddings thereby achieving 71% accuracy.
Identified emotional polarization in h
ealthcare by performing sentiment a
nalysis on medical d
ictations using A
Spark’s MLib to build binary classifiers (Logistic Regression, Support Vector Machine) and IBM Watson’s NLU
API to further gain insights from our medical data.
Designed Jupyter dashboards using Seaborn and Matplotlib to showcase my findings and data insights t o the rest o
Preprocessed and cleaned data using Pandas and NumPy for predicting medical codes using convolutional neural
Stored data on all anomalous events with screenshots during the process automation workflow into a NoSQL
database via MongoDB (used GridFS for images). PROJECTS
E-Commerce Sentiment Analysis using Apache Spark and MLib: Performed EDA, preprocessed a
on 20,XXXXXX reviews. Built classification models such as Logistic Regression, Support Vector Machines,
Gradient Boosted Trees and Naive Bayes algorithm to classify comments as positive or negative. The best model
obtained an accuracy of 90% and an AUC-ROC score of 0.84.
Prediction of Cryptocurrency prices using Machine Learning: Performed Exploratory Data Analysis and
cleaned dataset containing LiteCoin price data. Utilized Ridge Regression, Random Forest Regression and a Deep
Neural Network Regressor(Keras) to predict LiteCoin prices. Obtained an RMSE of 1.14 and R2 score of 0.999.
Classification of Alzheimer's patients using Machine Learning: Applied various conventional m
classifiers as well as neural networks on a dataset containing MRI data from thousands of patients with dementia
over the age of 60. The best classifier was a custom deep feed-forward multilayer perceptron which achieved an
accuracy of 92.55 %.