CS
Cameron Stewart

Cameron Stewart

Data Scientist | Six Sigma Black Belt

About Me Hi, my name is Cameron Stewart. I am tremendously passionate about Data Science and Machine Learning, and I am focused on transitioning into a full-time role in this field. While facing diverse technical challenges as an Industrial and Systems Engineer for industry leaders such as Toyota, Schlumberger, and Amazon, I became fascinated with analyzing complex data and utilizing predictive modeling to extract key business insights. This motivated me to enhance my skills by becoming a Six Sigma Black Belt in 2019 which has given me a valuable perspective that combines a statistical approach with a project management mindset. In August 2022, I will earn my Data Science Master’s Degree from Southern Methodist University (SMU) which has taught me how to wield the latest tools and methodologies in the field. I currently apply my skills by supporting the SMU’s Student Affairs Division with Data Science initiatives.

Education/Certifications

MS Data Science with Machine Learning Specialization
Southern Methodist University

Expected Graduation - Aug 2022
GPA 4.0
Key Classes: Machine Learning, Natural Language Processing, Time Series, Statistics

Certified Six Sigma Black Belt
American Society of Quality

Received Mar 2019

BS Industrial and Systems Engineering
Texas A&M University

Graduated Dec 2014
GPA 3.4
Minor in Mathematics

Languages/Tools:

Python
R
SAS
SQL
MySQL
MongoDB
RStudio
VS Code
Jupyter Notebooks
Tableau
Power BI
VBA
GitHub

Commonly Used Packages:

Pandas
NumPy
SciPy
Scikit-learn
Matplotlib
BeautifulSoup
NLTK
SpaCy
Tidyverse
Caret
Dplyr
Ggplot2
Plotly
Tswge
Tseries

Projects Below, I have included both personal and academic projects. For high level information on other projects that cannot be publicly shared and my career experience, please review my resume. Feel free to reach out if you have any questions or if you would like to learn more.

Software Screenshot

Time Series Forecasting of Texas Covid Cases

This project utilizes R time series packages tswge and tseries to forecast Texas Covid Cases both one and three weeks ahead. Using ARIMA, Vector Autoregressive (VAR), Multi-Layer Perceptron, and Ensemble models, I was able to forecast the count of Covid Cases and provide the associated confidence interval. Relevant features were extracted from Texas Department of State Health and Google Mobility Data.

GitHub Repository
Software Screenshot

Hong Kong Horse Racing Performance Classification using Machine Learning Methods

Utilizing Python's Scikit-Learn, this project classifies whether a horse will win or show in a given race from Kaggle's Hong Kong Horse Racing Dataset. Methods used: Logistic Regression, KNN, Naive Bayes, Random Forest, and Support Vector Machines. To create additional features for classification, clustering methods were used such as: K-Means, DBSCAN, and Spectral Clustering

GitHub Repository
Software Screenshot

Predicting Income from Census Data

Using U.S. Census Data, this project predicts whether an individual will earn more or less than $50K in 1994. The data was analyzed in R and used methods such as: Logistic Regression, Quadratic Discriminant Analysis (QDA), and Random Forest.

GitHub Repository
Software Screenshot

Natural Language Processing Projects

This is a series of projects that explore utilizing web scraping and Natural Language Processing (NLP) tools to gather insights. The projects are performed using Python and utilize packages such as: NLTK, SpaCy, Flair, and BeautifulSoup. The NLP activities include: Part of Speech Tagging, Chunking, Stemming, Lemmatizing, Text Similarity Comparison, Corpus Clustering, and Sentiment Analysis.

GitHub Repository
Software Screenshot

Predicting Employee Attrition and Income

This project predicts attrition and income for employees at a hypothetical company. Using R to create KNN and Naive Bayes models, I was able to classify which employees were planning to leave in the near future. By creating a Multiple Linear Regression model, I was able to predict employee income to better understand their fair market value.

GitHub Repository
Software Screenshot

Data Lake vs. Data Warehouse Use Case Analysis

This project dives into the differences and uses cases for Data Lakes and Data Warehouses. Using MySQL to create an example Data Warehouse and MongoDB to create a Data Lake, the project demonstrates the functionality differences. The repositories contain Twitter data collected using Python's tweepy and automatically transferred the data to each repository using the packages mysql.connector and pymongo. With SQL and MongoShell, I was able to compare queries of the data.

GitHub Repository
Software Screenshot

Database Management Projects

This is a series of mini-projects that explores database management in MySQL and MongoDB. The projects include creating entity-relationship diagrams, database creation/manipulation, and querying databases. The queries are performed using SQL and MongoShell.

GitHub Repository

Contact For any professional inquiries or project questions, reach out through my email or LinkedIn.