The Fun Begins

This blog documents my journey from a 13-year career in Patent Law to a career in Data Science. I am presently studying for a Master of Science in Physics at the University of Washington. I have been a part time student since January of 2018. Next quarter, I will be transitioning to a full-time student to accelerate my completion of my M.S. For summer quarter, I will be working on my capstone project. I intend to complete my studies in August of 2019.

Faces built from Singular Value Decomposition modes of a face from the Yale faces data set. The first image is a single mode, each image has progressively more modes until the face becomes recognizable.

So far, I have taken a data science and machine learning boot camp, I have started some machine learning related research with the Large Hadron Collider ATLAS Group, and I am taking an applied math course for my MS program, utilizing machine learning techniques. I am also regularly attending meetups for various data science topics hosted by groups such as Metis and Thinkful.

I will be chronicling my projects and experiences here.

In June of 2018, I started with Jose Portilla’s Python for Data Science and Machine Learning Bootcamp with Udemy. Over the summer I learned many libraries for data science visualization and machine learning including numpy, pandas, matplotlib, seaborn, scikit-learn, plotly, cufflinks and choropleth. I learned machine learning techniques including Linear Regression, Logistic Regression, K-Means Clustering, K Nearest Neighbors, Decision Trees and Random Forests, Principal Component Analysis, Support Vector Machines, Natural Language Processing, PySpark, and TensorFlow. Some of these related closely to techniques I learned in Physics. For example, Logistic Regression is essentially Fermi-Dirac statistics applied to macroscopic, mundane things instead of electrons.

From Wikipedia.
From Wikipedia

My work with the bootcamp prepared me for basic Python skills which I then used in September of 2018 to begin some training for my research with the CERN ATLAS experiment. I used Python to prepare a Monte Carlo simulation of a high energy particle decaying into jets of partons. The initial particle decayed into two partons with energies and 3 dimensional momenta determined by a probability distribution. Each of those partons decayed into two partons (and so forth) until they finally reached a lower stable energy for hadronization. I then applied clustering algorithms to the data to cluster the decay events into jets based on their relative angular separation and energies. The clustering algorithms included KT clustering, Anti-KT clustering, and Cambridge-Aachen. I plotted the number of jets for various parameters for each of these algorithms.

I just finished a course in applied math at UW, AMATH 582 “Computational Methods for Data Analysis.” We used MATLAB for data analysis and machine learning. It would not be proper to post homework solutions online. However, I will share a few details and plots that do not give away solutions. The projects are described in: Kutz, J. Nathan. Data-Driven Modeling and Scientific Computation: Methods for Complex Systems and Big Data, Oxford University Press, 2013.

My first project included processing noisy 3D ultrasound images in the Fourier domain and filtering the results in order to locate a marble in a dog’s digestive system.

My second project included using Gabor Filtering to analyze spectral content of music data.

My third project included Principal Component Analysis of three videos taken by a cell phone of a paint can suspended by a spring bouncing in a vertical direction.

My fourth project included Singular Value Decomposition of the widely used Yale face data set which is often used for training facial recognition models. Additionally, I analyzed spectrograms and applied machine learning algorithms including Naïve Bayes in order to train a model to identify artists and genres of music.

My fifth project included Dynamic Mode Decomposition of videos to separate foreground and background components of each video frame.

My special project included using Naïve Bayes classification as a machine learning model to characterize aluminum workpieces as being defective or non-defective based on images, and characterizing lead legs of integrated circuits as being defective or non-defective based on images.

I will describe each project and share a few screenshots of my results (but no code) in the following posts.

Next quarter I will be taking another applied math course, “High Performance Scientific Computing.” We will be covering parallel programming, threading, MPIs, AWS and other topics. I will also be taking an applied math course “Inferring Structure of Complex Systems.” We will be covering more advanced statistical inference and machine learning techniques, with a heavy emphasis on dimensionality reduction.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.