Data Analysis

Both in my scientific research and in other projects, I enjoy working with data because it is exciting to identify patterns in the data, to visualize the data, and to draw conclusions from them.

I am interested in data science and am actively expanding my technical skills by diving into python programming (using numpy, pandas, matplotlib, scikit-learn, jupyter) as well as machine learning algorithms and data visualization.

You can find a selection of my data analysis work below, both from my work in physics and from side projects. Additional project work can also be found on GitHub and in my blog.

Examples of Data Analysis Work

House Sales in King Countyimg-price-sqft

Based on data on house transactions in the Seattle area, I used different regressors, feature selection and feature generation techniques to build a regression model capable of predicting the sales price based on the other properties of a house. Read more…

New York Subway Data

mta-1In this project, I analyzed a data set of entry and exit data of turnstiles in New York’s subway stations to study commute in the city as well as the ridership distribution over the course of a day and a year. I also used a linear regression model to predict ridership based on holidays, weather, and other features. Read more…

Machine Learning: Classifying Wine

1-lrI used a variety of classification algorithms implemented in scikit-learn to build models capable of predicting which of three cultivars a wine sample belongs to based on 13 chemical constituents. In order to visualize the different models, I used principal component analysis to reduce the dimensionality of the features. Read more…

Evaluation of GPS Tracks

trace-on-mapThis python program imports gpx files recorded with the Runkeeper app, extracts statistics, visualizes runs (in elevation and pace profiles, and on maps) and compares multiple runs. Read more…

king-2King Analysis

This is a program which automatically performs a complex analysis of an experimental data set. The resulting information are valuable for benchmarking new models which are being developed in computational physics. Read more…

multi-thumbMultidimensional Spectroscopy Data

This is a brief overview of the types of analyses I’ve performed on multidimensional data obtained from spectroscopy experiments. Read more…