Projects

Music Artist Recommendation with Implicit Feedback Data

Implementing Alternating Least Squares on Last.fm data to create an artist recommendation system for users

Recent Posts

Motivation One of my colleagues has been working on a project analysing a particular type of student survey data. The data is partitioned into a handful of sub-surveys measuring (essentially) different categorizations of non-cognitive skills and personality/learning traits. One aim of the project was to evaluate how combinations of metrics from the individual tests could be used to predict student outcomes. There were over 100 variables he was intending to analyze, many of which had seemingly overlapping definitions, so collinearity was a big concern going into it.

CONTINUE READING

Chapter 2: The Experimental Ideal Angrist and Pischke begin building the foundation for concepts fundamental to causal inference in the second chapter of Mostly Harmless Econometrics. It opens with an anecdote about a 1962 experiment conducted to assess the effects of an early-intervention in preschoolers, highlighting the project’s excellent randomized design. The study itself was used as evidence for the creation of the Head Start preschool program but for us it matters because it reflects the value of random assignment in research design.

CONTINUE READING

Motivation This will be part of a series of posts I am writing to help bring my chops on causal inference up to par. Since I have not taken a formal course on the subject and I won’t be able to for some time, self-study is my only option. Fortunately I’ve been recommended quite a few resources from my instructors, one of which being Mostly Harmless Econometrics by Joshua D.

CONTINUE READING

Recently I’ve been working on a project with the goal of understanding how students choose between course modalities (online or in-person). My goal so far has been to test a handful of models to come up with the most accurate way of predicting the likelihood that a student will take a course online. Hopefully, this best-performing model would offer some interpretability so that I can see which variables offer the most predictive power.

CONTINUE READING