Skip to main content

jbrnbrg

Predicting House Sale Price with the Ames Housing Data

In this post I’m going to perform simplified bottoms-up EDA, model development, prediction, and model evaluation using a public data set that contains data for houses that sold in Ames, Iowa. This post essentially revisits a previous analysis I performed in a group for my MS degree program at CUNY. The main differences here will be that my analysis will be abbreviated to illustrate my skillset, I’ll be using python instead of R, and I will be relying on python’s fantastic scikit-learn library to perform the regressions.

Fraudulent Transaction Detection with GBM

Introduction In this post I will be creating a predictive model to identify fraudulent credit card transactions on a public data set from kaggle. Along the way, I will be reviewing some of the functionality of R’s gbm package for predictive modelling. Data Overview The data set contiains 280K+ records of credit card transactions from a two-week period in which a small percentage of transactions have been labeled as fraudulent.

EMS Call Volume Forecasting with Tensorflow BSTS

In this post I will be creating a call-volume forecast with my NYC EMS data using python’s tensorflow_probability library. In particular, I’ll employ that library’s sts method that provides the ability to create Bayesian structural time series (BSTS) forecasts. Structural Time Series Forecasting Unlike traditional time-series model forecasting, BSTS models can be used for prediction for multiple correlated time series while handling large variations in the short term. The Bayesian part, with respect to multivariate variables, assists the model to avoid over-fitting and identify correlations among the variables.

LSTM for EMS Call Volume Prediction

Multivariate time-series forecasting is a non-trivial task when it comes to complex seasonality. Forecasting: Principles and Practice by R.J. Hyndman and G. Athanasopoulo, gives several powerful examples if you’re using R and dealing with seasonality using Fourier terms for each seasonal period (kinda like I did in this post). In this post I’ll be using’s Keras RNN’s module for LSTM in python and forecasting the next 24 hours of call volume, per hour, into the future using the past 24 hours of my EMS data along with hourly weather data from Central Park via the NOAA.