Skip to main content

jbrnbrg

Tag: example

Mapping NYC's 23K Trashcan Locations

The New York City Department of Sanitation (DSNY) - the largest department of its kind in the world - is responsible for the city’s garbage-collection operation. One component of this operation is the regular emptying of 23,000+ street-level trashcans - or as they call them: “litter baskets.” DSNY offers up their geo-coded litter-basket inventory - refreshed monthly - through the NYC OpenData portal and in today’s post I’ll walk through how to create an interactive, 3D fly-over map of these litter-basket locations with the help of Mapbox’s Mapbox GL JS API in R.

Forecasting National Park Visits

Following full vaccination, folks have begun venturing outdoors for the first time in long while - myself included! In support of outdoor-activity planning, today’s post is going to cover forecasting national park visits for each month of 2021. Since there are 63 National Parks in the US across 30 states and two territories, I will employ the sweep package to address the scale of forecasts to be made/reviewed all while keeping the data in a tidy format.

Basic EDA for Multilevel Data in R

Multilevel data is data that includes repeated measures of the same subject or variables over some period of time e.g: A five-year study on the test scores of students grouped by cohorts and classes across a state’s K-12 schools Patient satisfaction of care grouped by attending doctors and their respective practices Police use of force incidents by race of suspect grouped by precinct, patrol, and arresting officer Analysts can encounter data of this type in just about any conceivable industry that produces data and the grouping structure must be fully understood to properly explore, model, and simulate before creating any usable insight.

Web Scraping for Public Health Data

Part of the new normal for many data analytics professionals is being able to obtain Covid-19 data by geographic location. To address this need, today’s post is going to walk through how to scrape the repo for the NYC Department of Health and Mental Hygiene using the flexible rvest library to to create a time-series of positive cases by NYC zip code.

RStudio in the Cloud via Docker & AWS EC2

In today’s post I am going to go through the steps needed to get RStudio up-and-running in the cloud using Amazon’s free-tier EC2 services, Docker, and rocker/tidyverse (free-tier setups can still result in expenses so please read the documentation carefully!). If you’ve never done anything with AWS before, the first thing you’ll need to do is create and activate an AWS account. After that, as long as you’ve got RStudio, Git BASH, and a Win 10 PC, the remainder of the instructions should work without issue.

Translating Tract-Level ACS Data to NYPD Precincts

In this posts I’m going to demonstrate how to get NYC census data at the tract level to estimate census data at the NYPD-precinct level. There are 77 NYPD precincts serving five boroughs of NYC and each precinct contains multiple census tracts. To get the census-level demographics on a per-precinct level, I’ll need a way to aggregate the tract data into precincts - let’s get started!

Choropleths with 311 Data Using in R

In today’s post I build upon the last post to demonstrate the use of the tmap package to make a choropleth map in R. I also include instructions on how to use the sf library to obtain the block-level census tract IDs included in the tidycensus ACS data for the State Plane coordinates within the NYC 311 data. If you’re new to making maps, it’s important that you recognize the differences between geographic coordinate and projected coordinate systems.

Equity and 311 Heat & Hot Water Service Requests

In this post I’ll be investigating equity between different census tracts in NYC based on economic measures and the open-to-close service request (SR) duration for 311 heat/hot water complaints. Census Tract Granularity If you’ve checked out my previous posts, you know I’ve done a bunch of work with NYC’s OpenData at the ZIP-code level. Today I’m going to go a step further and obtain NYC data with latitude and longitude coordinates and merge it with census data by census tract from the 5-year American Community Survey (ACS) from 2016.

Linear Mixed Effect Models with lme4

This post is going to follow along and expand on a well-known tutorial for Mixed Effect Linear Models by Bodo Winters, a lecturer in Cognitive Linguistics at the University of Birmingham, Dept. of English Language and Applied Linguistics. While this tutorial will generally follow Winter’s, I have added additional detail for EDA, expanded the details re: code & errors that are encountered, and added notes from many other sources on Mixed Effect Models (see the Sources section at the end for links) - let’s get started!

Plotly Scattermapbox with R and Python

I recently tried out plot.ly’s open source graphing library and found it to be challenging but worth the effort. Challenging, in that the documentation has some gaps but worth it in that the features of the standard plots are responsive (via .js) and feature-rich right out of the box. I tried out both python and R versions of plot.ly and I found the R version to be the most straight-forward to use and deploy via shinyapps.

Binary Response and GGally

If I am working on data with a binary response, I like to use the GGally package for its ggpairs function. It provides a way to look at a lot of different data types at the same time but the setup and customization can be a little daunting. In this example, which leverages this crime data, I demonstrate how ggpairs can be used to reveal a lot of information in a single figure.