January, 2018

Introduction

  • Day 1 - Getting started
  • Day 2 - Functions & Spark
  • Day 3 - Tidyverse
  • Day 4 - Plotly
  • Day 5 - Shiny Introduction
  • Day 6 - Reactivity
  • Day 7 - Modules
  • Day 8 - Shiny Project

Day 4 - Plotly

Day 4 - Agenda

  • Scatter plots
  • Line plots
  • Bar charts
  • Heatmaps
  • Box plots
  • Histograms

Data Science Toolchain

Scatter plots

diamonds%>%
  dplyr::sample_n(1000)%>%
  plot_ly(colors = pal_deloitte)%>%
  add_markers(
    x = ~carat, 
    y = ~price, 
    color = ~color,
    size = ~carat, 
    text = ~paste("Clarity: ", clarity)
  )

Scatter plots

Line plots

economics_long%>%
  plot_ly(
    x=~date,
    y=~value,
    color=~variable,
    colors = pal_deloitte,
    type="scatter",
    mode="lines"
  )

Line plots

Bar charts

diamonds %>% 
  count(cut, clarity) %>%
  plot_ly(colors = pal_deloitte)%>%
  add_bars(
    x = ~cut, 
    y = ~n, 
    color = ~clarity
  )

Bar charts

Bar charts

diamonds%>%
  group_by(color,clarity)%>%
  summarise(n=n())%>%
  mutate(
    nn=sum(n),
    prop=n/nn
  )%>%
  plot_ly(x = ~color,colors = pal_deloitte)%>%
  add_bars(
    y = ~prop, 
    color = ~clarity
  ) %>%
  layout(barmode = "stack")

Bar charts

Heatmaps

diamonds%>%
  group_by(cut, clarity) %>%
  summarise(N=n())%>%
  plot_ly() %>%
  add_heatmap( 
    x = ~cut, 
    y = ~clarity, 
    z =~N
  )

Heatmaps

Box plots

diamonds%>%
  plot_ly(colors = pal_deloitte)%>%
  add_boxplot(
    x = ~cut, 
    y = ~price, 
    color = ~clarity
  ) %>%
  layout(boxmode = "group")

Box plots

Histograms

plot_ly(alpha = 0.6,colors = pal_deloitte) %>%
  add_histogram(x = ~rnorm(500)) %>%
  add_histogram(x = ~rnorm(500) + 1) %>%
  layout(barmode = "overlay")

Histograms

Histograms

plot_ly(alpha = 0.6,colors = pal_deloitte) %>%
  add_histogram(x = ~rnorm(500)) %>%
  add_histogram(x = ~rnorm(500) + 1) %>%
  layout(barmode = "stack")

Histograms

Histograms

diamonds%>%
  plot_ly(colors = pal_deloitte2)%>%
  add_histogram2d(x = ~carat, y = ~price)

Histograms

Histograms

diamonds%>%
  plot_ly(colors = pal_deloitte2)%>%
  add_histogram2dcontour(x = ~carat, y = ~price)

Histograms

Exercises

  1. Read in the loans data into spark and compare the actual and expected balance over time
  2. Using the same data; calculate the hazard rate term structure and compare that with our expectation (i.e. the PD we simulated from)

HINT: The calculation is similar to the transition rate, except that this time just use the made_payment instead of the cd_bucket.