Ronan's #TidyTuesday blog: Adjusting variable distribution and exploring data using mass linear regression

Ronan Harrington

# Creating tbl_df with gross_inv_chain values
untransformed_tbl_df <- tibble(
  gross_inv_chain = tt$chain_investment$gross_inv_chain,
  transformation = "Untransformed"
  )

# Creating tbl_df with log10(gross_inv_chain) values
log10_tbl_df <- tibble(
  gross_inv_chain = log10(tt$chain_investment$gross_inv_chain),
  transformation = "Log10"
)

# Combining the above tibbles into one tbl_df
gross_inv_chain_tbl_df <- rbind(untransformed_tbl_df, log10_tbl_df)

# Plotting distribution of inflation-adjusted infrastructure investments
gross_inv_chain_tbl_df %>%
  ggplot(aes(x = gross_inv_chain, fill = transformation)) +
  geom_histogram(show.legend = FALSE, position = "identity",
                 bins = 12, colour = "black") +
  facet_wrap(~transformation, scales = "free") +
  labs(fill.position = "none", y = NULL,
       x = "Gross infrastructure investments adjusted for inflation (millions USD)",
       title = "Distributions of untransformed and log transformed infrastructure investments",
       subtitle = "Log transformed investments are more normally distributed") +
  scale_fill_brewer(palette = "Set1") +
  theme_classic()

Adjusting variable distribution and exploring data using mass linear regression

Author

Affiliation

Published

Citation

Introduction

Setup

Plotting distribution of inflation-adjusted infrastructure investments

Exploring a data set using mass linear regression

References

Footnotes

Corrections

Reuse

Citation