forked from etc5523-2020/tutorial-10
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathreport.Rmd
More file actions
97 lines (73 loc) · 2.17 KB
/
report.Rmd
File metadata and controls
97 lines (73 loc) · 2.17 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
---
title: "Report: Fertility"
author: "YOUR NAME"
output: html_document
---
```{r pkgs, include = FALSE}
library(tidyverse)
library(readxl)
library(janitor)
library(broom)
```
Here we've used the fertility data from the [gapminder web site](https://www.gapminder.org/data/) which contains the number of babies
born per woman accross countries in the world. The documentation for this data is available [here](https://www.gapminder.org/data/documentation/gd008/)
.
First we read in the data and process it, but you will need to make some changes to do the full analysis.
```{r raw_data}
fertility_raw <- read_xlsx(
"data/indicator-undata-total-fertility.xlsx"
) %>%
clean_names() %>%
rename(country = total_fertility_rate)
fertility_raw
```
We then pivot our data into long form and process our data
```{r fertility}
fertility <-
fertility_raw %>%
pivot_longer(cols = -c(country), # everything but country,
# what is the name of the new variable we are changing the
# column of names to?
names_to = "year",
# What is the name of the column we are changing the values to?
values_to = "babies_per_woman") %>%
# extract out the year information
mutate(year = parse_number(year)) %>%
# filter so we only look at years above 1950
filter(year >= 1950 ) %>%
# center year around 1950
mutate(year1950 = year - min(year))
```
# Line plot for all countries
```{r line-plot-all}
ggplot(fertility,
aes(x = year,
y = babies_per_woman,
group = country)) +
geom_line(alpha=0.1)
```
# How has fertility changed in Australia?
```{r aus-line-plot}
oz <- fertility %>% filter(country == "Australia")
ggplot(oz,
aes(x = year,
y = babies_per_woman,
group = country)) +
geom_line()
```
## Fit a linear model to Australia
```{r oz_mod}
mod <- lm(babies_per_woman ~ year1950, data = oz)
```
## Plot the model fit against the data
```{r oz_line_plot_mod_overlay}
mod %>%
augment_columns(oz) %>%
ggplot(aes(x = year, y = babies_per_woman)) +
geom_line() +
geom_point(aes(y = .fitted))
```
## Summarise the fit
```{r mod-summary}
glance(mod)
```