-
Notifications
You must be signed in to change notification settings - Fork 28
Protoextrapolate #1112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Protoextrapolate #1112
Changes from all commits
30e6c68
df42c51
0520fa9
dcf4df5
f2672a4
d7f5a59
42d7440
ee8c56e
685834f
3492f43
384f3bb
36283c8
bde8436
c66dc39
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,83 @@ | ||
| # extrapolators.R | ||
| # helper extrapolator functions for use in base year changes | ||
|
|
||
| #' extrapolate_constant | ||
| #' | ||
| #' computes the mean of last n original year values (with | ||
| #' \code{mean(., na.rm = TRUE)}) and uses this constant value to fill in NA | ||
| #' values corresponding to the extrapolation years at the end of the time series. | ||
| #' NOTE that this extrapolator does not touch any of the original data. It ONLY | ||
| #' fills in data corresponding to the extrapolation years. It is the user's | ||
| #' responsibility to account for this behavior in preparing raw data to be | ||
| #' extrapolated. | ||
| #' | ||
| #' @param x Vector of values with NA's corresponding to the extrapolation years | ||
| #' to be filled in via constant extrapolation of the mean of last n original | ||
| #' year values. | ||
| #' @param n Number of final original year values to be averaged to provide the | ||
| #' filler value. Averaging is done with \code{na.rm = TRUE}. | ||
| #' Defaults to n = 1: using the last recorded year's value to constantly fill in | ||
| #' the tail of vector missing values corresponding to extrapolation years. | ||
| #' @param numExtrapYrs The number of NA values at the tail end of each vector that | ||
| #' correspond to the extrapolation years and will be filled in. This will always | ||
| #' be known for each data set in each chunk. | ||
| #' @details Computes the mean of last n original year values of input vector x | ||
| #' and uses this constant value to fill in NA values in x that correspond to the | ||
| #' added extrapolation years. | ||
| #' @return Vector with all NA values replaced with the specified mean. | ||
| #' @importFrom assertthat assert_that is.scalar | ||
| #' @importFrom utils tail | ||
| #' @author ACS June 2019 | ||
| extrapolate_constant <- function(x, n=1, numExtrapYrs){ | ||
|
|
||
| # Some assertion tests to make sure working on right data types | ||
| assert_that(is.numeric(x)) | ||
| assert_that(is.scalar(n)) | ||
| assert_that(is.integer(numExtrapYrs)) | ||
|
|
||
|
|
||
| # The constant value to fill in all extrapolation year NA's with. | ||
| # = mean(. , na.rm = TRUE) of the last n values in the original | ||
| # data. | ||
| index_last_n_orig_yrs <- (length(x) - numExtrapYrs - n + 1):(length(x) - numExtrapYrs) | ||
| meanval <- mean(x[index_last_n_orig_yrs], na.rm = TRUE) | ||
|
|
||
|
|
||
| # fill in only the tail end, extrapolation years | ||
| # NA values with this constant. | ||
| index_extrap_yrs <- (length(x) - numExtrapYrs + 1):length(x) | ||
| x[index_extrap_yrs] <- meanval | ||
|
|
||
| return(x) | ||
| } | ||
|
|
||
|
|
||
|
|
||
| #' last_n_nonNA | ||
| #' | ||
| #' finds the last n non-NA values in an input vector. | ||
| #' A convenience functions for users who wish to customize | ||
| #' their extrapolations beyond the default or who wish to | ||
| #' identify NA values in their original (unextrapolated) | ||
| #' data. | ||
| #' | ||
| #' @param x Vector with some NA values | ||
| #' @param n The number of non-NA values sought. | ||
| #' @details finds the last n non-NA values in an input vector. | ||
| #' @return A vector with the last n non-NA values from input | ||
| #' vector x. | ||
| #' @importFrom assertthat assert_that is.scalar | ||
| #' @author ACS June 2019 | ||
| last_n_nonNA <- function(x, n){ | ||
|
|
||
| assert_that(is.scalar(n)) | ||
|
|
||
| if(n > length(x[!is.na(x)])){ | ||
| stop('asking for more nonNA years than you have.') | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 😆 |
||
| } | ||
|
|
||
|
|
||
| return(tail(x[!is.na(x)], n)) | ||
| } | ||
|
|
||
|
|
||
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -89,6 +89,18 @@ test_that("matches old data system output", { | |
| olddata <- COMPDATA[[oldf]] | ||
| expect_is(olddata, "data.frame", info = paste("No comparison data found for", oldf)) | ||
|
|
||
| # During the base year update development process, some outputs from the data | ||
| # system will be extended and therefore have different dimensions from the | ||
| # old comparison data. This causes the old-new test to fail. | ||
| # The extended extrapolation years will be dropped from the newdata | ||
| # so that comparison can be made to the old data. This also serves to check | ||
| # that the extrapolation procedure does not touch original data. | ||
| if(max(newdata$year) == BYU_YEAR){ # one way to check that it's a BYU without flags. | ||
| newdata %>% | ||
| filter(year <= max(HISTORICAL_YEARS)) -> | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think the The problem is that the last year of data in an input file may not actually be the I cannot think of an endogenous way to detect what the last recorded year was before extrapolation without adding additional information. I think I have an idea to incorporate it into a new FLAG with some Alternatively, if we don't think this part of the test will be kept beyond the actual base year updating process, we could also do We will lose a few years of data from the oldnew test but there's less space for user error. So, do we want to go flag or slightly less robust oldnew test?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There were only 10 or so that failed. Could we just flag them as
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One thing to keep in mind, the FAO Aquastat data are typically written out in (at best) 5-year intervals, with the data falling in years like 2008, 2012, 2017. Often there will be 20-year lapses in a time series. So, at least for that specific data source we don't want to be filtering years prior to filling in the missing data.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @pkyle Thanks! But here we're talking about the test code, not filtering before filling data. @abigailsnyder @pralitp I'm also leery of adding extra logic/steps the user is responsible for. Another possible way to address this: weaken the oldnew test so it filters |
||
| newdata | ||
| } | ||
|
|
||
| # Finally, test (NB rounding numeric columns to a sensible number of | ||
| # digits; otherwise spurious mismatches occur) | ||
| # Also first converts integer columns to numeric (otherwise test will | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.