This blog has relocated to https://coolbutuseless.github.ioand associated packages are now hosted at https://github.com/coolbutuseless.

29 April 2018

mikefc

Problem: I don’t like the default arguments of a function

Say I don’t like the default arguments of a particular function. What can I do about it?

For example, 99.999% of the time, I’d be happier with mean defaulting to na.rm=TRUE. Instead I get NAs when I don’t want, and I have to remember to call it and override the default na.rm value.

# I usually forget to set `na.rm` on the first call
mean(c(1, 3, 5, NA))
## [1] NA
mean(c(1, 3, 5, NA), na.rm=TRUE)
## [1] 3

Notes:

Write a wrapper function

  • Write a wrapper function that has a different default
  • Pros:
    • Easy
  • Cons:
    • Have to remember to get the arguments the same as the original call
    • Creates a new function in environment which masks the function in the package
mean.default <- function(x, trim=0, na.rm=TRUE, ...) {
  base::mean.default(x, trim, na.rm, ...)
}

mean(c(1, 3, 5, NA))
## [1] 3

Create a partial function using purrr::partial()

  • Similar to writing a wrapper function except more of the work is done automatically

  • Pros:
    • Easy
  • Cons
    • Creates a new function in environment which masks the function in the package
mean.default <- purrr::partial(base::mean.default, na.rm=TRUE)

mean(c(1, 3, 5, NA))
## [1] 3

Update the formal arguments for a function

  • Rewrite the formal arguments for the function
  • See Hadley’s Advanced R for more on functions and formal arguments.
  • Pros
    • No need to change existing R code i.e. still calling the same function name
  • Cons
    • Fiddly to implement
    • Actually creates a new function in the global environment which masks the original mean function
    • This won’t change the behaviour when function is from a package and called via :: e.g. base::mean.default()
fargs                 <- formals(mean.default)
fargs$na.rm           <- TRUE
formals(mean.default) <- fargs

mean(c(1, 3, 5, NA))
## [1] 3
base::mean.default(c(1, 3, 5, NA))
## [1] NA

Use the default package

  • Pros
    • Nicer syntax than just manually hacking at formals()
  • Cons
    • Actually creates a new function in the global environment which masks the original mean function
    • This won’t change the behaviour when function is from a package and called via :: e.g. base::mean.default()
    • This won’t let you set an argument to have no default i.e. you can’t change na.rm to have no default such that it must always be specified when the function is called
library(default)
default(mean.default) <- list(na.rm=TRUE)

mean(c(1, 3, 5, NA))
## [1] 3
base::mean.default(c(1, 3, 5, NA))
## [1] NA

Use a sledgehammer

  • I wrote a sledgehammer function to solve this problem with enough power to corrupt all the functions in R if you really wanted to.
  • Pros
    • You can do a lot of evil with this function.
    • This changes functions inside packages in-situ, e.g. base::mean.default() will be changed in place within the namespace of the package.
    • Works with locked environments
  • Cons
    • You can do a lot of evil with this function
    • Unlocking and changing base R packages is probably frowned upon.
    • Editing functions in-place within a package doesn’t always work e.g. stats::sd doesn’t get updated correctly with this method
#-----------------------------------------------------------------------------
#' Change the formal arguments of a function
#'
#' If the function is specified within a package, then update the function
#' within the namespace of the package.
#'
#' @param function_name character name of function
#' @param package_name character name of package. Set this if you want the function 
#'                     to be updated within the package namespace.  If this is unset
#'                     then the changed function is placed in the specified environment
#' @param envir environment to place the function if package_name not set
#'
#' @return TRUE otherwise should throw an error
#'
#' @importFrom rlang dots_definitions f_rhs
#-----------------------------------------------------------------------------
update_function_arguments <- function(function_name, package_name=NULL, envir=parent.frame(), ...) {

  #---------------------------------------------------------------------------
  # rlang::dots_definitions() always captures empty values whereas
  # rlang::quos() drops an empty argument if it's the last one.
  #---------------------------------------------------------------------------
  dots  <- rlang::dots_definitions(...)$dots

  #---------------------------------------------------------------------------
  # Get the named function and its formal arguments.
  # If `package_name` is defined, then get the function from within that
  # package, otherwise
  #---------------------------------------------------------------------------
  if (is.null(package_name)) {
    func <- get(function_name, envir = envir)
  } else {
    func  <- getFromNamespace(function_name, ns=package_name)
  }
  fargs <- formals(func)

  #---------------------------------------------------------------------------
  # For each named item in ..., set the formal argument
  #---------------------------------------------------------------------------
  for (i in seq(dots)) {
    argument_name <- names(dots)[i]
    fargs[argument_name] <- list(rlang::f_rhs(dots[[i]]))
  }

  #---------------------------------------------------------------------------
  # Update the function with new formal arguments
  #---------------------------------------------------------------------------
  formals(func) <- fargs

  #---------------------------------------------------------------------------
  # If no `package_name` is defined, then just assign function in .GlobalEnv.
  # Otherwise if `package_name` is set:
  #  - Get the package as an environment
  #  - Unlock the environment if it is locked
  #  - Assign the new function into the package
  #  - Re-lock the environment if it was initially locked
  #
  # This seems to mostly work. 
  #---------------------------------------------------------------------------
  if (is.null(package_name)) {
    assign(function_name, func, envir=envir)
  } else {
    package_env <- as.environment(paste0('package:', package_name))

    locked <- bindingIsLocked(function_name, package_env)
    if (locked) { unlockBinding(function_name, package_env) }

    package_env[[function_name]] <- func

    if (locked) { lockBinding(function_name, package_env) }
  }

  #---------------------------------------------------------------------------
  # Return quietly
  #---------------------------------------------------------------------------
  invisible(TRUE)
}
update_function_arguments('mean.default', 'base', na.rm=TRUE)
mean(c(1, 3, 5, NA))
## [1] 3
base::mean.default(c(1, 3, 5, NA))
## [1] 3

Removing the default value for an argument to a function

By using rlang and formals you can actually unset a default argument from a function. This means that a funciton call now must specify what was previously an assumed default.

update_function_arguments('mean.default', 'base', na.rm=)
mean(c(1, 3, 5, NA))
## Error in mean.default(c(1, 3, 5, NA)): argument "na.rm" is missing, with no default
mean(c(1, 3, 5, NA), na.rm=TRUE)
## [1] 3

Maliciously updating default arguments

Being able to change default arguments is a dangerous power to hold!

You can:

  • set defaults where there shouldn’t be
  • set the argument to a meaningless value and watch the chaos ensue
  • set the argument to an expression
  • set the argument to generate a random value

Slightly evil: Nonsensical default leads to idiotic error

update_function_arguments('mean.default', 'base', na.rm=c(1, 3, 5))
mean(c(1, 3, 5, NA))
## Warning in if (na.rm) x <- x[!is.na(x)]: the condition has length > 1 and
## only the first element will be used
## [1] 3

Slightly evil: Add default arguments to things which should have them

update_function_arguments('mean.default', 'base', x = 1:4)
mean()  # mean of nothing!
## Warning in if (na.rm) x <- x[!is.na(x)]: the condition has length > 1 and
## only the first element will be used
## [1] 2.5

Pure evil: Make a default argument dependent on another variable

update_function_arguments('mean.default', 'base', na.rm=length(x) > 3)
mean(c(1, 3, 5, NA))
## [1] 3
mean(c(1, 3, NA))
## [1] NA

Pure evil: Change a default argument to be set randomly

update_function_arguments('mean.default', 'base', na.rm=rnorm(1) > 0.5)
mean(c(1, 3, 5, NA))
## [1] NA
mean(c(1, 3, 5, NA))
## [1] 3
mean(c(1, 3, 5, NA))
## [1] NA

Pure evil: Change a default argument to give random errors

update_function_arguments('mean.default', 'base', 
                          na.rm=ifelse(rnorm(1)>0.5, stop("My hovercraft is full of eels", call.=FALSE), TRUE))
mean(c(1, 3, 5, NA))
## [1] 3
mean(c(1, 3, 5, NA))
## Error: My hovercraft is full of eels
mean(c(1, 3, 5, NA))
## [1] 3

Conclusion: With great power comes great responsibility

If you were putting together a list of things not to do in R, this would probably be on it.