Compute the utility of a model score on a classification data set. For each threshold of interest we compute the utility of the classification rule of taking all items with model score greater than or equal to the threshold. The user specifies the outcome (a binary classification target), a model score (numeric), and the utility values (positive, negative, or zero) of each case: true positives, false positives, true negatives, and false negatives. What is returned is a table of model thresholds and the total value of using this model score plus the given threshold as a classification rule. NA is used to mark a threshold where no rows are selected.

model_utility(
  d,
  model_name,
  outcome_name,
  ...,
  outcome_target = TRUE,
  true_positive_value_column_name = "true_positive_value",
  false_positive_value_column_name = "false_positive_value",
  true_negative_value_column_name = "true_negative_value",
  false_negative_value_column_name = "false_negative_value"
)

Arguments

d

A data.frame containing all data and outcome values.

model_name

Name of the column containing model predictions.

outcome_name

Name of the column containing the truth values.

...

Not used, forces later argument to be specified by name.

outcome_target

truth value considered to be TRUE.

true_positive_value_column_name

column name of per-row values of true positive cases. Only used on positive instances.

false_positive_value_column_name

column name of per-row values of false positive cases. Only used on negative instances.

true_negative_value_column_name

column name of per-row values of true negative cases. Only used on negative instances.

false_negative_value_column_name

column name of per-row values of false negative cases. Only used on positive instances.

Value

data.frame of all threshold values.

Details

A worked example can be found here: https://github.com/WinVector/sigr/blob/main/extras/UtilityExample.md.

Examples

d <- data.frame( predicted_probability = c(0, 0.5, 0.5, 0.5), made_purchase = c(FALSE, TRUE, FALSE, FALSE), false_positive_value = -5, # acting on any predicted positive costs $5 true_positive_value = 95, # revenue on a true positive is $100 minus action cost true_negative_value = 0.001, # true negatives have no value in our application # but just give ourselves a small reward for being right false_negative_value = -0.01 # adding a small notional tax for false negatives, # don't want our competitor getting these accounts. ) values <- model_utility(d, 'predicted_probability', 'made_purchase') best_strategy <- values[values$total_value >= max(values$total_value), ][1, ] t(best_strategy)
#> 2 #> model "predicted_probability" #> threshold "0.25" #> count_taken "3" #> fraction_taken "0.75" #> true_positive_value "95" #> false_positive_value "-10" #> true_negative_value "0.001" #> false_negative_value "0" #> total_value "85.001" #> true_negative_count "1" #> false_negative_count "0" #> true_positive_count "1" #> false_positive_count "2"
# a bigger example d <- data.frame( predicted_probability = stats::runif(100), made_purchase = sample(c(FALSE, TRUE), replace = TRUE, size = 100), false_positive_value = -5, # acting on any predicted positive costs $5 true_positive_value = 95, # revenue on a true positive is $100 minus action cost true_negative_value = 0.001, # true negatives have no value in our application # but just give ourselves a small reward for being right false_negative_value = -0.01 # adding a small notional tax for false negatives, # don't want our competitor getting these accounts. ) values <- model_utility(d, 'predicted_probability', 'made_purchase') # plot the estimated total utility as a function of threshold plot(values$threshold, values$total_value)
best_strategy <- values[values$total_value >= max(values$total_value), ][1, ] t(best_strategy)
#> 2 #> model "predicted_probability" #> threshold "0.0240117" #> count_taken "99" #> fraction_taken "0.99" #> true_positive_value "5130" #> false_positive_value "-225" #> true_negative_value "0.001" #> false_negative_value "0" #> total_value "4905.001" #> true_negative_count "1" #> false_negative_count "0" #> true_positive_count "54" #> false_positive_count "45"
# without utilities example d <- data.frame( predicted_probability = c(0, 0.5, 0.5, 0.5), made_purchase = c(FALSE, TRUE, FALSE, FALSE)) model_utility(d, 'predicted_probability', 'made_purchase')
#> model threshold count_taken fraction_taken #> 1 predicted_probability 0.00 4 1.00 #> 2 predicted_probability 0.25 3 0.75 #> 3 predicted_probability NA 0 0.00 #> true_negative_count false_negative_count true_positive_count #> 1 0 0 1 #> 2 1 0 1 #> 3 3 1 0 #> false_positive_count #> 1 3 #> 2 2 #> 3 0