Package 'mildsvm' reference manual

Title:	Multiple-Instance Learning with Support Vector Machines
Description:	Weakly supervised (WS), multiple instance (MI) data lives in numerous interesting applications such as drug discovery, object detection, and tumor prediction on whole slide images. The 'mildsvm' package provides an easy way to learn from this data by training Support Vector Machine (SVM)-based classifiers. It also contains helpful functions for building and printing multiple instance data frames. The core methods from 'mildsvm' come from the following references: Kent and Yu (2022) <arXiv:2206.14704>; Xiao, Liu, and Hao (2018) <doi:10.1109/TNNLS.2017.2766164>; Muandet et al. (2012) <https://proceedings.neurips.cc/paper/2012/file/9bf31c7ff062936a96d3c8bd1f8f2ff3-Paper.pdf>; Chu and Keerthi (2007) <doi:10.1162/neco.2007.19.3.792>; and Andrews et al. (2003) <https://papers.nips.cc/paper/2232-support-vector-machines-for-multiple-instance-learning.pdf>. Many functions use the 'Gurobi' optimization back-end to improve the optimization problem speed; the 'gurobi' R package and associated software can be downloaded from <https://www.gurobi.com> after obtaining a license.
Authors:	Sean Kent [aut, cre] , Yifei Liou [aut]
Maintainer:	Sean Kent <[email protected]>
License:	MIT + file LICENSE
Version:	0.4.0.9000
Built:	2025-03-21 02:47:24 UTC
Source:	https://github.com/skent259/mildsvm

Coerce to MI data frame

Description

as_mi_df() turns an existing object, such as a data frame, into a MI data frame, a data frame with 'mi_df'. This is in contrast with mi_df(), which builds a MI data frame from individual columns.

Usage

as_mi_df(
  x,
  bag_label = "bag_label",
  bag_name = "bag_name",
  instance_label = "instance_label",
  ...
)
as_mi_df(
  x,
  bag_label = "bag_label",
  bag_name = "bag_name",
  instance_label = "instance_label",
  ...
)

Arguments

`x`	A data-frame or similar to convert.
`bag_label`	A character (default `'bag_label'`) describing which column refers to the bag label.
`bag_name`	A character (default `'bag_name'`) describing which column refers to the bag name.
`instance_label`	A character (default `'instance_label'`) describing which column refers to the instance labels. If NULL, no instance_labels will be used.
`...`	Arguments reserved for other methods.

Value

A 'mi_df' object. This data.frame-like has columns bag_label, bag_name, and potentially others. It also inherits from the 'tbl_df' and 'tbl' classes.

Author(s)

Sean Kent

Examples

x = data.frame('bag_LABEL' = factor(c(1, 1, 0)),
               'bag_name' = c(rep('bag_1', 2), 'bag_2'),
               'X1' = c(-0.4, 0.5, 2),
               'instance_label' = c(0, 1, 0))

df <- as_mi_df(x)

x = data.frame('bag_LABEL' = factor(c(1, 1, 0)),
               'bag_name' = c(rep('bag_1', 2), 'bag_2'),
               'X1' = c(-0.4, 0.5, 2),
               'instance_label' = c(0, 1, 0))

df <- as_mi_df(x)

as_mild_df() turns an existing object, such as a data frame, into a MILD data frame, a data frame with 'mild_df'. This is in contrast with mild_df(), which builds a MILD data frame from individual columns.

Usage

as_mild_df(
  x,
  bag_label = "bag_label",
  bag_name = "bag_name",
  instance_name = "instance_name",
  instance_label = "instance_label",
  ...
)
as_mild_df(
  x,
  bag_label = "bag_label",
  bag_name = "bag_name",
  instance_name = "instance_name",
  instance_label = "instance_label",
  ...
)

Arguments

`x`	A data-frame or similar to convert.
`bag_label`	A character (default `'bag_label'`) describing which column refers to the bag label.
`bag_name`	A character (default `'bag_name'`) describing which column refers to the bag name.
`instance_name`	A character (default `'instance_name'`) describing which column refers to the instance name.
`instance_label`	A character (default `'instance_label'`) describing which column refers to the instance labels. If NULL, no instance_labels will be used.
`...`	Arguments reserved for other methods.

Value

A 'mild_df' object. This data.frame-like has columns bag_label, bag_name, instance_name, and potentially others. It also inherits from the 'tbl_df' and 'tbl' classes.

Author(s)

Sean Kent

Examples

x <- data.frame('bag_LABEL' = factor(c(1, 1, 0)),
               'bag_name' = c(rep('bag_1', 2), 'bag_2'),
               'instance_name' = c('bag_1_inst_1', 'bag_1_inst_2', 'bag_2_inst_1'),
               'X1' = c(-0.4, 0.5, 2),
               'instance_label' = c(0, 1, 0))

df <- as_mild_df(x)

x <- data.frame('bag_LABEL' = factor(c(1, 1, 0)),
               'bag_name' = c(rep('bag_1', 2), 'bag_2'),
               'instance_name' = c('bag_1_inst_1', 'bag_1_inst_2', 'bag_2_inst_1'),
               'X1' = c(-0.4, 0.5, 2),
               'instance_label' = c(0, 1, 0))

df <- as_mild_df(x)

Sample `mild_df` object by bags and instances

Description

From a mild_df object, return a sample that evenly pulls from the unique bags and unique instances from each bag as much as possible. This is a form of stratified sampling to avoid randomly sampling many rows from a few bags.

Usage

bag_instance_sampling(data, size)
bag_instance_sampling(data, size)

Arguments

`data`	A `mild_df` object containing the data.
`size`	A non-negative integer giving the number of rows to choose from `data`.

Value

A numeric vector of length size indicating which rows were sampled.

Author(s)

Sean Kent

Examples

mil_data <- generate_mild_df(positive_dist = "mvnormal",
                             nbag = 2,
                             ninst = 2,
                             nsample = 2)

rows <- bag_instance_sampling(mil_data, 6)
table(mil_data$bag_name[rows])
table(mil_data$instance_name[rows])

rows <- bag_instance_sampling(mil_data, 4)
table(mil_data$bag_name[rows])
table(mil_data$instance_name[rows])

mil_data <- generate_mild_df(positive_dist = "mvnormal",
                             nbag = 2,
                             ninst = 2,
                             nsample = 2)

rows <- bag_instance_sampling(mil_data, 6)
table(mil_data$bag_name[rows])
table(mil_data$instance_name[rows])

rows <- bag_instance_sampling(mil_data, 4)
table(mil_data$bag_name[rows])
table(mil_data$instance_name[rows])

Build a feature map on new data

Description

Feature maps provide a set of covariates in a transformed space. The build_fm() function creates these covariates based on an object that specifies the feature map and a provided dataset.

Usage

build_fm(kfm_fit, new_data, ...)

## S3 method for class 'kfm_exact'
build_fm(kfm_fit, new_data, ...)

## S3 method for class 'kfm_nystrom'
build_fm(kfm_fit, new_data, ...)
build_fm(kfm_fit, new_data, ...)

## S3 method for class 'kfm_exact'
build_fm(kfm_fit, new_data, ...)

## S3 method for class 'kfm_nystrom'
build_fm(kfm_fit, new_data, ...)

Arguments

`kfm_fit`	An object from a function in the `⁠kfm_*⁠` family, such as `kfm_nystrom()`.
`new_data`	The data to generate features from.
`...`	Additional arguments for methods.

Value

A matrix of covariates in the feature space, with the same number of rows as new_data. If new_data is a mild_df object, build_fm() will also return the columns containing 'bag_label', 'bag_name', 'instance_name'.

Methods (by class)

kfm_exact: Method for kfm_exact class.
kfm_nystrom: Method for kfm_nystrom class.

Author(s)

Sean Kent

Examples

df <- data.frame(
  X1 = c(2,   3,   4,   5,   6, 7, 8),
  X2 = c(1, 1.2, 1.3, 1.4, 1.1, 7, 1),
  X3 = rnorm(7)
)

fit <- kfm_nystrom(df, m = 7, r = 6, kernel = "radial", sigma = 0.05)
fm <- build_fm(fit, df)

fit <- kfm_exact(kernel = "polynomial", degree = 2, const = 1)
fm <- build_fm(fit, df)

df <- data.frame(
  X1 = c(2,   3,   4,   5,   6, 7, 8),
  X2 = c(1, 1.2, 1.3, 1.4, 1.1, 7, 1),
  X3 = rnorm(7)
)

fit <- kfm_nystrom(df, m = 7, r = 6, kernel = "radial", sigma = 0.05)
fm <- build_fm(fit, df)

fit <- kfm_exact(kernel = "polynomial", degree = 2, const = 1)
fm <- build_fm(fit, df)

Flatten `mild_df` data to the instance level

Description

Flatten mild_df type of data to regular multiple instance data where each instance is a vector by extracting distribution sample quantiles, mean and sd.

Usage

build_instance_feature(
  data,
  qtls = seq(0.05, 0.95, length.out = 10),
  mean = TRUE,
  sd = TRUE
)
build_instance_feature(
  data,
  qtls = seq(0.05, 0.95, length.out = 10),
  mean = TRUE,
  sd = TRUE
)

Arguments

`data`	A `mild_df` object.
`qtls`	Quantiles to be extracted from each instance empirical distribution.
`mean`	A logical for whether or not to extract mean.
`sd`	A logical for whether or not to extract standard deviation.

Value

A summarized data.frame at the instance level.

Author(s)

Yifei Liu

Examples

mild_df1 <- generate_mild_df(positive_degree = 3, nbag = 3)
df1 <- build_instance_feature(mild_df1, seq(0.05, 0.95, length.out = 10))

mild_df1 <- generate_mild_df(positive_degree = 3, nbag = 3)
df1 <- build_instance_feature(mild_df1, seq(0.05, 0.95, length.out = 10))

Classify y from bags

Description

Formally, this function applies max() on y for each level of bags.

Usage

classify_bags(y, bags, condense = TRUE)
classify_bags(y, bags, condense = TRUE)

Arguments

`y`	A numeric, character, or factor vector of bag labels for each instance. Must satisfy `length(y) == nrow(x)`. Suggest that one of the levels is 1, '1', or TRUE, which becomes the positive class; otherwise, a positive class is chosen and a message will be supplied.
`bags`	A vector specifying which instance belongs to each bag. Can be a string, numeric, of factor.
`condense`	A logical (default `TRUE`) for whether to return classification at the level of unique bags or not.

Value

a named vector of length length(unique(b)) which gives the classification for each bag. Names come from bags.

Author(s)

Sean Kent

Examples

y <- c(1, 0, 0, 1, 1, 1, 0, 0, 0)
bags <- rep(1:3, each = 3)

classify_bags(y, bags)
classify_bags(y, bags, condense = FALSE)

# works with regular vector too
scores <- 1:9
classify_bags(scores, bags)

y <- c(1, 0, 0, 1, 1, 1, 0, 0, 0)
bags <- rep(1:3, each = 3)

classify_bags(y, bags)
classify_bags(y, bags, condense = FALSE)

# works with regular vector too
scores <- 1:9
classify_bags(scores, bags)

Fit MI-SVM model to the data using cross-validation

Description

Cross-validation wrapper on the misvm() function to fit the MI-SVM model over a variety of specified cost parameters. The optimal cost parameter is chosen by the best AUC of the cross-fit models. See ?misvm for more details on the fitting function.

Usage

## Default S3 method:
cv_misvm(
  x,
  y,
  bags,
  cost_seq,
  n_fold,
  fold_id,
  method = c("heuristic", "mip", "qp-heuristic"),
  weights = TRUE,
  control = list(kernel = "linear", sigma = 1, nystrom_args = list(m = nrow(x), r =
    nrow(x), sampling = "random"), max_step = 500, type = "C-classification", scale =
    TRUE, verbose = FALSE, time_limit = 60, start = FALSE),
  ...
)

## S3 method for class 'formula'
cv_misvm(formula, data, cost_seq, n_fold, fold_id, ...)

## S3 method for class 'mi_df'
cv_misvm(x, ...)
## Default S3 method:
cv_misvm(
  x,
  y,
  bags,
  cost_seq,
  n_fold,
  fold_id,
  method = c("heuristic", "mip", "qp-heuristic"),
  weights = TRUE,
  control = list(kernel = "linear", sigma = 1, nystrom_args = list(m = nrow(x), r =
    nrow(x), sampling = "random"), max_step = 500, type = "C-classification", scale =
    TRUE, verbose = FALSE, time_limit = 60, start = FALSE),
  ...
)

## S3 method for class 'formula'
cv_misvm(formula, data, cost_seq, n_fold, fold_id, ...)

## S3 method for class 'mi_df'
cv_misvm(x, ...)

Arguments

`x`	A data.frame, matrix, or similar object of covariates, where each row represents a sample.
`y`	A numeric, character, or factor vector of bag labels for each instance. Must satisfy `length(y) == nrow(x)`. Suggest that one of the levels is 1, '1', or TRUE, which becomes the positive class; otherwise, a positive class is chosen and a message will be supplied.
`bags`	A vector specifying which instance belongs to each bag. Can be a string, numeric, of factor.
`cost_seq`	A sequence of `cost` arguments (default `2^(-2:2)`) in `misvm()`.
`n_fold`	The number of folds (default 5). If this is specified, `fold_id` need not be specified.
`fold_id`	The ids for the specific the fold for each instance. Care must be taken to ensure that ids respect the bag structure to avoid information leakage. If `n_fold` is specified, `fold_id` will be computed automatically.
`method`	The algorithm to use in fitting (default `'heuristic'`). When `method = 'heuristic'`, which employs an algorithm similar to Andrews et al. (2003). When `method = 'mip'`, the novel MIP method will be used. When `⁠method = 'qp-heuristic⁠`, the heuristic algorithm is computed using the dual SVM. See details.
`weights`	named vector, or `TRUE`, to control the weight of the cost parameter for each possible y value. Weights multiply against the cost vector. If `TRUE`, weights are calculated based on inverse counts of instances with given label, where we only count one positive instance per bag. Otherwise, names must match the levels of `y`.
`control`	list of additional parameters passed to the method that control computation with the following components: `kernel` either a character the describes the kernel ('linear' or 'radial') or a kernel matrix at the instance level. `sigma` argument needed for radial basis kernel. `nystrom_args` a list of parameters to pass to `kfm_nystrom()`. This is used when `method = 'mip'` and `kernel = 'radial'` to generate a Nystrom approximation of the kernel features. `max_step` argument used when `method = 'heuristic'`. Maximum steps of iteration for the heuristic algorithm. `type`: argument used when `method = 'heuristic'`. The `type` argument is passed to `e1071::svm()`. `scale` argument used for all methods. A logical for whether to rescale the input before fitting. `verbose` argument used when `method = 'mip'`. Whether to message output to the console. `time_limit` argument used when `method = 'mip'`. `FALSE`, or a time limit (in seconds) passed to `gurobi()` parameters. If `FALSE`, no time limit is given. `start` argument used when `method = 'mip'`. If `TRUE`, the mip program will be warm_started with the solution from `method = 'qp-heuristic'` to potentially improve speed.
`...`	Arguments passed to or from other methods.
`formula`	a formula with specification `mi(y, bags) ~ x` which uses the `mi` function to create the bag-instance structure. This argument is an alternative to the `⁠x, y, bags⁠` arguments, but requires the `data` argument. See examples.
`data`	If `formula` is provided, a data.frame or similar from which formula elements will be extracted.

Value

An object of class cv_misvm. The object contains the following components:

misvm_fit: A fit object of class misvm trained on the full data with the cross-validated choice of cost parameter. See misvm() for details.
cost_seq: the input sequence of cost arguments
cost_aucs: estimated AUC for the models trained for each cost_seq parameter. These are the average of the fold models for that cost, excluding any folds that don't have both levels of y in the validation set.
best_cost: The optimal choice of cost parameter, chosen as that which has the maximum AUC. If there are ties, this will pick the smallest cost with maximum AUC.

Methods (by class)

default: Method for data.frame-like objects
formula: Method for passing formula
mi_df: Method for mi_df objects, automatically handling bag names, labels, and all covariates.

Author(s)

Sean Kent, Yifei Liu

Examples

set.seed(8)
mil_data <- generate_mild_df(nbag = 20,
                             positive_prob = 0.15,
                             dist = rep("mvnormal", 3),
                             mean = list(rep(1, 10), rep(2, 10)),
                             sd_of_mean = rep(0.1, 3))
df <- build_instance_feature(mil_data, seq(0.05, 0.95, length.out = 10))
cost_seq <- 2^seq(-5, 7, length.out = 3)

# Heuristic method
mdl1 <- cv_misvm(x = df[, 4:123], y = df$bag_label,
                 bags = df$bag_name, cost_seq = cost_seq,
                 n_fold = 3, method = "heuristic")
mdl2 <- cv_misvm(mi(bag_label, bag_name) ~ X1_mean + X2_mean + X3_mean, data = df,
                 cost_seq = cost_seq, n_fold = 3)

if (require(gurobi)) {
  # solve using the MIP method
  mdl3 <- cv_misvm(x = df[, 4:123], y = df$bag_label,
                   bags = df$bag_name, cost_seq = cost_seq,
                   n_fold = 3, method = "mip")
}

predict(mdl1, new_data = df, type = "raw", layer = "bag")

# summarize predictions at the bag layer
suppressWarnings(library(dplyr))
df %>%
  bind_cols(predict(mdl2, df, type = "class")) %>%
  bind_cols(predict(mdl2, df, type = "raw")) %>%
  distinct(bag_name, bag_label, .pred_class, .pred)

set.seed(8)
mil_data <- generate_mild_df(nbag = 20,
                             positive_prob = 0.15,
                             dist = rep("mvnormal", 3),
                             mean = list(rep(1, 10), rep(2, 10)),
                             sd_of_mean = rep(0.1, 3))
df <- build_instance_feature(mil_data, seq(0.05, 0.95, length.out = 10))
cost_seq <- 2^seq(-5, 7, length.out = 3)

# Heuristic method
mdl1 <- cv_misvm(x = df[, 4:123], y = df$bag_label,
                 bags = df$bag_name, cost_seq = cost_seq,
                 n_fold = 3, method = "heuristic")
mdl2 <- cv_misvm(mi(bag_label, bag_name) ~ X1_mean + X2_mean + X3_mean, data = df,
                 cost_seq = cost_seq, n_fold = 3)

if (require(gurobi)) {
  # solve using the MIP method
  mdl3 <- cv_misvm(x = df[, 4:123], y = df$bag_label,
                   bags = df$bag_name, cost_seq = cost_seq,
                   n_fold = 3, method = "mip")
}

predict(mdl1, new_data = df, type = "raw", layer = "bag")

# summarize predictions at the bag layer
suppressWarnings(library(dplyr))
df %>%
  bind_cols(predict(mdl2, df, type = "class")) %>%
  bind_cols(predict(mdl2, df, type = "raw")) %>%
  distinct(bag_name, bag_label, .pred_class, .pred)

Printing multiple instance data frames

Description

Specialized print methods for the mi_df, mild_df classes. These return helpful information such as the number of rows, columns, bags, and instances (for mild_df objects).

These methods print the data frame based on the underlying subclass. This allows for additional arguments that can be passed to print.tbl() when the subclass is a tibble (tbl_df, tbl), documented below.

Usage

## S3 method for class 'mi_df'
print(x, ...)

## S3 method for class 'mild_df'
print(x, ...)
## S3 method for class 'mi_df'
print(x, ...)

## S3 method for class 'mild_df'
print(x, ...)

Arguments

`x`	Object to format or print.
`...`	Passed to other methods. See `print.tbl()` or details for more information.

Details

The following extra arguments are available when x has subclass tbl:

n: Number of rows to show. If NULL, the default, will print all rows if less than the print_max option. Otherwise, will print as many rows as specified by the print_min option.
width: Width of text output to generate. This defaults to NULL, which means use the width option.
max_extra_cols: Number of extra columns to print abbreviated information for, if the width is too small for the entire tibble. If NULL, the max_extra_cols option is used. The previously defined n_extra argument is soft-deprecated.
max_footer_lines: Maximum number of footer lines. If NULL, the max_footer_lines option is used.

Value

The object passed in x, invisibly. Primarily called to print the object to the console.

Examples

data("ordmvnorm")
print(as_mi_df(ordmvnorm, instance_label = "inst_label"))

print(as_mi_df(ordmvnorm, instance_label = "inst_label"), n = 2)

data("ordmvnorm")
print(as_mi_df(ordmvnorm, instance_label = "inst_label"))

print(as_mi_df(ordmvnorm, instance_label = "inst_label"), n = 2)

Generate mild_df using multivariate t and normal distributions.

Description

This function samples multiple instance distributional data (a mild_df object) where each row corresponds to a sample from a given instance distribution. Instance distributions can be multivariate t and normal, with mean and variance parameters that can be fixed or sampled based on prior parameters. These instances are grouped into bags and the bag labels follow the standard MI assumption.

Usage

generate_mild_df(
  nbag = 50,
  ninst = 4,
  nsample = 50,
  ncov = 10,
  nimp_pos = 1:ncov,
  nimp_neg = 1:ncov,
  positive_prob = 0.2,
  dist = c("mvt", "mvnormal", "mvnormal"),
  mean = list(rep(0, length(nimp_pos)), rep(0, length(nimp_neg)), 0),
  sd_of_mean = c(0.5, 0.5, 0.5),
  cov = list(diag(1, nrow = length(nimp_pos)), diag(1, nrow = length(nimp_neg)), 1),
  sample_cov = FALSE,
  df_wishart_cov = c(length(nimp_pos), length(nimp_neg), ncov - length(nimp_pos)),
  degree = c(3, NA, NA),
  positive_bag_prob = NULL,
  n_noise_inst = NULL,
  ...
)
generate_mild_df(
  nbag = 50,
  ninst = 4,
  nsample = 50,
  ncov = 10,
  nimp_pos = 1:ncov,
  nimp_neg = 1:ncov,
  positive_prob = 0.2,
  dist = c("mvt", "mvnormal", "mvnormal"),
  mean = list(rep(0, length(nimp_pos)), rep(0, length(nimp_neg)), 0),
  sd_of_mean = c(0.5, 0.5, 0.5),
  cov = list(diag(1, nrow = length(nimp_pos)), diag(1, nrow = length(nimp_neg)), 1),
  sample_cov = FALSE,
  df_wishart_cov = c(length(nimp_pos), length(nimp_neg), ncov - length(nimp_pos)),
  degree = c(3, NA, NA),
  positive_bag_prob = NULL,
  n_noise_inst = NULL,
  ...
)

Arguments

`nbag`	The number of bags (default 50).
`ninst`	The number of instances for each bag (default 4).
`nsample`	The number of samples for each instance (default 50).
`ncov`	The number of total covariates (default 10).
`nimp_pos`	An index of important covariates for positve instances (default `1:ncov`).
`nimp_neg`	An index of important covariates for negative instances (default `1:ncov`). (default `1:ncov`).
`positive_prob`	A numeric value between 0 and 1 indicating the probability of an instance being positive (default 0.2).
`dist`	A vector (length 3) of distributions for the positive, negative, and remaining instances, respectively. Distributions can be one of `'mvnormal'` for multivariate normal or `'mvt'` for multivariate student's t.
`mean`	A list (length 3) of mean vectors for the positive, negative, and remaining distributions. `mean[[1]]` should match `nimp_pos` in length; `mean[[2]]` should match `nimp_neg` in length.
`sd_of_mean`	A vector (length 3) of standard deviations in sampling the mean for positive, negative, and remaining distributions, where the prior is given by `mean`. Use `sd_of_mean = c(0, 0, 0)` to keep the mean consistent across all instances.
`cov`	A list (length 3) of covariance matrices for the positive, negative, and remaining distributions. `cov[[3]]` should be an integer since the dimension of remaining features can vary depending on if the important distribution is positive or negative.
`sample_cov`	A logical value for whether to sample the covariance for each distribution. If `FALSE` (the default), each covariance is fixed at `cov`. If `TRUE`, the prior is given by `cov` and sampled from a Wishart distribution with `df_wishart_cov` degrees of freedom to have an expectation of `cov`.
`df_wishart_cov`	A vector (length 3) of degrees-of-freedom to use in the Wishart covariance matrix sampling.
`degree`	A vector (length 3) of degrees-of-freedom used when any of `dist` is `'mvt'`. This parameter is ignored when `dist[i] == 'mvnormal'`, in which case `NA` can be specified.
`positive_bag_prob`	A numeric value between 0 and 1 indicating the probability of a bag being positive. Must be specified jointly with `n_noise_inst`, in which case `positive_prob` is ignored. If `NULL` (the default), instance labels are sampled first according to `positive_prob`.
`n_noise_inst`	An integer indicating the number of negative instances in a positive bag. Must be specified jointly with `positive_bag_prob`. `n_noise_inst` should be less than `ninst`.
`...`	Arguments passed to or from other methods.

Details

The first consideration to use this function is to determine the number of bags, instances per bag, and samples per instance using the nbag, ninst, and nsample arguments. Next, one must consider the number of covariates ncov, and how those covariates will differ between instances with positive and negative labels. Some covariates can be common between the positive and negative instances, which we call the remainder distribution. Use nimp_pos and nimp_neg to specify the index of the important (non-remainder) covariates in the distributions with positive and negative instance labels.

The structure of how many instances/bags are positive and negative is determined by positive_prob or the joint specification of positive_bag_prob and n_noise_inst. In the first case, instances labels have independent Bernoulli draws based on positive_prob and bag labels are determined by the standard MI assumption (i.e. positive if any instance in the bag is positive). In the second case, bag labels are drawn independently as Bernoilli with positive_bag_prob chance of success. Each positive bag will be given n_noise_inst values with instance label of 0, and the remaining with instance label of 1.

The remaining arguments are used to determine the distributions used for the positive, negative, and remaining features. Each argument will be a vector of list of length 3 corresponding to these 3 different groups. To create different distributions, the strategy is to first draw the mean parameter from Normal(mean, sd_of_mean * I) and the covariance parameter from Wishart(df_wishart_cov, cov), with expectation equal to cov. Then we can sample i.i.d. draws from the specified distribution (either multivariate normal or student's t). To ensure that each instance distribution has the same mean, set sd_of_mean to 0. To ensure that each instance distribution has the same covariance, set sample_cov = FALSE.

The final data.frame will have nsample * nbag * ninst rows and ncov + 3 columns including the bag_label, bag_name, instance_name, and ncov sampled covariates.

Value

A mild_df object.

Author(s)

Yifei Liu, Sean Kent

Examples

set.seed(8)
mild_data <- generate_mild_df(nbag = 7, ninst = 3, nsample = 20,
                              ncov = 2,
                              nimp_pos = 1,
                              dist = rep("mvnormal", 3),
                              mean = list(
                                rep(5, 1),
                                rep(15, 2),
                                0
                              ))

library(dplyr)
distinct(mild_data, bag_label, bag_name, instance_name)
split(mild_data[, 4:5], mild_data$instance_name) %>%
  sapply(colMeans) %>%
  round(2) %>%
  t()
set.seed(8)
mild_data <- generate_mild_df(nbag = 7, ninst = 3, nsample = 20,
                              ncov = 2,
                              nimp_pos = 1,
                              dist = rep("mvnormal", 3),
                              mean = list(
                                rep(5, 1),
                                rep(15, 2),
                                0
                              ))

library(dplyr)
distinct(mild_data, bag_label, bag_name, instance_name)
split(mild_data[, 4:5], mild_data$instance_name) %>%
  sapply(colMeans) %>%
  round(2) %>%
  t()

Create an exact kernel feature map

Description

For some kernels, it is possible to create the exact features from given data. This function stores the information needed to build those exact features.

Usage

kfm_exact(kernel = "polynomial", degree = 2, const = 1)
kfm_exact(kernel = "polynomial", degree = 2, const = 1)

Arguments

`kernel`	A character determining the kernel to use. Currently, only `'radial'` is implemented.
`degree`	A numeric value (default 2) that provides the degree for `kernel` = 'polynomial'
`const`	A numeric value (default 1) for the constant term when `kernel = 'polynomial'`.

Details

Currently, the following kernels are supported:

'polynomial', with degree = d and const = c

Value

An object of class kfm_exact with the following components, returned from the inputs:

kernel
degree
const

Author(s)

Sean Kent

Examples

df <- data.frame(
  X1 = c(2,   3,   4,   5,   6, 7, 8),
  X2 = c(1, 1.2, 1.3, 1.4, 1.1, 7, 1),
  X3 = rnorm(7)
)

fit <- kfm_exact(kernel = "polynomial", degree = 2, const = 1)
fm <- build_fm(fit, df)

df <- data.frame(
  X1 = c(2,   3,   4,   5,   6, 7, 8),
  X2 = c(1, 1.2, 1.3, 1.4, 1.1, 7, 1),
  X3 = rnorm(7)
)

fit <- kfm_exact(kernel = "polynomial", degree = 2, const = 1)
fm <- build_fm(fit, df)

Fit a Nyström kernel feature map approximation

Description

Use the Nyström method to fit a feature map that approximates a given kernel.

Usage

kfm_nystrom(df, m, r, kernel, sampling, ...)

## Default S3 method:
kfm_nystrom(
  df,
  m = nrow(df),
  r = m,
  kernel = "radial",
  sampling = "random",
  ...
)

## S3 method for class 'mild_df'
kfm_nystrom(
  df,
  m = nrow(df),
  r = m,
  kernel = "radial",
  sampling = "random",
  ...
)
kfm_nystrom(df, m, r, kernel, sampling, ...)

## Default S3 method:
kfm_nystrom(
  df,
  m = nrow(df),
  r = m,
  kernel = "radial",
  sampling = "random",
  ...
)

## S3 method for class 'mild_df'
kfm_nystrom(
  df,
  m = nrow(df),
  r = m,
  kernel = "radial",
  sampling = "random",
  ...
)

Arguments

`df`	An object containing covariates for training. Usually a data.frame or matrix.
`m`	The number of examples from `df` to sample in fitting.
`r`	The rank of matrix approximation to use. Must be less than or equal to `m`, the default.
`kernel`	A character determining the kernel to use. Currently, only `'radial'` is implemented.
`sampling`	A character determining how to sample instances. Default is `'random'`. For `kfm_nystrom.mild_df()`, one can specify `sampling = 'stratified'` to ensure that samples are chosen evenly from bags and instances. `sampling` can also be a numeric vector of length `m` of pre-determined samples.
`...`	additional parameters needed for the kernels. See details.

Details

For the ... argument, the additional parameters depend on which kernel is used:

For kernel = 'radial', specify sigma to define kernel bandwidth.

Value

an object of class kfm_nystrom with the following components:

df_sub the sub-sampled version of df
dv pre-multiplication matrix which contains information on the eigenvalues and eigenvectors of df_sub
method 'nystrom'
kernel the input parameter kernel
kernel_params parameters passed to ...

Methods (by class)

default: For use on objects of class data.frame or matrix.
mild_df: Ignore the information columns 'bag_label', 'bag_name', and 'instance_name' when calculating kernel approximation.

Author(s)

Sean Kent

References

Williams, C., & Seeger, M. (2001). Using the Nyström Method to Speed Up Kernel Machines. Advances in Neural Information Processing Systems, 13, 682–688.

Kent, S., & Yu, M. (2022). Non-convex SVM for cancer diagnosis based on morphologic features of tumor microenvironment arXiv preprint arXiv:2206.14704

Examples

df <- data.frame(
  X1 = c(2,   3,   4,   5,   6, 7, 8),
  X2 = c(1, 1.2, 1.3, 1.4, 1.1, 7, 1),
  X3 = rnorm(7)
)

fit <- kfm_nystrom(df, m = 7, r = 6, kernel = "radial", sigma = 0.05)
fm <- build_fm(fit, df)

df <- data.frame(
  X1 = c(2,   3,   4,   5,   6, 7, 8),
  X2 = c(1, 1.2, 1.3, 1.4, 1.1, 7, 1),
  X3 = rnorm(7)
)

fit <- kfm_nystrom(df, m = 7, r = 6, kernel = "radial", sigma = 0.05)
fm <- build_fm(fit, df)

Calculate the kernel mean embedding matrix

Description

Function to calculate the kernel mean embedding for to distributional data sets. It uses the empirical approximation for the integral

$\int_{\mathcal X} \int_{\mathcal Y} K(x, y) d P_X d Q_Y$

for a given kernel $K(\cdot, \cdot)$ . Currently only supports radial basis function kernel for fast computation.

Usage

## Default S3 method:
kme(df, df2 = NULL, sigma = 0.05, ...)

## S3 method for class 'mild_df'
kme(df, df2 = NULL, sigma = 0.05, ...)
## Default S3 method:
kme(df, df2 = NULL, sigma = 0.05, ...)

## S3 method for class 'mild_df'
kme(df, df2 = NULL, sigma = 0.05, ...)

Arguments

`df`	A data.frame of `mild_df` object, must have column `'instance_name'` which defines the instances.
`df2`	A data.frame, `mild_df` object, or `NULL` (default `NULL`).
`sigma`	The parameter for `'radial'` kernel (default `0.05`).
`...`	Additional arguments passed to methods.

Details

If df2 = NULL, calculate the kernel mean embedding matrix of (df, df) otherwise calculate (df, df2)

Value

A matrix of kernel mean embedding at the instance level.

Methods (by class)

default: Default S3 method
mild_df: S3 method for class mild_df

Author(s)

Yifei Liu, Sean Kent

Examples

x = data.frame('instance_name' = c('inst_1', 'inst_2', 'inst_1'),
               'X1' = c(-0.4, 0.5, 2))
kme(x)

mild_df1 <- generate_mild_df(nbag = 10, positive_degree = 3)
kme(mild_df1)

x = data.frame('instance_name' = c('inst_1', 'inst_2', 'inst_1'),
               'X1' = c(-0.4, 0.5, 2))
kme(x)

mild_df1 <- generate_mild_df(nbag = 10, positive_degree = 3)
kme(mild_df1)

Create an `mi` object

Description

Create an mi object, usually used as a response variable in a model formula.

Usage

mi(bag_label, bag_name)
mi(bag_label, bag_name)

Arguments

`bag_label`	The bag label or response, recorded as 0 = negative, 1 = positive.
`bag_name`	A unique bag identifier for each instance.

Value

An object of class mi. Currently, no methods are implemented for this.

Author(s)

Sean Kent

Examples

mil_data <- generate_mild_df(positive_degree = 3, nbag = 10)
with(mil_data, head(mi(bag_label, bag_name)))
df <- get_all_vars(mi(bag_label, bag_name) ~ X1 + X2, data = mil_data)
head(df)

mil_data <- generate_mild_df(positive_degree = 3, nbag = 10)
with(mil_data, head(mi(bag_label, bag_name)))
df <- get_all_vars(mi(bag_label, bag_name) ~ X1 + X2, data = mil_data)
head(df)

Build a multiple instance (MI) data frame

Description

mi_df() constructs a data frame that corresponds to Multiple Instance (MI) data. A mi_df object must have two special columns:

bag_label, determines the label of each bag, typically from c(0, 1)
bag_name, character or factor that specifies the bag that each sample belongs to.

Usage

mi_df(
  bag_label = character(),
  bag_name = character(),
  ...,
  instance_label = NULL
)
mi_df(
  bag_label = character(),
  bag_name = character(),
  ...,
  instance_label = NULL
)

Arguments

`bag_label`	A `character`, `factor`, or `numeric` vector.
`bag_name`	A `character` or `factor` vector.
`...`	A set of name-value pairs. These construct the covariates for a `mi_df`.
`instance_label`	A `character`, `factor`, or `numeric` vector, or `NULL`.

Details

We refer to the rows of a mi_df as instances. Each instance is contained in a bag, with a corresponding label. Bags will typically have several instances within them. Instance labels can be provided, but they will be pulled in as an attribute.

Value

A 'mi_df' object. This data.frame-like has columns bag_label, bag_name, and those specified in .... It also inherits from the 'tbl_df' and 'tbl' classes.

Author(s)

Sean Kent

Examples

mi_df('bag_label' = factor(c(1, 1, 0)),
      'bag_name' = c(rep('bag_1', 2), 'bag_2'),
      'X1' = c(-0.4, 0.5, 2),
      'instance_label' = c(0, 1, 0))

mi_df('bag_label' = factor(c(1, 1, 0)),
      'bag_name' = c(rep('bag_1', 2), 'bag_2'),
      'X1' = c(-0.4, 0.5, 2),
      'instance_label' = c(0, 1, 0))

Create a mild object

Description

Create a mild object, usually used as a response variable in a model formula.

Usage

mild(bag_label, bag_name, instance_name)
mild(bag_label, bag_name, instance_name)

Arguments

`bag_label`	The bag label or response, recorded as 0 = negative, 1 = positive.
`bag_name`	A unique bag identifier for each instance.
`instance_name`	A unique instance identifier for each sample.

Value

An object of class mild. Currently, no methods are implemented for this.

Author(s)

Sean Kent

Examples


mil_data <- generate_mild_df(positive_degree = 3, nbag = 10)
with(mil_data, head(mild(bag_label, bag_name, instance_name)))
df <- get_all_vars(mild(bag_label, bag_name) ~ X1 + X2, data = mil_data)
head(df)

mil_data <- generate_mild_df(positive_degree = 3, nbag = 10)
with(mil_data, head(mild(bag_label, bag_name, instance_name)))
df <- get_all_vars(mild(bag_label, bag_name) ~ X1 + X2, data = mil_data)
head(df)

Build a MILD data frame

Description

mild_df() constructs a data frame that corresponds to Multiple Instance Learning with Distributional Instances (MILD) data. A mild_df object must have three special columns:

bag_label, determines the label of each bag, typically from c(0, 1)
bag_name, character or factor that specifies the bag that each sample belongs to.
instance_name, character or factor that specifies the instance that each sample belongs to.

Usage

mild_df(
  bag_label = character(),
  bag_name = character(),
  instance_name = character(),
  ...,
  instance_label = NULL
)
mild_df(
  bag_label = character(),
  bag_name = character(),
  instance_name = character(),
  ...,
  instance_label = NULL
)

Arguments

`bag_label`	A `character`, `factor`, or `numeric` vector.
`bag_name`	A `character` or `factor` vector.
`instance_name`	A `character` or `factor` vector.
`...`	A set of name-value pairs. These construct the covariates for a `mild_df`.
`instance_label`	A `character`, `factor`, or `numeric` vector, or `NULL`.

Details

We refer to the rows of a mild_df as samples, since they are thought of as draws from the distribution that determines each instance. Each instance is contained in a bag, with a corresponding label. Instance labels can be provided, but they will be pulled in as an attribute.

Value

A 'mild_df' object. This data.frame-like has columns bag_label, bag_name, instance_name, and those specified in .... It also inherits from the 'tbl_df' and 'tbl' classes.

Author(s)

Yifei Liu, Sean Kent

Examples

mild_df('bag_label' = factor(c(1, 1, 0)),
        'bag_name' = c(rep('bag_1', 2), 'bag_2'),
        'instance_name' = c('bag_1_inst_1', 'bag_1_inst_2', 'bag_2_inst_1'),
        'X1' = c(-0.4, 0.5, 2),
        'instance_label' = c(0, 1, 0))

mild_df('bag_label' = factor(c(1, 1, 0)),
        'bag_name' = c(rep('bag_1', 2), 'bag_2'),
        'instance_name' = c('bag_1_inst_1', 'bag_1_inst_2', 'bag_2_inst_1'),
        'X1' = c(-0.4, 0.5, 2),
        'instance_label' = c(0, 1, 0))

Fit MIOR model to the data

Description

This function fits the MIOR model, proposed by Xiao Y, Liu B, and Hao Z (2018) in "Multiple-instance Ordinal Regression". MIOR is a modified SVM framework with parallel, ordered hyperplanes where the error terms are based only on the instance closest to a midpoint between hyperplanes.

Usage

## Default S3 method:
mior(
  x,
  y,
  bags,
  cost = 1,
  cost_eta = 1,
  method = "qp-heuristic",
  weights = NULL,
  control = list(kernel = "linear", sigma = if (is.vector(x)) 1 else 1/ncol(x),
    max_step = 500, scale = TRUE, verbose = FALSE, time_limit = 60, option =
    c("corrected", "xiao")),
  ...
)

## S3 method for class 'formula'
mior(formula, data, ...)

## S3 method for class 'mi_df'
mior(x, ...)
## Default S3 method:
mior(
  x,
  y,
  bags,
  cost = 1,
  cost_eta = 1,
  method = "qp-heuristic",
  weights = NULL,
  control = list(kernel = "linear", sigma = if (is.vector(x)) 1 else 1/ncol(x),
    max_step = 500, scale = TRUE, verbose = FALSE, time_limit = 60, option =
    c("corrected", "xiao")),
  ...
)

## S3 method for class 'formula'
mior(formula, data, ...)

## S3 method for class 'mi_df'
mior(x, ...)

Arguments

`x`	A data.frame, matrix, or similar object of covariates, where each row represents an instance. If a `mi_df` object is passed, `⁠y, bags⁠` are automatically extracted, and all other columns will be used as predictors.
`y`	A numeric, character, or factor vector of bag labels for each instance. Must satisfy `length(y) == nrow(x)`. Suggest that one of the levels is 1, '1', or TRUE, which becomes the positive class; otherwise, a positive class is chosen and a message will be supplied.
`bags`	A vector specifying which instance belongs to each bag. Can be a string, numeric, of factor.
`cost`	The cost parameter in SVM. If `method = 'heuristic'`, this will be fed to `kernlab::ksvm()`, otherwise it is similarly in internal functions.
`cost_eta`	The additional cost parameter in MIOR which controls how far away the first and last separating hyperplanes are relative to other costs.
`method`	The algorithm to use in fitting (default `'heuristic'`). When `method = 'heuristic'`, which employs an algorithm similar to Andrews et al. (2003). When `method = 'mip'`, the novel MIP method will be used. When `⁠method = 'qp-heuristic⁠`, the heuristic algorithm is computed using the dual SVM. See details.
`weights`	named vector, or `TRUE`, to control the weight of the cost parameter for each possible y value. Weights multiply against the cost vector. If `TRUE`, weights are calculated based on inverse counts of instances with given label, where we only count one positive instance per bag. Otherwise, names must match the levels of `y`.
`control`	list of additional parameters passed to the method that control computation with the following components: `kernel` either a character the describes the kernel ('linear' or 'radial') or a kernel matrix at the instance level. `sigma` argument needed for radial basis kernel. `max_step` argument used when `method = 'heuristic'`. Maximum steps of iteration for the heuristic algorithm. `scale` argument used for all methods. A logical for whether to rescale the input before fitting. `verbose` argument used when `method = 'mip'`. Whether to message output to the console. `time_limit` argument used when `method = 'mip'`. `FALSE`, or a time limit (in seconds) passed to `gurobi()` parameters. If `FALSE`, no time limit is given. `option` argument the controls the constraint calculation. See details.
`...`	Arguments passed to or from other methods.
`formula`	a formula with specification `mi(y, bags) ~ x` which uses the `mi` function to create the bag-instance structure. This argument is an alternative to the `⁠x, y, bags⁠` arguments, but requires the `data` argument. See examples.
`data`	If `formula` is provided, a data.frame or similar from which formula elements will be extracted

Details

Predictions (see predict.mior()) are determined by considering the smallest distance from each point to the midpoint hyperplanes across all instances in the bag. The prediction corresponds to the hyperplane having such a minimal distance.

It appears as though an error in Equation (12) persists to the dual form in (21). A corrected version of this dual formulation can be used with control$option = 'corrected', or the formulation as written can be used with control$option = 'xiao'.

Value

An object of class mior The object contains at least the following components:

gurobi_fit: A fit from model optimization that includes relevant components.
call_type: A character indicating which method misvm() was called with.
features: The names of features used in training.
levels: The levels of y that are recorded for future prediction.
cost: The cost parameter from function inputs.
weights: The calculated weights on the cost parameter.
repr_inst: The instances from positive bags that are selected to be most representative of the positive instances.
n_step: If method %in% c('heuristic', 'qp-heuristic'), the total steps used in the heuristic algorithm.
x_scale: If scale = TRUE, the scaling parameters for new predictions.

Methods (by class)

default: Method for data.frame-like objects
formula: Method for passing formula
mi_df: Method for mi_df objects, automatically handling bag names, labels, and all covariates.

Author(s)

Sean Kent

References

Xiao, Y., Liu, B., & Hao, Z. (2017). Multiple-instance ordinal regression. IEEE Transactions on Neural Networks and Learning Systems, 29(9), 4398-4413. doi:10.1109/TNNLS.2017.2766164

Examples

if (require(gurobi)) {
  set.seed(8)
  # make some data
  n <- 15
  X <- rbind(
    mvtnorm::rmvnorm(n/3, mean = c(4, -2, 0)),
    mvtnorm::rmvnorm(n/3, mean = c(0, 0, 0)),
    mvtnorm::rmvnorm(n/3, mean = c(-2, 1, 0))
  )
  score <- X %*% c(2, -1, 0)
  y <- as.numeric(cut(score, c(-Inf, quantile(score, probs = 1:2 / 3), Inf)))
  bags <- 1:length(y)

  # add in points outside boundaries
  X <- rbind(
    X,
    mvtnorm::rmvnorm(n, mean = c(6, -3, 0)),
    mvtnorm::rmvnorm(n, mean = c(-6, 3, 0))
  )
  y <- c(y, rep(-1, 2*n))
  bags <- rep(bags, 3)
  repr <- c(rep(1, n), rep(0, 2*n))

  y_bag <- classify_bags(y, bags, condense = FALSE)

  mdl1 <- mior(X, y_bag, bags)
  predict(mdl1, X, new_bags = bags)
}

if (require(gurobi)) {
  set.seed(8)
  # make some data
  n <- 15
  X <- rbind(
    mvtnorm::rmvnorm(n/3, mean = c(4, -2, 0)),
    mvtnorm::rmvnorm(n/3, mean = c(0, 0, 0)),
    mvtnorm::rmvnorm(n/3, mean = c(-2, 1, 0))
  )
  score <- X %*% c(2, -1, 0)
  y <- as.numeric(cut(score, c(-Inf, quantile(score, probs = 1:2 / 3), Inf)))
  bags <- 1:length(y)

  # add in points outside boundaries
  X <- rbind(
    X,
    mvtnorm::rmvnorm(n, mean = c(6, -3, 0)),
    mvtnorm::rmvnorm(n, mean = c(-6, 3, 0))
  )
  y <- c(y, rep(-1, 2*n))
  bags <- rep(bags, 3)
  repr <- c(rep(1, n), rep(0, 2*n))

  y_bag <- classify_bags(y, bags, condense = FALSE)

  mdl1 <- mior(X, y_bag, bags)
  predict(mdl1, X, new_bags = bags)
}

Fit MILD-SVM model to the data

Description

This function fits the MILD-SVM model, which takes a multiple-instance learning with distributions (MILD) data set and fits a modified SVM to it. The MILD-SVM methodology is based on research in progress.

Usage

## Default S3 method:
mismm(
  x,
  y,
  bags,
  instances,
  cost = 1,
  method = c("heuristic", "mip", "qp-heuristic"),
  weights = TRUE,
  control = list(kernel = "radial", sigma = if (is.vector(x)) 1 else 1/ncol(x),
    nystrom_args = list(m = nrow(x), r = nrow(x), sampling = "random"), max_step = 500,
    scale = TRUE, verbose = FALSE, time_limit = 60, start = FALSE),
  ...
)

## S3 method for class 'formula'
mismm(formula, data, ...)

## S3 method for class 'mild_df'
mismm(x, ...)
## Default S3 method:
mismm(
  x,
  y,
  bags,
  instances,
  cost = 1,
  method = c("heuristic", "mip", "qp-heuristic"),
  weights = TRUE,
  control = list(kernel = "radial", sigma = if (is.vector(x)) 1 else 1/ncol(x),
    nystrom_args = list(m = nrow(x), r = nrow(x), sampling = "random"), max_step = 500,
    scale = TRUE, verbose = FALSE, time_limit = 60, start = FALSE),
  ...
)

## S3 method for class 'formula'
mismm(formula, data, ...)

## S3 method for class 'mild_df'
mismm(x, ...)

Arguments

`x`	A data.frame, matrix, or similar object of covariates, where each row represents a sample. If a `mild_df` object is passed, `⁠y, bags, instances⁠` are automatically extracted, and all other columns will be used as predictors.
`y`	A numeric, character, or factor vector of bag labels for each instance. Must satisfy `length(y) == nrow(x)`. Suggest that one of the levels is 1, '1', or TRUE, which becomes the positive class; otherwise, a positive class is chosen and a message will be supplied.
`bags`	A vector specifying which instance belongs to each bag. Can be a string, numeric, of factor.
`instances`	A vector specifying which samples belong to each instance. Can be a string, numeric, of factor.
`cost`	The cost parameter in SVM. If `method = 'heuristic'`, this will be fed to `kernlab::ksvm()`, otherwise it is similarly in internal functions.
`method`	The algorithm to use in fitting (default `'heuristic'`). When `method = 'heuristic'`, the algorithm iterates between selecting positive witnesses and solving an underlying `smm()` problem. When `method = 'mip'`, the novel MIP method will be used. When `method = 'qp-heuristic'`, the heuristic algorithm is computed using a slightly modified dual SMM. See details
`weights`	named vector, or `TRUE`, to control the weight of the cost parameter for each possible y value. Weights multiply against the cost vector. If `TRUE`, weights are calculated based on inverse counts of instances with given label, where we only count one positive instance per bag. Otherwise, names must match the levels of `y`.
`control`	list of additional parameters passed to the method that control computation with the following components: `kernel` either a character the describes the kernel ('linear' or 'radial') or a kernel matrix at the instance level. `sigma` argument needed for radial basis kernel. `nystrom_args` a list of parameters to pass to `kfm_nystrom()`. This is used when `method = 'mip'` and `kernel = 'radial'` to generate a Nystrom approximation of the kernel features. `max_step` argument used when `method = 'heuristic'`. Maximum steps of iteration for the heuristic algorithm. `scale` argument used for all methods. A logical for whether to rescale the input before fitting. `verbose` argument used when `method = 'mip'`. Whether to message output to the console. `time_limit` argument used when `method = 'mip'`. `FALSE`, or a time limit (in seconds) passed to `gurobi()` parameters. If `FALSE`, no time limit is given. `start` argument used when `method = 'mip'`. If `TRUE`, the mip program will be warm_started with the solution from `method = 'qp-heuristic'` to potentially improve speed.
`...`	Arguments passed to or from other methods.
`formula`	A formula with specification `mild(y, bags, instances) ~ x` which uses the `mild` function to create the bag-instance structure. This argument is an alternative to the `⁠x, y, bags, instances ⁠` arguments, but requires the `data` argument. See examples.
`data`	If `formula` is provided, a data.frame or similar from which formula elements will be extracted.

Details

Several choices of fitting algorithm are available, including a version of the heuristic algorithm proposed by Andrews et al. (2003) and a novel algorithm that explicitly solves the mixed-integer programming (MIP) problem using the gurobi package optimization back-end.

Value

An object of class mismm The object contains at least the following components:

⁠*_fit⁠: A fit object depending on the method parameter. If method = 'heuristic', this will be a ksvm fit from the kernlab package. If method = 'mip' this will be gurobi_fit from a model optimization.
call_type: A character indicating which method misvm() was called with.
x: The training data needed for computing the kernel matrix in prediction.
features: The names of features used in training.
levels: The levels of y that are recorded for future prediction.
cost: The cost parameter from function inputs.
weights: The calculated weights on the cost parameter.
sigma: The radial basis function kernel parameter.
repr_inst: The instances from positive bags that are selected to be most representative of the positive instances.
n_step: If method %in% c('heuristic', 'qp-heuristic'), the total steps used in the heuristic algorithm.
useful_inst_idx: The instances that were selected to represent the bags in the heuristic fitting.
inst_order: A character vector that is used to modify the ordering of input data.
x_scale: If scale = TRUE, the scaling parameters for new predictions.

Methods (by class)

default: Method for data.frame-like objects
formula: Method for passing formula
mild_df: Method for mild_df objects

Author(s)

Sean Kent, Yifei Liu

References

Kent, S., & Yu, M. (2022). Non-convex SVM for cancer diagnosis based on morphologic features of tumor microenvironment arXiv preprint arXiv:2206.14704

Examples

set.seed(8)
mil_data <- generate_mild_df(nbag = 15, nsample = 20, positive_prob = 0.15,
                             sd_of_mean = rep(0.1, 3))

# Heuristic method
mdl1 <- mismm(mil_data)
mdl2 <- mismm(mild(bag_label, bag_name, instance_name) ~ X1 + X2 + X3, data = mil_data)

# MIP method
if (require(gurobi)) {
  mdl3 <- mismm(mil_data, method = "mip", control = list(nystrom_args = list(m = 10, r = 10)))
  predict(mdl3, mil_data)
}

predict(mdl1, new_data = mil_data, type = "raw", layer = "bag")

# summarize predictions at the bag layer
library(dplyr)
mil_data %>%
  bind_cols(predict(mdl2, mil_data, type = "class")) %>%
  bind_cols(predict(mdl2, mil_data, type = "raw")) %>%
  distinct(bag_name, bag_label, .pred_class, .pred)


set.seed(8)
mil_data <- generate_mild_df(nbag = 15, nsample = 20, positive_prob = 0.15,
                             sd_of_mean = rep(0.1, 3))

# Heuristic method
mdl1 <- mismm(mil_data)
mdl2 <- mismm(mild(bag_label, bag_name, instance_name) ~ X1 + X2 + X3, data = mil_data)

# MIP method
if (require(gurobi)) {
  mdl3 <- mismm(mil_data, method = "mip", control = list(nystrom_args = list(m = 10, r = 10)))
  predict(mdl3, mil_data)
}

predict(mdl1, new_data = mil_data, type = "raw", layer = "bag")

# summarize predictions at the bag layer
library(dplyr)
mil_data %>%
  bind_cols(predict(mdl2, mil_data, type = "class")) %>%
  bind_cols(predict(mdl2, mil_data, type = "raw")) %>%
  distinct(bag_name, bag_label, .pred_class, .pred)

Fit MI-SVM model to the data

Description

This function fits the MI-SVM model, first proposed by Andrews et al. (2003). It is a variation on the traditional SVM framework that carefully treats data from the multiple instance learning paradigm, where instances are grouped into bags, and a label is only available for each bag.

Usage

## Default S3 method:
misvm(
  x,
  y,
  bags,
  cost = 1,
  method = c("heuristic", "mip", "qp-heuristic"),
  weights = TRUE,
  control = list(kernel = "linear", sigma = if (is.vector(x)) 1 else 1/ncol(x),
    nystrom_args = list(m = nrow(x), r = nrow(x), sampling = "random"), max_step = 500,
    type = "C-classification", scale = TRUE, verbose = FALSE, time_limit = 60, start =
    FALSE),
  ...
)

## S3 method for class 'formula'
misvm(formula, data, ...)

## S3 method for class 'mi_df'
misvm(x, ...)

## S3 method for class 'mild_df'
misvm(x, .fns = list(mean = mean, sd = stats::sd), cor = FALSE, ...)
## Default S3 method:
misvm(
  x,
  y,
  bags,
  cost = 1,
  method = c("heuristic", "mip", "qp-heuristic"),
  weights = TRUE,
  control = list(kernel = "linear", sigma = if (is.vector(x)) 1 else 1/ncol(x),
    nystrom_args = list(m = nrow(x), r = nrow(x), sampling = "random"), max_step = 500,
    type = "C-classification", scale = TRUE, verbose = FALSE, time_limit = 60, start =
    FALSE),
  ...
)

## S3 method for class 'formula'
misvm(formula, data, ...)

## S3 method for class 'mi_df'
misvm(x, ...)

## S3 method for class 'mild_df'
misvm(x, .fns = list(mean = mean, sd = stats::sd), cor = FALSE, ...)

Arguments

`x`	A data.frame, matrix, or similar object of covariates, where each row represents an instance. If a `mi_df` object is passed, `⁠y, bags⁠` are automatically extracted, and all other columns will be used as predictors. If a `mild_df` object is passed, `⁠y, bags, instances⁠` are automatically extracted, and all other columns will be used as predictors.
`y`	A numeric, character, or factor vector of bag labels for each instance. Must satisfy `length(y) == nrow(x)`. Suggest that one of the levels is 1, '1', or TRUE, which becomes the positive class; otherwise, a positive class is chosen and a message will be supplied.
`bags`	A vector specifying which instance belongs to each bag. Can be a string, numeric, of factor.
`cost`	The cost parameter in SVM. If `method = 'heuristic'`, this will be fed to `kernlab::ksvm()`, otherwise it is similarly in internal functions.
`method`	The algorithm to use in fitting (default `'heuristic'`). When `method = 'heuristic'`, which employs an algorithm similar to Andrews et al. (2003). When `method = 'mip'`, the novel MIP method will be used. When `⁠method = 'qp-heuristic⁠`, the heuristic algorithm is computed using the dual SVM. See details.
`weights`	named vector, or `TRUE`, to control the weight of the cost parameter for each possible y value. Weights multiply against the cost vector. If `TRUE`, weights are calculated based on inverse counts of instances with given label, where we only count one positive instance per bag. Otherwise, names must match the levels of `y`.
`control`	list of additional parameters passed to the method that control computation with the following components: `kernel` either a character the describes the kernel ('linear' or 'radial') or a kernel matrix at the instance level. `sigma` argument needed for radial basis kernel. `nystrom_args` a list of parameters to pass to `kfm_nystrom()`. This is used when `method = 'mip'` and `kernel = 'radial'` to generate a Nystrom approximation of the kernel features. `max_step` argument used when `method = 'heuristic'`. Maximum steps of iteration for the heuristic algorithm. `type`: argument used when `method = 'heuristic'`. The `type` argument is passed to `e1071::svm()`. `scale` argument used for all methods. A logical for whether to rescale the input before fitting. `verbose` argument used when `method = 'mip'`. Whether to message output to the console. `time_limit` argument used when `method = 'mip'`. `FALSE`, or a time limit (in seconds) passed to `gurobi()` parameters. If `FALSE`, no time limit is given. `start` argument used when `method = 'mip'`. If `TRUE`, the mip program will be warm_started with the solution from `method = 'qp-heuristic'` to potentially improve speed.
`...`	Arguments passed to or from other methods.
`formula`	a formula with specification `mi(y, bags) ~ x` which uses the `mi` function to create the bag-instance structure. This argument is an alternative to the `⁠x, y, bags⁠` arguments, but requires the `data` argument. See examples.
`data`	If `formula` is provided, a data.frame or similar from which formula elements will be extracted.
`.fns`	(argument for `misvm.mild_df()` method) list of functions to summarize instances over.
`cor`	(argument for `misvm.mild_df()` method) logical, whether to include correlations between all features in the summarization.

Details

Value

An object of class misvm. The object contains at least the following components:

⁠*_fit⁠: A fit object depending on the method parameter. If method = 'heuristic', this will be an svm fit from the e1071 package. If ⁠method = 'mip', 'qp-heuristic'⁠ this will be gurobi_fit from a model optimization.
call_type: A character indicating which method misvm() was called with.
features: The names of features used in training.
levels: The levels of y that are recorded for future prediction.
cost: The cost parameter from function inputs.
weights: The calculated weights on the cost parameter.
repr_inst: The instances from positive bags that are selected to be most representative of the positive instances.
n_step: If method %in% c('heuristic', 'qp-heuristic'), the total steps used in the heuristic algorithm.
x_scale: If scale = TRUE, the scaling parameters for new predictions.

Methods (by class)

default: Method for data.frame-like objects
formula: Method for passing formula
mi_df: Method for mi_df objects, automatically handling bag names, labels, and all covariates.
mild_df: Method for mild_df objects. Summarize samples to the instance level based on specified functions, then perform misvm() on instance level data.

Author(s)

Sean Kent, Yifei Liu

References

Andrews, S., Tsochantaridis, I., & Hofmann, T. (2002). Support vector machines for multiple-instance learning. Advances in neural information processing systems, 15.

Kent, S., & Yu, M. (2022). Non-convex SVM for cancer diagnosis based on morphologic features of tumor microenvironment arXiv preprint arXiv:2206.14704

Examples

set.seed(8)
mil_data <- generate_mild_df(nbag = 20,
                             positive_prob = 0.15,
                             sd_of_mean = rep(0.1, 3))
df <- build_instance_feature(mil_data, seq(0.05, 0.95, length.out = 10))

# Heuristic method
mdl1 <- misvm(x = df[, 4:123], y = df$bag_label,
              bags = df$bag_name, method = "heuristic")
mdl2 <- misvm(mi(bag_label, bag_name) ~ X1_mean + X2_mean + X3_mean, data = df)

# MIP method
if (require(gurobi)) {
  mdl3 <- misvm(x = df[, 4:123], y = df$bag_label,
                bags = df$bag_name, method = "mip")
}

predict(mdl1, new_data = df, type = "raw", layer = "bag")

# summarize predictions at the bag layer
library(dplyr)
df %>%
  bind_cols(predict(mdl2, df, type = "class")) %>%
  bind_cols(predict(mdl2, df, type = "raw")) %>%
  distinct(bag_name, bag_label, .pred_class, .pred)


set.seed(8)
mil_data <- generate_mild_df(nbag = 20,
                             positive_prob = 0.15,
                             sd_of_mean = rep(0.1, 3))
df <- build_instance_feature(mil_data, seq(0.05, 0.95, length.out = 10))

# Heuristic method
mdl1 <- misvm(x = df[, 4:123], y = df$bag_label,
              bags = df$bag_name, method = "heuristic")
mdl2 <- misvm(mi(bag_label, bag_name) ~ X1_mean + X2_mean + X3_mean, data = df)

# MIP method
if (require(gurobi)) {
  mdl3 <- misvm(x = df[, 4:123], y = df$bag_label,
                bags = df$bag_name, method = "mip")
}

predict(mdl1, new_data = df, type = "raw", layer = "bag")

# summarize predictions at the bag layer
library(dplyr)
df %>%
  bind_cols(predict(mdl2, df, type = "class")) %>%
  bind_cols(predict(mdl2, df, type = "raw")) %>%
  distinct(bag_name, bag_label, .pred_class, .pred)

Fit MI-SVM model to ordinal outcome data using One-vs-All

Description

This function uses the one-vs-all multiclass classification strategy to fit a series of MI-SVM models for predictions on ordinal outcome data. For an ordinal outcome with K levels, we fit K MI-SVM models to predict an individual level vs not.

Usage

## Default S3 method:
misvm_orova(
  x,
  y,
  bags,
  cost = 1,
  method = c("heuristic", "mip", "qp-heuristic"),
  weights = TRUE,
  control = list(kernel = "linear", sigma = if (is.vector(x)) 1 else 1/ncol(x),
    nystrom_args = list(m = nrow(x), r = nrow(x), sampling = "random"), max_step = 500,
    type = "C-classification", scale = TRUE, verbose = FALSE, time_limit = 60, start =
    FALSE),
  ...
)

## S3 method for class 'formula'
misvm_orova(formula, data, ...)

## S3 method for class 'mi_df'
misvm_orova(x, ...)
## Default S3 method:
misvm_orova(
  x,
  y,
  bags,
  cost = 1,
  method = c("heuristic", "mip", "qp-heuristic"),
  weights = TRUE,
  control = list(kernel = "linear", sigma = if (is.vector(x)) 1 else 1/ncol(x),
    nystrom_args = list(m = nrow(x), r = nrow(x), sampling = "random"), max_step = 500,
    type = "C-classification", scale = TRUE, verbose = FALSE, time_limit = 60, start =
    FALSE),
  ...
)

## S3 method for class 'formula'
misvm_orova(formula, data, ...)

## S3 method for class 'mi_df'
misvm_orova(x, ...)

Arguments

`x`	A data.frame, matrix, or similar object of covariates, where each row represents an instance. If a `mi_df` object is passed, `⁠y, bags⁠` are automatically extracted, and all other columns will be used as predictors.
`y`	A numeric, character, or factor vector of bag labels for each instance. Must satisfy `length(y) == nrow(x)`. Suggest that one of the levels is 1, '1', or TRUE, which becomes the positive class; otherwise, a positive class is chosen and a message will be supplied.
`bags`	A vector specifying which instance belongs to each bag. Can be a string, numeric, of factor.
`cost`	The cost parameter in SVM. If `method = 'heuristic'`, this will be fed to `kernlab::ksvm()`, otherwise it is similarly in internal functions.
`method`	The algorithm to use in fitting (default `'heuristic'`). When `method = 'heuristic'`, which employs an algorithm similar to Andrews et al. (2003). When `method = 'mip'`, the novel MIP method will be used. When `⁠method = 'qp-heuristic⁠`, the heuristic algorithm is computed using the dual SVM. See details.
`weights`	named vector, or `TRUE`, to control the weight of the cost parameter for each possible y value. Weights multiply against the cost vector. If `TRUE`, weights are calculated based on inverse counts of instances with given label, where we only count one positive instance per bag. Otherwise, names must match the levels of `y`.
`control`	list of additional parameters passed to the method that control computation with the following components: `kernel` either a character the describes the kernel ('linear' or 'radial') or a kernel matrix at the instance level. `sigma` argument needed for radial basis kernel. `nystrom_args` a list of parameters to pass to `kfm_nystrom()`. This is used when `method = 'mip'` and `kernel = 'radial'` to generate a Nystrom approximation of the kernel features. `max_step` argument used when `method = 'heuristic'`. Maximum steps of iteration for the heuristic algorithm. `type`: argument used when `method = 'heuristic'`. The `type` argument is passed to `e1071::svm()`. `scale` argument used for all methods. A logical for whether to rescale the input before fitting. `verbose` argument used when `method = 'mip'`. Whether to message output to the console. `time_limit` argument used when `method = 'mip'`. `FALSE`, or a time limit (in seconds) passed to `gurobi()` parameters. If `FALSE`, no time limit is given. `start` argument used when `method = 'mip'`. If `TRUE`, the mip program will be warm_started with the solution from `method = 'qp-heuristic'` to potentially improve speed.
`...`	Arguments passed to or from other methods.
`formula`	a formula with specification `mi(y, bags) ~ x` which uses the `mi` function to create the bag-instance structure. This argument is an alternative to the `⁠x, y, bags⁠` arguments, but requires the `data` argument. See examples.
`data`	If `formula` is provided, a data.frame or similar from which formula elements will be extracted

Value

An object of class misvm_orova The object contains at least the following components:

fits: a list of misvm objects with length equal to the number of classes in y. See misvm() for details on the misvm object.
call_type: A character indicating which method misvm_orova() was called with.
features: The names of features used in training.
levels: The levels of y that are recorded for future prediction.

Methods (by class)

default: Method for data.frame-like objects
formula: Method for passing formula
mi_df: Method for mi_df objects, automatically handling bag names, labels, and all covariates.

Author(s)

Sean Kent

References

Andrews, S., Tsochantaridis, I., & Hofmann, T. (2002). Support vector machines for multiple-instance learning. Advances in neural information processing systems, 15.

Examples

data("ordmvnorm")
x <- ordmvnorm[, 3:7]
y <- ordmvnorm$bag_label
bags <- ordmvnorm$bag_name

mdl1 <- misvm_orova(x, y, bags)
predict(mdl1, x, new_bags = bags)

data("ordmvnorm")
x <- ordmvnorm[, 3:7]
y <- ordmvnorm$bag_label
bags <- ordmvnorm$bag_name

mdl1 <- misvm_orova(x, y, bags)
predict(mdl1, x, new_bags = bags)

Fit MI-SVM-OR model to ordinal outcome data

Description

This function fits a modification of MI-SVM to ordinal outcome data based on the research method proposed by Kent and Yu.

Usage

## Default S3 method:
omisvm(
  x,
  y,
  bags,
  cost = 1,
  h = 1,
  s = Inf,
  method = c("qp-heuristic"),
  weights = TRUE,
  control = list(kernel = "linear", sigma = if (is.vector(x)) 1 else 1/ncol(x),
    max_step = 500, type = "C-classification", scale = TRUE, verbose = FALSE, time_limit
    = 60),
  ...
)

## S3 method for class 'formula'
omisvm(formula, data, ...)

## S3 method for class 'mi_df'
omisvm(x, ...)
## Default S3 method:
omisvm(
  x,
  y,
  bags,
  cost = 1,
  h = 1,
  s = Inf,
  method = c("qp-heuristic"),
  weights = TRUE,
  control = list(kernel = "linear", sigma = if (is.vector(x)) 1 else 1/ncol(x),
    max_step = 500, type = "C-classification", scale = TRUE, verbose = FALSE, time_limit
    = 60),
  ...
)

## S3 method for class 'formula'
omisvm(formula, data, ...)

## S3 method for class 'mi_df'
omisvm(x, ...)

Arguments

`x`	A data.frame, matrix, or similar object of covariates, where each row represents an instance. If a `mi_df` object is passed, `⁠y, bags⁠` are automatically extracted, and all other columns will be used as predictors.
`y`	A numeric, character, or factor vector of bag labels for each instance. Must satisfy `length(y) == nrow(x)`. Suggest that one of the levels is 1, '1', or TRUE, which becomes the positive class; otherwise, a positive class is chosen and a message will be supplied.
`bags`	A vector specifying which instance belongs to each bag. Can be a string, numeric, of factor.
`cost`	The cost parameter in SVM. If `method = 'heuristic'`, this will be fed to `kernlab::ksvm()`, otherwise it is similarly in internal functions.
`h`	A scalar that controls the trade-off between maximizing the margin and minimizing distance between hyperplanes.
`s`	An integer for how many replication points to add to the dataset. If `k` represents the number of labels in y, must have `⁠1 <= s <= k-1⁠`. The default, `Inf`, uses the maximum number of replication points, `k-1`.
`method`	The algorithm to use in fitting (default `'heuristic'`). When `method = 'heuristic'`, which employs an algorithm similar to Andrews et al. (2003). When `method = 'mip'`, the novel MIP method will be used. When `⁠method = 'qp-heuristic⁠`, the heuristic algorithm is computed using the dual SVM. See details.
`weights`	named vector, or `TRUE`, to control the weight of the cost parameter for each possible y value. Weights multiply against the cost vector. If `TRUE`, weights are calculated based on inverse counts of instances with given label, where we only count one positive instance per bag. Otherwise, names must match the levels of `y`.
`control`	list of additional parameters passed to the method that control computation with the following components: `kernel` either a character the describes the kernel ('linear' or 'radial') or a kernel matrix at the instance level. `sigma` argument needed for radial basis kernel. `nystrom_args` a list of parameters to pass to `kfm_nystrom()`. This is used when `method = 'mip'` and `kernel = 'radial'` to generate a Nystrom approximation of the kernel features. `max_step` argument used when `method = 'heuristic'`. Maximum steps of iteration for the heuristic algorithm. `type`: argument used when `method = 'heuristic'`. The `type` argument is passed to `e1071::svm()`. `scale` argument used for all methods. A logical for whether to rescale the input before fitting. `verbose` argument used when `method = 'mip'`. Whether to message output to the console. `time_limit` argument used when `method = 'mip'`. `FALSE`, or a time limit (in seconds) passed to `gurobi()` parameters. If `FALSE`, no time limit is given. `start` argument used when `method = 'mip'`. If `TRUE`, the mip program will be warm_started with the solution from `method = 'qp-heuristic'` to potentially improve speed.
`...`	Arguments passed to or from other methods.
`formula`	a formula with specification `mi(y, bags) ~ x` which uses the `mi` function to create the bag-instance structure. This argument is an alternative to the `⁠x, y, bags⁠` arguments, but requires the `data` argument. See examples.
`data`	If `formula` is provided, a data.frame or similar from which formula elements will be extracted

Details

Currently, the only method available is a heuristic algorithm in linear SVM space. Additional methods should be available shortly.

Value

An object of class omisvm. The object contains at least the following components:

⁠*_fit⁠: A fit object depending on the method parameter. If method = 'qp-heuristic' this will be gurobi_fit from a model optimization.
call_type: A character indicating which method omisvm() was called with.
features: The names of features used in training.
levels: The levels of y that are recorded for future prediction.
cost: The cost parameter from function inputs.
weights: The calculated weights on the cost parameter.
repr_inst: The instances from positive bags that are selected to be most representative of the positive instances.
n_step: If method == 'qp-heuristic', the total steps used in the heuristic algorithm.
x_scale: If scale = TRUE, the scaling parameters for new predictions.

Methods (by class)

default: Method for data.frame-like objects
formula: Method for passing formula
mi_df: Method for mi_df objects, automatically handling bag names, labels, and all covariates.

Author(s)

Sean Kent

Examples

if (require(gurobi)) {
  data("ordmvnorm")
  x <- ordmvnorm[, 3:7]
  y <- ordmvnorm$bag_label
  bags <- ordmvnorm$bag_name

  mdl1 <- omisvm(x, y, bags, weights = NULL)
  predict(mdl1, x, new_bags = bags)
}

if (require(gurobi)) {
  data("ordmvnorm")
  x <- ordmvnorm[, 3:7]
  y <- ordmvnorm$bag_label
  bags <- ordmvnorm$bag_name

  mdl1 <- omisvm(x, y, bags, weights = NULL)
  predict(mdl1, x, new_bags = bags)
}

Sample ordinal MIL data using mvnorm

Description

A data set that demonstrates the ordinal multiple-instance learning structure with feature columns randomly sampled from a multivariate normal distribution.

Usage

ordmvnorm
ordmvnorm

Format

An MI data frame with 1000 rows 8 variables, and 5 bags. Instance labels can be accessed via attr(ordmvnorm, "instance_label").

bag_label: outcome label at the bag level. This is the maximum of the inst_label for each bag
bag_name: indicator of each bag
V1: Variable with mean equal to 2 * inst_label
V2: Variable with mean equal to -1 * inst_label
V3: Variable with mean equal to 1 * inst_label
V4: Variable with mean 0, essentially noise
V5: Variable with mean 0, essentially noise

Predict method for `cv_misvm` object

Description

Predict method for cv_misvm object

Usage

## S3 method for class 'cv_misvm'
predict(
  object,
  new_data,
  type = c("class", "raw"),
  layer = c("bag", "instance"),
  new_bags = "bag_name",
  ...
)
## S3 method for class 'cv_misvm'
predict(
  object,
  new_data,
  type = c("class", "raw"),
  layer = c("bag", "instance"),
  new_bags = "bag_name",
  ...
)

Arguments

`object`	An object of class `cv_misvm`.
`new_data`	A data frame to predict from. This needs to have all of the features that the data was originally fitted with.
`type`	If `'class'`, return predicted values with threshold of 0 as -1 or +1. If `'raw'`, return the raw predicted scores.
`layer`	If `'bag'`, return predictions at the bag level. If `'instance'`, return predictions at the instance level.
`new_bags`	A character or character vector. Can specify a singular character that provides the column name for the bag names in `new_data` (default `'bag_name'`). Can also specify a vector of length `nrow(new_data)` that has bag name for each row.
`...`	Arguments passed to or from other methods.

Value

A tibble with nrow(new_data) rows. If type = 'class', the tibble will have a column '.pred_class'. If type = 'raw', the tibble will have a column '.pred'.

Author(s)

Sean Kent

Examples

mil_data <- generate_mild_df(
  nbag = 10,
  nsample = 20,
  positive_degree = 3
)
df1 <- build_instance_feature(mil_data, seq(0.05, 0.95, length.out = 10))
mdl1 <- cv_misvm(x = df1[, 4:123], y = df1$bag_label,
                 bags = df1$bag_name, cost_seq = 2^(-2:2),
                 n_fold = 3, method = "heuristic")

predict(mdl1, new_data = df1, type = "raw", layer = "bag")

# summarize predictions at the bag layer
suppressWarnings(library(dplyr))
df1 %>%
  bind_cols(predict(mdl1, df1, type = "class")) %>%
  bind_cols(predict(mdl1, df1, type = "raw")) %>%
  distinct(bag_name, bag_label, .pred_class, .pred)

mil_data <- generate_mild_df(
  nbag = 10,
  nsample = 20,
  positive_degree = 3
)
df1 <- build_instance_feature(mil_data, seq(0.05, 0.95, length.out = 10))
mdl1 <- cv_misvm(x = df1[, 4:123], y = df1$bag_label,
                 bags = df1$bag_name, cost_seq = 2^(-2:2),
                 n_fold = 3, method = "heuristic")

predict(mdl1, new_data = df1, type = "raw", layer = "bag")

# summarize predictions at the bag layer
suppressWarnings(library(dplyr))
df1 %>%
  bind_cols(predict(mdl1, df1, type = "class")) %>%
  bind_cols(predict(mdl1, df1, type = "raw")) %>%
  distinct(bag_name, bag_label, .pred_class, .pred)

Predict method for `mior` object

Description

Predict method for mior object

Usage

## S3 method for class 'mior'
predict(
  object,
  new_data,
  type = c("class", "raw"),
  layer = c("bag", "instance"),
  new_bags = "bag_name",
  ...
)
## S3 method for class 'mior'
predict(
  object,
  new_data,
  type = c("class", "raw"),
  layer = c("bag", "instance"),
  new_bags = "bag_name",
  ...
)

Arguments

`object`	An object of class `mior`
`new_data`	A data frame to predict from. This needs to have all of the features that the data was originally fitted with.
`type`	If `'class'`, return predicted values with threshold of 0 as -1 or +1. If `'raw'`, return the raw predicted scores.
`layer`	If `'bag'`, return predictions at the bag level. If `'instance'`, return predictions at the instance level.
`new_bags`	A character or character vector. Can specify a singular character that provides the column name for the bag names in `new_data` (default `'bag_name'`). Can also specify a vector of length `nrow(new_data)` that has bag name for each row.
`...`	Arguments passed to or from other methods.

Details

When the object was fitted using the formula method, then the parameters new_bags and new_instances are not necessary, as long as the names match the original function call.

Value

A tibble with nrow(new_data) rows. If type = 'class', the tibble will have a column .pred_class. If type = 'raw', the tibble will have a column .pred.

Author(s)

Sean Kent

Examples

if (require(gurobi)) {
  set.seed(8)
  # make some data
  n <- 15
  X <- rbind(
    mvtnorm::rmvnorm(n/3, mean = c(4, -2, 0)),
    mvtnorm::rmvnorm(n/3, mean = c(0, 0, 0)),
    mvtnorm::rmvnorm(n/3, mean = c(-2, 1, 0))
  )
  score <- X %*% c(2, -1, 0)
  y <- as.numeric(cut(score, c(-Inf, quantile(score, probs = 1:2 / 3), Inf)))
  bags <- 1:length(y)

  # add in points outside boundaries
  X <- rbind(
    X,
    mvtnorm::rmvnorm(n, mean = c(6, -3, 0)),
    mvtnorm::rmvnorm(n, mean = c(-6, 3, 0))
  )
  y <- c(y, rep(-1, 2*n))
  bags <- rep(bags, 3)
  repr <- c(rep(1, n), rep(0, 2*n))

  y_bag <- classify_bags(y, bags, condense = FALSE)

  mdl1 <- mior(X, y_bag, bags)
  # summarize predictions at the bag layer
  library(dplyr)
  df1 <- bind_cols(y = y_bag, bags = bags, as.data.frame(X))
  df1 %>%
    bind_cols(predict(mdl1, df1, new_bags = bags, type = "class")) %>%
    bind_cols(predict(mdl1, df1, new_bags = bags, type = "raw")) %>%
    distinct(y, bags, .pred_class, .pred)
}

if (require(gurobi)) {
  set.seed(8)
  # make some data
  n <- 15
  X <- rbind(
    mvtnorm::rmvnorm(n/3, mean = c(4, -2, 0)),
    mvtnorm::rmvnorm(n/3, mean = c(0, 0, 0)),
    mvtnorm::rmvnorm(n/3, mean = c(-2, 1, 0))
  )
  score <- X %*% c(2, -1, 0)
  y <- as.numeric(cut(score, c(-Inf, quantile(score, probs = 1:2 / 3), Inf)))
  bags <- 1:length(y)

  # add in points outside boundaries
  X <- rbind(
    X,
    mvtnorm::rmvnorm(n, mean = c(6, -3, 0)),
    mvtnorm::rmvnorm(n, mean = c(-6, 3, 0))
  )
  y <- c(y, rep(-1, 2*n))
  bags <- rep(bags, 3)
  repr <- c(rep(1, n), rep(0, 2*n))

  y_bag <- classify_bags(y, bags, condense = FALSE)

  mdl1 <- mior(X, y_bag, bags)
  # summarize predictions at the bag layer
  library(dplyr)
  df1 <- bind_cols(y = y_bag, bags = bags, as.data.frame(X))
  df1 %>%
    bind_cols(predict(mdl1, df1, new_bags = bags, type = "class")) %>%
    bind_cols(predict(mdl1, df1, new_bags = bags, type = "raw")) %>%
    distinct(y, bags, .pred_class, .pred)
}

Predict method for `mismm` object

Description

Predict method for mismm object

Usage

## S3 method for class 'mismm'
predict(
  object,
  new_data,
  type = c("class", "raw"),
  layer = c("bag", "instance"),
  new_bags = "bag_name",
  new_instances = "instance_name",
  kernel = NULL,
  ...
)
## S3 method for class 'mismm'
predict(
  object,
  new_data,
  type = c("class", "raw"),
  layer = c("bag", "instance"),
  new_bags = "bag_name",
  new_instances = "instance_name",
  kernel = NULL,
  ...
)

Arguments

`object`	An object of class `mismm`.
`new_data`	A data frame to predict from. This needs to have all of the features that the data was originally fitted with.
`type`	If `'class'`, return predicted values with threshold of 0 as -1 or +1. If `'raw'`, return the raw predicted scores.
`layer`	If `'bag'`, return predictions at the bag level. If `'instance'`, return predictions at the instance level.
`new_bags`	A character or character vector. Can specify a singular character that provides the column name for the bag names in `new_data` (default `'bag_name'`). Can also specify a vector of length `nrow(new_data)` that has bag name for each row.
`new_instances`	A character or character vector. Can specify a singular character that provides the column name for the instance names in `new_data` (default `'instance_name'`). Can also specify a vector of length `nrow(new_data)` that has instance name for each row.
`kernel`	An optional pre-computed kernel matrix at the instance level or `NULL` (default `NULL`). The rows should correspond to instances in the new data to predict, and columns should correspond to instances in the original training data, such as a call to `kme()`.
`...`	Arguments passed to or from other methods.

Details

When the object was fitted using the formula method, then the parameters new_bags and new_instances are not necessary, as long as the names match the original function call.

Value

A tibble with nrow(new_data) rows. If type = 'class', the tibble will have a column .pred_class. If type = 'raw', the tibble will have a column .pred.

Author(s)

Sean Kent

Examples

mil_data <- generate_mild_df(nbag = 15, nsample = 20, positive_prob = 0.15,
                             sd_of_mean = rep(0.1, 3))

mdl1 <- mismm(mil_data, control = list(sigma = 1/5))

# bag level predictions
library(dplyr)
mil_data %>%
    bind_cols(predict(mdl1, mil_data, type = "class")) %>%
    bind_cols(predict(mdl1, mil_data, type = "raw")) %>%
    distinct(bag_name, bag_label, .pred_class, .pred)

# instance level prediction
mil_data %>%
    bind_cols(predict(mdl1, mil_data, type = "class", layer = "instance")) %>%
    bind_cols(predict(mdl1, mil_data, type = "raw", layer = "instance")) %>%
    distinct(bag_name, instance_name, bag_label, .pred_class, .pred)

mil_data <- generate_mild_df(nbag = 15, nsample = 20, positive_prob = 0.15,
                             sd_of_mean = rep(0.1, 3))

mdl1 <- mismm(mil_data, control = list(sigma = 1/5))

# bag level predictions
library(dplyr)
mil_data %>%
    bind_cols(predict(mdl1, mil_data, type = "class")) %>%
    bind_cols(predict(mdl1, mil_data, type = "raw")) %>%
    distinct(bag_name, bag_label, .pred_class, .pred)

# instance level prediction
mil_data %>%
    bind_cols(predict(mdl1, mil_data, type = "class", layer = "instance")) %>%
    bind_cols(predict(mdl1, mil_data, type = "raw", layer = "instance")) %>%
    distinct(bag_name, instance_name, bag_label, .pred_class, .pred)

Predict method for `misvm` object

Description

Predict method for misvm object

Usage

## S3 method for class 'misvm'
predict(
  object,
  new_data,
  type = c("class", "raw"),
  layer = c("bag", "instance"),
  new_bags = "bag_name",
  ...
)
## S3 method for class 'misvm'
predict(
  object,
  new_data,
  type = c("class", "raw"),
  layer = c("bag", "instance"),
  new_bags = "bag_name",
  ...
)

Arguments

`object`	An object of class `misvm`.
`new_data`	A data frame to predict from. This needs to have all of the features that the data was originally fitted with.
`type`	If `'class'`, return predicted values with threshold of 0 as -1 or +1. If `'raw'`, return the raw predicted scores.
`layer`	If `'bag'`, return predictions at the bag level. If `'instance'`, return predictions at the instance level.
`new_bags`	A character or character vector. Can specify a singular character that provides the column name for the bag names in `new_data` (default `'bag_name'`). Can also specify a vector of length `nrow(new_data)` that has bag name for each row.
`...`	Arguments passed to or from other methods.

Details

When the object was fitted using the formula method, then the parameters new_bags and new_instances are not necessary, as long as the names match the original function call.

Value

A tibble with nrow(new_data) rows. If type = 'class', the tibble will have a column .pred_class. If type = 'raw', the tibble will have a column .pred.

Author(s)

Sean Kent

Examples

mil_data <- generate_mild_df(nbag = 20,
                             positive_prob = 0.15,
                             sd_of_mean = rep(0.1, 3))
df1 <- build_instance_feature(mil_data, seq(0.05, 0.95, length.out = 10))
mdl1 <- misvm(x = df1[, 4:63], y = df1$bag_label,
              bags = df1$bag_name, method = "heuristic")

predict(mdl1, new_data = df1, type = "raw", layer = "bag")

# summarize predictions at the bag layer
library(dplyr)
df1 %>%
  bind_cols(predict(mdl1, df1, type = "class")) %>%
  bind_cols(predict(mdl1, df1, type = "raw")) %>%
  distinct(bag_name, bag_label, .pred_class, .pred)

mil_data <- generate_mild_df(nbag = 20,
                             positive_prob = 0.15,
                             sd_of_mean = rep(0.1, 3))
df1 <- build_instance_feature(mil_data, seq(0.05, 0.95, length.out = 10))
mdl1 <- misvm(x = df1[, 4:63], y = df1$bag_label,
              bags = df1$bag_name, method = "heuristic")

predict(mdl1, new_data = df1, type = "raw", layer = "bag")

# summarize predictions at the bag layer
library(dplyr)
df1 %>%
  bind_cols(predict(mdl1, df1, type = "class")) %>%
  bind_cols(predict(mdl1, df1, type = "raw")) %>%
  distinct(bag_name, bag_label, .pred_class, .pred)

Predict method for `misvm_orova` object

Description

Predict method for misvm_orova object. Predictions use the K fitted MI-SVM models. For class predictions, we return the class whose MI-SVM model has the highest raw predicted score. For raw predictions, a full matrix of predictions is returned, with one column for each model.

Usage

## S3 method for class 'misvm_orova'
predict(
  object,
  new_data,
  type = c("class", "raw"),
  layer = c("bag", "instance"),
  new_bags = "bag_name",
  ...
)
## S3 method for class 'misvm_orova'
predict(
  object,
  new_data,
  type = c("class", "raw"),
  layer = c("bag", "instance"),
  new_bags = "bag_name",
  ...
)

Arguments

`object`	An object of class `misvm_orova`
`new_data`	A data frame to predict from. This needs to have all of the features that the data was originally fitted with.
`type`	If `'class'`, return predicted values based on the highest output of an individual model. If `'raw'`, return the raw predicted scores for each model.
`layer`	If `'bag'`, return predictions at the bag level. If `'instance'`, return predictions at the instance level.
`new_bags`	A character or character vector. Can specify a singular character that provides the column name for the bag names in `new_data` (default `'bag_name'`). Can also specify a vector of length `nrow(new_data)` that has bag name for each row.
`...`	Arguments passed to or from other methods.

Details

When the object was fitted using the formula method, then the parameters new_bags and new_instances are not necessary, as long as the names match the original function call.

Value

A tibble with nrow(new_data) rows. If type = 'class', the tibble will have a column .pred_class. If type = 'raw', the tibble will have K columns ⁠.pred_{class_name}⁠ corresponding to the raw predictions of the K models.

Author(s)

Sean Kent

Examples

data("ordmvnorm")
x <- ordmvnorm[, 3:7]
y <- ordmvnorm$bag_label
bags <- ordmvnorm$bag_name

mdl1 <- misvm_orova(x, y, bags)

# summarize predictions at the bag layer
library(dplyr)
df1 <- bind_cols(y = y, bags = bags, as.data.frame(x))
df1 %>%
  bind_cols(predict(mdl1, df1, new_bags = bags, type = "class")) %>%
  bind_cols(predict(mdl1, df1, new_bags = bags, type = "raw")) %>%
  select(-starts_with("V")) %>%
  distinct()

data("ordmvnorm")
x <- ordmvnorm[, 3:7]
y <- ordmvnorm$bag_label
bags <- ordmvnorm$bag_name

mdl1 <- misvm_orova(x, y, bags)

# summarize predictions at the bag layer
library(dplyr)
df1 <- bind_cols(y = y, bags = bags, as.data.frame(x))
df1 %>%
  bind_cols(predict(mdl1, df1, new_bags = bags, type = "class")) %>%
  bind_cols(predict(mdl1, df1, new_bags = bags, type = "raw")) %>%
  select(-starts_with("V")) %>%
  distinct()

Predict method for `omisvm` object

Description

Predict method for omisvm object

Usage

## S3 method for class 'omisvm'
predict(
  object,
  new_data,
  type = c("class", "raw"),
  layer = c("bag", "instance"),
  new_bags = "bag_name",
  ...
)
## S3 method for class 'omisvm'
predict(
  object,
  new_data,
  type = c("class", "raw"),
  layer = c("bag", "instance"),
  new_bags = "bag_name",
  ...
)

Arguments

`object`	An object of class `omisvm`
`new_data`	A data frame to predict from. This needs to have all of the features that the data was originally fitted with.
`type`	If `'class'`, return predicted values with threshold of 0 as -1 or +1. If `'raw'`, return the raw predicted scores.
`layer`	If `'bag'`, return predictions at the bag level. If `'instance'`, return predictions at the instance level.
`new_bags`	A character or character vector. Can specify a singular character that provides the column name for the bag names in `new_data` (default `'bag_name'`). Can also specify a vector of length `nrow(new_data)` that has bag name for each row.
`...`	Arguments passed to or from other methods.

Details

When the object was fitted using the formula method, then the parameters new_bags and new_instances are not necessary, as long as the names match the original function call.

Value

A tibble with nrow(new_data) rows. If type = 'class', the tibble will have a column .pred_class. If type = 'raw', the tibble will have a column .pred.

Author(s)

Sean Kent

Examples

if (require(gurobi)) {
  data("ordmvnorm")
  x <- ordmvnorm[, 3:7]
  y <- ordmvnorm$bag_label
  bags <- ordmvnorm$bag_name

  mdl1 <- omisvm(x, y, bags, weights = NULL)

  # summarize predictions at the bag layer
  library(dplyr)
  df1 <- bind_cols(y = y, bags = bags, as.data.frame(x))
  df1 %>%
    bind_cols(predict(mdl1, df1, new_bags = bags, type = "class")) %>%
    bind_cols(predict(mdl1, df1, new_bags = bags, type = "raw")) %>%
    distinct(y, bags, .pred_class, .pred)
}

if (require(gurobi)) {
  data("ordmvnorm")
  x <- ordmvnorm[, 3:7]
  y <- ordmvnorm$bag_label
  bags <- ordmvnorm$bag_name

  mdl1 <- omisvm(x, y, bags, weights = NULL)

  # summarize predictions at the bag layer
  library(dplyr)
  df1 <- bind_cols(y = y, bags = bags, as.data.frame(x))
  df1 %>%
    bind_cols(predict(mdl1, df1, new_bags = bags, type = "class")) %>%
    bind_cols(predict(mdl1, df1, new_bags = bags, type = "raw")) %>%
    distinct(y, bags, .pred_class, .pred)
}

Predict method for `smm` object

Description

Predict method for smm object

Usage

## S3 method for class 'smm'
predict(
  object,
  new_data,
  type = c("class", "raw"),
  layer = "instance",
  new_instances = "instance_name",
  new_bags = "bag_name",
  kernel = NULL,
  ...
)
## S3 method for class 'smm'
predict(
  object,
  new_data,
  type = c("class", "raw"),
  layer = "instance",
  new_instances = "instance_name",
  new_bags = "bag_name",
  kernel = NULL,
  ...
)

Arguments

`object`	an object of class `smm`
`new_data`	A data frame to predict from. This needs to have all of the features that the data was originally fitted with.
`type`	If `'class'`, return predicted values with threshold of 0 as -1 or +1. If `'raw'`, return the raw predicted scores.
`layer`	If `'instance'`, return predictions at the instance level. Option `'bag'` returns predictions at the bag level, but only if the model was fit with `smm.mild_df()`,
`new_instances`	A character or character vector. Can specify a singular character that provides the column name for the instance names in `new_data` (default `'instance_name'`). Can also specify a vector of length `nrow(new_data)` that has instance name for each row.
`new_bags`	A character or character vector. Only relevant when fit with `smm.mild_df()`, which contains bag level information. Can specify a singular character that provides the column name for the bag names in `new_data`, default = "bag_name". Can also specify a vector of length `nrow(new_data)` that has bag name for each instance.
`kernel`	An optional pre-computed kernel matrix at the instance level or `NULL` (default `NULL`). The rows should correspond to instances in the new data to predict, and columns should correspond to instances in the original training data, such as a call to `kme()`.
`...`	Arguments passed to or from other methods.

Details

When the object was fitted using the formula method, then the parameters new_bags and new_instances are not necessary, as long as the names match the original function call.

Value

tibble with nrow(new_data) rows. If type = 'class', the tibble will have a column named .pred_class. If type = 'raw', the tibble will have a column name .pred.

Author(s)

Sean Kent

Examples

set.seed(8)
n_instances <- 10
n_samples <- 20
y <- rep(c(1, -1), each = n_samples * n_instances / 2)
instances <- as.character(rep(1:n_instances, each = n_samples))
x <- data.frame(x1 = rnorm(length(y), mean = 1*(y==1)),
                x2 = rnorm(length(y), mean = 2*(y==1)),
                x3 = rnorm(length(y), mean = 3*(y==1)))

mdl <- smm(x, y, instances, control = list(sigma = 1/3))

# instance level predictions (training data)
suppressWarnings(library(dplyr))
data.frame(instance_name = instances, y = y, x) %>%
  bind_cols(predict(mdl, type = "raw", new_data = x, new_instances = instances)) %>%
  bind_cols(predict(mdl, type = "class", new_data = x, new_instances = instances)) %>%
  distinct(instance_name, y, .pred, .pred_class)

# test data
new_inst <- rep(c("11", "12"), each = 30)
new_y <- rep(c(1, -1), each = 30)
new_x <- data.frame(x1 = rnorm(length(new_inst), mean = 1*(new_inst=="11")),
                    x2 = rnorm(length(new_inst), mean = 2*(new_inst=="11")),
                    x3 = rnorm(length(new_inst), mean = 3*(new_inst=="11")))

# instance level predictions (test data)
data.frame(instance_name = new_inst, y = new_y, new_x) %>%
  bind_cols(predict(mdl, type = "raw", new_data = new_x, new_instances = new_inst)) %>%
  bind_cols(predict(mdl, type = "class", new_data = new_x, new_instances = new_inst)) %>%
  distinct(instance_name, y, .pred, .pred_class)

set.seed(8)
n_instances <- 10
n_samples <- 20
y <- rep(c(1, -1), each = n_samples * n_instances / 2)
instances <- as.character(rep(1:n_instances, each = n_samples))
x <- data.frame(x1 = rnorm(length(y), mean = 1*(y==1)),
                x2 = rnorm(length(y), mean = 2*(y==1)),
                x3 = rnorm(length(y), mean = 3*(y==1)))

mdl <- smm(x, y, instances, control = list(sigma = 1/3))

# instance level predictions (training data)
suppressWarnings(library(dplyr))
data.frame(instance_name = instances, y = y, x) %>%
  bind_cols(predict(mdl, type = "raw", new_data = x, new_instances = instances)) %>%
  bind_cols(predict(mdl, type = "class", new_data = x, new_instances = instances)) %>%
  distinct(instance_name, y, .pred, .pred_class)

# test data
new_inst <- rep(c("11", "12"), each = 30)
new_y <- rep(c(1, -1), each = 30)
new_x <- data.frame(x1 = rnorm(length(new_inst), mean = 1*(new_inst=="11")),
                    x2 = rnorm(length(new_inst), mean = 2*(new_inst=="11")),
                    x3 = rnorm(length(new_inst), mean = 3*(new_inst=="11")))

# instance level predictions (test data)
data.frame(instance_name = new_inst, y = new_y, new_x) %>%
  bind_cols(predict(mdl, type = "raw", new_data = new_x, new_instances = new_inst)) %>%
  bind_cols(predict(mdl, type = "class", new_data = new_x, new_instances = new_inst)) %>%
  distinct(instance_name, y, .pred, .pred_class)

Predict method for `svor_exc` object

Description

Predict method for svor_exc object

Usage

## S3 method for class 'svor_exc'
predict(
  object,
  new_data,
  type = c("class", "raw"),
  layer = c("instance", "bag"),
  new_bags = "bag_name",
  ...
)
## S3 method for class 'svor_exc'
predict(
  object,
  new_data,
  type = c("class", "raw"),
  layer = c("instance", "bag"),
  new_bags = "bag_name",
  ...
)

Arguments

`object`	An object of class `svor_exc`.
`new_data`	A data frame to predict from. This needs to have all of the features that the data was originally fitted with.
`type`	If `'class'`, return predicted values with threshold of 0 as -1 or +1. If `'raw'`, return the raw predicted scores.
`layer`	If `'bag'`, return predictions at the bag level. If `'instance'`, return predictions at the instance level.
`new_bags`	A character or character vector. Can specify a singular character that provides the column name for the bag names in `new_data` (default `'bag_name'`). Can also specify a vector of length `nrow(new_data)` that has bag name for each row.
`...`	Arguments passed to or from other methods.

Details

When the object was fitted using the formula method, then the parameter new_bags is not necessary, as long as the names match the original function call.

Value

A tibble with nrow(new_data) rows. If type = 'class', the tibble will have a column .pred_class. If type = 'raw', the tibble will have a column .pred.

Author(s)

Sean Kent

Examples

data("ordmvnorm")
x <- ordmvnorm[, 3:7]
y <- attr(ordmvnorm, "instance_label")

mdl1 <- svor_exc(x, y)
predict(mdl1, x)
predict(mdl1, x, type = "raw")

data("ordmvnorm")
x <- ordmvnorm[, 3:7]
y <- attr(ordmvnorm, "instance_label")

mdl1 <- svor_exc(x, y)
predict(mdl1, x)
predict(mdl1, x, type = "raw")

Fit SMM model to the data

Description

Function to carry out support measure machines algorithm which is appropriate for multiple instance learning. The algorithm calculates the kernel matrix of different empirical measures using kernel mean embedding. The data set should be passed in with rows corresponding to samples from a set of instances. SMM will compute a kernel on the instances and pass that to kernlab::ksvm() to train the appropriate SVM model.

Usage

## Default S3 method:
smm(
  x,
  y,
  instances,
  cost = 1,
  weights = TRUE,
  control = list(kernel = "radial", sigma = if (is.vector(x)) 1 else 1/ncol(x), scale =
    TRUE),
  ...
)

## S3 method for class 'formula'
smm(formula, data, instances = "instance_name", ...)

## S3 method for class 'mild_df'
smm(x, ...)
## Default S3 method:
smm(
  x,
  y,
  instances,
  cost = 1,
  weights = TRUE,
  control = list(kernel = "radial", sigma = if (is.vector(x)) 1 else 1/ncol(x), scale =
    TRUE),
  ...
)

## S3 method for class 'formula'
smm(formula, data, instances = "instance_name", ...)

## S3 method for class 'mild_df'
smm(x, ...)

Arguments

`x`	A data.frame, matrix, or similar object of covariates, where each row represents a sample. If a `mild_df` object is passed, `⁠y, instances⁠` are automatically extracted, `bags` is ignored, and all other columns will be used as predictors.
`y`	A numeric, character, or factor vector of bag labels for each instance. Must satisfy `length(y) == nrow(x)`. Suggest that one of the levels is 1, '1', or TRUE, which becomes the positive class; otherwise, a positive class is chosen and a message will be supplied.
`instances`	A vector specifying which samples belong to each instance. Can be a string, numeric, of factor.
`cost`	The cost parameter in SVM, fed to the `C` argument in `kernlab::ksvm()`.
`weights`	named vector, or `TRUE`, to control the weight of the cost parameter for each possible y value. Weights multiply against the cost vector. If `TRUE`, weights are calculated based on inverse counts of instances with given label, where we only count one positive instance per bag. Otherwise, names must match the levels of `y`.
`control`	A list of additional parameters passed to the method that control computation with the following components: `kernel` either a character the describes the kernel ('linear' or 'radial') or a kernel matrix at the instance level. `sigma` argument needed for radial basis kernel. `scale` argument used for all methods. A logical for whether to rescale the input before fitting.
`...`	Arguments passed to or from other methods.
`formula`	A formula with specification `y ~ x`. This argument is an alternative to the `x`, `y` arguments, but requires the `data` and `instances` argument. See examples.
`data`	If `formula` is provided, a data.frame or similar from which formula elements will be extracted.

Value

An object of class smm The object contains at least the following components:

ksvm_fit: A fit of class ksvm from the kernlab package.
call_type: A character indicating which method smm() was called with.
x: The training data needed for computing the kernel matrix in prediction.
features: The names of features used in training.
levels: The levels of y that are recorded for future prediction.
cost: The cost parameter from function inputs.
sigma: The radial basis function kernel parameter.
weights: The calculated weights on the cost parameter, if applicable.
x_scale: If scale = TRUE, the scaling parameters for new predictions.

Methods (by class)

default: Method for data.frame-like objects
formula: Method for passing formula
mild_df: Method for mild_df objects. Use the bag_label as y at the instance level, then perform smm() ignoring the MIL structure.

Author(s)

Sean Kent, Yifei Liu

References

Muandet, K., Fukumizu, K., Dinuzzo, F., & Schölkopf, B. (2012). Learning from distributions via support measure machines. Advances in neural information processing systems, 25.

Examples

set.seed(8)
n_instances <- 10
n_samples <- 20
y <- rep(c(1, -1), each = n_samples * n_instances / 2)
instances <- as.character(rep(1:n_instances, each = n_samples))
x <- data.frame(x1 = rnorm(length(y), mean = 1*(y==1)),
                x2 = rnorm(length(y), mean = 2*(y==1)),
                x3 = rnorm(length(y), mean = 3*(y==1)))

df <- data.frame(instance_name = instances, y = y, x)

mdl <- smm(x, y, instances)
mdl2 <- smm(y ~ ., data = df)

# instance level predictions
suppressWarnings(library(dplyr))
df %>%
  dplyr::bind_cols(predict(mdl, type = "raw", new_data = x, new_instances = instances)) %>%
  dplyr::bind_cols(predict(mdl, type = "class", new_data = x, new_instances = instances)) %>%
  dplyr::distinct(instance_name, y, .pred, .pred_class)

set.seed(8)
n_instances <- 10
n_samples <- 20
y <- rep(c(1, -1), each = n_samples * n_instances / 2)
instances <- as.character(rep(1:n_instances, each = n_samples))
x <- data.frame(x1 = rnorm(length(y), mean = 1*(y==1)),
                x2 = rnorm(length(y), mean = 2*(y==1)),
                x3 = rnorm(length(y), mean = 3*(y==1)))

df <- data.frame(instance_name = instances, y = y, x)

mdl <- smm(x, y, instances)
mdl2 <- smm(y ~ ., data = df)

# instance level predictions
suppressWarnings(library(dplyr))
df %>%
  dplyr::bind_cols(predict(mdl, type = "raw", new_data = x, new_instances = instances)) %>%
  dplyr::bind_cols(predict(mdl, type = "class", new_data = x, new_instances = instances)) %>%
  dplyr::distinct(instance_name, y, .pred, .pred_class)

Summarize data across functions

Description

Summarize a numeric data frame based on specified grouping columns and a list of functions. This is useful in summarizing a mild_df object from the sample level to the instance level.

Usage

## Default S3 method:
summarize_samples(data, group_cols, .fns = list(mean = mean), cor = FALSE, ...)

## S3 method for class 'mild_df'
summarize_samples(data, ...)
## Default S3 method:
summarize_samples(data, group_cols, .fns = list(mean = mean), cor = FALSE, ...)

## S3 method for class 'mild_df'
summarize_samples(data, ...)

Arguments

`data`	A data.frame, 'mild_df' object, or similar of data to summarize.
`group_cols`	A character vector of column(s) that describe groups to summarize across.
`.fns`	A list of functions (default `list(mean = mean)`).
`cor`	A logical (default `FALSE`) for whether to include correlations between all features in the summarization.
`...`	Arguments passed to or from other methods.

Value

A tibble with summarized data. There will be one row for each set of distinct groups specified by group_cols. There will be one column for each of the group_cols, plus length(.fns) columns for each of the features in data, plus correlation columns if specified.

Methods (by class)

default: Method for data.frame-like objects.
mild_df: Method for mild_df objects.

Author(s)

Sean Kent

Examples

fns <- list(mean = mean, sd = sd)
summarize_samples(mtcars, group_cols = c("cyl", "gear"), .fns = fns)
summarize_samples(mtcars, group_cols = c("cyl", "gear"), .fns = fns, cor = TRUE)

fns <- list(mean = mean, sd = sd)
summarize_samples(mtcars, group_cols = c("cyl", "gear"), .fns = fns)
summarize_samples(mtcars, group_cols = c("cyl", "gear"), .fns = fns, cor = TRUE)

Fit SVOR-EXC model to ordinal outcome data

Description

This function fits the Support Vector Ordinal Regression with Explicit Constraints based on the research of Chu and Keerthi (2007).

Usage

## Default S3 method:
svor_exc(
  x,
  y,
  cost = 1,
  method = c("smo"),
  weights = NULL,
  control = list(kernel = "linear", sigma = if (is.vector(x)) 1 else 1/ncol(x),
    max_step = 500, scale = TRUE, verbose = FALSE),
  ...
)

## S3 method for class 'formula'
svor_exc(formula, data, ...)

## S3 method for class 'mi_df'
svor_exc(x, ...)
## Default S3 method:
svor_exc(
  x,
  y,
  cost = 1,
  method = c("smo"),
  weights = NULL,
  control = list(kernel = "linear", sigma = if (is.vector(x)) 1 else 1/ncol(x),
    max_step = 500, scale = TRUE, verbose = FALSE),
  ...
)

## S3 method for class 'formula'
svor_exc(formula, data, ...)

## S3 method for class 'mi_df'
svor_exc(x, ...)

Arguments

`x`	A data.frame, matrix, or similar object of covariates, where each row represents an instance. If a `mi_df` object is passed, `y` is automatically extracted, `bags` is ignored, and all other columns will be used as predictors.
`y`	A numeric, character, or factor vector of bag labels for each instance. Must satisfy `length(y) == nrow(x)`. Suggest that one of the levels is 1, '1', or TRUE, which becomes the positive class; otherwise, a positive class is chosen and a message will be supplied.
`cost`	The cost parameter in SVM.
`method`	The algorithm to use in fitting (default `'smo'`). When `method = 'smo'`, the modified SMO algorithm from Chu and Keerthi (2007) is used.
`weights`	`NULL`, since weights are not implemented for this function.
`control`	list of additional parameters passed to the method that control computation with the following components: `kernel` either a character the describes the kernel ('linear' or 'radial') or a kernel matrix at the instance level. `sigma` argument needed for radial basis kernel. `max_step` argument used when `method = 'heuristic'`. Maximum steps of iteration for the heuristic algorithm. `scale` argument used for all methods. A logical for whether to rescale the input before fitting. `verbose` argument used when `method = 'mip'`. Whether to message output to the console.
`...`	Arguments passed to or from other methods.
`formula`	A formula with specification `y ~ x`. This argument is an alternative to the `x`, `y` arguments, but requires the `data` argument. See examples.
`data`	If `formula` is provided, a data.frame or similar from which formula elements will be extracted.

Value

An object of class svor_exc The object contains at least the following components:

smo_fit: A fit object from running the modified ordinal smo algorithm.
call_type: A character indicating which method svor_exc() was called with.
features: The names of features used in training.
levels: The levels of y that are recorded for future prediction.
cost: The cost parameter from function inputs.
n_step: The total steps used in the heuristic algorithm.
x_scale: If scale = TRUE, the scaling parameters for new predictions.

Methods (by class)

default: Method for data.frame-like objects
formula: Method for passing formula
mi_df: Method for mi_df objects, automatically handling bag names, labels, and all covariates. Use the bag_label as y at the instance level, then perform svor_exc() ignoring the MIL structure and bags.

Author(s)

Sean Kent

References

Chu, W., & Keerthi, S. S. (2007). Support vector ordinal regression. Neural computation, 19(3), 792-815. doi:10.1162/neco.2007.19.3.792

Examples

data("ordmvnorm")
x <- ordmvnorm[, 3:7]
y <- attr(ordmvnorm, "instance_label")

mdl1 <- svor_exc(x, y)
predict(mdl1, x)

data("ordmvnorm")
x <- ordmvnorm[, 3:7]
y <- attr(ordmvnorm, "instance_label")

mdl1 <- svor_exc(x, y)
predict(mdl1, x)

Package 'mildsvm'

Help Index

Coerce to MI data frame

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Coerce to MILD data frame

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Sample mild_df object by bags and instances

Description

Usage

Arguments

Value

Author(s)

Examples

Build a feature map on new data

Description

Usage

Arguments

Value

Methods (by class)

Author(s)

See Also

Examples

Flatten mild_df data to the instance level

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Classify y from bags

Description

Usage

Arguments

Value

Author(s)

Examples

Fit MI-SVM model to the data using cross-validation

Description

Usage

Arguments

Value

Methods (by class)

Author(s)

See Also

Examples

Printing multiple instance data frames

Description

Usage

Arguments

Details

Value

Examples

Generate mild_df using multivariate t and normal distributions.

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Create an exact kernel feature map

Description

Usage

Arguments

Details

Value

Author(s)

Sample `mild_df` object by bags and instances

Flatten `mild_df` data to the instance level

Create an `mi` object