EpsilonGreedyUntargeted

class robustcheck.EpsilonGreedyUntargeted.EpsilonGreedyUntargeted(model, img, label, pixel_groups, epsilon=0.1, pixel_space_int_flag=True, pixel_space_min=0, pixel_space_max=255, steps=1000, verbose=False)[source]

Black-box, untargeted adversarial attack against image classifiers.

It encapsulates the target model and image and provides a method to run the adversarial attack. The attack samples groups of pixels to adversarially perturb according to a classic epsilon-greedy strategy. The reward is represented by the decrease in the probability to be classified correctly of the target image by the target model. The attack samples a pixel from the group that provided the highest average reward so far with probability 1-epsilon, and a pixel from a random group with probability epsilon.

model

Target model to be attacked. This has to expose a predict method that returns the output probability distributions when provided a batch of images as input.

img

An array (HxWxC) representing the target image to be perturbed.

label

An integer representing the correct class index of the image.

pixel_groups

An array of arrays of pairs of integers. Each second level array represents the indices of pixels that get attacked as part of the same pixel group. Usual approaches are to have these groups created based on objectness or on spatial proximity (e.g. in a grid-like setup).

epsilon

A float representing the probability of exploration (choosing a random group of pixels to be perturbed) in the classic epsilon-greedy strategy.

pixel_space_max

A number (integer or float) representing the maximum value pixels can take in the image space. This is used for extracting normalised metrics about the attack success later on.

verbose

A boolean flag which, when set to True, enables printing info on the attack results.

get_best_candidate(self)[source]

Returns the fittest individual in the active generation.

is_perturbed(self)[source]

Returns a boolean representing whether a successful adversarial perturbation has been achieved in the active generation.

run_adversarial_attack(self, steps=100)[source]

Runs the adversarial attack based on the evolutionary strategy until a successful adversarial perturbation was found or until steps generations were explored. Returns the total number of generations before the stopping condition was reached.

explore_attack_group(group_index)[source]

Explores the potential reward obtained by sampling the attacked pixel from a fixed group.

Parameters:

group_index – An integer representing the index of the pixel group that the method will attempt perturbing.

Returns:

A dictionary containing information about the perturbation attempt. The dictionary contains the following fields:

”potential_reward”: A float representing the expected reward by perturbing the target group. “altered_image”: A three-dimensional array representing the perturbed image after applying the

group_index group perturbation.

”prob_before”: A float representing the probability of the perturbed image to be classified correctly

by the target model before applying the group_index group perturbation.

”prob_after”: A float representing the probability of the perturbed image to be classified correctly

by the target model after applying the group_index group perturbation.

”pred_after”: An array of floats representing the probability distribution of the perturbed image as

output by the target model after applying the group_index group perturbation.

is_perturbed()[source]
Returns:

A boolean representing whether the adversarial attack has been successful

run_adversarial_attack()[source]

Runs the adversarial attack.

Returns:

An integer representing the number of attack steps until either the attack was successful or the maximum steps threshold was reached.

select_group()[source]

This is the core method that trades off between exploration and exploitation, as expected in classic epsilon-greedy strategies. Here, exploration is represented by sampling a random group of pixels, while exploitation means selecting a group of pixels with the highest average reward observed so far.

update(chosen_group, reward)[source]

Updates a pixel group chosen_group according to an observed reward. This will update the corresponding group value and count of historical observation. This updates the instance fields _values and _counts.

Parameters:
  • chosen_group – An integer representing the index of the pixel group that will get updated after a new reward was observed.

  • reward – The reward used to update the pixel group value and count.

Returns:

A float representing the updated value of the historical average reward of the chosen group.