using adversarial attacks to confuse deep-chicken-terminator
These blocks are made by starting out with a dark grey image and then backpropagating on the image with the pre trained network with a negative epsilon in order to minimise loss for the target class, a more negative epsilon will not necessarily give a better result. But it's a bell curve instead, and the epsilon is optimized by looking for the local target class probability maxima in the domain [lower_limit
, 0
)
These adversarial blocks can be generated for any animal class.
sign(data_gradients)
gives the element wise signs of the data gradientepsilon
defines the "strength" of the perturbation of the imageIn a nutshell, instead of optimizing the model to reduce the loss, we're un-optimizing the input image to maximise loss.
This works primarily because of the piecewise linear nature of deep neural networks. For example, look at ReLU or at maxout functions, they're all piecewise linear. Even a carefully tuned sigmoid has an approximate linear nature when taken piecewise.
With varying values of epsilon, we will see an approximately linear relationship between "confidence" and epsilon.
gradient * epsilon
to each of those pixels. This made the image deviate further and further away from the class it actually belongs to and thus maximising loss in the process. Note that this was done with a positive epsilon valueBut for our current objective, we will try to "optimize" the image to a different class. This can be done by:
x
and with a label of y
. Where y
is the class to which we want to convert our image to.y
, and with a sufficiently negative epsilon value, the image gets mis-classified as the target class.If you didn't read the boring stuff above, just remember that