Firstly, you can find the project poster here!
The core idea of this project was to derive local robustness certificates for image patches by mapping each patch to its nearest neighbor in a fixed vocabulary (of image patches). In this case, a local certificate specifies the maximum L2 norm of any perturbation you can apply to a specific patch, for which this patch’s nearest neighbor does not flip. If all patches are guaranteed not to flip, the input to the downstream model is guaranteed to stay constant, and therefore the model output is guaranteed to be constant.
Advantages of this approach over traditional approaches like randomized smoothing are that it allows overall tighter certificates. (Although they are heavily spatially constrained. Randomized smoothing yields a single (float) certificate per image. Our approach results in one certificate per image patch.) Additionally, since the approach essentially guarantees constant model input, it is independent of the model type (classification / regression). The approach also offers some protection against evasive, gradient-based attacks.
The main drawback is that certificate tightness and model performance heavily depend on the vocabulary’s quality. We experimented with two types of vocabularies. Firstly, learned vocabularies, which means optimization of reconstruction loss over a fixed set of image patches. Secondly, noise vocabularies, which initially consisted of a very large number of noise patches and got pruned by usage counts over a reconstruction of the training set. Although we conducted many experiments, we were not able to find a way to learn decent RGB vocabularies, which caused the downstream performance to suffer. Noise vocabularies resulted in less tight certificates, and even though the performance drop they caused was less severe than that of learned vocabularies, it was still not acceptable.
My conclusion is that the project’s general idea is interesting and worth some more investigation. Unfortunately, our experiments were heavily limited due to time and resource constraints. But if one could find a way to learn decent vocabularies, the framework offers protection against evasive attacks and allows deriving very tight (but locally constrained!) robustness certificates for images.