Adversarial Clustering

Clustering algorithms have been increasingly adopted in security applications to spot dangerous or illicit activities. For instance, clustering of malware and computer virues aims at identifying and categorizing different existing malware families, and to generate specific signatures for their detection by anti-viruses, or signature-based intrusion detection systems like Snort.

However, clustering algorithms have not been originally devised to deal with deliberate attack attempts that may aim to subvert the clustering process itself. Whether clustering can be safely adopted in such settings remains thus questionable.

In a recently proposed work we have defined a general framework that allows one to identify potential attacks against clustering algorithms, and to evaluate their impact, by making specific assumptions on the adversary’s goal, knowledge of the attacked system, and capabilities of manipulating the input data. We have shown that an attacker may significantly poison the whole clustering process by adding a relatively small percentage of attack samples to the input data, and that some attack samples may be obfuscated to be hidden within some existing clusters. Our analysis has been focused on the single-and complete-linkage hierarchical clustering algorithms, but we are devising poisoning and obfuscation attacks that can efficiently target other clustering algorithms as well. We have also recently shown that poisoning can significantly degrade the performance of (real) behavioral malware clustering tools.

To give an idea of how a poisoning attack works, we report a simple example below.


The left plot reports a set of four initial clusters obtained on a two-dimensional data set of 80 samples. The right plot depicts the final clustering obtained after adding 20 attack samples (highlighted with red circles), namely, when the attacker controls only 20% of the data. It can be noted how the number of final clusters has significantly increased, and how points coming from the initial clusters have been clustered in a totally different manner. This shows how our poisoning attack can really subvert the single-linkage clustering process. An animation reporting the attack progress is also available.