Classification of multi-label data

A huge amount of text documents, images, videos and multimedia data is currently available in digital form. Annotating them with semantic labels is necessary for their effective management and retrieval. Since manual annotation has become infeasible, automatic annotation techniques have been the subject of a considerable research effort over the past ten years in the machine learning and pattern recognition communities.
PRA Lab is working on techniques for multi-label classification, which is the task of deciding which labels, among a predefined set, can better describe the content of a given document, image, etc. For instance, a news can be labelled as "sport" and "economy", if it treats both topics; an image can be labelled as "beach" and "sunset", it it depicts a sunset on a beach. Multi-label classification becomes a challenging task, when the number of labels is large, as usually happens in practice. Techniques for multi-label classification can be exploited in a number of relevant applications related to the organization, filtering or mining of large amount of texts, images, etc.

PRA Lab is also working on methods for attaining a trade-off between manual annotation effort and accuracy of automatic annotation, in multi-label classification tasks when classification algorithms do not attain the desired accuracy. To this aim, we allow a multi-label classifier to say "don't know" for one or more labels, when it is uncertain about whether assigning them or not to an input data, so that only uncertain decisions are subsequently handled by human annotators.