Machine Learning:Multi Label Classification

Q. Difference between Multiclass Classification and Multilabel Classification.?

Ans.Difference between multi-class classification & multi-label classification is that in multi-class problems the classes are mutually exclusive, whereas for multi-label pro
blems each label represents a different classification task, but the tasks are somehow related.

Overview of Multi-Label Classification:


    Multi-label classification originated from the investigation of text categorisation problem, where each document may belong to several predefined topics simultaneously.

    Multi-label classification of textual data is an important problem. Examples range from news articles to emails. For instance, this can be employed to find the genres that a movie belongs to, based on the summary of its plot.

    For example, multi-class classification makes the assumption that each sample is assigned to one and only one label: a fruit can be either an apple or a pear but not both at the same time. Whereas, an instance of multi-label classification can be that a text might be about any of religion, politics, finance or education at the same time or none of these.

    Problem Definition & Evaluation Metrics:


    Evaluation Metrics:
    The evaluation measures for single-label are usually different than for multi-label. Here in single-label classfication we use simple metrics such as precision, recall, accuracy, etc,. Say, in single-label classification, accuracy is just:



    Image for post




    In multi-label classification, a misclassification is no longer a hard wrong or right. A prediction containing a subset of the actual classes should be considered better than a prediction that contains none of them, i.e., predicting two of the three labels correctly this is better than predicting no labels at all.

    Micro-averaging & Macro-averaging (Label based measures):

    To measure a multi-class classifier we have to average out the classes somehow. There are two different methods of doing this called micro-averaging and macro-averaging.

    In micro-averaging all TPs, TNs, FPs and FNs for each class are summed up and then the average is taken
    .
    Image for post
    In micro-averaging method, you sum up the individual true positives, false positives, and false negatives of the system for different sets and the apply them. And the micro-average F1-Score will be simply the harmonic mean of above two equations.
    Macro-averaging is straight forward. We just take the average of the precision and recall of the system on different sets.

    Image for post

    Macro-averaging method can be used when you want to know how the system performs overall across the sets of data. You should not come up with any specific decision with this average. On the other hand, micro-averaging can be a useful measure when your dataset varies in size.


    Hamming-Loss (Example based measure):

    In simplest of terms, Hamming-Loss is the fraction of labels that are incorrectly predicted, i.e., the fraction of the wrong labels to the total number of labels.


    Image for post



    Exact Match Ratio (Subset accuracy):

    It is the most strict metric, indicating the percentage of samples that have all their labels classified correctly.

    Image for post
    Fig-7: Exact Match Ratio
    The disadvantage of this measure is that multi-class classification problems have a chance of being partially correct, but here we ignore those partially correct matches.
    There is a function in scikit-learn which implements subset accuracy, called as accuracy_score.

    Note: We will be using accuracy_score function to evaluate all our models in this project.



    Techniques for Solving a Multi-Label classification problem:

    Basically, there are three methods to solve a multi-label classification problem, namely:

    1. Problem Transformation
    2. Adapted Algorithm
    3. Ensemble approaches

    4.1 Problem Transformation

    In this method, we will try to transform our multi-label problem into single-label problem(s).

    This method can be carried out in three different ways as:

    1. Binary Relevance
    2. Classifier Chains
    3. Label Powerset

    4.1.1 Binary Relevance

    This is the simplest technique, which basically treats each label as a separate single class classification problem.

    For example, let us consider a case as shown below. We have the data set like this, where X is the independent feature and Y’s are the target variable.


    In binary relevance, this problem is broken into 4 different single class classification problems as shown in the figure below.

    We don’t have to do this manually, the multi-learn library provides its implementation in python. So, let’s us quickly look at its implementation on the randomly generated data.

    Now, in a multi-label classification problem, we can’t simply use our normal metrics to calculate the accuracy of our predictions. For that purpose, we will use accuracy score metric. This function calculates subset accuracy meaning the predicted set of labels should exactly match with the true set of labels.

    Pros and cons
    It is most simple and efficient method but the only drawback of this method is that it doesn’t consider labels correlation because it treats every target variable independently.

    4.1.2 Classifier Chains

    In this, the first classifier is trained just on the input data and then each next classifier is trained on the input space and all the previous classifiers in the chain.  

    Let’s try to this understand this by an example. In the dataset given below, we have X as the input space and Y’s as the labels.

    In classifier chains, this problem would be transformed into 4 different single label problems, just like shown below. Here yellow colored is the input space and the white part represent the target variable.


    This is quite similar to binary relevance, the only difference being it forms chains in order to preserve label correlation. So, let’s try to implement this using multi-learn library.

    4.1.3 Label Powerset

    In this, we transform the problem into a multi-class problem with one multi-class classifier is trained on all unique label combinations found in the training data.

    Let’s understand it by an example.

    In this, we find that x1 and x4 have the same labels, similarly, x3 and x6 have the same set of labels. So, label powerset transforms this problem into a single multi-class problem as shown below.

    So, label powerset has given a unique class to every possible label combination that is present in the training set.

    4.2 Adapted Algorithm

    Adapted algorithm, as the name suggests, adapting the algorithm to directly perform multi-label classification, rather than transforming the problem into different subsets of problems.

    For example, multi-label version of kNN is represented by MLkNN. So, let us quickly implement this on our randomly generated data set.

    4.3 Ensemble Approaches

    Ensemble always produces better results. Scikit-Multilearn library provides different ensembling classification functions, which you can use for obtaining better results.

    5. Case Studies

    Multi-label classification problems are very common in the real world. So, let us look at some of the areas where we can find the use of them.

    1. Audio Categorization

    We have already seen songs being classified into different genres. They are also been classified on the basis of emotions or moods like “relaxing-calm”, or “sad-lonely” etc.

    Source: link

    2. Image Categorization

    Multi-label classification using image has also a wide range of applications. Images can be labeled to indicate different objects, people or concepts.

    3. Bioinformatics

    Multi-Label classification has a lot of use in the field of bioinformatics, for example, classification of genes in the yeast data set.

    It is also used to predict multiple functions of proteins using several unlabeled proteins. You can check this paper for more information.

     

    4. Text Categorization

    You all must once check out google news. So, what google news does is, it labels every news to one or more categories such that it is displayed under different categories. For example, take a look at the image below.

     

    That same news is present under the categories of India, Technology, Latest etc. because it has been classified into these different labels. Thus making it a multi label classification problem.

    There are plenty of other areas, so explore and comment down below if you wish to share it with the community.

     

    6. End Notes

    In this article, I introduced you to the concept of multi-label classification problems. I have also covered the approaches to solve this problem and the practical use cases where you may have to handle it using multi-learn library in python.
    I hope this article will give you a head start when you face these kinds of problems. If you have any doubts/suggestions, feel free to reach out to me below!



    Comments

    Popular posts from this blog

    Presentation_Rashmi

    MySQL : Structured Query Language

    spoken