1.

How do you handle unbalanced class issue in a binary classification context? Explain briefly.

Answer»

Response: 

Imbalanced DATA usually refers to a problem with classification problems where the classes are not represented EQUALLY. For example, in a credit card fraud detection scenario, we may have a 2-class (binary) classification problem with 100 instances (rows). A total of 95 instances are labelled with Class-1 which are genuine transactions and the remaining 5 instances are labelled with Class-2 which are fraudulent transactions. 

This is an imbalanced dataset and the ratio of Class-1 to Class-2 instances is 95:5. You can have a class imbalance problem on two-class classification problems as well as multi-class classification problems.  

We can handle it in various ways. 

  1. One way to see if we can COLLECT more data, to make the imbalance cases more balance may be 75:25 split etc. 
  2. Try to resample your dataset, i.e. over-sampling or under-sampling. We can include copies of instances from the under-represented class called over-sampling. This is similar to formally sampling with REPLACEMENT. Secondly, we can delete instances from the over-represented class, called under-sampling. 
  3. We can try changing the performance metric while evaluating the model. Accuracy is not the metric to use when working with imbalanced classes like this example here. Other metrics such as Precision, Recall, F1score, Confusion matrix, etc. can be looked into. 
  4. We can also try experimenting with different algorithms to see how outcomes differ. 

Lot of different aspects can be looked at. All of these vary based on the context, dataset and domain also that we are analyzing on. 



Discussion

No Comment Found