Introduction to SVM, hyperplane, TF-IDF and BoW

Upasana | August 05, 2019 | 2 min read | 217 views


Explain SVM (Support Vector Machine)

Support Vector Machine algorithm is also known as its short form, SVM. SVM is advanced extension of an algorithm known as support vector classifier, which is advanced extension of maximal marginal classifier. If you are aware of random forest then you must know that Random forest is improvised extension of Decision trees & Bagging. SVM helps in classifying the probelm sets where boundary is going te be very musch defined and it performs much better in binary classification problems. It uses hyperplane to define the boundary or say it finds the hyperplane which miximizes the margine between two labels.

What is hyperplane?

Hyperplane is a subspace of which dimension is always one less dimensional than the space. Lets say, we are in 3-D vector space then hyperplane will be a 2-D vector sub space.

Why didn’t you normalise dataset?

The project was a classification based project. Features being used had a nice correlation factor without standardizing and normalising them so was no need to normalize data. FYI, Normalization is not always necessary and there is difference between standardization & normalization.

How to build a model on textual data?

Since, we have textual data which is not acceptable by algorithsm as it is not in form of numbers, so we convert text based data into vectors format with fixed length.

How to convert text data to vector format?

We can use algorithms like Bag-of-Words & TF-IDF for convert text to vectors.

What is tf-idf?

Full form of TF-IDF is Term Frequency - Inverse Document Frequency. TF part of algorithms makes sure that vectors have the words which are frequent in the text and IDF makes sure to remove the words which have frequently occurred across all the text data. So in conclusion, TF-IDF finds out the words which refer to the context of the text and then convert it into fixed length vector format.

What is difference between Bag of words and tf-idf?

TF part of algorithms makes sure that vectors have the words which are frequent in the text and IDF makes sure to remove the words which have frequently occurred across all the text data. So in conclusion, TF-IDF finds out the words which refer to the context of the text.

Whereas Bag-of-Words (BoW) just works on assigning a unique number to every words and finding out the frequency of occurrence of word in the text and converting the text into fixed length vector format.


Top articles in this category:
  1. SVM after LSTM deep learning model for text classification
  2. Top 100 interview questions on Data Science & Machine Learning
  3. Flask Interview Questions
  4. Connect to MySQL with Python 3.x and get Pandas Dataframe
  5. Introduction to Sorting Algorithms
  6. Introduction to Python 3.6 & Jupyter Notebook
  7. Introduction to regression, correlation, multi collinearity and 99th percentile

Recommended books for interview preparation:

Find more on this topic:
Buy interview books

Java & Microservices interview refresher for experienced developers.