Skip to main content



Semi-supervised text classification using doc2vec and label spreading

·2 mins
Here is a simple way to classify text without much human effort and get a impressive performance. It can be divided into two steps: Get train data by using keyword classification Generate a more accurate classification model by using doc2vec and label spreading Keyword-based Classification #Keyword based classification is a simple but effective method.

Parameters in doc2vec

·2 mins
Here are some parameter in gensim’s doc2vec class. window #window is the maximum distance between the predicted word and context words used for prediction within a document. It will look behind and ahead. In skip-gram model, if the window size is 2, the training samples will be this:(the blue word is the input word)