Posts

2018

Deploy Nikola Org Mode on Travis

3 November 2018·3 mins

Recently, I enjoy using Spacemacs, so I decided to switch to org file from Markdown for writing blog. After several attempts, I managed to let Travis convert org file to HTML. Here are the steps.

Install Org Mode plugin #

First you need to install Org Mode plugin on your computer following the official guide: Nikola orgmode plugin.

Using Chinese Characters in Matplotlib

4 October 2018·1 min

After searching from Google, here is easiest solution. This should also works on other languages:

import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format = 'retina'

import matplotlib.font_manager as fm
f = "/System/Library/Fonts/PingFang.ttc"
prop = fm.FontProperties(fname=f)

plt.title("你好",fontproperties=prop)
plt.show()

Output:

LSTM and GRU

22 April 2018·1 min

LSTM #

The avoid the problem of vanishing gradient and exploding gradient in vanilla RNN, LSTM was published, which can remember information for longer periods of time.

Here is the structure of LSTM:

Models and Architectures in Word2vec

5 January 2018·3 mins

Generally, word2vec is a language model to predict the words probability based on the context. When build the model, it create word embedding for each word, and word embedding is widely used in many NLP tasks.

Models #

CBOW (Continuous Bag of Words) #

Use the context to predict the probability of current word. (In the picture, the word is encoded with one-hot encoding, \(W_{V*N}\) is word embedding, and \(W_{V*N}^{’}\), the output weight matrix in hidden layer, is same as \(\hat{\upsilon}\) in following equations)

2017

Semi-supervised text classification using doc2vec and label spreading

10 September 2017·2 mins

Here is a simple way to classify text without much human effort and get a impressive performance.

It can be divided into two steps:

Get train data by using keyword classification
Generate a more accurate classification model by using doc2vec and label spreading

Keyword-based Classification #

Keyword based classification is a simple but effective method. Extracting the target keyword is a monotonous work. I use this method to automatic extract keyword candidate.

Parameters in doc2vec

3 August 2017·2 mins

Here are some parameter in gensim’s doc2vec class.

window #

window is the maximum distance between the predicted word and context words used for prediction within a document. It will look behind and ahead.

Brief Introduction of Label Propagation Algorithm

16 July 2017·2 mins

As I said before, I’m working on a text classification project. I use doc2vec to convert text into vectors, then I use LPA to classify the vectors.

LPA is a simple, effective semi-supervised algorithm. It can use the density of unlabeled data to find a hyperplane to split the data.

Enable C Extension for gensim on Windows

10 June 2017·1 min

These days, I’m working on some text classification works, and I use gensim ’s doc2vec function.

When using gensim, it shows this warning message:

C extension not loaded for Word2Vec, training will be slow.

I search this on Internet and found that gensim has rewrite some part of the code using cython rather than numpy to get better performance. A compiler is required to enable this feature.

I hope this can enforce myself to review what I have learned, and it would even be better if someone can benefit from it.