Skip to main content

Machine Learning


Near-duplicate with SimHash

·4 mins
Before talking about SimHash, let’s review some other methods which can also identify duplication. Longest Common Subsequence(LCS) #This is the algorithm used by diff command. It is also edit distance with insertion and deletion as the only two edit operations.

The Annotated The Annotated Transformer

·4 mins
Thanks for the articles I list at the end of this post, I understand how transformers works. These posts are comprehensive, but there are some points that confused me. First, this is the graph that was referenced by almost all of the post related to Transformer.

Different types of Attention

·1 min
\(s_t\) and \(h_i\) are source hidden states and target hidden state, the shape is (n,1). \(c_t\) is the final context vector, and \(\alpha_{t,s}\) is alignment score. \[\begin{aligned} c_t&=\sum_{i=1}^n \alpha_{t,s}h_i \\ \alpha_{t,s}&= \frac{\exp(score(s_t,h_i))}{\sum_{i=1}^n \exp(score(s_t,h_i))} \end{aligned}\] Global(Soft) VS Local(Hard) #Global Attention takes all source hidden states into account, and local attention only use part of the source hidden states.

Using Dueling DQN to Play Flappy Bird

·5 mins
PyTorch provide a simple DQN implementation to solve the cartpole game. However, the code is incorrect, it diverges after training (It has been discussed here). The official code’s training data is below, it’s high score is about 50 and finally diverges.


TextCNN with PyTorch and Torchtext on Colab

·3 mins
PyTorch is a really powerful framework to build the machine learning models. Although some features is missing when compared with TensorFlow (For example, the early stop function, History to draw plot), its code style is more intuitive. Torchtext is a NLP package which is also made by pytorch team.


·1 min
LSTM #The avoid the problem of vanishing gradient and exploding gradient in vanilla RNN, LSTM was published, which can remember information for longer periods of time. Here is the structure of LSTM: The calculate procedure are: \[\begin{aligned} f_t&=\sigma(W_f\cdot[h_{t-1},x_t]+b_f)\\ i_t&=\sigma(W_i\cdot[h_{t-1},x_t]+b_i)\\ o_t&=\sigma(W_o\cdot[h_{t-1},x_t]+b_o)\\ \tilde{C_t}&=tanh(W_C\cdot[h_{t-1},x_t]+b_C)\\ C_t&=f_t\ast C_{t-1}+i_t\ast \tilde{C_t}\\ h_t&=o_t \ast tanh(C_t) \end{aligned}\]

Models and Architectures in Word2vec

·3 mins
Generally, word2vec is a language model to predict the words probability based on the context. When build the model, it create word embedding for each word, and word embedding is widely used in many NLP tasks. Models #CBOW (Continuous Bag of Words) #Use the context to predict the probability of current word.


Semi-supervised text classification using doc2vec and label spreading

·2 mins
Here is a simple way to classify text without much human effort and get a impressive performance. It can be divided into two steps: Get train data by using keyword classification Generate a more accurate classification model by using doc2vec and label spreading Keyword-based Classification #Keyword based classification is a simple but effective method.

Parameters in doc2vec

·2 mins
Here are some parameter in gensim’s doc2vec class. window #window is the maximum distance between the predicted word and context words used for prediction within a document. It will look behind and ahead. In skip-gram model, if the window size is 2, the training samples will be this:(the blue word is the input word)

Brief Introduction of Label Propagation Algorithm

·2 mins
As I said before, I’m working on a text classification project. I use doc2vec to convert text into vectors, then I use LPA to classify the vectors. LPA is a simple, effective semi-supervised algorithm. It can use the density of unlabeled data to find a hyperplane to split the data.