Skip to main content

Posts

2020


Import custom package or module in PySpark

·1 min
First zip all of the dependencies into zip file like this. Then you can use one of the following methods to import it. |-- kk.zip | |-- kk.py Using –py-files in spark-submit #When submit spark job, add --py-files=kk.zip parameter. kk.zip will be distributed with the main scrip file, and kk.

Time boundary in InfluxDB Group by Time Statement

·4 mins
These days I use InfluxDB to save some time series data. I love these features it provides: High Performance #According to to it’s hardware guide, a single node will support more than 750k point write per second, 100 moderate queries per second and 10M series cardinality.

C3 Linearization and Python MRO(Method Resolution Order)

·3 mins
Python supports multiple inheritance, its class can be derived from more than one base classes. If the specified attribute or methods was not found in current class, how to decide the search sequence from superclasses? In simple scenario, we know left-to right, bottom to up.

2019


Difference between Value and Pointer variable in Defer in Go

·3 mins
defer is a useful function to do cleanup, as it will execute in LIFO order before the surrounding function returns. If you don’t know how it works, sometimes the execution result may confuse you. How it Works and Why Value or Pointer Receiver Matters #I found an interesting code on Stack Overflow:

Near-duplicate with SimHash

·4 mins
Before talking about SimHash, let’s review some other methods which can also identify duplication. Longest Common Subsequence(LCS) #This is the algorithm used by diff command. It is also edit distance with insertion and deletion as the only two edit operations.

Jaeger Code Structure

·1 min
Here is the main logic for jaeger agent and jaeger collector. (Based on jaeger 1.13.1) Jaeger Agent #Collect UDP packet from 6831 port, convert it to model.Span, send to collector by gRPC Jaeger Collector #Process gRPC or process packet from Zipkin(port 9411).

The Annotated The Annotated Transformer

·4 mins
Thanks for the articles I list at the end of this post, I understand how transformers works. These posts are comprehensive, but there are some points that confused me. First, this is the graph that was referenced by almost all of the post related to Transformer.

Different types of Attention

·1 min
\(s_t\) and \(h_i\) are source hidden states and target hidden state, the shape is (n,1). \(c_t\) is the final context vector, and \(\alpha_{t,s}\) is alignment score. \[\begin{aligned} c_t&=\sum_{i=1}^n \alpha_{t,s}h_i \\ \alpha_{t,s}&= \frac{\exp(score(s_t,h_i))}{\sum_{i=1}^n \exp(score(s_t,h_i))} \end{aligned}\] Global(Soft) VS Local(Hard) #Global Attention takes all source hidden states into account, and local attention only use part of the source hidden states.

Torchtext snippets

·1 min
Load separate files #data.Field parameters is here. When calling build_vocab, torchtext will add <unk> in vocabulary list. Set unk_token=None if you want to remove it. If sequential=True (default), it will add <pad> in vocab. <unk> and <pad> will add at the beginning of vocabulary list by default.

Build Your Own Tiny Tiny RSS Service

·3 mins
After Inoreader change the free plan, which limit the max subscription to 150, I begin to find an alternative. Finally, I found Tiny Tiny RSS. It has a nice website and has the fever API Plugin which was supported by most of the RSS reader app, so you can read RSS on all of you devices.