Skip to main content

SimHash

2019


Near-duplicate with SimHash

·4 mins
Before talking about SimHash, let’s review some other methods which can also identify duplication. Longest Common Subsequence(LCS) #This is the algorithm used by diff command. It is also edit distance with insertion and deletion as the only two edit operations.