Esquire Theme by Matthew Buchanan
Social icons by Tim van Damme

hit counter
hit counter

06

Dec

What data structures/matching algorithims does vimdiff use?

Answer by Anirudh Joshi:

Diffs are part of the Longest common subsequence problem (http://en.wikipedia.org/wiki/Longest_common_subsequence_problem). This is an NP-hard problem that basically tries to find continuous sequences in arbitrary data - i.e. many source files/DNA shot gun sequencing.

However diffs only operates on 2 sequences reducing the complexity of the search to O(m*n) with m/n being the length of the files (thanks to Peter Scott).

I’m pretty sure vimdiff uses the standard diff tool (http://en.wikipedia.org/wiki/Diff#Algorithm) to attack this problem:

This only works when a standard “diff” command is available.  See ‘diffexpr’.

Source: http://vimdoc.sourceforge.net/htmldoc/diff.html

On Unix-based systems, Vim should work without problem because there should be a “standard” diff program available

Source: http://vim.wikia.com/wiki/Running_diff

Diff mainly uses the algorithm described in the paper, An O(ND) Difference Algorithm and its Variations (http://www.xmailserver.org/diff2.pdf), found from http://stackoverflow.com/questions/805626/diff-algorithm.

Other references are included in the source code found here http://ftp.gnu.org/gnu/diffutils/ :

The basic algorithm is described in “An O(ND) Difference Algorithm and its Variations”, Eugene W. Myers, 'Algorithmica’ Vol. 1 No. 2, 1986, pp. 251-266; and in “A File Comparison Program”, Webb Miller and Eugene W. Myers, 'Software–Practice and Experience’ Vol. 15 No. 11, 1985, pp. 1025-1040. The algorithm was independently discovered as described in “Algorithms for Approximate String Matching”, E. Ukkonen, `Information and Control’ Vol. 64, 1985, pp. 100-118.

Code for various languages can be found at http://code.google.com/p/google-diff-match-patch/

More discussion here: http://c2.com/cgi/wiki?DiffAlgorithm
View Answer on Quora