Recreating old versions of documents using stored diffs

arunproff

New Member
Messages
9
Reaction score
1
Points
0
hi,
i'm trying to build a simple wiki using Python (yes I know there are a couple out there). I also want to track document history. But instead of saving each revision of the document, I thought it might consume less space by just storing the diff for each version compared to the more current version.

I have found a python library for calculating the diff between document versions, but can't seem to find anything for reconstructing an old version using the diffs and the current version of the document.

Has anyone come across a library (pure python preferrably) that can do this? Or have a simple algorithm they can post that can do this?

thanks,
arun
 

misson

Community Paragon
Community Support
Messages
2,572
Reaction score
72
Points
48
The way to do it from the command line is `patch`, so perhaps python-patch is the way to do it in python.
 

Mr. DOS

Member
Messages
230
Reaction score
5
Points
18
I would be very careful about just storing the complete original version and then diffs from there on up. If you do incremental diffs (i.e., rev 1 -> rev 2 -> rev 3) then you're going to have to rebuild the entire document from the first revision every time it's called, and if you only diff between the changes and the original, the diffs are probably going to end up being very large. Look into how subversion handles diffs; I believe it keeps a full copy of the most recent revision, then uses backward incremental diffs when the user requests old versions.

--- Mr. DOS
 

xav0989

Community Public Relation
Community Support
Messages
4,467
Reaction score
95
Points
0
My proposition would be to use both complete copies and incremental diffs. For instance, each 50 copies, a complete version is saved, and diffs are used in between. This would shorten the amount of time needed to recover an old diff.
 
Top