DevDocDeletionMitigationPlan

From StatSVN

Contents

Deletion Mitigation Plan

Problem statement

Proposed solution

  • StatSVN will produce statistics only for the files which are in the current revision.

Consequences

  1. Not exactly the same behaviour as StatCVS. Our metrics will not consider any work on deleted files or replaced files, before they are replaced.
  2. Performance will be improved. No longer querying dead files.
  3. No more problems with binary files or directories, since they can be read from the working copy.
  4. Simplified / cleaner code.

Implementation issues

  • To implement this solution, we need the following:
    • We are producing statistics on live files only. If we see deletion in the repository, we ignore this file completely.
    • The file must exist in the checked out version. (The working folder must be a clean representation of what is in the repository).
  • Therefore, I recommend we read svn info at startup AND verify that all files exist on disk.
  • Because svn log doesn't show specific deletion entries if one deletes a folder, we must take this into consideration. This imposes a two-pass process because we don't know what files existed in the deleted folder until we encounter these later in the log (because revisions are listed in reverse order)
  • Alternative 1
    • Phase 1: Do like now. Log pairs of deletion path with revision number. ("/path/folder", 505) and no longer mark files as binary here.
    • Phase 2:
 
for each deletion in deletions {
   for each filebuilder in filebuilders {
      if filebuilder.isDead() then continue;
      if (filebuilder.path.indexOf(deletion.path + "/")==0) {
         if (filebuilder.latestrevision <= deletion.revision)  
            insert deletion revision at appropriate location in filebuilder and mark as dead;
         else
            insert deletion revision at appropriate location in filebuilder;
      }
      else if (filebuilder.path == deletion.path) {
          // deletion revision has already been added
      } 
   }
}
for each filebuilder in filebuilders {
   if (!filebuilder.isDead() && !filebuilder.isDirectory()) 
      add to good filebuilder list;
   else
      continue;
   filebuilder.switch revisions that are replacements to additions // possibly already done like this. 
   filebuilder.remove all revisions after an add. 
}

replace filebuilders with good filebuilder list

for each filebuilder in filebuilders {
   set is binary on filebuilder (from current revision);
}
  • Alternative 2
    • Phase 1: Do like now. No longer mark files as binary here.
    • Phase 2:
for each filebuilder in filebuilders()
{
   if (filebuilder.path exists in working directory && !filebuilder.isDirectory()) 
      add to good filebuilder list;
   else
      continue;
   filebuilder.switch revisions that are replacements to additions // possibly already done like this. 
   filebuilder.remove all revisions after an add. 
}

replace filebuilders with good filebuilder list

for each filebuilder in filebuilders {
   set is binary on filebuilder (from current revision);
}

Committed Solution

  • I planned on implementing alternative 2 but ran into some issues that made me implement solution that takes ideas from both alternatives.
  • After implementing alternative 2, I discovered I could simply not make a filebuilder during the logfile parsing if the file did not exist. (better performance)
  • I also realized I could skip building revisions if the last revision was already an addition, if I treated replacements as additions.
    • I had to create a stateAdd in RevisionData because I could not use the hasNoLines to achieve this purpose (since we fill in line counts later).
  • I also realized that additions can be implicit. For example, copying a whole directory from one branch to another can add only one line to the svn log (for the directory instead of per file).
  • Therefore, I get a list of the directories and check to see if files were replaced by a directory copy during their lifespan. If so, add a new revision at the appropriate location and get rid of the revisions before it.
  • All appears well now.