Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
Choose Language Hide Translation Bar
Staff (Retired)
Compare two files

If you've been writing JSL for a while, you've probably saved multiple copies of your script and eventually wondered about the differences between the scripts. There is not a great way to answer the question built in to Windows, so here's a way to do it in JSL, using the Shortest Edit Script function (that does almost all the work).

CompareTextFiles = Function( {pathA, pathB},
	{textA, textB, v, gray, red, green, ipart, parts, kind, text, vv, bb},
	textA = Load Text File( pathA );
	textB = Load Text File( pathB );

	parts = Shortest Edit Script( lines( textA, textB, ignorewhitespace() ) );
	v = V List Box();
	gray = RGB Color( .9, .9, .9 );
	red = RGB Color( 1, .7, .7 );
	green = RGB Color( .7, 1, .7 );
	For( ipart = 1, ipart <= N Items( parts ), ipart += 1,
		{kind, text} = parts[ipart];
		bb = MouseBox(
			Border Box( Left( 10 ), Right( 100 ), 
				// trim removes a final CR that double spaces the joint
				// between text boxes in the vlistbox. the 3 spaces added
				// back hides a rendering error that clips the last chars
				// in a line...sometimes.
				Text Box( Trim( text, right ) || "   ", <<setwrap( 1000 ) )
		If( kind != "Common",
				kind == "Remove",
					bb << backgroundcolor( red ) << setTooltip( "removed from " || pathA )//
			, /*else if*/kind == "Insert",
					bb << backgroundcolor( green ) << setTooltip( "inserted from " || pathB )//
			, // else
				Throw( "what? " || Char( kind ) )
		v << append( bb );
	v << backgroundcolor( gray );
	v;// return	

The function expects two file names and returns a vlist box suitable for embedding into a window. The vlist box holds a bunch of textboxes drawn on top of a gray, red, or green background. The edit script returned by the Shortest Edit Script function is a list of commands: keep data common to both strings (gray), delete data from the first string (red), insert data from the second string (green). Using just those three commands, the script is "a" (not "the") shortest possible script to convert string A to string B. 

The MouseBox adds a tool tip to the red and green sections explaining which file lost that data or added that data. 

The Shortest Edit Script has some other parameters (not used here) that will be useful for longer files, especially the limit parameter. By setting a reasonable limit of 500 or 1000 differences, you can just say "these files are really different" without spending too much time waiting for a really slow match up process that is unlikely to produce interesting results. The function is called with the Ignore White Space parameter because I'm not interested in lines that are indented differently.

Two versions of Jabberwocky downloaded from the webTwo versions of Jabberwocky downloaded from the web

At least one of those versions was modified from the original.

Script attached.

Article Labels

    There are no labels assigned to this post.

Article Tags