BookmarkSubscribe
Choose Language Hide Translation Bar
Rajat
Contributor

Concatenate Multiple tables with different column order

Team,

I have 5 csv files each having 200column (not each file have same number of columns). Each file has columns in different order. I want to write a scirpt which will let user to select all 5 files then combine all 5 files (dropping mismatched columns) and save the combined csv file to the required location. I want to use command similar to (data = Open(Pick File("Choose the File", x)), x is the directory), to pick the files.

 

Thanks

0 Kudos
1 ACCEPTED SOLUTION

Accepted Solutions
txnelson
Super User

Re: Concatenate Multiple tables with different column order

Here is a rework of the script, selecting all of the files to be concatenated in one selection, and then running a little script to put them together, keeping only columns that find a match.  I have built it to your specifications, however it could easily be changed to allow the concatenation of any number of files

Names Default To Here( 1 );

inputDirectory = "C:\";

// Select the files to be read in......this can be any number of files
filePath = Pick File(
	"Choose the File to be Concatenated to:",
	inputDirectory,
	{"CSV Files|csv;", "All Files|*"},
	1,
	0,
	""
);
// Open the first file
dtMain = Open( filePath, invisible );


files = Pick File(
	"Choose the Additional 4 Files to Read In:",
	inputDirectory,
	{"CSV Files|csv;", "All Files|*"},
	1,
	0,
	"",
	multiple
);

// Validate that files have been read in
If( Try( N Items( files ), 0 ) == 4,
	Dialog( "Incorrect number of files were read in."
		"Only 4 additional files can be read in");
	Throw( "Incorrect number of files were read in" );
);

// Read in the remaining files and process them
For( i = 2, i <= N Items( Files ), i++,
	Try( dtConCat = Open( Files[i], invisible ) );
	
	mainColList = dtMain << Get Column Names( string );
	
	conCatColList = dtConCat << Get Column Names( string );
	
	// Delete the columns in the dtConCat table that do not exist in the dtMain table
	delList = {};
	For( k = 1, k <= N Items( conCatColList ), k++,
		If( N Rows( Loc( mainColList, conCatColList[k] ) ) == 0,
			Insert Into( delList, conCatColList[k] )
		)
	);
	If( N Items( delList ) > 0,
		dtConCat << delete columns( delList )
	);

	// Delete the columns in the dtMain table that were not found in the dtConCat table
	delList = {};
	For( k = 1, k <= N Items( mainColList ),
		k++,
		If( N Rows( Loc( conCatColList, mainColList[k] ) ) == 0,
			Insert Into( delList, mainColList[k] )
		)
	);
	If( N Items( delList ) > 0,
		dtMain << delete columns( delList )
	);
	
	// If any columns still remain in the two tables, then concatenate them together
	If( N Cols( dtMain ) > 0 & N Cols( dtConCat ) > 0,
		dtMain = dtMain << Concatenate( dtConCat, Append to First Table )
	);
	
	// Close the no longer needed file
	Close( dtConCat, nosave );
	
);

// Make the main file visible
dtMain << show window( 1 );

 

Jim
11 REPLIES 11
uday_guntupalli
Community Trekker

Re: Concatenate Multiple tables with different column order

@Rajat,
          Looking at the problem that you have described, let us break it down into multiple steps. 

 

Step 1: Selecting the desired files - using Pick File () 

            

// Define Your Directory and a list to hold File Paths 
MyDirectory = "C:\" ;
nFiles = 5; 
MyFilePaths = {}; 
OpenDts = {}; 

// Test if Directory exists and get all file paths
Try( If(Directory Exists(MyDirectory),
           for(i = 1 , i <= nFiles, i ++,
                FilePath = Pick File( "Select File", MyDirectory);
                Insert Into(MyFilePaths,FilePath); 
               ); 
,
// else
Throw()
), "Unable to Open File"; );

 

 

Step 2: Looping through the files, opening and concatenating common columns 
Here, I have a question for you. Is there a list of columns that you expect in the final table (or) is it just the common columns across all 5 csv files ? 

Best
Uday
0 Kudos
Rajat
Contributor

Re: Concatenate Multiple tables with different column order

Thanks Uday. I want to concatenate the common columns.
0 Kudos
uday_guntupalli
Community Trekker

Re: Concatenate Multiple tables with different column order

@Rajat
      I am making an assumption (I don't know how valid it is ) that you have the same number of rows. 

 

// Define Your Directory and a list to hold File Paths 
MyDirectory = "C:\" ;
nFiles = 5; 
MyFilePaths = {}; 
ColNamesListOfList = {};
OpenDts = {}; 

// Test if Directory exists and get all file paths
Try( If(Directory Exists(MyDirectory),
           for(i = 1 , i <= nFiles, i ++,
                FilePath = Pick File( "Select File", MyDirectory);
                Insert Into(MyFilePaths,FilePath); 
               );           
            ,          
            // else          
            Throw()         
        ),
       "Unable to Open File";
   ); 
   
// Get List of All Column Names 
for(i = 1, i <= nFiles, i++,
		OpenDts[i] = Open(MyDirectory || "\" || FilePath[i],"Invisible");
		ColNames = OpenDts[i] << Get Column Names("String"); 
		Insert Into(ColNamesListOfList,ColNames);
   );
   
// Find Common Column Names 
for(i = 1, i <= nFiles, i++,
		ThisList = ColNamesListOfList[i];
		ThisListAsAA = Associative Array(ThisList); 
		ThisListAsAAKeys =ThisListAsAA << Get Keys; 
		If(i == 1,
			CommonItems = ThisListAsAAKeys;
			,
			// else 
			CommonItems << Intersect(ThisListAsAAKeys);
		  );
   );

// Concatenate 
for(i = 1, i <= nFiles, i++,
	If(i == 1,
		dt_Results = OpenDts[i]; 
		, 
		dt_Results << Join(With( OpenDts[i]),
							Merge Same Name Columns,
							Select( :name, :age, :sex, :height, :weight ),
							By Row Number  // assuming that the tables have the same number of rows
						  );
	  );
   );	
Best
Uday
0 Kudos
Rajat
Contributor

Re: Concatenate Multiple tables with different column order

@uday_guntupalli
Thanks for your great help.

I have few comments

  • All tables have different number of rows.
  • I want to concatenate (number of rows in final table will be sum of number of rows of individual table) the tables not to join the tables. Will the concatenate loop works same?

Also, could you please help me understand the following commands.

  1. In loop for finding Common Column Names. Can you please explaing following lines.
    ThisListAsAA = Associative Array(ThisList); 
    ThisListAsAAKeys =ThisListAsAA << Get Keys; 
    	If(i == 1,
    		CommonItems = ThisListAsAAKeys;
    		,
    		// else 
    		CommonItems << Intersect(ThisListAsAAKeys);
    	  );
    As much as I understood in "ThisListAsAA" , "ThisList" will get stored as associative array. Please help to understand next two lines.
  2. In loop to join the tables, can you please explain following lines.
    Select( :name, :age, :sex, :height, :weight ),
    By Row Number
    There are no columns with columnnames :name, :age etc and manually we can't write to select columns as number of are huge. Also We can't join by row number as number of rows are different and we want to concatenate the table not join.

Thanks

0 Kudos
uday_guntupalli
Community Trekker

Re: Concatenate Multiple tables with different column order

@Rajat
          

ThisListAsAA = Associative Array(ThisList); 
ThisListAsAAKeys =ThisListAsAA << Get Keys; 
	If(i == 1,
		CommonItems = ThisListAsAAKeys;
		,
		// else 
		CommonItems << Intersect(ThisListAsAAKeys);
	  );

An associative array has a one on one relationship or mapping. So once we construct an associative array, the second line is attempting to extract the unique keys in the associative array map. As for the If block, think of set theory, when i = 1, we are assigning the unique keys of Associative Array 1 to the result. From there on, we are repeatedly interecting it with other Associative Arrays to finally arrive at a list that will only have elements that are common across all the associative arrays. 

 

Select( :name, :age, :sex, :height, :weight ),
By Row Number

This was a typo from my example. Select(CommonItems) would be appropriate. 

 

I don't know if Concatenate will work, you are welcome to try

Best
Uday
0 Kudos
Rajat
Contributor

Re: Concatenate Multiple tables with different column order

Thanks @uday_guntupalli
If we can reorder columns and make them have same order. After that we will concatenate all the tables.
Can you help me to write syntax to reorder the columns of all tables based on first table then we will use concatenate command in a loop. If its not possible can we save the original column order of first table, then reorder all tables with their name and then concatenate in a loop. Finally we will reorder the final table based on earlier saved order of columns.
@Craige_Hales @txnelson do you have any solution for this problem.

Thanks. Appreciate you help.
0 Kudos
uday_guntupalli
Community Trekker

Re: Concatenate Multiple tables with different column order

@Rajat
     

dt << Reorder By Name();
Best
Uday
0 Kudos
Rajat
Contributor

Re: Concatenate Multiple tables with different column order

@uday_guntupalli
Can help me to write the syntax to save the original order and reorder the final table with the saved order?
Thanks
0 Kudos
uday_guntupalli
Community Trekker

Re: Concatenate Multiple tables with different column order

@Rajat
      I don't understand why you would need to create a new table. I would do something like this. 

dt = Open( "$SAMPLE_DATA/Big Class.jmp" );
OriginalColOrder = dt << Get Column Names("string");
dt << Reorder By Name();
SortedColOrder = dt << Get Column Names("string");
Show(OriginalColOrder);
Show(SortedColOrder); 

     The lists provide you both the sorted order and original order. Also, please share what you have tried thus far, so it is easier to see where you are headed. I am unable to exactly understand what you have tried thus far and where you are failing. If you are unable to share your data, try to create some sample data, so it is easy for everyone to recreate the issues and help guide you. 

Best
Uday
0 Kudos