cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
CharlesIII
Level II

Splitting a data table by column id

Hello, 

I have a large data table that the columns have an ID number followed by words like below, 

1-Date1-parameter1-something2-something2-Date2-parameter3-parameter3-Date3-something3-else
          
          

 

My ultimate goal is to automate control charts with Date as the sample label. There are not uniform amounts of columns for each ID and they are not sorted by ascending date. 

 

My idea is to split the large data table into smaller ones by ID and then do control charts from those. However, I do not know how to do this split in an automated way. 

 

Thanks 

 

3 REPLIES 3
txnelson
Super User

Re: Splitting a data table by column id

Without more specifics, I can only give  an approximate direction on how to solve this.  I see 2 methods to go after what you need.  The first is to find out how many different IDs there are and place the column into the different lists for the IDs, doing some thing like this:

Names Default To Here( 1 );
dt = Current Data Table();

colNamesList = dt << get column names( string );

// Create an ID list
listOfIDs = {};

For( i = 1, i <= N Items( colNamesList ), i++,
	If( Contains( listOfIDs, Word( 1, Column( dt, colNamesList[i] ) << get name, "-" ) ) == 0, 
		// Insert into the listOfIDs, the new ID name
		Insert Into( listOfIDs, Word( 1, Column( dt, colNamesList[i] ) << get name, "-" ) );
		// Create a list to hold the column names for the new ID
		Eval( Parse( "listOfColumns_" || Char( N Items( listOfIDs ) ) || " = {};" ) );
		Eval(
			Parse(
				"insert into(listOfColumns_" || Char( N Items( listOfIDs ) ) || "," || Char(
					Column( dt, colNamesList[i] ) << get name
				) || ");"
			)
		);
	, 
		// If the ID has already been see, then just add the column name to the
		// ID list of column names
		Eval(
			Parse(
				"insert into(listOfColumns_" || Char(
					Contains( listOfIDs, Word( 1, Column( dt, colNamesList[i] ) << get name, "-" ) )
				) || "," || Char( Column( dt, colNamesList[i] ) ) << get name || ");"
			)
		)
	)
);

You can then loop through the listOfIDs and do your processing.

 

The other method would be to stack all of the columns, and then create a new column for ID, strip off the ID values from the labels, and then split the columns back into common columns and do your processing using a By statement to separate the different IDs.  Something like

Names Default To Here( 1 );
dt = Current Data Table();

colNamesList = dt << get column names( string );

// Stack the columns
dtStack = dt << Stack(
	columns( colNamesList ),
	Source Label Column( "Label" ),
	Stacked Data Column( "Data" ),
	Drop All Other Columns( 1 )
);

// Create the ID Column
dtStack << New Column("ID", character,formula( word(1,:label,"-")));

// Now delete the ID from the :Label column
For Each Row(
	:Label = trim(substr( :Label, contains(:Label,"-")+1));
);

// Split back to common columns
dtFinal = dtStack << Split(
	Split By( :Label ),
	Split( :Data ),
	Sort by Column Property
);

Now for my clarification.......as I stated, with your sparse details, I am making a lot of guesses on this.  Also, without a sample data table, the code I provided has not been tested, and I assume it would not work on first execution.  It is intended to show a posible direction you can take to solve the problem

Jim
gzmorgan0
Super User (Alumni)

Re: Splitting a data table by column id

Charles,

If your column has a regular series of 3 ( Date, Parameter Name, Data Values ), this can be done via the UI with Main Menu > Tables > Stack and select a multiple series stack, 3, and uncheck stack by row.  It takes a little clean up of the column names, but the resulting table can be used for control charts using Parameter Name, or Data Group as the By variable.

 

Or if you want separate tables, you can use Table Subset on this new table, check Subset By, then specify, Data Group.

 

If your table does not have a repeated series order then a script or a lot of manual work is needed to subset each group. Respond to this post if your data table has an irregular order of columns.

 

I attached a JSL script that should work for your table if it has the repeated series columns and all Dates and all Somethings are the same modeling type (likely numeric, continuous).

 

Names Default To Here(1);

dt = current data table();
cnames = dt << get column names("String");

stk = dt << Stack(
	columns(
	  evalList(cnames)
	),
	Source Label Column( "Label" ),
	Stacked Data Column( "Data" ),
	Stack By Row( 0 ),
	Number of Series( 3 )
);

stk:Data   << Set Name("Date");
stk:Data 2 << Set Name("Parameter Name");
stk:Data 3 << Set Name("Data");
stk:Label  << set each value( trim(word(1,:Label,"-")));
stk:Label  << Set Name( "Data Group");
stk << delete columns(:Label 2, Label 3);

 

gzmorgan0
Super User (Alumni)

Re: Splitting a data table by column id

Jim's solution is the more general solution I mentioned,  if your table does not have a regular pattern of columns. Jum's response only appeared after I hit Post.