Splitting a data table by column id

Report Inappropriate Content · Jul 17, 2018 04:23 PM

Hello,

I have a large data table that the columns have an ID number followed by words like below,

1-Date	1-parameter	1-something	2-something	2-Date	2-parameter	3-parameter	3-Date	3-something	3-else

My ultimate goal is to automate control charts with Date as the sample label. There are not uniform amounts of columns for each ID and they are not sorted by ascending date.

My idea is to split the large data table into smaller ones by ID and then do control charts from those. However, I do not know how to do this split in an automated way.

Thanks

txnelson · Jul 17, 2018 06:46 PM

Without more specifics, I can only give an approximate direction on how to solve this. I see 2 methods to go after what you need. The first is to find out how many different IDs there are and place the column into the different lists for the IDs, doing some thing like this:

Names Default To Here( 1 );
dt = Current Data Table();

colNamesList = dt << get column names( string );

// Create an ID list
listOfIDs = {};

For( i = 1, i <= N Items( colNamesList ), i++,
	If( Contains( listOfIDs, Word( 1, Column( dt, colNamesList[i] ) << get name, "-" ) ) == 0, 
		// Insert into the listOfIDs, the new ID name
		Insert Into( listOfIDs, Word( 1, Column( dt, colNamesList[i] ) << get name, "-" ) );
		// Create a list to hold the column names for the new ID
		Eval( Parse( "listOfColumns_" || Char( N Items( listOfIDs ) ) || " = {};" ) );
		Eval(
			Parse(
				"insert into(listOfColumns_" || Char( N Items( listOfIDs ) ) || "," || Char(
					Column( dt, colNamesList[i] ) << get name
				) || ");"
			)
		);
	, 
		// If the ID has already been see, then just add the column name to the
		// ID list of column names
		Eval(
			Parse(
				"insert into(listOfColumns_" || Char(
					Contains( listOfIDs, Word( 1, Column( dt, colNamesList[i] ) << get name, "-" ) )
				) || "," || Char( Column( dt, colNamesList[i] ) ) << get name || ");"
			)
		)
	)
);

You can then loop through the listOfIDs and do your processing.

The other method would be to stack all of the columns, and then create a new column for ID, strip off the ID values from the labels, and then split the columns back into common columns and do your processing using a By statement to separate the different IDs. Something like

Names Default To Here( 1 );
dt = Current Data Table();

colNamesList = dt << get column names( string );

// Stack the columns
dtStack = dt << Stack(
	columns( colNamesList ),
	Source Label Column( "Label" ),
	Stacked Data Column( "Data" ),
	Drop All Other Columns( 1 )
);

// Create the ID Column
dtStack << New Column("ID", character,formula( word(1,:label,"-")));

// Now delete the ID from the :Label column
For Each Row(
	:Label = trim(substr( :Label, contains(:Label,"-")+1));
);

// Split back to common columns
dtFinal = dtStack << Split(
	Split By( :Label ),
	Split( :Data ),
	Sort by Column Property
);

Now for my clarification.......as I stated, with your sparse details, I am making a lot of guesses on this. Also, without a sample data table, the code I provided has not been tested, and I assume it would not work on first execution. It is intended to show a posible direction you can take to solve the problem

Jim

gzmorgan0 · Jul 17, 2018 06:49 PM

Charles,

If your column has a regular series of 3 ( Date, Parameter Name, Data Values ), this can be done via the UI with Main Menu > Tables > Stack and select a multiple series stack, 3, and uncheck stack by row. It takes a little clean up of the column names, but the resulting table can be used for control charts using Parameter Name, or Data Group as the By variable.

Or if you want separate tables, you can use Table Subset on this new table, check Subset By, then specify, Data Group.

If your table does not have a repeated series order then a script or a lot of manual work is needed to subset each group. Respond to this post if your data table has an irregular order of columns.

I attached a JSL script that should work for your table if it has the repeated series columns and all Dates and all Somethings are the same modeling type (likely numeric, continuous).

Names Default To Here(1);

dt = current data table();
cnames = dt << get column names("String");

stk = dt << Stack(
	columns(
	  evalList(cnames)
	),
	Source Label Column( "Label" ),
	Stacked Data Column( "Data" ),
	Stack By Row( 0 ),
	Number of Series( 3 )
);

stk:Data   << Set Name("Date");
stk:Data 2 << Set Name("Parameter Name");
stk:Data 3 << Set Name("Data");
stk:Label  << set each value( trim(word(1,:Label,"-")));
stk:Label  << Set Name( "Data Group");
stk << delete columns(:Label 2, Label 3);

gzmorgan0 · Jul 17, 2018 06:54 PM

Jim's solution is the more general solution I mentioned, if your table does not have a regular pattern of columns. Jum's response only appeared after I hit Post.

Splitting a data table by column id

Re: Splitting a data table by column id

Re: Splitting a data table by column id

Re: Splitting a data table by column id

Recommended Articles

Get Going with JMP: Essentials for Using JMP

Multiple-Group Analysis in Structural Equation Modeling