Subscribe Bookmark RSS Feed

Help with binning transcript lengths with specific ranges in JMP

johnny-weaver

New Contributor

Joined:

Sep 17, 2017

I have around ~60,000 transcript lengths ranging in length from 200 to 10,000 that I want to graph their length distributions using binning (<250, 250 - 500, 500 - 1000, 1000 - 1500, 1500 - 2000, 2000 - 2500, 2500 - 3000, >3000) with the last bin containing all lengths >3000. So far I have not been able to figure this out (new to JMP). I have all the transcripts in one column, one value per row.

 

Thank you,

 

Johnny

1 ACCEPTED SOLUTION

Accepted Solutions
txnelson

Super User

Joined:

Jun 22, 2012

Solution

The way to do this is to simply create a new column and use a fairly simple formula to create the bins.  Here is a script that does what you want, but you could easily create this interactively by creating the new column and then using the formula editor to create the code for the formula

Names Default To Here( 1 );
// Create an example data table
dt = New Table( "Bin",
	add rows( 60000 ),
	New Column( "value", numeric, formula( Random Normal( 1500, 600 ) ) )
);
// Make sure all 60,000 get processed before proceeding
wait(2);
// Remove the formula to make the values "real" not virtual values
dt:value << delete property( "formula" );

// Create a new column with a binning formula you want
dt << New Column( "Bin",
	character,
	formula(
		If(
			:value < 250, "<250",
			:value < 500, "250-499",
			:value > 3000, ">3000",
			Char( 500 * Floor( :value / 500 ) ) || "-" || Char(
				500 * Ceiling( :value / 500 ) - 1
			)
		)
	)
);

// Since the bin values will not display alphabetically, set the Value Ordering
// column property to make sure the ordering will display properly
dt:bin << set property(
	"value ordering",
	{"<250", "250-499", "500-999", "1000-1499", "1500-1999", "2000-2499", "2500-2999", ">3000"}
);

P.S.

I believe that you will find that your comment "Any idea how to do this if even possible?" will become a non question, or at least it will be modified to just the first part of the question, "Any idea how to do this?".  Using JMP and JSL most anything you need can be found or created.

Jim
2 REPLIES
txnelson

Super User

Joined:

Jun 22, 2012

Solution

The way to do this is to simply create a new column and use a fairly simple formula to create the bins.  Here is a script that does what you want, but you could easily create this interactively by creating the new column and then using the formula editor to create the code for the formula

Names Default To Here( 1 );
// Create an example data table
dt = New Table( "Bin",
	add rows( 60000 ),
	New Column( "value", numeric, formula( Random Normal( 1500, 600 ) ) )
);
// Make sure all 60,000 get processed before proceeding
wait(2);
// Remove the formula to make the values "real" not virtual values
dt:value << delete property( "formula" );

// Create a new column with a binning formula you want
dt << New Column( "Bin",
	character,
	formula(
		If(
			:value < 250, "<250",
			:value < 500, "250-499",
			:value > 3000, ">3000",
			Char( 500 * Floor( :value / 500 ) ) || "-" || Char(
				500 * Ceiling( :value / 500 ) - 1
			)
		)
	)
);

// Since the bin values will not display alphabetically, set the Value Ordering
// column property to make sure the ordering will display properly
dt:bin << set property(
	"value ordering",
	{"<250", "250-499", "500-999", "1000-1499", "1500-1999", "2000-2499", "2500-2999", ">3000"}
);

P.S.

I believe that you will find that your comment "Any idea how to do this if even possible?" will become a non question, or at least it will be modified to just the first part of the question, "Any idea how to do this?".  Using JMP and JSL most anything you need can be found or created.

Jim
johnny-weaver

New Contributor

Joined:

Sep 17, 2017

Thank you for this script, it helps a lot. I had no idea you could script in JMP.

 

Johnny