cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Browse apps to extend the software in the new JMP Marketplace
Choose Language Hide Translation Bar
Ese_Omatsone
Level I

How to combine data measured in different units

I have a JMP Table that contains data measured hourly at about 50 different stations.  At each station the data is sometimes reported in ppm, other times in ppb, occasionally in both units and occasionally, there are no measurements at all.  It's a pretty large dataset and I'm wondering - how do I automate combining the columns that pertain to each station into a single column in one system of units.  Or do I have to manually create a new column for each station and enter a formula to do the conversion? I'm just trying to speed up this data prep step without using Excel.

2 ACCEPTED SOLUTIONS

Accepted Solutions
ih
Super User (Alumni) ih
Super User (Alumni)

Re: How to combine data measured in different units

No doubt there are many ways to do this, here is the first that came to mind assuming the column format in your example is consistent. You can either run the script at the bottom of the post to see a full example or follow along with the green comments in the script and the screenshots below:

 

Bring all station data into a single column using Stack:

ih_0-1620684493672.png

 

Formula for Station (shown both in the 'pretty' format and the code you get when double clicking on the formula:

ih_2-1620684601350.png

Formula for Unit is similar:

ih_3-1620684619132.png

 

Formula for Data accounts for each potential unit:

ih_5-1620684651969.png

Tabulate relies on the ordinal or nominal data type for Intervals and Station:

ih_6-1620684698459.png

Then use Make Into Data table from the tabulate menu.

 

Script to recreate an example:

//To see a full example you can copy this this into a new script window and press Run, or just following along
//with the green comments

Names default to here(1);

//Recreate part of your sample data table, you could skip this part and open your original data instead
//Note that the times are ordinal or nominal data types, this will be imporant in the tabulate platform later
dt = New Table( "Raw Data",
	Add Rows( 2 ),
	New Column( "IntervalStart", Numeric, "Ordinal", Format( "m/d/y h:m", 19 ), Input Format( "m/d/y" ),
		Set Values( [3503606400, 3503610000] ),
	),
	New Column( "IntervalEnd", Numeric, "Ordinal", Format( "m/d/y h:m", 19 ), Input Format( "m/d/y h:m" ),
		Set Values( [3503609940, 3503613540] )
	),
	New Column( "StationName: Anzac-Unit:ppb", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [1, 2] ) ),
	New Column( "StationName: Anzac-Unit:ppm", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [., .] ) ),
	New Column( "StationName: Ardrossan-Unit: ppb", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [., .] ) ),
	New Column( "StationName: Ardrossan-Unit: ppm", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [11, 12] ) )
);

//Get a reference to your current data table
dt = Current data table();

//Stack all of the columns - use Tables > Stack and include all of the columns except the first two which are times
dtStacked = dt << Stack(
	columns(
		(dt << Get Column Names())[3::n col(dt)]
	),
	Source Label Column( "Label" ),
	Stacked Data Column( "Data" )
);

//Now add three new columns to find the station and units, and calculate the concentration based on the units 
//you should be able to copy and paste the part inside the 'Formula' function below into the formulas of your new columns
dtStacked << New Column("Station", Character, "Nominal", Formula(Words(:Label, "-")[1]));
dtStacked << New Column("Unit", Character, "Nominal", Formula(Words(:Label, "-")[2]), Set Display Width(65));
dtStacked << New Column("Data (ppm)", Numeric, "Continuous", Format("Best", 12), Formula(If(Contains(:Unit, "ppm"), :Data, Contains(:Unit, "ppb"), :Data / 1000)));

//Put back in standard format using the tabulate platform.  Drag times and the station into 
//columns on the left and add the mean and N as data fields.  Note that in the resulting table anything 
//with a N(Data) greater than 1 might have a problem!
(dtStacked << Tabulate(
	Show Control Panel( 0 ),
	Add Table(
		Column Table( Analysis Columns( :Data ), Statistics( Mean, N ) ),
		Row Table( Grouping Columns( :IntervalStart, :IntervalEnd, :Station ) )
	)
)) << Make Into Data Table;

 

View solution in original post

txnelson
Super User

Re: How to combine data measured in different units

Here is a script that given the structure you showed, will give convert all of the data to ppm and delete the ppb columns

It is just the starting point, and if more specifics change the calculations, they can be included in the logic

Names Default To Here( 1 );
dt = Current Data Table();
For( i = 1, i <= N Cols( dt ), i++,
	If( Trim( Word( -1, Column( dt, i ) << get name, ":" ) ) == "ppm",
		ppbCol = Trim( Substr( Column( dt, i ) << get name, 1, Contains( Column( dt, i ) << get name, "ppm" ) - 1 ) || "ppb" );
		For( k = 1, k <= N Rows( dt ), k++,
			Column( dt, i )[k] = Sum( Column( dt, i )[k], Column( dt, ppbCol )[k]/1000 )
		);
	)
);

// Get rid of ppb columns
For( i = N Cols( dt ), i >= 1, i--,
	If( Trim( Word( -1, Column( dt, i ) << get name, ":" ) ) == "ppb",
		dt << delete columns( i )
	)
);
Jim

View solution in original post

9 REPLIES 9
txnelson
Super User

Re: How to combine data measured in different units

A very simple script can be used to change the values directly in the column it came in.  Or you can use the Recode tool to do the same, in an interactive format.  How do you determine what the units of measurement are?  Is there a separate column that indicates that?

Jim
Ese_Omatsone
Level I

Re: How to combine data measured in different units

Hi Jim,

I posted a picture of that the data looks like, in answer to your question.  Please tell me the steps to change the data in place.

Thank you.

dale_lehman
Level VII

Re: How to combine data measured in different units

Does the data appear as numeric (without the units - or units in a separate field) or as character data where the units appear with the number (e.g., 13ppm)?  It would be easier if you can provide an example of how the data appears.  A formula should make the conversion you want, but exactly what formula depends on what the data looks like.

Ese_Omatsone
Level I

Re: How to combine data measured in different units

Capture_2.PNG

Ese_Omatsone
Level I

Re: How to combine data measured in different units

Hi,

 

This is what the data looks like.  The column name says the station and the units for the measurement.

Thank you kindly for the help.

ih
Super User (Alumni) ih
Super User (Alumni)

Re: How to combine data measured in different units

No doubt there are many ways to do this, here is the first that came to mind assuming the column format in your example is consistent. You can either run the script at the bottom of the post to see a full example or follow along with the green comments in the script and the screenshots below:

 

Bring all station data into a single column using Stack:

ih_0-1620684493672.png

 

Formula for Station (shown both in the 'pretty' format and the code you get when double clicking on the formula:

ih_2-1620684601350.png

Formula for Unit is similar:

ih_3-1620684619132.png

 

Formula for Data accounts for each potential unit:

ih_5-1620684651969.png

Tabulate relies on the ordinal or nominal data type for Intervals and Station:

ih_6-1620684698459.png

Then use Make Into Data table from the tabulate menu.

 

Script to recreate an example:

//To see a full example you can copy this this into a new script window and press Run, or just following along
//with the green comments

Names default to here(1);

//Recreate part of your sample data table, you could skip this part and open your original data instead
//Note that the times are ordinal or nominal data types, this will be imporant in the tabulate platform later
dt = New Table( "Raw Data",
	Add Rows( 2 ),
	New Column( "IntervalStart", Numeric, "Ordinal", Format( "m/d/y h:m", 19 ), Input Format( "m/d/y" ),
		Set Values( [3503606400, 3503610000] ),
	),
	New Column( "IntervalEnd", Numeric, "Ordinal", Format( "m/d/y h:m", 19 ), Input Format( "m/d/y h:m" ),
		Set Values( [3503609940, 3503613540] )
	),
	New Column( "StationName: Anzac-Unit:ppb", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [1, 2] ) ),
	New Column( "StationName: Anzac-Unit:ppm", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [., .] ) ),
	New Column( "StationName: Ardrossan-Unit: ppb", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [., .] ) ),
	New Column( "StationName: Ardrossan-Unit: ppm", Numeric, "Continuous", Format( "Best", 12 ), Set Values( [11, 12] ) )
);

//Get a reference to your current data table
dt = Current data table();

//Stack all of the columns - use Tables > Stack and include all of the columns except the first two which are times
dtStacked = dt << Stack(
	columns(
		(dt << Get Column Names())[3::n col(dt)]
	),
	Source Label Column( "Label" ),
	Stacked Data Column( "Data" )
);

//Now add three new columns to find the station and units, and calculate the concentration based on the units 
//you should be able to copy and paste the part inside the 'Formula' function below into the formulas of your new columns
dtStacked << New Column("Station", Character, "Nominal", Formula(Words(:Label, "-")[1]));
dtStacked << New Column("Unit", Character, "Nominal", Formula(Words(:Label, "-")[2]), Set Display Width(65));
dtStacked << New Column("Data (ppm)", Numeric, "Continuous", Format("Best", 12), Formula(If(Contains(:Unit, "ppm"), :Data, Contains(:Unit, "ppb"), :Data / 1000)));

//Put back in standard format using the tabulate platform.  Drag times and the station into 
//columns on the left and add the mean and N as data fields.  Note that in the resulting table anything 
//with a N(Data) greater than 1 might have a problem!
(dtStacked << Tabulate(
	Show Control Panel( 0 ),
	Add Table(
		Column Table( Analysis Columns( :Data ), Statistics( Mean, N ) ),
		Row Table( Grouping Columns( :IntervalStart, :IntervalEnd, :Station ) )
	)
)) << Make Into Data Table;

 

Ese_Omatsone
Level I

Re: How to combine data measured in different units

THANK YOU.  IT WORKED!  Everything is now in ppm, and I can continue with my analysis.

txnelson
Super User

Re: How to combine data measured in different units

Here is a script that given the structure you showed, will give convert all of the data to ppm and delete the ppb columns

It is just the starting point, and if more specifics change the calculations, they can be included in the logic

Names Default To Here( 1 );
dt = Current Data Table();
For( i = 1, i <= N Cols( dt ), i++,
	If( Trim( Word( -1, Column( dt, i ) << get name, ":" ) ) == "ppm",
		ppbCol = Trim( Substr( Column( dt, i ) << get name, 1, Contains( Column( dt, i ) << get name, "ppm" ) - 1 ) || "ppb" );
		For( k = 1, k <= N Rows( dt ), k++,
			Column( dt, i )[k] = Sum( Column( dt, i )[k], Column( dt, ppbCol )[k]/1000 )
		);
	)
);

// Get rid of ppb columns
For( i = N Cols( dt ), i >= 1, i--,
	If( Trim( Word( -1, Column( dt, i ) << get name, ":" ) ) == "ppb",
		dt << delete columns( i )
	)
);
Jim
Ese_Omatsone
Level I

Re: How to combine data measured in different units

Thank you kindly!