cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
WernerL
Level III

Recoding column with both char and num

I am working on a dataset with > 1M rows, with a column 'Reported Results'.

In this column, approx. 3000 rows contain a '<' or '>' (always followed by a value, e.g. <100), all other rows are numeric. When importing the datafile, JMP marked the Data Type for this column as Character.

I tried converting the 5000 rows, containing a '<' or '>', by using Column Recode. I saved the script to File (see attached).

The new column, after recoding, maintains Data Type Character. When adjusting to Numeric, I obtain meaningless data.

I also tried: (a) first resetting my column 'Reported Results' to Numeric, and also (b) 'parse as numbers' in recode, but both alternatives didn't work either.

Using JMP 17.2.0

9 REPLIES 9
jthi
Super User

Re: Recoding column with both char and num

Does the data look correct in Character format after recoding?

 

Edit:

Recode seems to do what it is supposed to?

View more...
Names Default To Here(1);

vals = {",000000", "0", ",0000000000", "0", ",050000", "0.05", ",100000", "0.1", ",150000", "0.15", ",300000", "0.3", ",350000", "0.35", ",400000",
"0.4", ",500000", "0.5", ",600000", "0.6", ",650000", "0.65", ",800000", "0.8", "-", "Exclude", "-,350000", "Exclude", "-0,1", "Exclude", "-0,3",
"Exclude", "-0,7", "Exclude", "-0,8", "Exclude", "-0,9", "Exclude", "-1,0", "Exclude", "-1,1", "Exclude", "-1,3", "Exclude", "-1,4", "Exclude",
"-2,0", "Exclude", "-2,1", "Exclude", "-2,2", "Exclude", "< 0", "0", "< 0,06", "0.025", "< 0,17", "0.08", "< 0,25", "0.12", "< 0,50", "0.245",
"< 0,64", "0.315", "< 1", "0.5", "< 10", "4.5", "< 100", "45", "< 1000", "450", "< 110", "50", "< 120", "55", "< 1300", "585", "< 15", "7", "< 150",
"70", "< 160", "75", "< 170", "80", "< 180", "85", "< 190", "90", "< 2", "0.95", "< 20", "9.5", "< 220", "105", "< 25", "12", "< 290", "135", "< 3",
"1.45", "< 30", "14.5", "< 300", "135", "< 3000", "1350", "< 310", "140", "< 320", "145", "< 35", "17", "< 4", "1.95", "< 40", "19.5", "< 400", "180",
"< 45", "22", "< 5", "2.45", "< 50", "24.5", "< 55", "27", "< 590", "290", "< 60", "29.5", "< 6100", "2550", "< 620", "290", "< 65", "32", "< 650",
"295", "< 6500", "2750", "< 660", "300", "< 6600", "2800", "< 670", "305", "< 6700", "2850", "< 680", "310", "< 690", "315", "< 70", "34.5", "< 700",
"315", "< 710", "320", "<,166666666666666000", "0.075", "<0", "0", "<1", "0.5", "<1,0", "0.5", "<1,000000000000", "0.5", "<1,3", "0.6", "<10", "4.5",
"<10,000000000000", "4.5", "<100", "45", "<1000", "450", "<10000", "4500", "<110", "50", "<120", "55", "<15", "7", "<160", "75", "<2", "0.95", "<20",
"9.5", "<200", "95", "<25", "12", "<3,23", "1.61", "<35", "17", "<4", "1.95", "<4,05", "2.02", "<45", "22", "<5", "2.45", "<55", "27", "<60", "29.5",
"<640", "290", "<65", "32", "<650", "295", "<75", "37", "<80", "39.5", "> 10", "45", "> 100", "450", "> 10000", "45000", "> 1100", "5550", "> 11000",
"55500", "> 120", "560", "> 1200", "5600", "> 12000", "56000", "> 120000", "460000", "> 12000000", "56000000", "> 13000", "56500", "> 150", "575",
"> 1500", "5750", "> 15000", "57500", "> 150000", "575000", "> 1500000", "5750000", "> 15000000", "57500000", "> 1600", "5800", "> 16000", "58000",
"> 170", "585", "> 1800", "5900", "> 18000", "59000", "> 190", "595", "> 200", "600", "> 2000", "6000", "> 20000", "60000", "> 2100000", "6050000",
"> 2200", "6100", "> 2300", "6150", "> 230000", "615000", "> 260", "630", "> 2600", "6300", "> 26000", "63000", "> 270", "635", "> 2700", "6350",
"> 27000", "63500", "> 270000", "635000", "> 2700000", "635000", "> 300", "650", "> 3000", "6500", "> 30000", "65000", "> 360", "680", "> 37000",
"68500", "> 370000", "685000", "> 3700000", "6850000", "> 37000000", "68500000", "> 400", "700", "> 4000", "7000", "> 420", "710", "> 430", "715",
"> 450", "725", "> 4500", "7250", "> 4500000", "7250000", "> 460", "730", "> 540", "770", "> 60", "80", "> 600", "800", "> 6000", "8000", "> 80",
"90", "> 900", "850", "> 9000", "9500", "> 90000", "95000", "> 9300", "9650", ">10000", "45000", ">100000", "450000", ">1000000", "4500000", ">11000",
"55500", ">150", "575", ">1500", "5750", ">15000", "57500", ">150000", "575000", ">1500000", "5750000", ">15000000", "57500000", ">150000000",
"575000000", ">1600", "5800", ">16000", "58000", ">160000", "580000", ">1600000", "5800000", ">1800", "5900", ">180000", "590000", ">1900", "5950",
">19000", "59500", ">20000", "60000", ">2200", "6100", ">24000", "62000", ">26000", "63000", ">260000", "630000", ">27000", "63500", ">300", "650",
">3000", "6500", ">30000", "65000", ">300000", "650000", ">3000000", "6500000", ">30000000", "65000000", ">300000000", "650000000", ">3000000000",
"6500000000", ">30000000000", "65000000000", ">310000", "655000", ">3100000", "6550000", ">360", "680", ">37000", "68500", ">370000", "685000",
">3700000", "6850000", ">37000000", "68500000", ">3700000000", "6850000000", ">3900", "6950", ">440", "720", ">4400000", "7200000", ">450", "725",
">4500", "7250", ">45000", "72500", ">500000000", "750000000", ">540", "770", ">54000", "77000", ">550000", "775000", ">580000", "790000", ">7600000",
"8800000", ">80", "90", ">8400", "8200", ">900", "850", ">9000", "9500", ">93000000", "96500000"};

char_vals = vals[1 :: N Items(vals) :: 2];

dt = New Table("Test", New Column("Reported result", Character, Nominal, Values(char_vals)));

Local({dt, col1},
	dt = Data Table("Test");
	dt << Begin Data Update;
	col1 = dt << New Column(dt:Reported result);
	col1 << Set Name("Result (after corr.Censor.)");
	dt << Move Selected Columns({col1}, after(dt:Reported result));
	dt << Recode Column(
		dt:Reported result,
		{Map Value(
			_rcOrig,
			{",000000", "0", ",0000000000", "0", ",050000", "0.05", ",100000", "0.1", ",150000", "0.15", ",300000", "0.3", ",350000", "0.35",
			",400000", "0.4", ",500000", "0.5", ",600000", "0.6", ",650000", "0.65", ",800000", "0.8", "-", "Exclude", "-,350000", "Exclude", "-0,1",
			"Exclude", "-0,3", "Exclude", "-0,7", "Exclude", "-0,8", "Exclude", "-0,9", "Exclude", "-1,0", "Exclude", "-1,1", "Exclude", "-1,3",
			"Exclude", "-1,4", "Exclude", "-2,0", "Exclude", "-2,1", "Exclude", "-2,2", "Exclude", "< 0", "0", "< 0,06", "0.025", "< 0,17", "0.08",
			"< 0,25", "0.12", "< 0,50", "0.245", "< 0,64", "0.315", "< 1", "0.5", "< 10", "4.5", "< 100", "45", "< 1000", "450", "< 110", "50",
			"< 120", "55", "< 1300", "585", "< 15", "7", "< 150", "70", "< 160", "75", "< 170", "80", "< 180", "85", "< 190", "90", "< 2", "0.95",
			"< 20", "9.5", "< 220", "105", "< 25", "12", "< 290", "135", "< 3", "1.45", "< 30", "14.5", "< 300", "135", "< 3000", "1350", "< 310",
			"140", "< 320", "145", "< 35", "17", "< 4", "1.95", "< 40", "19.5", "< 400", "180", "< 45", "22", "< 5", "2.45", "< 50", "24.5", "< 55",
			"27", "< 590", "290", "< 60", "29.5", "< 6100", "2550", "< 620", "290", "< 65", "32", "< 650", "295", "< 6500", "2750", "< 660", "300",
			"< 6600", "2800", "< 670", "305", "< 6700", "2850", "< 680", "310", "< 690", "315", "< 70", "34.5", "< 700", "315", "< 710", "320",
			"<,166666666666666000", "0.075", "<0", "0", "<1", "0.5", "<1,0", "0.5", "<1,000000000000", "0.5", "<1,3", "0.6", "<10", "4.5",
			"<10,000000000000", "4.5", "<100", "45", "<1000", "450", "<10000", "4500", "<110", "50", "<120", "55", "<15", "7", "<160", "75", "<2",
			"0.95", "<20", "9.5", "<200", "95", "<25", "12", "<3,23", "1.61", "<35", "17", "<4", "1.95", "<4,05", "2.02", "<45", "22", "<5", "2.45",
			"<55", "27", "<60", "29.5", "<640", "290", "<65", "32", "<650", "295", "<75", "37", "<80", "39.5", "> 10", "45", "> 100", "450",
			"> 10000", "45000", "> 1100", "5550", "> 11000", "55500", "> 120", "560", "> 1200", "5600", "> 12000", "56000", "> 120000", "460000",
			"> 12000000", "56000000", "> 13000", "56500", "> 150", "575", "> 1500", "5750", "> 15000", "57500", "> 150000", "575000", "> 1500000",
			"5750000", "> 15000000", "57500000", "> 1600", "5800", "> 16000", "58000", "> 170", "585", "> 1800", "5900", "> 18000", "59000", "> 190",
			"595", "> 200", "600", "> 2000", "6000", "> 20000", "60000", "> 2100000", "6050000", "> 2200", "6100", "> 2300", "6150", "> 230000",
			"615000", "> 260", "630", "> 2600", "6300", "> 26000", "63000", "> 270", "635", "> 2700", "6350", "> 27000", "63500", "> 270000",
			"635000", "> 2700000", "635000", "> 300", "650", "> 3000", "6500", "> 30000", "65000", "> 360", "680", "> 37000", "68500", "> 370000",
			"685000", "> 3700000", "6850000", "> 37000000", "68500000", "> 400", "700", "> 4000", "7000", "> 420", "710", "> 430", "715", "> 450",
			"725", "> 4500", "7250", "> 4500000", "7250000", "> 460", "730", "> 540", "770", "> 60", "80", "> 600", "800", "> 6000", "8000", "> 80",
			"90", "> 900", "850", "> 9000", "9500", "> 90000", "95000", "> 9300", "9650", ">10000", "45000", ">100000", "450000", ">1000000",
			"4500000", ">11000", "55500", ">150", "575", ">1500", "5750", ">15000", "57500", ">150000", "575000", ">1500000", "5750000", ">15000000",
			"57500000", ">150000000", "575000000", ">1600", "5800", ">16000", "58000", ">160000", "580000", ">1600000", "5800000", ">1800", "5900",
			">180000", "590000", ">1900", "5950", ">19000", "59500", ">20000", "60000", ">2200", "6100", ">24000", "62000", ">26000", "63000",
			">260000", "630000", ">27000", "63500", ">300", "650", ">3000", "6500", ">30000", "65000", ">300000", "650000", ">3000000", "6500000",
			">30000000", "65000000", ">300000000", "650000000", ">3000000000", "6500000000", ">30000000000", "65000000000", ">310000", "655000",
			">3100000", "6550000", ">360", "680", ">37000", "68500", ">370000", "685000", ">3700000", "6850000", ">37000000", "68500000",
			">3700000000", "6850000000", ">3900", "6950", ">440", "720", ">4400000", "7200000", ">450", "725", ">4500", "7250", ">45000", "72500",
			">500000000", "750000000", ">540", "770", ">54000", "77000", ">550000", "775000", ">580000", "790000", ">7600000", "8800000", ">80", "90",
			">8400", "8200", ">900", "850", ">9000", "9500", ">93000000", "96500000"},
			Unmatched(_rcNow)
		)},
		Update Properties(1),
		Target Column(col1)
	);
	dt << End Data Update;
);

Wait(2); // for demo purposes

Column(dt, "Result (after corr.Censor.)") << Set Data Type("Numeric") << Set Modeling Type("Continuous");

jthi_0-1695823763672.png

jthi_1-1695823771533.png

 

-Jarmo
WernerL
Level III

Re: Recoding column with both char and num

Correct, after recoding, the data looks correct in Character format. (When changing this to Numeric the censored data (< or >) jumps into billions and the previously Numeric data is gone/ empty cell.)

jthi
Super User

Re: Recoding column with both char and num

Based on the code you provided ">7600000" is supposed to change to "8800000" and when that is converted to number it seems to be ok (row 229).

jthi_0-1695824252921.png

How are you trying to recode those values which are below or higher than something else?

-Jarmo
WernerL
Level III

Re: Recoding column with both char and num

Thanks for your swift responses Jarmo!

I am not entirely sure I understand your question " How are you trying to recode those values which are below or higher than something else? ".

In Recode, I specified manually for each 'Old value' (censored), in the parallel column 'New Value', a new Numeric value, as per your test above (unfortunately, it appears to be interpreted as Char)

 

 

 

jthi
Super User

Re: Recoding column with both char and num

You can change it to numeric / continuous after recoding. Or you can first create the correct recode and then parse as numbers

jthi_0-1695825807454.png

Values which could be converted to numbers were (modeling type is still Nominal and you might have to change it to Continuous)

jthi_1-1695825822453.png

 

-Jarmo
WernerL
Level III

Re: Recoding column with both char and num

I understand you are proposing two different solution:

 

(a) change to numeric / continuous after recoding.

Unfortunately, this is exactly my problem. See my comment higher <When I do this after recoding, the data looks correct in Character format. (When changing this to Numeric the censored data (< or >) jumps into billions and the previously Numeric data is gone/ empty cell.)>

 

(b) Or you can first create the correct recode and then parse as numbers:

When I try to 'Edit with Recode' (or even when I try to rerun the Script, I get the following error:

WernerL_0-1695827156994.png

 

 

WernerL
Level III

Re: Recoding column with both char and num

Morning Jarmo,

 

Thank you for confirming the recode seems to work properly.

After recoding, the data looks correct in Character format. However, when changing this to Numeric/ Continuous the results is meaningless data: censored data (< or >) jumps into billions and the previously Numeric data is gone/ empty cell.

Any suggestion what could trigger this and how I could change to Numeric/ Continuous properly?

Thank you

 

jthi
Super User

Re: Recoding column with both char and num

Decimal separator or thousand separator could maybe cause some issues. Without having working examples where this occurs, it is fairly difficult to debug as my example seems to work fine.

-Jarmo
WernerL
Level III

Re: Recoding column with both char and num

Thank you for your efforts. I will log a request with JMP Support exploring if they know more.