- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Recoding column with both char and num
I am working on a dataset with > 1M rows, with a column 'Reported Results'.
In this column, approx. 3000 rows contain a '<' or '>' (always followed by a value, e.g. <100), all other rows are numeric. When importing the datafile, JMP marked the Data Type for this column as Character.
I tried converting the 5000 rows, containing a '<' or '>', by using Column Recode. I saved the script to File (see attached).
The new column, after recoding, maintains Data Type Character. When adjusting to Numeric, I obtain meaningless data.
I also tried: (a) first resetting my column 'Reported Results' to Numeric, and also (b) 'parse as numbers' in recode, but both alternatives didn't work either.
Using JMP 17.2.0
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Recoding column with both char and num
Does the data look correct in Character format after recoding?
Edit:
Recode seems to do what it is supposed to?
Names Default To Here(1);
vals = {",000000", "0", ",0000000000", "0", ",050000", "0.05", ",100000", "0.1", ",150000", "0.15", ",300000", "0.3", ",350000", "0.35", ",400000",
"0.4", ",500000", "0.5", ",600000", "0.6", ",650000", "0.65", ",800000", "0.8", "-", "Exclude", "-,350000", "Exclude", "-0,1", "Exclude", "-0,3",
"Exclude", "-0,7", "Exclude", "-0,8", "Exclude", "-0,9", "Exclude", "-1,0", "Exclude", "-1,1", "Exclude", "-1,3", "Exclude", "-1,4", "Exclude",
"-2,0", "Exclude", "-2,1", "Exclude", "-2,2", "Exclude", "< 0", "0", "< 0,06", "0.025", "< 0,17", "0.08", "< 0,25", "0.12", "< 0,50", "0.245",
"< 0,64", "0.315", "< 1", "0.5", "< 10", "4.5", "< 100", "45", "< 1000", "450", "< 110", "50", "< 120", "55", "< 1300", "585", "< 15", "7", "< 150",
"70", "< 160", "75", "< 170", "80", "< 180", "85", "< 190", "90", "< 2", "0.95", "< 20", "9.5", "< 220", "105", "< 25", "12", "< 290", "135", "< 3",
"1.45", "< 30", "14.5", "< 300", "135", "< 3000", "1350", "< 310", "140", "< 320", "145", "< 35", "17", "< 4", "1.95", "< 40", "19.5", "< 400", "180",
"< 45", "22", "< 5", "2.45", "< 50", "24.5", "< 55", "27", "< 590", "290", "< 60", "29.5", "< 6100", "2550", "< 620", "290", "< 65", "32", "< 650",
"295", "< 6500", "2750", "< 660", "300", "< 6600", "2800", "< 670", "305", "< 6700", "2850", "< 680", "310", "< 690", "315", "< 70", "34.5", "< 700",
"315", "< 710", "320", "<,166666666666666000", "0.075", "<0", "0", "<1", "0.5", "<1,0", "0.5", "<1,000000000000", "0.5", "<1,3", "0.6", "<10", "4.5",
"<10,000000000000", "4.5", "<100", "45", "<1000", "450", "<10000", "4500", "<110", "50", "<120", "55", "<15", "7", "<160", "75", "<2", "0.95", "<20",
"9.5", "<200", "95", "<25", "12", "<3,23", "1.61", "<35", "17", "<4", "1.95", "<4,05", "2.02", "<45", "22", "<5", "2.45", "<55", "27", "<60", "29.5",
"<640", "290", "<65", "32", "<650", "295", "<75", "37", "<80", "39.5", "> 10", "45", "> 100", "450", "> 10000", "45000", "> 1100", "5550", "> 11000",
"55500", "> 120", "560", "> 1200", "5600", "> 12000", "56000", "> 120000", "460000", "> 12000000", "56000000", "> 13000", "56500", "> 150", "575",
"> 1500", "5750", "> 15000", "57500", "> 150000", "575000", "> 1500000", "5750000", "> 15000000", "57500000", "> 1600", "5800", "> 16000", "58000",
"> 170", "585", "> 1800", "5900", "> 18000", "59000", "> 190", "595", "> 200", "600", "> 2000", "6000", "> 20000", "60000", "> 2100000", "6050000",
"> 2200", "6100", "> 2300", "6150", "> 230000", "615000", "> 260", "630", "> 2600", "6300", "> 26000", "63000", "> 270", "635", "> 2700", "6350",
"> 27000", "63500", "> 270000", "635000", "> 2700000", "635000", "> 300", "650", "> 3000", "6500", "> 30000", "65000", "> 360", "680", "> 37000",
"68500", "> 370000", "685000", "> 3700000", "6850000", "> 37000000", "68500000", "> 400", "700", "> 4000", "7000", "> 420", "710", "> 430", "715",
"> 450", "725", "> 4500", "7250", "> 4500000", "7250000", "> 460", "730", "> 540", "770", "> 60", "80", "> 600", "800", "> 6000", "8000", "> 80",
"90", "> 900", "850", "> 9000", "9500", "> 90000", "95000", "> 9300", "9650", ">10000", "45000", ">100000", "450000", ">1000000", "4500000", ">11000",
"55500", ">150", "575", ">1500", "5750", ">15000", "57500", ">150000", "575000", ">1500000", "5750000", ">15000000", "57500000", ">150000000",
"575000000", ">1600", "5800", ">16000", "58000", ">160000", "580000", ">1600000", "5800000", ">1800", "5900", ">180000", "590000", ">1900", "5950",
">19000", "59500", ">20000", "60000", ">2200", "6100", ">24000", "62000", ">26000", "63000", ">260000", "630000", ">27000", "63500", ">300", "650",
">3000", "6500", ">30000", "65000", ">300000", "650000", ">3000000", "6500000", ">30000000", "65000000", ">300000000", "650000000", ">3000000000",
"6500000000", ">30000000000", "65000000000", ">310000", "655000", ">3100000", "6550000", ">360", "680", ">37000", "68500", ">370000", "685000",
">3700000", "6850000", ">37000000", "68500000", ">3700000000", "6850000000", ">3900", "6950", ">440", "720", ">4400000", "7200000", ">450", "725",
">4500", "7250", ">45000", "72500", ">500000000", "750000000", ">540", "770", ">54000", "77000", ">550000", "775000", ">580000", "790000", ">7600000",
"8800000", ">80", "90", ">8400", "8200", ">900", "850", ">9000", "9500", ">93000000", "96500000"};
char_vals = vals[1 :: N Items(vals) :: 2];
dt = New Table("Test", New Column("Reported result", Character, Nominal, Values(char_vals)));
Local({dt, col1},
dt = Data Table("Test");
dt << Begin Data Update;
col1 = dt << New Column(dt:Reported result);
col1 << Set Name("Result (after corr.Censor.)");
dt << Move Selected Columns({col1}, after(dt:Reported result));
dt << Recode Column(
dt:Reported result,
{Map Value(
_rcOrig,
{",000000", "0", ",0000000000", "0", ",050000", "0.05", ",100000", "0.1", ",150000", "0.15", ",300000", "0.3", ",350000", "0.35",
",400000", "0.4", ",500000", "0.5", ",600000", "0.6", ",650000", "0.65", ",800000", "0.8", "-", "Exclude", "-,350000", "Exclude", "-0,1",
"Exclude", "-0,3", "Exclude", "-0,7", "Exclude", "-0,8", "Exclude", "-0,9", "Exclude", "-1,0", "Exclude", "-1,1", "Exclude", "-1,3",
"Exclude", "-1,4", "Exclude", "-2,0", "Exclude", "-2,1", "Exclude", "-2,2", "Exclude", "< 0", "0", "< 0,06", "0.025", "< 0,17", "0.08",
"< 0,25", "0.12", "< 0,50", "0.245", "< 0,64", "0.315", "< 1", "0.5", "< 10", "4.5", "< 100", "45", "< 1000", "450", "< 110", "50",
"< 120", "55", "< 1300", "585", "< 15", "7", "< 150", "70", "< 160", "75", "< 170", "80", "< 180", "85", "< 190", "90", "< 2", "0.95",
"< 20", "9.5", "< 220", "105", "< 25", "12", "< 290", "135", "< 3", "1.45", "< 30", "14.5", "< 300", "135", "< 3000", "1350", "< 310",
"140", "< 320", "145", "< 35", "17", "< 4", "1.95", "< 40", "19.5", "< 400", "180", "< 45", "22", "< 5", "2.45", "< 50", "24.5", "< 55",
"27", "< 590", "290", "< 60", "29.5", "< 6100", "2550", "< 620", "290", "< 65", "32", "< 650", "295", "< 6500", "2750", "< 660", "300",
"< 6600", "2800", "< 670", "305", "< 6700", "2850", "< 680", "310", "< 690", "315", "< 70", "34.5", "< 700", "315", "< 710", "320",
"<,166666666666666000", "0.075", "<0", "0", "<1", "0.5", "<1,0", "0.5", "<1,000000000000", "0.5", "<1,3", "0.6", "<10", "4.5",
"<10,000000000000", "4.5", "<100", "45", "<1000", "450", "<10000", "4500", "<110", "50", "<120", "55", "<15", "7", "<160", "75", "<2",
"0.95", "<20", "9.5", "<200", "95", "<25", "12", "<3,23", "1.61", "<35", "17", "<4", "1.95", "<4,05", "2.02", "<45", "22", "<5", "2.45",
"<55", "27", "<60", "29.5", "<640", "290", "<65", "32", "<650", "295", "<75", "37", "<80", "39.5", "> 10", "45", "> 100", "450",
"> 10000", "45000", "> 1100", "5550", "> 11000", "55500", "> 120", "560", "> 1200", "5600", "> 12000", "56000", "> 120000", "460000",
"> 12000000", "56000000", "> 13000", "56500", "> 150", "575", "> 1500", "5750", "> 15000", "57500", "> 150000", "575000", "> 1500000",
"5750000", "> 15000000", "57500000", "> 1600", "5800", "> 16000", "58000", "> 170", "585", "> 1800", "5900", "> 18000", "59000", "> 190",
"595", "> 200", "600", "> 2000", "6000", "> 20000", "60000", "> 2100000", "6050000", "> 2200", "6100", "> 2300", "6150", "> 230000",
"615000", "> 260", "630", "> 2600", "6300", "> 26000", "63000", "> 270", "635", "> 2700", "6350", "> 27000", "63500", "> 270000",
"635000", "> 2700000", "635000", "> 300", "650", "> 3000", "6500", "> 30000", "65000", "> 360", "680", "> 37000", "68500", "> 370000",
"685000", "> 3700000", "6850000", "> 37000000", "68500000", "> 400", "700", "> 4000", "7000", "> 420", "710", "> 430", "715", "> 450",
"725", "> 4500", "7250", "> 4500000", "7250000", "> 460", "730", "> 540", "770", "> 60", "80", "> 600", "800", "> 6000", "8000", "> 80",
"90", "> 900", "850", "> 9000", "9500", "> 90000", "95000", "> 9300", "9650", ">10000", "45000", ">100000", "450000", ">1000000",
"4500000", ">11000", "55500", ">150", "575", ">1500", "5750", ">15000", "57500", ">150000", "575000", ">1500000", "5750000", ">15000000",
"57500000", ">150000000", "575000000", ">1600", "5800", ">16000", "58000", ">160000", "580000", ">1600000", "5800000", ">1800", "5900",
">180000", "590000", ">1900", "5950", ">19000", "59500", ">20000", "60000", ">2200", "6100", ">24000", "62000", ">26000", "63000",
">260000", "630000", ">27000", "63500", ">300", "650", ">3000", "6500", ">30000", "65000", ">300000", "650000", ">3000000", "6500000",
">30000000", "65000000", ">300000000", "650000000", ">3000000000", "6500000000", ">30000000000", "65000000000", ">310000", "655000",
">3100000", "6550000", ">360", "680", ">37000", "68500", ">370000", "685000", ">3700000", "6850000", ">37000000", "68500000",
">3700000000", "6850000000", ">3900", "6950", ">440", "720", ">4400000", "7200000", ">450", "725", ">4500", "7250", ">45000", "72500",
">500000000", "750000000", ">540", "770", ">54000", "77000", ">550000", "775000", ">580000", "790000", ">7600000", "8800000", ">80", "90",
">8400", "8200", ">900", "850", ">9000", "9500", ">93000000", "96500000"},
Unmatched(_rcNow)
)},
Update Properties(1),
Target Column(col1)
);
dt << End Data Update;
);
Wait(2); // for demo purposes
Column(dt, "Result (after corr.Censor.)") << Set Data Type("Numeric") << Set Modeling Type("Continuous");
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Recoding column with both char and num
Correct, after recoding, the data looks correct in Character format. (When changing this to Numeric the censored data (< or >) jumps into billions and the previously Numeric data is gone/ empty cell.)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Recoding column with both char and num
Based on the code you provided ">7600000" is supposed to change to "8800000" and when that is converted to number it seems to be ok (row 229).
How are you trying to recode those values which are below or higher than something else?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Recoding column with both char and num
Thanks for your swift responses Jarmo!
I am not entirely sure I understand your question " How are you trying to recode those values which are below or higher than something else? ".
In Recode, I specified manually for each 'Old value' (censored), in the parallel column 'New Value', a new Numeric value, as per your test above (unfortunately, it appears to be interpreted as Char)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Recoding column with both char and num
You can change it to numeric / continuous after recoding. Or you can first create the correct recode and then parse as numbers
Values which could be converted to numbers were (modeling type is still Nominal and you might have to change it to Continuous)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Recoding column with both char and num
I understand you are proposing two different solution:
(a) change to numeric / continuous after recoding.
Unfortunately, this is exactly my problem. See my comment higher <When I do this after recoding, the data looks correct in Character format. (When changing this to Numeric the censored data (< or >) jumps into billions and the previously Numeric data is gone/ empty cell.)>
(b) Or you can first create the correct recode and then parse as numbers:
When I try to 'Edit with Recode' (or even when I try to rerun the Script, I get the following error:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Recoding column with both char and num
Morning Jarmo,
Thank you for confirming the recode seems to work properly.
After recoding, the data looks correct in Character format. However, when changing this to Numeric/ Continuous the results is meaningless data: censored data (< or >) jumps into billions and the previously Numeric data is gone/ empty cell.
Any suggestion what could trigger this and how I could change to Numeric/ Continuous properly?
Thank you
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Recoding column with both char and num
Decimal separator or thousand separator could maybe cause some issues. Without having working examples where this occurs, it is fairly difficult to debug as my example seems to work fine.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Get Direct Link
- Report Inappropriate Content
Re: Recoding column with both char and num
Thank you for your efforts. I will log a request with JMP Support exploring if they know more.