Re: Inconsistent acceptable syntax - Page 2

pcarroll1 · Nov 14, 2024 11:08 AM

AllCols = dt << Get Column Names();
AllColNames = dt << Get Column Names(string);
for(i=3, i<=n items(AllCols), i++, 
	colcol = AllCols[i];
	cname = AllColNames[i];
	nnm1 = Col Number(colcol);
	Fstr = "num(:\!""||cname||"\!"n)";
	Fexpr = parse(Fstr);
	newcol = dt << new column("Temp", numeric, formula(eval(evalexpr(Fexpr))));
	nnm2 = Col Number(newcol);
	if(nnm2 == nnm1, 
		newcol << delete formula();
		dt << Delete Columns(colcol);
		newcol << set name(cname),  //else
		dt << Delete Columns(newcol);
	);
);

I am using JMP18 on Windows. When I write code that I want to use often, I like to make a toolbar button for it. But sometimes I find that code that is acceptable to run from an open script window is not acceptable when run from a toolbar icon. Here is an example, where I am looping through columns and changing them from character to numeric if all entries are numeric. When running from a toolbar the objection is for the line:

newcol = dt << new column("Temp", numeric, formula(eval(evalexpr(Fexpr))));

This syntax is acceptable when I run from a script window!

Why would the jsl interpreter be different in each situation?

What syntax for this would be universally acceptable?

For very large datasets and columns this can be slow. Is there a better way to do this?

pcarroll1 · Nov 15, 2024 11:22 AM

Although, the column is referred to in 2 different ways (one as the text string name, the other as a column type) this finally worked. I've been writing code in jsl for 18 years and still find referencing columns as variables in jsl to be inconsistent and confusing.

AllCols = dtTOSA << Get Column Names();
AllColNames = dtTOSA << Get Column Names(string);
currentDatatable(dtTOSA);
for(i=3, i<=n items(AllCols), i++, 
	colcol = AllCols[i];
	cname = AllColNames[i];
	nnm1 = dtTOSA << Col Number(colcol);
	Fstr = "num(:\!""||cname||"\!"n)";
	Fexpr = parse(Fstr);
	newcol = dtTOSA << new column("Temp", numeric, set formula(eval(evalexpr(Fexpr))));
	nnm2 = dtTOSA << Col Number(newcol);
	if(nnm2 == nnm1, 
		currentDatatable(dtTOSA);
		newcol << delete formula();
		dtTOSA << Delete Columns(eval(cname));
		currentDatatable(dtTOSA);
		newcol << set name(cname),  //else
		dtTOSA << Delete Columns({newcol});
	);
);

jthi · Nov 15, 2024 12:00 PM

I always use strings as my column base and then build my references from them how ever I need. I have found this to be the most flexible way of handling different types of referencing.

For example, if I take your script and make some assumptions of its purpose (starting from third column go through all of them, create a new formula column in which you change values to numeric, compare missing values and perform actions based on the result) I would write it like this in JMP18:

Names Default To Here(1);

dtTOSA = New Table("Untitled",
	Add Rows(4),
	Compress File When Saved(1),
	New Column("Column 1",
		Numeric,
		"Continuous",
		Format("Best", 12),
		Set Values([1, 2, 3, 4])
	),
	New Column("Column 2", Character, "Nominal", Set Values({"A", "1", "2", "3"})),
	New Column("Column 3", Character, "Nominal", Set Values({"1", "2", "3", "4"})),
	New Column("Column 4", Character, "Nominal", Set Values({"A", "B", "C", "D"}))
);
wait(1); // demo purposes

cols = dtTOSA << Get Column Names("String");
// Remove From(cols, 1, 2); // skipping this for demo purposes

For Each({colname}, cols,
	// colname = cols[3]; // for testing
	Eval(EvalExpr(
		newcol = dtTOSA << New Column("Temp", numeric, Formula(
			Num(Expr(Name Expr(As Column(dtTOSA, colname))))
		));
	));
	newcol << Run Formulas;
	newcol << delete formula();
	
	orig_vals = Col Number(Column(dtTOSA, colname));
	new_vals = Col Number(Column(dtTOSA, (newcol << get name)));
	
	If(orig_vals == new_vals,
		dtTOSA << Delete Columns(colname);
		newcol << set name(colname);
	, 
		dtTOSA << Delete Columns(newcol)
	);
);

And if I were to do this from the beginning but still using a bit similar idea as you have, I would (at least at this second) do something like this

Names Default To Here(1);

dt = New Table("Untitled",
	Add Rows(4),
	Compress File When Saved(1),
	New Column("Column 1",
		Numeric,
		"Continuous",
		Format("Best", 12),
		Set Values([1, 2, 3, 4])
	),
	New Column("Column 2", Character, "Nominal", Set Values({"A", "1", "2", "3"})),
	New Column("Column 3", Character, "Nominal", Set Values({"1", "2", "3", "4"})),
	New Column("Column 4", Character, "Nominal", Set Values({"A", "B", "C", "D"}))
);
wait(1); // demo purposes

cols = dt << Get Column Names("Character", "String"); // We only need to check character columns

For Each({colname}, cols,
	dt_summary = dt << Summary(
		Group(Eval(colname)),
		Freq("None"),
		Weight("None"),
		Link to original data table(0),
		output table name("Summar of " || colname),
		Private
	);
	
	orig_vals = Col Number(Column(dt_summary, 1));
	Column(dt_summary, 1) << Set Data Type("Numeric");
	new_vals = Col Number(Column(dt_summary, 1));
	Close(dt_summary, no save);
	
	If(orig_vals == new_vals, // we can change
		Column(dt, colname) << Set Data Type("Numeric");
		Column(dt, colname) << Set Modeling Type("Continuous");
	);
);

I tend to do these utilizing Transform Each and Num, but sometimes it might be enough if you let JMP do the conversions (there are case where you might not want to do this though..., see Num() from scripting index and how it behaves with Num("5%") for example. << Restrict avoids this behaviour)

-Jarmo

pcarroll1 · Nov 15, 2024 02:11 PM

Jarmo,

Thanks for the thoughts. I sometimes have thousands of columns, and tens of thousands of rows, and speed can be an issue. The way I do it now actually reduced the time to ~1/3 of the way I did this previously, but that still can be many minutes. I could experiment with your method sometime, but I wonder if you think making a summary table each time would be slower or faster.

Regarding dealing with columns references, I've tried using Column(dt, colname) but it doesn't always work. But in your script you also use As Column(). As a rule, is As Column() used for formulas?

hogi · Nov 15, 2024 02:31 PM

@pcarroll1 wrote:
As a rule, is As Column() used for formulas?

yes, either as column(columnname) or :column(dt, columnname)[]

jthi · Nov 15, 2024 03:18 PM

For first "question": if you try to optimize for speed for specific use cases, you just have to try out which works best as there are many different methods and the fastest might depend a lot on your data. In this case I used summary table (or Summarize) to avoid checking for replicate values but it like I mentioned, depends on your data, on your JMP versions, and possibly on other things...

For column references; my rule is: always use column name and table reference as the starting point, and build the final reference from them. What I consider a reference depends where and how I'm using it. For example if I'm building a formula I want to have my formula to reference columns as :mycol (and not AsColumn(mycol) or Column(mycol)[], which are usually also fine). And I can get the :mycol format using the fairly long style of NameExpr(AsColumn(dt,colname)) with expression evaluation. Generally AsColumn(dt, colname) (OR dt:colname) is the same as Column(dt, colname)[Row()] (or Column(dt, colname)[]), which will get the value of current row (in my mind this isn't always like this. For example in platforms I consider :mycol being a reference instead of a specific value on a row...).

As an example this will return missing value as mycol will be evaluated immediately and the initial Row() value is 0 in JMP (not an issue inside a formula as the Row() will be updated all the time)

Names Default To Here(1); 

dt = open("$SAMPLE_DATA/Big Class.jmp");
colname = "age";

mycol = NameExpr(AsColumn(dt,colname));

show(mycol)

This is the most difficult difference to understand with column references, As Column() (value of the current row) vs Column() (reference to a column) in JMP. It is very difficult to get your head around how these work... I have learned it basically by failing many many times. And now I have a good idea which to use initially when building an expression (I might still be incorrect from time to time). This topic is something I wish to write a blog post at some point as my opinions (currently) might differ from others and I consider myself pretty good JMP scripter without being a JMP (JSL) developer.

One thing to note here: I'm not a JMP developer and I don't really even care how developers think JSL works (I do disagree from time to time with developers). All my answers are always based on my experiences with the JMP versions I have used.

I'm very sorry if this post is bit confusing (I might have had few beers this evening ). If you wish to have more explanation on my opinions on this topics, you can post to this topic (but tag me with @ so I will notice it more easily). You can also send me a private message if necessary.

-Jarmo