Subscribe Bookmark
Richard_Zink

Staff

Joined:

May 27, 2014

Using JSL to select the first or last record in a set

When considering my professional life, I'm a relative newcomer to JMP. In fact, my JMP and JSL (JMP Scripting Language) experience began the day I became a developer here in the JMP Life Sciences group, about four years ago. In contrast, I've been using SAS since I was an undergraduate student in mathematics, giving me more than 20 years of experience. Though there are a lot of differences between the two languages, one of the things I enjoy most about my job is using this very large and rich toolset to develop cool new features for JMP Clinical.

Even though I have access to both JMP and SAS, I sometimes find myself wondering how to replicate a feature in SAS using JSL. The subject of this blog post is how to replicate the SAS data step behavior to select first and last records within by groups. In SAS, this is performed by first sorting the data and using special keywords within the data step. For example, if I had the Big Class data table formatted in SAS and wanted to choose the first boy and girl alphabetically, the program below would accomplish this:

proc sort data = bigclass;

by sex name;

run;

data first;

set bigclass;

by sex name;

if first.sex;

run;

To choose the last boy and girl alphabetically, I would use if last.sex;.   Similar logic applies if I wanted to select first and last within additional by groups. Say, I wanted to pick the shortest person for each name within gender, the following would do that:

proc sort data = bigclass;

by sex name height;

run;

data first;

set bigclass;

by sex name height;

if first.name;

run;

My colleague Brady Brady was kind enough to help me solve this within JSL using Col Min() and Col Max(). Figure 1 contains a sample data set. The original data only included Letter (A through J) and Letter 2 (A through D). The data table is sorted by these two variables. You can see below that First.Letter is equal to 1 for the first instance of each letter in Letter (0 otherwise); Last.Letter is a 1 for the last instance of each letter in Letter (0 otherwise). The formulas for these variables is as follows:

First.Letter = If( Row() == Col Minimum( Row(), :Letter ),

1,

0

)

Last.Letter = If( Row() == Col Maximum( Row(), :Letter ),

1,

0

)

To interpret these formulas, the Col M**imum() functions are using :Letter as a by group and getting the minimum or maximum row numbers within each value of Letter. The If() function sets First.Letter = 1 if the current row number equals the :Letter-specific minimum row number. Note that if a particular value for :Letter only had a single row, that record would be considered both the first and the last, with First.Letter = 1 and Last.Letter = 1 

Figure 1. Data Table FirstLast

Figure 1. Data Table FirstLast

Similar to the above code for SAS, this can be extended for multiple by variables. Say I wanted to select the first instance of each letter of Letter2 within Letter, I simply add an additional byvar to the function.

First.Letter2 = If( Row() == Col Minimum( Row(), :Letter, :Letter2 ),

1,

0

)

Last.Letter2 = If( Row() == Col Maximum( Row(), :Letter, :Letter2 ),

1,

0

)

These functions have a useful feature in that it allows you to easily subset to by groups where duplicates occur.  Selecting rows where First.Letter = 0 or Last.Letter = 0 will select all rows where :Letter is repeated. In the context of Big Class, using these functions on :Height can allow you to create a subtable where specific heights may be repeated.

Note that in SAS, a sort is required to create the special first.XX last.XX variables.  Though the data table above is sorted, this is not a requirement for the JSL code above to work.

Many thanks to Brady Brady for his help!