Subscribe Bookmark RSS Feed

Select where based on multiple columns

adiezmann

Community Trekker

Joined:

Mar 1, 2012

I'm not sure if there's a good way to do what I'm hoping to do, but I have the following situation:

I have a data file which is the result of collecting measurements every second. There's a timestamp column, a status column, and then all my data columns.

There's three different possible entries in the status column, "RAMP", "WAIT", and "READ". The general ordering of data by time results in, say, about 20 rows of "RAMP" followed by 85 rows of "WAIT" followed by 9 rows of "READ", which then repeats a dozen times. My regions of interest, which I would like to select, are the last 30 rows within each section labeled as "WAIT".

Obviously, it's extremely simple to select all "WAIT" rows, or the last 30 rows of the entire file, but I can't figure out how to select only the last 30 of each section.

Is there a good way of solving this?

7 REPLIES
astounding

Community Trekker

Joined:

Jan 26, 2012

You posted this in the JMP forum, so I'm not sure if this helps or not.  Using base SAS, it's a relatively short program if you use the right tools:

data last_30_waits;

   group_num + 1;

   n_waits=0;

   do until (last.status and status='WAIT');

      set measurements (keep=status);

      by status notsorted;

      if status='WAIT' then n_waits + 1;

   end;

   ** At this point, group_num identifies a set of WAITs, and n_waits shows how many records are in the group;

   n_found=0;

   do until (last.status and status='WAIT');

      set measurements;

      by status notsorted;

      if status='WAIT' then do;

         n_found + 1;

         if n_found >= n_waits - 29 then output;

      end;

   end;

   drop n_found;

run;

Good luck.

adiezmann

Community Trekker

Joined:

Mar 1, 2012

Unfortunately, I'm currently JMP only, no basic SAS - but thanks for the suggestion.

pmroz

Super User

Joined:

Jun 23, 2011

Here's one way to do it.  You search for all the WAIT rows and then loop through the row numbers.  When there's a gap larger than 1 you've found the end of a section.  The code below builds two lists that identify the starting and ending row numbers for each section.  I created a small table for this example; change the ntail parameter to 30 for your data.

dt = New Table( "temp",

    Add Rows( 39 ),

    New Column( "Status",

        Character,

        Nominal,

        Set Values(

            {"RAMP", "RAMP", "RAMP", "WAIT", "WAIT", "WAIT", "WAIT", "WAIT", "READ",

            "READ", "READ", "READ", "READ", "RAMP", "RAMP", "RAMP", "WAIT", "WAIT",

            "WAIT", "WAIT", "WAIT", "READ", "READ", "READ", "READ", "READ", "RAMP",

            "RAMP", "RAMP", "WAIT", "WAIT", "WAIT", "WAIT", "WAIT", "READ", "READ",

            "READ", "READ", "READ"}

        )

    ),

);

wait_rows = dt << get rows where(:Status == "WAIT");

nr = nrows(wait_rows);

wait_start_list = {};

wait_end_list   = {};

old_row         = wait_rows[1] - 1;

// Number of WAIT rows to get

ntail           = 4;

// This will get the last 4 WAIT rows.  Replace with 30 to get the last 30 WAIT rows

for (i = 1, i <= nr, i++,

    new_row = wait_rows[i];

// Look for a gap larger than one in the list of rows

    if ((new_row - old_row) > 1,

        insertinto(wait_start_list, (old_row - ntail + 1));

        insertinto(wait_end_list, old_row);

    );

    old_row = new_row;

);

// Take care of the last set of WAITs

old_row = wait_rows[nr];

insertinto(wait_start_list, (old_row - ntail + 1));

insertinto(wait_end_list, old_row);

adiezmann

Community Trekker

Joined:

Mar 1, 2012

Thank you, that gets me really close, but I have one more clueless question - what's the best way to select everything *between* each pair of wait_start_list/wait_end_list numbers? I haven't worked with lists before. (I'm really new to scripting in general, but I'm trying to learn as I go the best I can.)

adiezmann

Community Trekker

Joined:

Mar 1, 2012

I found a way to make the list I wanted - I added:

last_30_waits = {};

for (j = 1, j<=3, j++,

for (i = 0, i<=3, i++,

insertinto(last_30_waits,(wait_end_list - i))));

dt << Select Rows(last_30_waits);

But I have a feeling that's not optimal. Also, is there a function which returns the number of entries in a list? I'd like to replace the "3" in my first for loop with that function, but I'm having trouble finding it.

adiezmann

Community Trekker

Joined:

Mar 1, 2012

Another function I *know* must exist but I just can't find - current data table() returns a reference I can assign to dt, but I'm looking for a way to extract the name of the table - I have a script I'd like to run on many different tables, and I want to start it on the current table, and then create a new subset table named "<descriptive string> <name of current name>". I was expecting output table name("string" dt) or ("string" $dt) to work, something like that, but it doesn't. I'm hunting through documentation looking, but thought I'd throw this comment out there in case.

ms

Super User

Joined:

Jun 23, 2011

How to get table name and number of list items:

dt=current datatable();

dt_name = dt<<get name;

lst={"a", "b", "c"};

nitems(lst);

Use || to concatenate a string with another:

"string" || (dt<<get name)