Subscribe Bookmark RSS Feed

Re: How to deal with missing data

hrkg

Community Trekker

Joined:

Jun 3, 2012

Hello,

I need to run a descriptive analysis on a huge number of columns. To do so I made the following script :

dt = Current Data Table();

column_names = dt << Get Column Names(numeric,String);

for(i=3,i<=NItems(column_names),i++,

  colName = column_names;

  cmd = Expr(

  ow = dt << Distribution(

  Continuous Distribution( Column( cName ) ),

  Where( :SEX == "F" )

   ); 

owr = ow<<report;

);

Substitute Into(cmd,Expr(cName),colName);

Eval(cmd);

);

Current Journal() << Save HTML(".....\test.htm");

Now, when I run it I get an error message in log window "One or more columns have nothing but missing values.".

I have some columns that all values or only those for specific "SEX" are missing data. How could I check it in my loop and skip them in analysis ?

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
ms

Super User

Joined:

Jun 23, 2011

Solution

The stack is just an intermediary step to arrange data in a single column. The error may be that either the label or data columns don't get the names "Position" and "Data" that the summary command assumes here. Take away the "invisible" option and look at the stacked table to see the column names, and change Summary accordingly if something is not named as intended (as noted in the jsl comment above the Stacked Data Column("Data") argument seems to be ignored in JMP 10 if columns are grouped as in your original table, hence the ungrouping command i my example above).

Instead of stack, if that's misbehavin, you can instead use summary and transpose in tandem to get a similar result. Try the example below (works with the posted table "Example") .

dt = Data Table( "Example" );

column_names = dt << Get Column names( numeric );

(dt << Summary(

  Group( :SEX ),

  Mean( Eval( column_names ) ),

  statistics column name format( "column" )

)) << Transpose( Label( :SEX ), Output Table( "Result" ) );

8 REPLIES
ms

Super User

Joined:

Jun 23, 2011

There are several ways to do it (as usual). Below is one approach that should work.

dt = Current Data Table();

column_names = dt << Get Column Names( numeric, String );

For( i = 3, i <= N Items( column_names ), i++,

  // if non missing in column i where SEX="F" > 0, do stuff, otherwise next i

  If( N Row( dt << get rows where( And( :SEX[] == "F", !Is Missing( Column( column_names[i] )[] ) ) ) ),

  colName = column_names[i];

  cmd = Expr(

  ow = dt << Distribution( Continuous Distribution( Column( cName ) ), Where( :SEX == "F" ) );

  owr = ow << report;

  );

  Substitute Into( cmd, Expr( cName ), colName );

  Eval( cmd );

  )

);

Current Journal() << Save HTML( ".....\test.htm" );

hrkg

Community Trekker

Joined:

Jun 3, 2012

Dear MS,

Thanks for your guidance. I added your suggested codes and I modified my script to export results to a table as followed :

results = New Table("Results", visible,

      New Column("Position", character),

      New Column("Female", numeric),

      New Column("Male", numeric),

      New Column("Ratio", Formula(Female/Male)),

      Add Rows(NItems(column_names))

     );

dt = Current Data Table();

column_names = dt << Get Column Names(numeric,String);

for(i=1,i<=NItems(column_names),i++,

    If( N Row( dt << get rows where( And( :SEX[] == "F", !Is Missing( Column( column_names )[] ) ) ) ),

  colName = column_names;

  cmd = Expr(

    ow = dt << Distribution( Continuous Distribution( Column( cName ) ),

          Where( :SEX == "F" ) );

    owr = ow<<report;

  

    allmean= owr[NumberColBox(2)];

    Fmean = allmean[1];

  

    allmean1= owr[NumberColBox(4)];

    Mmean = allmean1[1];

  

    Column(results,1) = colName;

    Column(results,2) = Fmean;

    Column(results,3) = Mmean;

    ow << Close Window;

    );

  Substitute Into( cmd, Expr( cName ), colName );

  Eval( cmd );

)

);

results << journal;

Now, when I run it it gives me an error for SEX. I attached an example of my data file for test.

Thanks in advance for your help.

ms

Super User

Joined:

Jun 23, 2011

I think that the error is caused by misplaced

dt = current datatable()

It assigns the new table "Results" rather than your original table to dt . And Results has no column named :SEX

Try dt = data table("Example") instead.

hrkg

Community Trekker

Joined:

Jun 3, 2012

I changed scripts as followed :

dt = data table("Example") ;

column_names = dt << Get Column Names(numeric,String);

results = New Table("Results", visible,

      New Column("Position", character),

      New Column("Female", numeric),

      New Column("Male", numeric),

      New Column("Ratio", Formula(Female/Male)),

      Add Rows(NItems(column_names))

     );

for(i=1,i<=NItems(column_names),i++,

    If( N Row( dt << get rows where( And( :SEX[] == "F", !Is Missing( Column( column_names )[] ) ) ) ),

  colName = column_names;

  cmd = Expr(

    ow = dt << Distribution( Continuous Distribution( Column( cName ) ),

          Where( :SEX == "F" ) );

    owr = ow<<report;

   

    allmean= owr[NumberColBox(2)];

    Fmean = allmean[1];

   

    allmean1= owr[NumberColBox(4)];

    Mmean = allmean1[1];

   

    Column(results,1) = colName;

    Column(results,2) = Fmean;

    Column(results,3) = Mmean;

    ow << Close Window;

    );

  Substitute Into( cmd, Expr( cName ), colName );

  Eval( cmd );

)

);

results << journal;

Now I get this message in log window

"couldn't find column{1}

in access or evaluation of 'Column' , Column( column_names )"

and it refers to this line :

":SEX[] == "F" & !Is Missing( Column/*###*/(::column_names[::i])[] )"

Thanks for advices


ms

Super User

Joined:

Jun 23, 2011

I don't get that message if applying the script to the example table posted above. However, owr[NumberColBox(4)]; gives an error because it does not exist. The code so far only gives results for "F" and not for "M" which I assume you also want to list in the results table.

If your only goal is to summarize means I would use the Summary() function instead of looping through a lot of distribution reports.

An example:

dt = Data Table( "Example" );

column_names = dt << Get Column Names( numeric );

dt << Ungroup Columns( column_names );// Appears to a bug in JMP 10 with stacked data column name and grouped columns

dt_stacked = dt << Stack(

  columns( column_names ),

  Source Label Column( "Position" ),

  Stacked Data Column( "Data" ),

  invisible

);

dt_sum = dt_stacked << Summary(

  Group( :Position ),

  Mean( :Data ) ,

  Subgroup( :SEX )

);

Column( 3 ) << set name( "Female" );

Column( 4 ) << set name( "male" );

dt_sum << New Column( "Ratio", Formula( Female / Male ) );

dt_stacked << close window;

hrkg

Community Trekker

Joined:

Jun 3, 2012

Dear MS,

Thanks a lot for suggesting this new method. I think it is faster and more efficient. However, when I run the script I get an error as followed :

Column not found in access or evaluation of 'Bad Argument'

Send Expects Scriptable Object in access or evaluation of 'Send' , dt_stacked << Summary( Group( :Position ), Mean( :Data ), Subgroup( :SEX ) )

In the following script, error marked by /*###*/

::dt = Data Table( "Example" );

::column_names = ::dt << Get Column Names( numeric );

::dt_stacked = ::dt << Stack(

    columns( ::column_names ),

    Source Label Column( "Position" ),

    Stacked Data Column( "Data" ),

    invisible

);

::dt_sum = ::dt_stacked <<  /*###*/Summary(

    Group( :Position ),

    Mean( :Data ),

    Subgroup( :SEX )

) /*###*/;

Column( 3 ) << set name( "Female" );

Column( 4 ) << set name( "male" );

dt_sum << New Column( "Ratio", Formula( Female / Male ) );

dt_stacked << close window;

It seems something is not right in creating stack. Do you have any idea why such an error raised ?

Thanks again for all your helps.

Highlighted
ms

Super User

Joined:

Jun 23, 2011

Solution

The stack is just an intermediary step to arrange data in a single column. The error may be that either the label or data columns don't get the names "Position" and "Data" that the summary command assumes here. Take away the "invisible" option and look at the stacked table to see the column names, and change Summary accordingly if something is not named as intended (as noted in the jsl comment above the Stacked Data Column("Data") argument seems to be ignored in JMP 10 if columns are grouped as in your original table, hence the ungrouping command i my example above).

Instead of stack, if that's misbehavin, you can instead use summary and transpose in tandem to get a similar result. Try the example below (works with the posted table "Example") .

dt = Data Table( "Example" );

column_names = dt << Get Column names( numeric );

(dt << Summary(

  Group( :SEX ),

  Mean( Eval( column_names ) ),

  statistics column name format( "column" )

)) << Transpose( Label( :SEX ), Output Table( "Result" ) );

hrkg

Community Trekker

Joined:

Jun 3, 2012

Dear MS,

This last script worked perfectly. I learned a lot in this discussion.

Thanks again for all your guidance.