cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
hrkg
Level I

How to deal with missing data

Hello,

I need to run a descriptive analysis on a huge number of columns. To do so I made the following script :

dt = Current Data Table();

column_names = dt << Get Column Names(numeric,String);

for(i=3,i<=NItems(column_names),i++,

  colName = column_names;

  cmd = Expr(

  ow = dt << Distribution(

  Continuous Distribution( Column( cName ) ),

  Where( :SEX == "F" )

   ); 

owr = ow<<report;

);

Substitute Into(cmd,Expr(cName),colName);

Eval(cmd);

);

Current Journal() << Save HTML(".....\test.htm");

Now, when I run it I get an error message in log window "One or more columns have nothing but missing values.".

I have some columns that all values or only those for specific "SEX" are missing data. How could I check it in my loop and skip them in analysis ?

Thanks

1 ACCEPTED SOLUTION

Accepted Solutions
ms
Super User (Alumni) ms
Super User (Alumni)

Re: How to deal with missing data

The stack is just an intermediary step to arrange data in a single column. The error may be that either the label or data columns don't get the names "Position" and "Data" that the summary command assumes here. Take away the "invisible" option and look at the stacked table to see the column names, and change Summary accordingly if something is not named as intended (as noted in the jsl comment above the Stacked Data Column("Data") argument seems to be ignored in JMP 10 if columns are grouped as in your original table, hence the ungrouping command i my example above).

Instead of stack, if that's misbehavin, you can instead use summary and transpose in tandem to get a similar result. Try the example below (works with the posted table "Example") .

dt = Data Table( "Example" );

column_names = dt << Get Column names( numeric );

(dt << Summary(

  Group( :SEX ),

  Mean( Eval( column_names ) ),

  statistics column name format( "column" )

)) << Transpose( Label( :SEX ), Output Table( "Result" ) );

View solution in original post

8 REPLIES 8
ms
Super User (Alumni) ms
Super User (Alumni)

Re: How to deal with missing data

There are several ways to do it (as usual). Below is one approach that should work.

dt = Current Data Table();

column_names = dt << Get Column Names( numeric, String );

For( i = 3, i <= N Items( column_names ), i++,

  // if non missing in column i where SEX="F" > 0, do stuff, otherwise next i

  If( N Row( dt << get rows where( And( :SEX[] == "F", !Is Missing( Column( column_names[i] )[] ) ) ) ),

  colName = column_names[i];

  cmd = Expr(

  ow = dt << Distribution( Continuous Distribution( Column( cName ) ), Where( :SEX == "F" ) );

  owr = ow << report;

  );

  Substitute Into( cmd, Expr( cName ), colName );

  Eval( cmd );

  )

);

Current Journal() << Save HTML( ".....\test.htm" );

hrkg
Level I

Re: How to deal with missing data

Dear MS,

Thanks for your guidance. I added your suggested codes and I modified my script to export results to a table as followed :

results = New Table("Results", visible,

      New Column("Position", character),

      New Column("Female", numeric),

      New Column("Male", numeric),

      New Column("Ratio", Formula(Female/Male)),

      Add Rows(NItems(column_names))

     );

dt = Current Data Table();

column_names = dt << Get Column Names(numeric,String);

for(i=1,i<=NItems(column_names),i++,

    If( N Row( dt << get rows where( And( :SEX[] == "F", !Is Missing( Column( column_names )[] ) ) ) ),

  colName = column_names;

  cmd = Expr(

    ow = dt << Distribution( Continuous Distribution( Column( cName ) ),

          Where( :SEX == "F" ) );

    owr = ow<<report;

  

    allmean= owr[NumberColBox(2)];

    Fmean = allmean[1];

  

    allmean1= owr[NumberColBox(4)];

    Mmean = allmean1[1];

  

    Column(results,1) = colName;

    Column(results,2) = Fmean;

    Column(results,3) = Mmean;

    ow << Close Window;

    );

  Substitute Into( cmd, Expr( cName ), colName );

  Eval( cmd );

)

);

results << journal;

Now, when I run it it gives me an error for SEX. I attached an example of my data file for test.

Thanks in advance for your help.

ms
Super User (Alumni) ms
Super User (Alumni)

Re: How to deal with missing data

I think that the error is caused by misplaced

dt = current datatable()

It assigns the new table "Results" rather than your original table to dt . And Results has no column named :SEX

Try dt = data table("Example") instead.

hrkg
Level I

Re: How to deal with missing data

I changed scripts as followed :

dt = data table("Example") ;

column_names = dt << Get Column Names(numeric,String);

results = New Table("Results", visible,

      New Column("Position", character),

      New Column("Female", numeric),

      New Column("Male", numeric),

      New Column("Ratio", Formula(Female/Male)),

      Add Rows(NItems(column_names))

     );

for(i=1,i<=NItems(column_names),i++,

    If( N Row( dt << get rows where( And( :SEX[] == "F", !Is Missing( Column( column_names )[] ) ) ) ),

  colName = column_names;

  cmd = Expr(

    ow = dt << Distribution( Continuous Distribution( Column( cName ) ),

          Where( :SEX == "F" ) );

    owr = ow<<report;

   

    allmean= owr[NumberColBox(2)];

    Fmean = allmean[1];

   

    allmean1= owr[NumberColBox(4)];

    Mmean = allmean1[1];

   

    Column(results,1) = colName;

    Column(results,2) = Fmean;

    Column(results,3) = Mmean;

    ow << Close Window;

    );

  Substitute Into( cmd, Expr( cName ), colName );

  Eval( cmd );

)

);

results << journal;

Now I get this message in log window

"couldn't find column{1}

in access or evaluation of 'Column' , Column( column_names )"

and it refers to this line :

":SEX[] == "F" & !Is Missing( Column/*###*/(::column_names[::i])[] )"

Thanks for advices


ms
Super User (Alumni) ms
Super User (Alumni)

Re: How to deal with missing data

I don't get that message if applying the script to the example table posted above. However, owr[NumberColBox(4)]; gives an error because it does not exist. The code so far only gives results for "F" and not for "M" which I assume you also want to list in the results table.

If your only goal is to summarize means I would use the Summary() function instead of looping through a lot of distribution reports.

An example:

dt = Data Table( "Example" );

column_names = dt << Get Column Names( numeric );

dt << Ungroup Columns( column_names );// Appears to a bug in JMP 10 with stacked data column name and grouped columns

dt_stacked = dt << Stack(

  columns( column_names ),

  Source Label Column( "Position" ),

  Stacked Data Column( "Data" ),

  invisible

);

dt_sum = dt_stacked << Summary(

  Group( :Position ),

  Mean( :Data ) ,

  Subgroup( :SEX )

);

Column( 3 ) << set name( "Female" );

Column( 4 ) << set name( "male" );

dt_sum << New Column( "Ratio", Formula( Female / Male ) );

dt_stacked << close window;

hrkg
Level I

Re: How to deal with missing data

Dear MS,

Thanks a lot for suggesting this new method. I think it is faster and more efficient. However, when I run the script I get an error as followed :

Column not found in access or evaluation of 'Bad Argument'

Send Expects Scriptable Object in access or evaluation of 'Send' , dt_stacked << Summary( Group( :Position ), Mean( :Data ), Subgroup( :SEX ) )

In the following script, error marked by /*###*/

::dt = Data Table( "Example" );

::column_names = ::dt << Get Column Names( numeric );

::dt_stacked = ::dt << Stack(

    columns( ::column_names ),

    Source Label Column( "Position" ),

    Stacked Data Column( "Data" ),

    invisible

);

::dt_sum = ::dt_stacked <<  /*###*/Summary(

    Group( :Position ),

    Mean( :Data ),

    Subgroup( :SEX )

) /*###*/;

Column( 3 ) << set name( "Female" );

Column( 4 ) << set name( "male" );

dt_sum << New Column( "Ratio", Formula( Female / Male ) );

dt_stacked << close window;

It seems something is not right in creating stack. Do you have any idea why such an error raised ?

Thanks again for all your helps.

ms
Super User (Alumni) ms
Super User (Alumni)

Re: How to deal with missing data

The stack is just an intermediary step to arrange data in a single column. The error may be that either the label or data columns don't get the names "Position" and "Data" that the summary command assumes here. Take away the "invisible" option and look at the stacked table to see the column names, and change Summary accordingly if something is not named as intended (as noted in the jsl comment above the Stacked Data Column("Data") argument seems to be ignored in JMP 10 if columns are grouped as in your original table, hence the ungrouping command i my example above).

Instead of stack, if that's misbehavin, you can instead use summary and transpose in tandem to get a similar result. Try the example below (works with the posted table "Example") .

dt = Data Table( "Example" );

column_names = dt << Get Column names( numeric );

(dt << Summary(

  Group( :SEX ),

  Mean( Eval( column_names ) ),

  statistics column name format( "column" )

)) << Transpose( Label( :SEX ), Output Table( "Result" ) );

hrkg
Level I

Re: How to deal with missing data

Dear MS,

This last script worked perfectly. I learned a lot in this discussion.

Thanks again for all your guidance.