Solved: Re: How to deal with missing data

Report Inappropriate Content · Jun 12, 2012 08:22 AM

Hello,

I need to run a descriptive analysis on a huge number of columns. To do so I made the following script :

dt = Current Data Table();

column_names = dt << Get Column Names(numeric,String);

for(i=3,i<=NItems(column_names),i++,

colName = column_names;

cmd = Expr(

ow = dt << Distribution(

Continuous Distribution( Column( cName ) ),

Where( :SEX == "F" )

);

owr = ow<<report;

);

Substitute Into(cmd,Expr(cName),colName);

Eval(cmd);

);

Current Journal() << Save HTML(".....\test.htm");

Now, when I run it I get an error message in log window "One or more columns have nothing but missing values.".

I have some columns that all values or only those for specific "SEX" are missing data. How could I check it in my loop and skip them in analysis ?

Thanks

ms · Jun 16, 2012 06:22 PM

The stack is just an intermediary step to arrange data in a single column. The error may be that either the label or data columns don't get the names "Position" and "Data" that the summary command assumes here. Take away the "invisible" option and look at the stacked table to see the column names, and change Summary accordingly if something is not named as intended (as noted in the jsl comment above the Stacked Data Column("Data") argument seems to be ignored in JMP 10 if columns are grouped as in your original table, hence the ungrouping command i my example above).

Instead of stack, if that's misbehavin, you can instead use summary and transpose in tandem to get a similar result. Try the example below (works with the posted table "Example") .

dt = Data Table( "Example" );

column_names = dt << Get Column names( numeric );

(dt << Summary(

Group( :SEX ),

Mean( Eval( column_names ) ),

statistics column name format( "column" )

)) << Transpose( Label( :SEX ), Output Table( "Result" ) );

View solution in original post

ms · Jun 12, 2012 08:54 AM

There are several ways to do it (as usual). Below is one approach that should work.

dt = Current Data Table();

column_names = dt << Get Column Names( numeric, String );

For( i = 3, i <= N Items( column_names ), i++,

// if non missing in column i where SEX="F" > 0, do stuff, otherwise next i

If( N Row( dt << get rows where( And( :SEX[] == "F", !Is Missing( Column( column_names[i] )[] ) ) ) ),

colName = column_names[i];

cmd = Expr(

ow = dt << Distribution( Continuous Distribution( Column( cName ) ), Where( :SEX == "F" ) );

owr = ow << report;

);

Substitute Into( cmd, Expr( cName ), colName );

Eval( cmd );

)

);

Current Journal() << Save HTML( ".....\test.htm" );

hrkg · Jun 12, 2012 10:03 AM

Dear MS,

Thanks for your guidance. I added your suggested codes and I modified my script to export results to a table as followed :

results = New Table("Results", visible,

New Column("Position", character),

New Column("Female", numeric),

New Column("Male", numeric),

New Column("Ratio", Formula(Female/Male)),

Add Rows(NItems(column_names))

);

dt = Current Data Table();

column_names = dt << Get Column Names(numeric,String);

for(i=1,i<=NItems(column_names),i++,

If( N Row( dt << get rows where( And( :SEX[] == "F", !Is Missing( Column( column_names )[] ) ) ) ),

colName = column_names;

cmd = Expr(

ow = dt << Distribution( Continuous Distribution( Column( cName ) ),

Where( :SEX == "F" ) );

owr = ow<<report;

allmean= owr[NumberColBox(2)];

Fmean = allmean[1];

allmean1= owr[NumberColBox(4)];

Mmean = allmean1[1];

Column(results,1) = colName;

Column(results,2) = Fmean;

Column(results,3) = Mmean;

ow << Close Window;

);

Substitute Into( cmd, Expr( cName ), colName );

Eval( cmd );

)

);

results << journal;

Now, when I run it it gives me an error for SEX. I attached an example of my data file for test.

Thanks in advance for your help.

ms · Jun 12, 2012 10:15 AM

I think that the error is caused by misplaced

dt = current datatable()

It assigns the new table "Results" rather than your original table to dt . And Results has no column named :SEX

Try dt = data table("Example") instead.

hrkg · Jun 12, 2012 10:32 AM

I changed scripts as followed :

dt = data table("Example") ;

column_names = dt << Get Column Names(numeric,String);

results = New Table("Results", visible,

New Column("Position", character),

New Column("Female", numeric),

New Column("Male", numeric),

New Column("Ratio", Formula(Female/Male)),

Add Rows(NItems(column_names))

);

for(i=1,i<=NItems(column_names),i++,

If( N Row( dt << get rows where( And( :SEX[] == "F", !Is Missing( Column( column_names )[] ) ) ) ),

colName = column_names;

cmd = Expr(

ow = dt << Distribution( Continuous Distribution( Column( cName ) ),

Where( :SEX == "F" ) );

owr = ow<<report;

allmean= owr[NumberColBox(2)];

Fmean = allmean[1];

allmean1= owr[NumberColBox(4)];

Mmean = allmean1[1];

Column(results,1) = colName;

Column(results,2) = Fmean;

Column(results,3) = Mmean;

ow << Close Window;

);

Substitute Into( cmd, Expr( cName ), colName );

Eval( cmd );

)

);

results << journal;

Now I get this message in log window

"couldn't find column{1}

in access or evaluation of 'Column' , Column( column_names )"

and it refers to this line :

":SEX[] == "F" & !Is Missing( Column/*###*/(::column_names[::i])[] )"

Thanks for advices

ms · Jun 12, 2012 11:51 AM

I don't get that message if applying the script to the example table posted above. However, owr[NumberColBox(4)]; gives an error because it does not exist. The code so far only gives results for "F" and not for "M" which I assume you also want to list in the results table.

If your only goal is to summarize means I would use the Summary() function instead of looping through a lot of distribution reports.

An example:

dt = Data Table( "Example" );

column_names = dt << Get Column Names( numeric );

dt << Ungroup Columns( column_names );// Appears to a bug in JMP 10 with stacked data column name and grouped columns

dt_stacked = dt << Stack(

columns( column_names ),

Source Label Column( "Position" ),

Stacked Data Column( "Data" ),

invisible

);

dt_sum = dt_stacked << Summary(

Group( :Position ),

Mean( :Data ) ,

Subgroup( :SEX )

);

Column( 3 ) << set name( "Female" );

Column( 4 ) << set name( "male" );

dt_sum << New Column( "Ratio", Formula( Female / Male ) );

dt_stacked << close window;

hrkg · Jun 16, 2012 02:40 PM

Dear MS,

Thanks a lot for suggesting this new method. I think it is faster and more efficient. However, when I run the script I get an error as followed :

Column not found in access or evaluation of 'Bad Argument'

Send Expects Scriptable Object in access or evaluation of 'Send' , dt_stacked << Summary( Group( :Position ), Mean( :Data ), Subgroup( :SEX ) )

In the following script, error marked by /*###*/

::dt = Data Table( "Example" );

::column_names = ::dt << Get Column Names( numeric );

::dt_stacked = ::dt << Stack(

columns( ::column_names ),

Source Label Column( "Position" ),

Stacked Data Column( "Data" ),

invisible

);

::dt_sum = ::dt_stacked << /*###*/Summary(

Group( :Position ),

Mean( :Data ),

Subgroup( :SEX )

) /*###*/;

Column( 3 ) << set name( "Female" );

Column( 4 ) << set name( "male" );

dt_sum << New Column( "Ratio", Formula( Female / Male ) );

dt_stacked << close window;

It seems something is not right in creating stack. Do you have any idea why such an error raised ?

Thanks again for all your helps.

ms · Jun 16, 2012 06:22 PM

The stack is just an intermediary step to arrange data in a single column. The error may be that either the label or data columns don't get the names "Position" and "Data" that the summary command assumes here. Take away the "invisible" option and look at the stacked table to see the column names, and change Summary accordingly if something is not named as intended (as noted in the jsl comment above the Stacked Data Column("Data") argument seems to be ignored in JMP 10 if columns are grouped as in your original table, hence the ungrouping command i my example above).

Instead of stack, if that's misbehavin, you can instead use summary and transpose in tandem to get a similar result. Try the example below (works with the posted table "Example") .

dt = Data Table( "Example" );

column_names = dt << Get Column names( numeric );

(dt << Summary(

Group( :SEX ),

Mean( Eval( column_names ) ),

statistics column name format( "column" )

)) << Transpose( Label( :SEX ), Output Table( "Result" ) );

hrkg · Jun 17, 2012 01:21 AM

Dear MS,

This last script worked perfectly. I learned a lot in this discussion.

Thanks again for all your guidance.

How to deal with missing data

Re: How to deal with missing data

Re: How to deal with missing data

Re: How to deal with missing data

Re: How to deal with missing data

Re: How to deal with missing data

Re: How to deal with missing data

Re: How to deal with missing data

Re: How to deal with missing data

Re: How to deal with missing data

Recommended Articles

Creating Heat Maps

Get Going with JMP: Essentials for Using JMP