Please see enclosed script,
it generates a table with continuous x, and a bivariate response.
There are some scripts for vizualization and modeling.
You can try response screening, and Fit model for finding relevant variables.
As there are as many columns 20K, PCA for the continuous x would be an option (variable reduction technique).
In this example there is no outcome from it, as all x are independent (random normal).
Names Default To Here( 1 );
cdt = New Table( "data", New Column( "response", Character ) );
cdt << add rows( 100 );
For( i = 1, i <= 10, i++,
cdt << New Column( "col" || Char( i ), Continuous, set each value( Random Normal( i, 1 ) ) )
);
cdt:response << set formula( If( :col5 < 4 & :col6 < 5, "B", "A" ) );
cdt <<
Add Properties to Table(
{New Script(
"col6 vs. col5",
Graph Builder(
Variables( X( :col5 ), Y( :col6 ), Overlay( :response ) ),
Elements( Points( X, Y, Legend( 13 ) ), Smoother( X, Y, Legend( 14 ) ) )
)
), New Script(
"Scatterplot Matrix",
Scatterplot Matrix(
Y(
:response,
:col1,
:col2,
:col3,
:col4,
:col5,
:col6,
:col7,
:col8,
:col9,
:col10
),
Matrix Format( "Lower Triangular" )
)
), New Script(
"Response Screening of response",
Response Screening(
Y( :response ),
X(
:col1,
:col2,
:col3,
:col4,
:col5,
:col6,
:col7,
:col8,
:col9,
:col10
)
)
), New Script(
"Fit Nominal Logistic",
Fit Model(
Y( :response ),
Effects(
:col1,
:col2,
:col3,
:col4,
:col5,
:col6,
:col7,
:col8,
:col9,
:col10
),
Personality( "Nominal Logistic" ),
Run( Likelihood Ratio Tests( 1 ), Wald Tests( 0 ) )
)
), New Script(
"Principal Components",
Principal Components(
Y(
:col1,
:col2,
:col3,
:col4,
:col5,
:col6,
:col7,
:col8,
:col9,
:col10
),
Estimation Method( "Default" ),
"on Correlations"
)
)}
);
Georg