Subscribe Bookmark RSS Feed

Find Character Patterns in Data Set

robot

Community Trekker

Joined:

Feb 27, 2012

Hi,

I have a data set that lists widgets in one column and a long string of defect codes in another column.  Is there a platform in JMP well suited to find character patterns?  For example, assuming defect codes A-Z, which may appear in any quantity per row, do the codes {A,E,F} often appear together?  Or maybe sometimes the codes {H,L,P,X} often appear together?  I am using JMP10.  Thanks!

Example Table


nt = New Table( "Defect Codes",


  Add Rows( 3 ),


  New Column( "Widget ID",


  Numeric,


  Continuous,


  Format( "Best", 12 ),


  Set Values( [1, 2, 3] )


  ),


  New Column( "Defect Codes",


  Character,


  Nominal,


  Set Values( {"K,V,Q,D", "A,P", "N,D,H"} )


  )


);


1 ACCEPTED SOLUTION

Accepted Solutions
Solution

I agree with michael that this is an interesting problem. One idea is to expand the error codes over multiple columns (lets say columns A to Z) and perform a missing data pattern (Tables Menu). The pattern may directly be useful for illustrating associations between defect codes, but an additional step could be to apply Multivariate() on the missing pattern. Or even som flavour of the categorial platform.

A jsl example:

//list of all defect codes

def_code_list = {"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O",

"P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z"};

n = N Items( def_code_list );

//–––––––––––––––––––––––

// Example table

nt = New Table( "Defect Codes",

  Add Rows( 200 ),

  New Column( "Widget ID", Numeric, Continuous ),

  New Column( "Defect Codes", Character, Nominal, )

); 

Column( 1 ) << Set Values( Index( 1, N Row( nt ) ) );

//Add random effect codes

For Each Row(

  :Defect Codes = Substitute(

  Char( def_code_list[Random Index( n, Round( Random Exp() + 2, 0 ) )] ),

  "{", "",

  "}", "",

  "\!"", ""

  )

);

//contaminate with some common combinations

:Defect codes[Random Index( 200, 10 )] = "A, B, C";

:Defect codes[Random Index( 200, 20 )] = "M, N, O";

// Expand to multiple columns (and make column list)

collist = {};

For( i = 1, i <= n, i++,

  col = nt << New Column( def_code_list[i], character, nominal );

  Insert Into( collist, col );

);

For Each Row(

  ID_codelist = Words( :Defect Codes[], ", " );

  For( i = 1, i <= N Items( ID_codelist ), i++,

  Column( nt, Loc( def_code_list, ID_codelist[i] )[1] + 2 )[] = ID_codelist[i]

  );

);

//––––––––––––––––––––––––

// Analyze data

// Missing data pattern

mdp = nt << missing data pattern( columns( Eval( collist ) ) );

// Multivariate on missing pattern

mdp << multivariate( columns( Eval( collist ) ), freq( :Count ) );

3 REPLIES
michaelhaslam_p

Community Trekker

Joined:

Sep 15, 2013


Robot,

This is an interesting and perhaps complex problem.  I do not know of an out of the box platform for this, depending.  I believe you will need to construct a scripting (JSL) solution.

The operative question is what is a pattern?  Is it any string of arbitrary length from  long string?  So, for:

A,B,C,D,E,F,G,H

the patterns might be:

A,

B,

A,B

A,B,C

B,C

A,B,C,D

... etc.

In the above, it could be a very large computing problem.

Or:  Are the patterns discrete, well defined blocks that can be separated out.  In this case, separate them out in a single, stacked column and use the distribution platform.

Michael Haslam

Solution

I agree with michael that this is an interesting problem. One idea is to expand the error codes over multiple columns (lets say columns A to Z) and perform a missing data pattern (Tables Menu). The pattern may directly be useful for illustrating associations between defect codes, but an additional step could be to apply Multivariate() on the missing pattern. Or even som flavour of the categorial platform.

A jsl example:

//list of all defect codes

def_code_list = {"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O",

"P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z"};

n = N Items( def_code_list );

//–––––––––––––––––––––––

// Example table

nt = New Table( "Defect Codes",

  Add Rows( 200 ),

  New Column( "Widget ID", Numeric, Continuous ),

  New Column( "Defect Codes", Character, Nominal, )

); 

Column( 1 ) << Set Values( Index( 1, N Row( nt ) ) );

//Add random effect codes

For Each Row(

  :Defect Codes = Substitute(

  Char( def_code_list[Random Index( n, Round( Random Exp() + 2, 0 ) )] ),

  "{", "",

  "}", "",

  "\!"", ""

  )

);

//contaminate with some common combinations

:Defect codes[Random Index( 200, 10 )] = "A, B, C";

:Defect codes[Random Index( 200, 20 )] = "M, N, O";

// Expand to multiple columns (and make column list)

collist = {};

For( i = 1, i <= n, i++,

  col = nt << New Column( def_code_list[i], character, nominal );

  Insert Into( collist, col );

);

For Each Row(

  ID_codelist = Words( :Defect Codes[], ", " );

  For( i = 1, i <= N Items( ID_codelist ), i++,

  Column( nt, Loc( def_code_list, ID_codelist[i] )[1] + 2 )[] = ID_codelist[i]

  );

);

//––––––––––––––––––––––––

// Analyze data

// Missing data pattern

mdp = nt << missing data pattern( columns( Eval( collist ) ) );

// Multivariate on missing pattern

mdp << multivariate( columns( Eval( collist ) ), freq( :Count ) );

robot

Community Trekker

Joined:

Feb 27, 2012

Wow!  Very slick.  Thanks for the input.