Turn on suggestions

Auto-suggest helps you quickly narrow down your search results by suggesting possible matches as you type.

Showing results for

- JMP User Community
- :
- Discussions
- :
- Discussions
- :
- Find Character Patterns in Data Set

Topic Options

- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Sep 20, 2013 10:52 PM
(3925 views)

Hi,

I have a data set that lists widgets in one column and a long string of defect codes in another column. Is there a platform in JMP well suited to find character patterns? For example, assuming defect codes A-Z, which may appear in any quantity per row, do the codes {A,E,F} often appear together? Or maybe sometimes the codes {H,L,P,X} often appear together? I am using JMP10. Thanks!

**Example Table**

nt = New Table( "Defect Codes",

Add Rows( 3 ),

New Column( "Widget ID",

Numeric,

Continuous,

Format( "Best", 12 ),

Set Values( [1, 2, 3] )

),

New Column( "Defect Codes",

Character,

Nominal,

Set Values( {"K,V,Q,D", "A,P", "N,D,H"} )

)

);

1 ACCEPTED SOLUTION

Accepted Solutions

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

I agree with michael that this is an interesting problem. One idea is to expand the error codes over multiple columns (lets say columns A to Z) and perform a missing data pattern (Tables Menu). The pattern may directly be useful for illustrating associations between defect codes, but an additional step could be to apply Multivariate() on the missing pattern. Or even som flavour of the categorial platform.

A jsl example:

//list of all defect codes

def_code_list = **{**"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O",

"P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z"**}**;

n = N Items**(** def_code_list **)**;

//–––––––––––––––––––––––

// Example table

nt = New Table**(** "Defect Codes",

Add Rows**(** **200** **)**,

New Column**(** "Widget ID", Numeric, Continuous **)**,

New Column**(** "Defect Codes", Character, Nominal, **)**

**)**;

Column**(** **1** **)** << Set Values**(** Index**(** **1**, N Row**(** nt **)** **)** **)**;

//Add random effect codes

For Each Row**(**

:Defect Codes = Substitute**(**

Char**(** def_code_list**[**Random Index**(** n, Round**(** Random Exp**()** + **2**, **0** **)** **)]** **)**,

"{", "",

"}", "",

"\!"", ""

**)**

**)**;

//contaminate with some common combinations

:Defect codes**[**Random Index**(** **200**, **10** **)]** = "A, B, C";

:Defect codes**[**Random Index**(** **200**, **20** **)]** = "M, N, O";

// Expand to multiple columns (and make column list)

collist = **{}**;

For**(** i = **1**, i <= n, i++,

col = nt << New Column**(** def_code_list**[**i**]**, character, nominal **)**;

Insert Into**(** collist, col **)**;

**)**;

For Each Row**(**

ID_codelist = Words**(** :Defect Codes**[]**, ", " **)**;

For**(** i = **1**, i <= N Items**(** ID_codelist **)**, i++,

Column**(** nt, Loc**(** def_code_list, ID_codelist**[**i**]** **)[****1****]** + **2** **)[]** = ID_codelist**[**i**]**

**)**;

**)**;

//––––––––––––––––––––––––

// Analyze data

// Missing data pattern

mdp = nt << missing data pattern**(** columns**(** Eval**(** collist **)** **)** **)**;

// Multivariate on missing pattern

mdp << multivariate**(** columns**(** Eval**(** collist **)** **)**, freq**(** :Count **)** **)**;

3 REPLIES 3

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Find Character Patterns in Data Set

Robot,

This is an interesting and perhaps complex problem. I do not know of an out of the box platform for this, depending. I believe you will need to construct a scripting (JSL) solution.

The operative question is what is a pattern? Is it any string of arbitrary length from long string? So, for:

A,B,C,D,E,F,G,H

the patterns might be:

A,

B,

A,B

A,B,C

B,C

A,B,C,D

... etc.

In the above, it could be a very large computing problem.

Or: Are the patterns discrete, well defined blocks that can be separated out. In this case, separate them out in a single, stacked column and use the distribution platform.

Michael Haslam

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

I agree with michael that this is an interesting problem. One idea is to expand the error codes over multiple columns (lets say columns A to Z) and perform a missing data pattern (Tables Menu). The pattern may directly be useful for illustrating associations between defect codes, but an additional step could be to apply Multivariate() on the missing pattern. Or even som flavour of the categorial platform.

A jsl example:

//list of all defect codes

def_code_list = **{**"A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O",

"P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z"**}**;

n = N Items**(** def_code_list **)**;

//–––––––––––––––––––––––

// Example table

nt = New Table**(** "Defect Codes",

Add Rows**(** **200** **)**,

New Column**(** "Widget ID", Numeric, Continuous **)**,

New Column**(** "Defect Codes", Character, Nominal, **)**

**)**;

Column**(** **1** **)** << Set Values**(** Index**(** **1**, N Row**(** nt **)** **)** **)**;

//Add random effect codes

For Each Row**(**

:Defect Codes = Substitute**(**

Char**(** def_code_list**[**Random Index**(** n, Round**(** Random Exp**()** + **2**, **0** **)** **)]** **)**,

"{", "",

"}", "",

"\!"", ""

**)**

**)**;

//contaminate with some common combinations

:Defect codes**[**Random Index**(** **200**, **10** **)]** = "A, B, C";

:Defect codes**[**Random Index**(** **200**, **20** **)]** = "M, N, O";

// Expand to multiple columns (and make column list)

collist = **{}**;

For**(** i = **1**, i <= n, i++,

col = nt << New Column**(** def_code_list**[**i**]**, character, nominal **)**;

Insert Into**(** collist, col **)**;

**)**;

For Each Row**(**

ID_codelist = Words**(** :Defect Codes**[]**, ", " **)**;

For**(** i = **1**, i <= N Items**(** ID_codelist **)**, i++,

Column**(** nt, Loc**(** def_code_list, ID_codelist**[**i**]** **)[****1****]** + **2** **)[]** = ID_codelist**[**i**]**

**)**;

**)**;

//––––––––––––––––––––––––

// Analyze data

// Missing data pattern

mdp = nt << missing data pattern**(** columns**(** Eval**(** collist **)** **)** **)**;

// Multivariate on missing pattern

mdp << multivariate**(** columns**(** Eval**(** collist **)** **)**, freq**(** :Count **)** **)**;

- Mark as New
- Bookmark
- Subscribe
- Subscribe to RSS Feed
- Get Direct Link
- Email to a Friend
- Report Inappropriate Content

Re: Find Character Patterns in Data Set

Wow! Very slick. Thanks for the input.