cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Try the Materials Informatics Toolkit, which is designed to easily handle SMILES data. This and other helpful add-ins are available in the JMP® Marketplace
Choose Language Hide Translation Bar
View Original Published Thread

Identifying Unusual Patterns that Might Indicate Data Integrity Issues

Published on ‎11-07-2024 03:28 PM by Community Manager Community Manager | Updated on ‎11-08-2024 08:07 AM

Identifying Unusual Patterns that Might Identify Data Integrity Issues
Video Player is loading.
Current Time 0:00
Duration 56:04
Loaded: 0.29%
Stream Type LIVE
Remaining Time 56:04
 
1x
    • Chapters
    • descriptions off, selected
    • captions off, selected
    • en (Main), selected

     

    See how to:

    • Explore Patterns
    • Identify duplicate values
      • Find Most Duplicated Values - values that appear most frequently within column                                         
      • Find Longest Runs - values that repeats in consecutive rows within column
      • Find Longest Duplicated Sequences- sequence of values that repeats within column
      • Find Duplicates Across Columns- sequence of values that appears in the same rows across multiple columns
      • Use Rarity Score to interpret duplications
        • Conceptually a pattern is about as likely as getting [rarity value] heads in a row when flipping a fair coin
        • Statistically, -Log2(p);  where p is probability of pattern assuming random ordering of values                                                            
    • Identify unusual values                                                          
      • Locate Formatted Width within cells - both overall and decimals
      • Locate suspicious Fraction Lengths        
      • Locate suspicious Leading Digits that are too uniform
        • Check distribution of leading digits against Benford's Law, which says, that in many naturally occurring groups of numbers, distribution of leading digit is not uniform
        • Log10( (d+1) / d), where d is leading digit 

    • Identify unexpected linear relationships where, within some group of consecutive rows (default is 10), one column has an exact linear relationship with another column
    • Identify specification limit anomalies for columns with spec limit properties
      • Locate Spec Limit Matches where limits in cells exactly match LSL or USL
      • Compare Spec Limits Distribution to compare out-of-spec values to expected out-of-spec values

    Explore Patterns.JPG

     

    Benford's Law.JPG

    Resources:



    0 Comments