cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
The Discovery Summit 2025 Call for Content is open! Submit an abstract today to present at our premier analytics conference.
Choose Language Hide Translation Bar
View Original Published Thread

Identifying Unusual Patterns that Might Indicate Data Integrity Issues

Published on ‎11-07-2024 03:28 PM by Community Manager Community Manager | Updated on ‎11-08-2024 08:07 AM

Identifying Unusual Patterns that Might Identify Data Integrity Issues
Video Player is loading.
Current Time 0:00
Duration 0:00
Loaded: 0%
Stream Type LIVE
Remaining Time 0:00
 
1x
    • Chapters
    • descriptions off, selected
    • captions off, selected

       

      See how to:

      • Explore Patterns
      • Identify duplicate values
        • Find Most Duplicated Values - values that appear most frequently within column                                         
        • Find Longest Runs - values that repeats in consecutive rows within column
        • Find Longest Duplicated Sequences- sequence of values that repeats within column
        • Find Duplicates Across Columns- sequence of values that appears in the same rows across multiple columns
        • Use Rarity Score to interpret duplications
          • Conceptually a pattern is about as likely as getting [rarity value] heads in a row when flipping a fair coin
          • Statistically, -Log2(p);  where p is probability of pattern assuming random ordering of values                                                            
      • Identify unusual values                                                          
        • Locate Formatted Width within cells - both overall and decimals
        • Locate suspicious Fraction Lengths        
        • Locate suspicious Leading Digits that are too uniform
          • Check distribution of leading digits against Benford's Law, which says, that in many naturally occurring groups of numbers, distribution of leading digit is not uniform
          • Log10( (d+1) / d), where d is leading digit 

      • Identify unexpected linear relationships where, within some group of consecutive rows (default is 10), one column has an exact linear relationship with another column
      • Identify specification limit anomalies for columns with spec limit properties
        • Locate Spec Limit Matches where limits in cells exactly match LSL or USL
        • Compare Spec Limits Distribution to compare out-of-spec values to expected out-of-spec values

      Explore Patterns.JPG

       

      Benford's Law.JPG

      Resources:



      Attachments
      0 Comments