Identifying Unusual Patterns that Might Indicate Data Integrity Issues

See how to:

• Explore Patterns
• Identify duplicate values
• Find Most Duplicated Values - values that appear most frequently within column
• Find Longest Runs - values that repeats in consecutive rows within column
• Find Longest Duplicated Sequences- sequence of values that repeats within column
• Find Duplicates Across Columns- sequence of values that appears in the same rows across multiple columns
• Use Rarity Score to interpret duplications
• Conceptually a pattern is about as likely as getting [rarity value] heads in a row when flipping a fair coin
• Statistically, -Log2(p);  where p is probability of pattern assuming random ordering of values
• Identify unusual values
• Locate Formatted Width within cells - both overall and decimals
• Locate suspicious Fraction Lengths
• Locate suspicious Leading Digits that are too uniform
• Check distribution of leading digits against Benford's Law, which says, that in many naturally occurring groups of numbers, distribution of leading digit is not uniform
• Log10( (d+1) / d), where d is leading digit

• Identify unexpected linear relationships where, within some group of consecutive rows (default is 10), one column has an exact linear relationship with another column
• Identify specification limit anomalies for columns with spec limit properties
• Locate Spec Limit Matches where limits in cells exactly match LSL or USL
• Compare Spec Limits Distribution to compare out-of-spec values to expected out-of-spec values

