Identifying Unusual Patterns that Might Indicate Data Integrity Issues
Published on
11-07-2024
03:28 PM
by
| Updated on
11-08-2024
08:07 AM
See how to:
- Explore Patterns
- Identify duplicate values
- Find Most Duplicated Values - values that appear most frequently within column
- Find Longest Runs - values that repeats in consecutive rows within column
- Find Longest Duplicated Sequences- sequence of values that repeats within column
- Find Duplicates Across Columns- sequence of values that appears in the same rows across multiple columns
- Use Rarity Score to interpret duplications
- Conceptually a pattern is about as likely as getting [rarity value] heads in a row when flipping a fair coin
- Statistically, -Log2(p); where p is probability of pattern assuming random ordering of values
- Identify unusual values
- Locate Formatted Width within cells - both overall and decimals
- Locate suspicious Fraction Lengths
- Locate suspicious Leading Digits that are too uniform
- Identify unexpected linear relationships where, within some group of consecutive rows (default is 10), one column has an exact linear relationship with another column
- Identify specification limit anomalies for columns with spec limit properties
- Locate Spec Limit Matches where limits in cells exactly match LSL or USL
- Compare Spec Limits Distribution to compare out-of-spec values to expected out-of-spec values
Resources:
Start:
Wed, Jun 17, 2020 02:00 PM EDT
End:
Wed, Jun 17, 2020 03:00 PM EDT