BookmarkSubscribeRSS Feed
Highlighted
Lu

Community Member

Joined:

Nov 7, 2018

Probabilistic Symbolic Pattern Recognition (PSPR)

PSPR is a feature extraction method that considers changes in symbolic series in terms of both time and frequency. When applied on numeric valued series such as physiologic data, each number is represented with a symbol, usually letters, based on a given set of thresholds. For example, a series of blood pressure recordings over time 80-90-100-80-110 is represented with abcac with rules BP≤85 is a, 85<BP≤95 is b, and BP>95 is c. Once all series are represented by symbols, then the probabilistic model of how series evolve over time is obtained. These models include probabilities of the next symbol taking a specific value given observed patterns. For instance, P(a|abc) is the probability of the next symbol being a given that the last three observed symbols are abc. The final features are calculated as a distance metric (such as Euclidian and Mahalanobis) between these probabilistic models of series compared to pre-identified reference series. Here the reference series are usually a subset of cases or controls. When cases are used as reference series, PSPR features represent how a given series is similar (or dissimilar) to actual cases.

Is someone aware whether such analytic function is available in JMP?