Solved: Re: Is there a possibilty to distinguish between two types of missing data?

matthias_bruchh · Jun 10, 2023 4:35 PM

Hi,

I have an excel data table that looks like this (file attached):

Test	Signal 1	Signal 2	Signal 3
1	3	3	4
2	4	not reached
3	5		6
4	6		not reached

It includes a number of tests. Each test has a number of signal channels (Signal 1-3), but not all channels were used for all tests. In cases where the signal channel was not used for a given test the corresponding field is empty.

The numerical values for each channel mark the time when the signal reaches a certain threshold value. If the signal does not reach the threshold, this in indicated by the string "not reached".

For the evaluation it would be important to be able to distinguish between the two cases of missing data:

1) there is no data because the channel was not used.

2) there is no data because the threshold was not reached.

Currently the data are all of type "character" but of course I need to convert them into "numeric" for the evaluation. However, as soon as I convert the data type of a column to "numeric" the entries "not reached" are deleted and there is no distinction anymore between the two types of missing data.

I've tried to use the recode option to replace "not reached" by NaN or nan and then convert the columns to numeric but the result was the same (the NaN or nan were deleted).

Is there a way that would allow distinguishing between the two cases of missing data? I thought of having no value in one case and something like NaN in the other case.

(Either an interactive or a scripting solution would be fine.)

jthi · Aug 11, 2021 03:26 AM

Couple of suggestions come quickly to my mind (these solutions depend on the analysis you want to do and might not work correctly in your case)

You could leave empty values as empty (will be replaced with . when changed to numeric) and replace not reached with 9999 or something similar. And then use one of options found from Column Properties (Value Labels, Missing Value Codes might work).

Or then you could stack the data. All columns which are to be stacked must have same data type, so change the not reached to 9999 before stacking. Then after stacking you could create a formula, which will tell what the issue was with the specific signal:

And then again use Column Properties if needed depending on what you are doing.

-Jarmo

View solution in original post

jthi · Aug 11, 2021 03:26 AM

Couple of suggestions come quickly to my mind (these solutions depend on the analysis you want to do and might not work correctly in your case)

You could leave empty values as empty (will be replaced with . when changed to numeric) and replace not reached with 9999 or something similar. And then use one of options found from Column Properties (Value Labels, Missing Value Codes might work).

Or then you could stack the data. All columns which are to be stacked must have same data type, so change the not reached to 9999 before stacking. Then after stacking you could create a formula, which will tell what the issue was with the specific signal:

And then again use Column Properties if needed depending on what you are doing.

-Jarmo

matthias_bruchh · Aug 11, 2021 06:41 AM

Thanks. That seems to be doing quite nicely what I was looking for.