Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
Choose Language Hide Translation Bar
Highlighted
Level III

## Extracting words from a line- as column name and its corresponding numbers as values into the column

I have a column named "Fault"- where I have fault codes extracted from various devices. One of the fault code from a particular day is given below.

"inverter Inverter 20 below limit: 0.96 Kilowatts/inverter (58.9 % of 1.63 Kilowatts) at 11/30/2019 7:30:00 AM (sun az: 123.4 el: 3.7) Duration: 4.00 hours inverter Inverter 23 below limit: 0.95 Kilowatts/inverter (57.9 % of 1.63 Kilowatts) at 11/30/2019 7:30:00 AM (sun az: 123.4 el: 3.7) Duration: 3.50 hours"

The goal is to perform an analysis to tabulate the Faulty inverters  by Date and its cumulative under performance and duration; later use the other values to diagnose the fault.

I need to extract the details as follows in the format  ----Column name : Value

Inverter: 20

Below limit (kW/Inverter) : 0.96

% of Max: 58.9%

Date 11/30/2019

Time : 7:30:00 AM

Sun Az : 123.4

Elevation:3.7

Duration (hrs) : 4

If there is a second inverter which also has a fault code then ( see second line in the fault code above)

Inverter: 23

Below limit (kW/Inverter) : 0.95

% of Max: 57.9%

Date 11/30/2019

Time : 7:30:00 AM

Sun Az : 123.4

Elevation:3.7

Duration (hrs) : 3.50

Sometimes there might be 5 or 10 inverters ( out of 70) which could give similar codes in a single line.

Can you let me know how to get this done. Is it only through JSL? or can I use formulas? Either way- I need some help.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
Super User

## Re: Extracting words from a line- as column name and its corresponding numbers as values into the column

I had some spare time......not much else to do with Covid-19 still around, so I wrote a bit of JSL that should give you a good start with reading in your data

``````Names Default To Here( 1 );
dt = Open(

"Path to the data file",
columns( New Column( "c000001", Character, "Nominal" ) ),
Import Settings(
End Of Line( CRLF, CR, LF ),
End Of Field( CSV( 0 ) ),
Strip Quotes( 1 ),
Use Apostrophe as Quotation Mark( 0 ),
Use Regional Settings( 0 ),
Scan Whole File( 1 ),
Treat empty columns as numeric( 0 ),
CompressNumericColumns( 0 ),
CompressCharacterColumns( 0 ),
CompressAllowListCheck( 0 ),
Labels( 0 ),
Column Names Start( 1 ),
Data Starts( 1 ),
Year Rule( "20xx" )
)
);

// Create the new output data table
dtLong = New Table( "Long Data",
New Column( "Label", character ),
New Column( "Value", character ),
New Column( "Line" ),
New Column( "Instance" )
);

// A function that returns the next word in the line and
// then removes the word from the front of the target line
getWord = Function( {},
theWord = Trim( Word( 1, target, " " ) );
target = Trim( Substr( target, Length( theWord ) + 1 ) );
theWord;
);

// Loop through all rows in the input data table
For( i = 1, i <= N Rows( dt ), i++,
instance = 0;
target = Column( dt, 1 )[i];
While( Word( 1, target, " " ) != "",
dtLong << add rows( 1 );
dtLong:line[N Rows( dtLong )] = i;
theWord = getWord();
Match( theWord,
"inverter",
instance++;
dtLong:Instance[N Rows( dtLong )] = instance;
dtLong:Label[N Rows( dtLong )] = theWord;
theWord = getWord();
theWord = getword();
dtLong:value[N Rows( dtLong )] = theWord;,
"below",
dtLong:Instance[N Rows( dtLong )] = instance;
dtLong:Label[N Rows( dtLong )] = theWord || " limit (KW/Inverter)";
theWord = getWord();
theWord = getWord();
dtLong:value[N Rows( dtLong )] = theWord;
// Get the % of Max
dtLong << Add Rows( 1 );
dtLong:line[N Rows( dtLong )] = i;
dtLong:Instance[N Rows( dtLong )] = instance;
theWord = getWord();
theWord = getWord();
dtLong:Label[N Rows( dtLong )] = "% of Max";
dtLong:value[N Rows( dtLong )] = Substr( theWord, 2 );
// Skip forward to the Date
For( k = 1, k <= 5, k++,
theWord = getWord()
);
// Get the Date
dtLong << add rows( 1 );
theWord = getWord();
dtLong:Instance[N Rows( dtLong )] = instance;
dtLong:line[N Rows( dtLong )] = i;
dtLong:Label[N Rows( dtLong )] = "Date";
dtLong:value[N Rows( dtLong )] = theWord;
// Get the Time
dtLong << add rows( 1 );
theWord = getWord();
dtLong:Instance[N Rows( dtLong )] = instance;
dtLong:line[N Rows( dtLong )] = i;
dtLong:Label[N Rows( dtLong )] = "Time";
dtLong:value[N Rows( dtLong )] = theWord || getWord();
// Get the Sun Az
dtLong << add rows( 1 );
theWord = getWord();
dtLong:Instance[N Rows( dtLong )] = instance;
dtLong:line[N Rows( dtLong )] = i;
dtLong:Label[N Rows( dtLong )] = Substr( theWord, 2 ) || " " || getWord();
dtLong:value[N Rows( dtLong )] = getWord();
// Get the Elevation
dtLong << add rows( 1 );
theWord = getWord();
dtLong:Instance[N Rows( dtLong )] = instance;
dtLong:line[N Rows( dtLong )] = i;
dtLong:Label[N Rows( dtLong )] = "Elevation";
dtLong:value[N Rows( dtLong )] = Word( -1, getWord(), ")" );,
"Duration:",
dtLong:Instance[N Rows( dtLong )] = instance;
dtLong:Label[N Rows( dtLong )] = theWord || " (hrs)";
dtLong:value[N Rows( dtLong )] = getWord();
theWord = getWord();
)

;
);
);``````
Jim
6 REPLIES 6
Highlighted
Super User

## Re: Extracting words from a line- as column name and its corresponding numbers as values into the column

You should be able to make a lot of progress on this using the formula editor - take a look at the WORD function

-Dave
Highlighted
Level III

## Re: Extracting words from a line- as column name and its corresponding numbers as values into the column

Dave,

Yes, I could if I have just 1 or 2 faulty inverters, But then if for example I have 23 fault codes in a single line for a single date then I need to create so many columns individually and then write up formulas for each of the 23 inverters. Can you clarify if I am understanding your solution.  I also added another line which summarizes the goal of the analysis after extraction.

Highlighted
Super User

## Re: Extracting words from a line- as column name and its corresponding numbers as values into the column

I have had to handle data like this many times.  The approach that I think has to be taken is to read all of the data from your input file into a JMP data table, where each complete line is read into a single column.  You will end up with a data table with one column, and the number of rows equal to the number of lines in the input data table.

Next, you create a new table with 3 columns, Label, Value and Line.  Then, using JSL that scans the contents of each row in the one column, one creates a new row in the new table, and populates the Label(i.e. Inverter, Below limit(KW/Inverter) etc.), the Value(i.e. 20, 0.96, etc.) and the Line, which would just be the current row number.....Row().  There may also have to be an additional output column that would be based upon this being the 1st Inverter on the line, or the 2nd inverter.....etc.

This can be done using a While() function in combination of the Word() function scanning for ":" as the delimiter.  The odd numbered words would be the Labels, the even numbered will be the Values.

Once this new table is created, you can easily move into a one row per inverter, using Tables=>Split.

Jim
Highlighted
Level III

## Re: Extracting words from a line- as column name and its corresponding numbers as values into the column

Jim

Thank you for your suggestion. I will try. As I am new to scripting-I will go through examples to figure it out.

Highlighted
Super User

## Re: Extracting words from a line- as column name and its corresponding numbers as values into the column

I had some spare time......not much else to do with Covid-19 still around, so I wrote a bit of JSL that should give you a good start with reading in your data

``````Names Default To Here( 1 );
dt = Open(

"Path to the data file",
columns( New Column( "c000001", Character, "Nominal" ) ),
Import Settings(
End Of Line( CRLF, CR, LF ),
End Of Field( CSV( 0 ) ),
Strip Quotes( 1 ),
Use Apostrophe as Quotation Mark( 0 ),
Use Regional Settings( 0 ),
Scan Whole File( 1 ),
Treat empty columns as numeric( 0 ),
CompressNumericColumns( 0 ),
CompressCharacterColumns( 0 ),
CompressAllowListCheck( 0 ),
Labels( 0 ),
Column Names Start( 1 ),
Data Starts( 1 ),
Year Rule( "20xx" )
)
);

// Create the new output data table
dtLong = New Table( "Long Data",
New Column( "Label", character ),
New Column( "Value", character ),
New Column( "Line" ),
New Column( "Instance" )
);

// A function that returns the next word in the line and
// then removes the word from the front of the target line
getWord = Function( {},
theWord = Trim( Word( 1, target, " " ) );
target = Trim( Substr( target, Length( theWord ) + 1 ) );
theWord;
);

// Loop through all rows in the input data table
For( i = 1, i <= N Rows( dt ), i++,
instance = 0;
target = Column( dt, 1 )[i];
While( Word( 1, target, " " ) != "",
dtLong << add rows( 1 );
dtLong:line[N Rows( dtLong )] = i;
theWord = getWord();
Match( theWord,
"inverter",
instance++;
dtLong:Instance[N Rows( dtLong )] = instance;
dtLong:Label[N Rows( dtLong )] = theWord;
theWord = getWord();
theWord = getword();
dtLong:value[N Rows( dtLong )] = theWord;,
"below",
dtLong:Instance[N Rows( dtLong )] = instance;
dtLong:Label[N Rows( dtLong )] = theWord || " limit (KW/Inverter)";
theWord = getWord();
theWord = getWord();
dtLong:value[N Rows( dtLong )] = theWord;
// Get the % of Max
dtLong << Add Rows( 1 );
dtLong:line[N Rows( dtLong )] = i;
dtLong:Instance[N Rows( dtLong )] = instance;
theWord = getWord();
theWord = getWord();
dtLong:Label[N Rows( dtLong )] = "% of Max";
dtLong:value[N Rows( dtLong )] = Substr( theWord, 2 );
// Skip forward to the Date
For( k = 1, k <= 5, k++,
theWord = getWord()
);
// Get the Date
dtLong << add rows( 1 );
theWord = getWord();
dtLong:Instance[N Rows( dtLong )] = instance;
dtLong:line[N Rows( dtLong )] = i;
dtLong:Label[N Rows( dtLong )] = "Date";
dtLong:value[N Rows( dtLong )] = theWord;
// Get the Time
dtLong << add rows( 1 );
theWord = getWord();
dtLong:Instance[N Rows( dtLong )] = instance;
dtLong:line[N Rows( dtLong )] = i;
dtLong:Label[N Rows( dtLong )] = "Time";
dtLong:value[N Rows( dtLong )] = theWord || getWord();
// Get the Sun Az
dtLong << add rows( 1 );
theWord = getWord();
dtLong:Instance[N Rows( dtLong )] = instance;
dtLong:line[N Rows( dtLong )] = i;
dtLong:Label[N Rows( dtLong )] = Substr( theWord, 2 ) || " " || getWord();
dtLong:value[N Rows( dtLong )] = getWord();
// Get the Elevation
dtLong << add rows( 1 );
theWord = getWord();
dtLong:Instance[N Rows( dtLong )] = instance;
dtLong:line[N Rows( dtLong )] = i;
dtLong:Label[N Rows( dtLong )] = "Elevation";
dtLong:value[N Rows( dtLong )] = Word( -1, getWord(), ")" );,
"Duration:",
dtLong:Instance[N Rows( dtLong )] = instance;
dtLong:Label[N Rows( dtLong )] = theWord || " (hrs)";
dtLong:value[N Rows( dtLong )] = getWord();
theWord = getWord();
)

;
);
);``````
Jim
Highlighted
Level III

## Re: Extracting words from a line- as column name and its corresponding numbers as values into the column

Amazing! It works.Thank you for your patience in writing this long code to accomplish my request. I am trying to implement this to other Alert codes I have.Thank you so much.It saved me eons.

Article Labels