Our World Statistics Day conversations have been a great reminder of how much statistics can inform our lives. Do you have an example of how statistics has made a difference in your life? Share your story with the Community!
Choose Language Hide Translation Bar
Highlighted
SunnyR
Level III

Extracting words from a line- as column name and its corresponding numbers as values into the column

I have a column named "Fault"- where I have fault codes extracted from various devices. One of the fault code from a particular day is given below.

 

"inverter Inverter 20 below limit: 0.96 Kilowatts/inverter (58.9 % of 1.63 Kilowatts) at 11/30/2019 7:30:00 AM (sun az: 123.4 el: 3.7) Duration: 4.00 hours inverter Inverter 23 below limit: 0.95 Kilowatts/inverter (57.9 % of 1.63 Kilowatts) at 11/30/2019 7:30:00 AM (sun az: 123.4 el: 3.7) Duration: 3.50 hours"

 

The goal is to perform an analysis to tabulate the Faulty inverters  by Date and its cumulative under performance and duration; later use the other values to diagnose the fault.

 

I need to extract the details as follows in the format  ----Column name : Value

Inverter: 20

Below limit (kW/Inverter) : 0.96

% of Max: 58.9%

Date 11/30/2019

Time : 7:30:00 AM

Sun Az : 123.4

Elevation:3.7

Duration (hrs) : 4

 

If there is a second inverter which also has a fault code then ( see second line in the fault code above)

Inverter: 23

Below limit (kW/Inverter) : 0.95

% of Max: 57.9%

Date 11/30/2019

Time : 7:30:00 AM

Sun Az : 123.4

Elevation:3.7

Duration (hrs) : 3.50

 

Sometimes there might be 5 or 10 inverters ( out of 70) which could give similar codes in a single line.

Can you let me know how to get this done. Is it only through JSL? or can I use formulas? Either way- I need some help.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
txnelson
Super User

Re: Extracting words from a line- as column name and its corresponding numbers as values into the column

I had some spare time......not much else to do with Covid-19 still around, so I wrote a bit of JSL that should give you a good start with reading in your data

Names Default To Here( 1 );
dt = Open(
	
	"Path to the data file",
	columns( New Column( "c000001", Character, "Nominal" ) ),
	Import Settings(
		End Of Line( CRLF, CR, LF ),
		End Of Field( CSV( 0 ) ),
		Strip Quotes( 1 ),
		Use Apostrophe as Quotation Mark( 0 ),
		Use Regional Settings( 0 ),
		Scan Whole File( 1 ),
		Treat empty columns as numeric( 0 ),
		CompressNumericColumns( 0 ),
		CompressCharacterColumns( 0 ),
		CompressAllowListCheck( 0 ),
		Labels( 0 ),
		Column Names Start( 1 ),
		Data Starts( 1 ),
		Lines To Read( "All" ),
		Year Rule( "20xx" )
	)
);

// Create the new output data table
dtLong = New Table( "Long Data", 
New Column( "Label", character ), 
New Column( "Value", character ), 
New Column( "Line" ),
New Column( "Instance" )	
);

// A function that returns the next word in the line and
// then removes the word from the front of the target line
getWord = Function( {},
	theWord = Trim( Word( 1, target, " " ) );
	target = Trim( Substr( target, Length( theWord ) + 1 ) );
	theWord;
);


// Loop through all rows in the input data table
For( i = 1, i <= N Rows( dt ), i++,
	instance = 0;
	target = Column( dt, 1 )[i];
	While( Word( 1, target, " " ) != "",
		dtLong << add rows( 1 );
		dtLong:line[N Rows( dtLong )] = i;
		theWord = getWord();
		Match( theWord,
			"inverter",
				instance++;
				dtLong:Instance[N Rows( dtLong )] = instance;
				dtLong:Label[N Rows( dtLong )] = theWord;
				theWord = getWord();
				theWord = getword();
				dtLong:value[N Rows( dtLong )] = theWord;,
			"below",
				dtLong:Instance[N Rows( dtLong )] = instance;
				dtLong:Label[N Rows( dtLong )] = theWord || " limit (KW/Inverter)";
				theWord = getWord();
				theWord = getWord();
				dtLong:value[N Rows( dtLong )] = theWord;
				// Get the % of Max
				dtLong << Add Rows( 1 );
				dtLong:line[N Rows( dtLong )] = i;
				dtLong:Instance[N Rows( dtLong )] = instance;
				theWord = getWord();
				theWord = getWord();
				dtLong:Label[N Rows( dtLong )] = "% of Max";
				dtLong:value[N Rows( dtLong )] = Substr( theWord, 2 );
				// Skip forward to the Date
				For( k = 1, k <= 5, k++,
					theWord = getWord()
				);
				// Get the Date
				dtLong << add rows( 1 );
				theWord = getWord();
				dtLong:Instance[N Rows( dtLong )] = instance;
				dtLong:line[N Rows( dtLong )] = i;
				dtLong:Label[N Rows( dtLong )] = "Date";
				dtLong:value[N Rows( dtLong )] = theWord;
				// Get the Time
				dtLong << add rows( 1 );
				theWord = getWord();
				dtLong:Instance[N Rows( dtLong )] = instance;
				dtLong:line[N Rows( dtLong )] = i;
				dtLong:Label[N Rows( dtLong )] = "Time";
				dtLong:value[N Rows( dtLong )] = theWord || getWord();
				// Get the Sun Az
				dtLong << add rows( 1 );
				theWord = getWord();
				dtLong:Instance[N Rows( dtLong )] = instance;
				dtLong:line[N Rows( dtLong )] = i;
				dtLong:Label[N Rows( dtLong )] = Substr( theWord, 2 ) || " " || getWord();
				dtLong:value[N Rows( dtLong )] = getWord();
				// Get the Elevation
				dtLong << add rows( 1 );
				theWord = getWord();
				dtLong:Instance[N Rows( dtLong )] = instance;
				dtLong:line[N Rows( dtLong )] = i;
				dtLong:Label[N Rows( dtLong )] = "Elevation";
				dtLong:value[N Rows( dtLong )] = Word( -1, getWord(), ")" );,
			"Duration:",
				dtLong:Instance[N Rows( dtLong )] = instance;
				dtLong:Label[N Rows( dtLong )] = theWord || " (hrs)";
				dtLong:value[N Rows( dtLong )] = getWord();
				theWord = getWord();
		)
		
		;
	);
);
Jim

View solution in original post

6 REPLIES 6
Highlighted
David_Burnham
Super User

Re: Extracting words from a line- as column name and its corresponding numbers as values into the column

You should be able to make a lot of progress on this using the formula editor - take a look at the WORD function

-Dave
Highlighted
SunnyR
Level III

Re: Extracting words from a line- as column name and its corresponding numbers as values into the column

Dave,

  Yes, I could if I have just 1 or 2 faulty inverters, But then if for example I have 23 fault codes in a single line for a single date then I need to create so many columns individually and then write up formulas for each of the 23 inverters. Can you clarify if I am understanding your solution.  I also added another line which summarizes the goal of the analysis after extraction.

Highlighted
txnelson
Super User

Re: Extracting words from a line- as column name and its corresponding numbers as values into the column

I have had to handle data like this many times.  The approach that I think has to be taken is to read all of the data from your input file into a JMP data table, where each complete line is read into a single column.  You will end up with a data table with one column, and the number of rows equal to the number of lines in the input data table.

Next, you create a new table with 3 columns, Label, Value and Line.  Then, using JSL that scans the contents of each row in the one column, one creates a new row in the new table, and populates the Label(i.e. Inverter, Below limit(KW/Inverter) etc.), the Value(i.e. 20, 0.96, etc.) and the Line, which would just be the current row number.....Row().  There may also have to be an additional output column that would be based upon this being the 1st Inverter on the line, or the 2nd inverter.....etc.

This can be done using a While() function in combination of the Word() function scanning for ":" as the delimiter.  The odd numbered words would be the Labels, the even numbered will be the Values.

Once this new table is created, you can easily move into a one row per inverter, using Tables=>Split.

Jim
Highlighted
SunnyR
Level III

Re: Extracting words from a line- as column name and its corresponding numbers as values into the column

Jim

 Thank you for your suggestion. I will try. As I am new to scripting-I will go through examples to figure it out.

Highlighted
txnelson
Super User

Re: Extracting words from a line- as column name and its corresponding numbers as values into the column

I had some spare time......not much else to do with Covid-19 still around, so I wrote a bit of JSL that should give you a good start with reading in your data

Names Default To Here( 1 );
dt = Open(
	
	"Path to the data file",
	columns( New Column( "c000001", Character, "Nominal" ) ),
	Import Settings(
		End Of Line( CRLF, CR, LF ),
		End Of Field( CSV( 0 ) ),
		Strip Quotes( 1 ),
		Use Apostrophe as Quotation Mark( 0 ),
		Use Regional Settings( 0 ),
		Scan Whole File( 1 ),
		Treat empty columns as numeric( 0 ),
		CompressNumericColumns( 0 ),
		CompressCharacterColumns( 0 ),
		CompressAllowListCheck( 0 ),
		Labels( 0 ),
		Column Names Start( 1 ),
		Data Starts( 1 ),
		Lines To Read( "All" ),
		Year Rule( "20xx" )
	)
);

// Create the new output data table
dtLong = New Table( "Long Data", 
New Column( "Label", character ), 
New Column( "Value", character ), 
New Column( "Line" ),
New Column( "Instance" )	
);

// A function that returns the next word in the line and
// then removes the word from the front of the target line
getWord = Function( {},
	theWord = Trim( Word( 1, target, " " ) );
	target = Trim( Substr( target, Length( theWord ) + 1 ) );
	theWord;
);


// Loop through all rows in the input data table
For( i = 1, i <= N Rows( dt ), i++,
	instance = 0;
	target = Column( dt, 1 )[i];
	While( Word( 1, target, " " ) != "",
		dtLong << add rows( 1 );
		dtLong:line[N Rows( dtLong )] = i;
		theWord = getWord();
		Match( theWord,
			"inverter",
				instance++;
				dtLong:Instance[N Rows( dtLong )] = instance;
				dtLong:Label[N Rows( dtLong )] = theWord;
				theWord = getWord();
				theWord = getword();
				dtLong:value[N Rows( dtLong )] = theWord;,
			"below",
				dtLong:Instance[N Rows( dtLong )] = instance;
				dtLong:Label[N Rows( dtLong )] = theWord || " limit (KW/Inverter)";
				theWord = getWord();
				theWord = getWord();
				dtLong:value[N Rows( dtLong )] = theWord;
				// Get the % of Max
				dtLong << Add Rows( 1 );
				dtLong:line[N Rows( dtLong )] = i;
				dtLong:Instance[N Rows( dtLong )] = instance;
				theWord = getWord();
				theWord = getWord();
				dtLong:Label[N Rows( dtLong )] = "% of Max";
				dtLong:value[N Rows( dtLong )] = Substr( theWord, 2 );
				// Skip forward to the Date
				For( k = 1, k <= 5, k++,
					theWord = getWord()
				);
				// Get the Date
				dtLong << add rows( 1 );
				theWord = getWord();
				dtLong:Instance[N Rows( dtLong )] = instance;
				dtLong:line[N Rows( dtLong )] = i;
				dtLong:Label[N Rows( dtLong )] = "Date";
				dtLong:value[N Rows( dtLong )] = theWord;
				// Get the Time
				dtLong << add rows( 1 );
				theWord = getWord();
				dtLong:Instance[N Rows( dtLong )] = instance;
				dtLong:line[N Rows( dtLong )] = i;
				dtLong:Label[N Rows( dtLong )] = "Time";
				dtLong:value[N Rows( dtLong )] = theWord || getWord();
				// Get the Sun Az
				dtLong << add rows( 1 );
				theWord = getWord();
				dtLong:Instance[N Rows( dtLong )] = instance;
				dtLong:line[N Rows( dtLong )] = i;
				dtLong:Label[N Rows( dtLong )] = Substr( theWord, 2 ) || " " || getWord();
				dtLong:value[N Rows( dtLong )] = getWord();
				// Get the Elevation
				dtLong << add rows( 1 );
				theWord = getWord();
				dtLong:Instance[N Rows( dtLong )] = instance;
				dtLong:line[N Rows( dtLong )] = i;
				dtLong:Label[N Rows( dtLong )] = "Elevation";
				dtLong:value[N Rows( dtLong )] = Word( -1, getWord(), ")" );,
			"Duration:",
				dtLong:Instance[N Rows( dtLong )] = instance;
				dtLong:Label[N Rows( dtLong )] = theWord || " (hrs)";
				dtLong:value[N Rows( dtLong )] = getWord();
				theWord = getWord();
		)
		
		;
	);
);
Jim

View solution in original post

Highlighted
SunnyR
Level III

Re: Extracting words from a line- as column name and its corresponding numbers as values into the column

Amazing! It works.Thank you for your patience in writing this long code to accomplish my request. I am trying to implement this to other Alert codes I have.Thank you so much.It saved me eons.