cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Discussions

Solve problems, and share tips and tricks with other JMP users.
%3CLINGO-SUB%20id%3D%22lingo-sub-499621%22%20slang%3D%22en-US%22%20mode%3D%22NONE%22%3ETextdatei%20lesen%2C%20Zeilen%20suchen%2C%20die%20eine%20bestimmte%20Zeichenfolge%20enthalten%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-499621%22%20slang%3D%22en-US%22%20mode%3D%22NONE%22%3E%3CP%3EHallo%20zusammen%2C%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EHier%20ist%20ein%20Skript%2C%20das%20ich%20habe%20und%20das%20funktioniert.%20Ich%20m%C3%B6chte%20wissen%2C%20ob%20es%20einen%20besseren%20Weg%20gibt%2C%20dasselbe%20zu%20tun.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EIch%20verwende%20zun%C3%A4chst%20%E2%80%9ETextdatei%20laden%E2%80%9C%2C%20um%20eine%20Textdatei%20in%20die%20String-Variable%20%E2%80%9Efile_text1%E2%80%9C%20einzulesen.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EAls%20n%C3%A4chstes%20verwende%20ich%20%E2%80%9Ewords%E2%80%9C%20mit%20%E2%80%9E%5C!n%E2%80%9C%2C%20um%20die%20Zeilen%20zu%20trennen%20und%20sie%20in%20das%20String-Array%20%E2%80%9Efile_text2%E2%80%9C%20zu%20laden.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EAls%20n%C3%A4chstes%20lade%20ich%20dann%20%E2%80%9Efile_text2%E2%80%9C%20in%20eine%20Datentabelle.%20Anschlie%C3%9Fend%20verwende%20ich%20%E2%80%9Eget%20rows%20where%E2%80%9C%20in%20Kombination%20mit%20%E2%80%9Econtains%E2%80%9C%2C%20um%20die%20Zeilennummer%20in%20der%20Originaltextdatei%20zu%20finden%2C%20die%20eine%20bestimmte%20Zeichenfolge%20enth%C3%A4lt.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EGibt%20es%20eine%20Funktion%2C%20die%20%E2%80%9EZeilen%20wohin%20abrufen%E2%80%9C%20entspricht%2C%20aber%20f%C3%BCr%20String-Arrays%20verwendbar%20ist%3F%20Auf%20diese%20Weise%20kann%20ich%20den%20Teil%20%C3%BCberspringen%2C%20der%20in%20die%20Datentabelle%20geladen%20werden%20soll.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CPRE%3E%3CCODE%20class%3D%22%20language-jsl%22%3Efile_text1%20%3D%20load%20text%20file(%20%22example.txt%22%20)%3B%0Afile_text2%20%3D%20words(file_text1%20%2C%20%22%5C!n%22)%3B%0Adt0%20%3D%20new%20table(%22file_dt%22%2C%20new%20column(%22file_content%22%2C%20character)%20)%3B%0A%20dt0%3Afile_content%20%26lt%3B%26lt%3B%20set%20values(file_text2)%3B%0A%0Apass_qty_row_array%20%3D%20dt0%20%26lt%3B%26lt%3B%20get%20rows%20where(contains(%20%3Afile_content%2C%20%22Pass%22)%20)%3B%3C%2FCODE%3E%3C%2FPRE%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-499621%22%20slang%3D%22en-US%22%20mode%3D%22NONE%22%3E%3CLINGO-LABEL%3EErweiterte%20statistische%20Modellierung%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3EAutomatisierung%20und%20Skripterstellung%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3EVerbraucher-%20und%20Marktforschung%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3EInhaltsorganisation%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3EDatenmischung%20und%20-bereinigung%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3EMassenanpassung%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3EQualit%C3%A4ts-%20und%20Verfahrenstechnik%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E%3CLINGO-SUB%20id%3D%22lingo-sub-504688%22%20slang%3D%22en-US%22%20mode%3D%22NONE%22%3EBetreff%3A%20Textdatei%20lesen%2C%20Zeilen%20suchen%2C%20die%20eine%20bestimmte%20Zeichenfolge%20enthalten%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-504688%22%20slang%3D%22en-US%22%20mode%3D%22NONE%22%3E%3CP%3EVielen%20Dank%20an%20alle%20und%20besonders%20an%20Jthi.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EIch%20habe%20festgestellt%2C%20dass%20die%20Verwendung%20einer%20for-Schleife%20f%C3%BCr%20N%20Elemente%20(file_text2)%20schneller%20war%20als%20die%20Verwendung%20einer%20Datentabelle.%20Und%20dann%20habe%20ich%20Jthis%20Methode%20mit%20%E2%80%9EFor%20Each%E2%80%9C%20gesehen%2C%20die%20aus%20Zeitmessungen%20sogar%20noch%20schneller%20ist.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-499843%22%20slang%3D%22en-US%22%20mode%3D%22NONE%22%3EBetreff%3A%20Textdatei%20lesen%2C%20Zeilen%20suchen%2C%20die%20eine%20bestimmte%20Zeichenfolge%20enthalten%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-499843%22%20slang%3D%22en-US%22%20mode%3D%22NONE%22%3E%3CP%3EEs%20gibt%20auch%20For%20Each%2C%20Filter%20Each%20(und%20Transform%20Each)%2C%20die%20in%20solchen%20F%C3%A4llen%20hilfreich%20sein%20k%C3%B6nnen%20und%20ziemlich%20schnell%20sein%20sollten.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CPRE%3E%3CCODE%20class%3D%22%20language-jsl%22%3ENames%20Default%20To%20Here(1)%3B%0A%0Afile_txt%20%3D%20Load%20Text%20File(%22%24SAMPLE_IMPORT_DATA%2FAnimals.txt%22)%3B%0Afile_list%20%3D%20words(file_txt%2C%20%22%5C!n%22)%3B%0Asearch_word%20%3D%20%22fall%22%3B%0A%0Apass_qty_row_array%20%3D%20%5B%5D%3B%0AFor%20Each(%7Bline%2C%20idx%7D%2C%20file_text2%2C%0A%20If(Contains(line%2C%20search_word)%2C%0A%20%20Insert%20Into(pass_qty_row_array%2C%20idx)%3B%0A%20)%3B%0A)%3B%0Ashow(pass_qty_row_array)%3B%3C%2FCODE%3E%3C%2FPRE%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-499767%22%20slang%3D%22en-US%22%20mode%3D%22NONE%22%3EBetreff%3A%20Textdatei%20lesen%2C%20Zeilen%20suchen%2C%20die%20eine%20bestimmte%20Zeichenfolge%20enthalten%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-499767%22%20slang%3D%22en-US%22%20mode%3D%22NONE%22%3E%3CP%3EPers%C3%B6nlich%20habe%20ich%20f%C3%BCr%20Probleme%20wie%20dieses%20eine%20Map()-%2C%20Filter()-%20und%20Reduce()-Funktion%20erstellt.Hier%20ist%20ein%20Beispiel%20f%C3%BCr%20die%20Verwendung%20der%20Map()-Funktion.%20Auf%20meinem%20System%20ist%20es%20etwa%2050%E2%80%9360%20%25%20schneller%2C%20als%20eine%20Tabelle%20f%C3%BCr%20die%20Suche%20zu%20erstellen.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EHier%20ist%20die%20Kartenfunktion%20(explizit%20globalisiert)%3A%3C%2FP%3E%3CPRE%3E%3CCODE%20class%3D%22%20language-jsl%22%3E%3A%3Amap%20%3D%20Function(%20%7Binputs%20%2F*%20list%2C%20function%20*%2F%20%7D%2C%0A%20%2F*%20uses%20a%20single%20underscore%20_%20as%20the%20wild-card%20*%2F%0A%20%7B__i__%2C%20__result__%2C%20__list__%2C%20_%2C%20__%2C%20__keys__%7D%2C%0A%20__list__%20%3D%20Eval(%20Arg(%20inputs%2C%201%20)%20)%3B%0A%20If(%20Is%20List(%20__list__%20)%2C%0A%20%20__result__%20%3D%20%7B%7D%3B%0A%20%20Eval(%0A%20%20%20Substitute(%0A%20%20%20%20Expr(%0A%20%20%20%20%20For(%20__i__%20%3D%201%2C%20__i__%20%26lt%3B%3D%20__N__%2C%20__i__%2B%2B%2C%0A%20%20%20%20%20%20_%20%3D%20__list__%5B__i__%5D%3B%0A%20%20%20%20%20%20__result__%5B__i__%5D%20%3D%20__function__%0A%20%20%20%20%20)%0A%20%20%20%20)%0A%20%20%20%2C%0A%20%20%20%20Expr(%20__N__%20)%2C%20N%20Items(%20__list__%20)%2C%0A%20%20%20%20Expr(%20__function__%20)%2C%20Arg(%20inputs%2C%202%20)%0A%20%20%20)%3B%0A%20%20)%3B%0A%20%2C%0A%20%20Is%20Associative%20Array(%20__list__%20)%2C%0A%20%20__result__%20%3D%20%5B%3D%26gt%3B%5D%3B%0A%20%20__keys__%20%3D%20__list__%20%26lt%3B%26lt%3B%20Get%20Keys%3B%0A%20%20Eval(%0A%20%20%20Substitute(%0A%20%20%20%20Expr(%0A%20%20%20%20%20For(%20__i__%20%3D%201%2C%20__i__%20%26lt%3B%3D%20__N__%2C%20__i__%2B%2B%2C%0A%20%20%20%20%20%20__%20%3D%20__keys__%5B__i__%5D%3B%0A%20%20%20%20%20%20_%20%3D%20__list__%5B__%5D%3B%0A%20%20%20%20%20%20__result__%5B__%5D%20%3D%20__function__%0A%20%20%20%20%20)%0A%20%20%20%20)%0A%20%20%20%2C%0A%20%20%20%20Expr(%20__N__%20)%2C%20N%20Items(%20__keys__%20)%2C%0A%20%20%20%20Expr(%20__function__%20)%2C%20Arg(%20inputs%2C%202%20)%0A%20%20%20)%0A%20%20)%0A%20)%3B%0A%20__result__%0A)%3B%3C%2FCODE%3E%3C%2FPRE%3E%3CP%3Eund%20hier%20ist%20ein%20Vergleich%20zwischen%20der%20Verwendung%20einer%20Tabelle%20und%20der%20Kartenfunktion%3A%3C%2FP%3E%3CPRE%3E%3CCODE%20class%3D%22%20language-jsl%22%3ENames%20Default%20To%20Here(%201%20)%3B%0Afilename%20%3D%20Convert%20File%20Path(%20%22%24SAMPLE_IMPORT_DATA%2FUN%20Malaria%202009.csv%22%2C%20absolute%2C%20windows%20)%3B%0Aresult%20%3D%20Load%20Text%20File(%20filename%20)%3B%0AShow(%20result%20)%3B%0A%0Afile_text2%20%3D%20Words(%20result%2C%20%22%5C!N%22%20)%3B%0AN%20%3D%2010000%3B%0Aword%20%3D%20%22malaria%22%3B%0As%20%3D%20HP%20Time()%3B%0ASummation(%20i%20%3D%201%2C%20N%2C%0A%20dt0%20%3D%20New%20Table(%20%22file_dt%22%2C%20New%20Column(%20%22file_content%22%2C%20character%20)%2C%20Private%20)%3B%0A%20dt0%3Afile_content%20%26lt%3B%26lt%3B%20set%20values(%20file_text2%20)%3B%0A%0A%20pass_qty_row_array%201%20%3D%20dt0%20%26lt%3B%26lt%3B%20get%20rows%20where(%20Contains(%20%3Afile_content%2C%20word%20)%20)%3B%0A%20close(%20dt0%2C%20No%20Save%20)%3B%0A%200%0A)%3B%0AShow(%20time%201%20%3D%20(HP%20Time()%20-%20s)%20%2F%201000000%20)%3B%0A%0As%20%3D%20HP%20Time()%3B%0ASummation(%20i%20%3D%201%2C%20N%2C%0A%20pass_qty_row_array%202%20%3D%20loc(%20Matrix(%20%3A%3Amap(%7B%20file_text2%2C%20Contains(%20_%2C%20word%20)%20%7D)%20)%20)%3B%0A%200%0A)%3B%0AShow(%20time%202%20%3D%20(HP%20Time()%20-%20s)%20%2F%201000000%20)%3B%3CBR%20%2F%3E%3CBR%20%2F%3EShow(%20All(%20pass_qty_row_array%201%20%3D%3D%20pass_qty_row_array%202%20)%20)%3B%0A%0A1%20-%20(time%201%20-%20time%202)%20%2F%20time%201%3C%2FCODE%3E%3C%2FPRE%3E%3CP%3EAbgesehen%20von%20der%20Definition%20der%20Kartenfunktion%20sieht%20ihre%20Verwendung%20im%20Code%20normalerweise%20viel%20sauberer%20aus%20als%20jede%20andere%20L%C3%B6sung.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-499732%22%20slang%3D%22en-US%22%20mode%3D%22NONE%22%3EBetreff%3A%20Textdatei%20lesen%2C%20Zeilen%20suchen%2C%20die%20eine%20bestimmte%20Zeichenfolge%20enthalten%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-499732%22%20slang%3D%22en-US%22%20mode%3D%22NONE%22%3E%3CP%3EHier%20ist%20eine%20andere%20M%C3%B6glichkeit%2C%20damit%20umzugehen.Es%20kann%20schneller%20oder%20langsamer%20sein%20...%20Ich%20wei%C3%9F%20es%20nicht.%3C%2FP%3E%0A%3CPRE%3E%3CCODE%20class%3D%22%20language-jsl%22%3Enames%20default%20to%20here(1)%3B%0Adt0%20%3D%20Open(%0A%20%22example.txt%22%2C%0A%20columns(%20New%20Column(%20%22Line%22%2C%20Character%2C%20%22Nominal%22%20)%20)%2C%0A%20Import%20Settings(%0A%20%20End%20Of%20Line(%20CRLF%2C%20CR%2C%20LF%20)%2C%0A%20%20End%20Of%20Field(%20CSV(%200%20)%20)%2C%0A%20%20Strip%20Quotes(%201%20)%2C%0A%20%20Use%20Apostrophe%20as%20Quotation%20Mark(%200%20)%2C%0A%20%20Use%20Regional%20Settings(%200%20)%2C%0A%20%20Scan%20Whole%20File(%201%20)%2C%0A%20%20Treat%20empty%20columns%20as%20numeric(%200%20)%2C%0A%20%20CompressNumericColumns(%200%20)%2C%0A%20%20CompressCharacterColumns(%200%20)%2C%0A%20%20CompressAllowListCheck(%200%20)%2C%0A%20%20Labels(%200%20)%2C%0A%20%20Column%20Names%20Start(%201%20)%2C%0A%20%20Data%20Starts(%201%20)%2C%0A%20%20Lines%20To%20Read(%20%22All%22%20)%2C%0A%20%20Year%20Rule(%20%2220xx%22%20)%0A%20)%0A)%3B%0A%0Apass_qty_row_array%20%3D%20dt0%20%26lt%3B%26lt%3B%20get%20rows%20where(contains(%20%3Afile_content%2C%20%22Pass%22)%20)%3B%3C%2FCODE%3E%3C%2FPRE%3E%3C%2FLINGO-BODY%3E
Choose Language Hide Translation Bar
LaserGuy
Level II

Read Text File, Find Lines Containing a Certain String

Hello Everyone,

 

Here is a script I have that works. I want to know if there is a better way to do the same thing.

 

I first use "load text file" to read a text file into the string variable "file_text1".

 

Next, I use "words" with "\!n" to separate the lines and load them into the string array "file_text2".

 

Next, I then load the "file_text2" into a data table. I then use the "get rows where" combined with "contains" to find the line number in the original text file that contains a certain string.

 

Is there a function equivalent to "get rows where" but usable for string arrays? This way, I can skip the portion to load into data table.

 

file_text1 = load text file( "example.txt" );
file_text2 = words(file_text1 , "\!n");
dt0 = new table("file_dt", new column("file_content", character) );
	dt0:file_content << set values(file_text2);

pass_qty_row_array = dt0 << get rows where(contains( :file_content, "Pass") );
1 ACCEPTED SOLUTION

Accepted Solutions
jthi
Super User

Re: Read Text File, Find Lines Containing a Certain String

There is also For Each, Filter Each (and Transform Each) which can help with cases like this and it should be fairly fast.

 

Names Default To Here(1);

file_txt = Load Text File("$SAMPLE_IMPORT_DATA/Animals.txt");
file_list = words(file_txt, "\!n");
search_word = "fall";

pass_qty_row_array = [];
For Each({line, idx}, file_text2,
	If(Contains(line, search_word),
		Insert Into(pass_qty_row_array, idx);
	);
);
show(pass_qty_row_array);

 

 

-Jarmo

View solution in original post

4 REPLIES 4
txnelson
Super User

Re: Read Text File, Find Lines Containing a Certain String

Here is another way to handle this.  It may be faster, or it may be slower.....I don't know.

names default to here(1);
dt0 = Open(
	"example.txt",
	columns( New Column( "Line", Character, "Nominal" ) ),
	Import Settings(
		End Of Line( CRLF, CR, LF ),
		End Of Field( CSV( 0 ) ),
		Strip Quotes( 1 ),
		Use Apostrophe as Quotation Mark( 0 ),
		Use Regional Settings( 0 ),
		Scan Whole File( 1 ),
		Treat empty columns as numeric( 0 ),
		CompressNumericColumns( 0 ),
		CompressCharacterColumns( 0 ),
		CompressAllowListCheck( 0 ),
		Labels( 0 ),
		Column Names Start( 1 ),
		Data Starts( 1 ),
		Lines To Read( "All" ),
		Year Rule( "20xx" )
	)
);

pass_qty_row_array = dt0 << get rows where(contains( :file_content, "Pass") );
Jim
ErraticAttack
Level VI

Re: Read Text File, Find Lines Containing a Certain String

Personally, I've created a Map(), Filter(), and Reduce() function for problems such as this.  Here is an example using the Map() function and on my system is is roughly 50% - 60% faster than creating a table to do the search.

 

Here is the map function (explicitly globalized):

::map = Function( {inputs /* list, function */ },
	/* uses a single underscore _ as the wild-card */
	{__i__, __result__, __list__, _, __, __keys__},
	__list__ = Eval( Arg( inputs, 1 ) );
	If( Is List( __list__ ),
		__result__ = {};
		Eval(
			Substitute(
				Expr(
					For( __i__ = 1, __i__ <= __N__, __i__++,
						_ = __list__[__i__];
						__result__[__i__] = __function__
					)
				)
			,
				Expr( __N__ ), N Items( __list__ ),
				Expr( __function__ ), Arg( inputs, 2 )
			);
		);
	,
		Is Associative Array( __list__ ),
		__result__ = [=>];
		__keys__ = __list__ << Get Keys;
		Eval(
			Substitute(
				Expr(
					For( __i__ = 1, __i__ <= __N__, __i__++,
						__ = __keys__[__i__];
						_ = __list__[__];
						__result__[__] = __function__
					)
				)
			,
				Expr( __N__ ), N Items( __keys__ ),
				Expr( __function__ ), Arg( inputs, 2 )
			)
		)
	);
	__result__
);

and here is a comparison of using a table vs. the map function:

Names Default To Here( 1 );
filename = Convert File Path( "$SAMPLE_IMPORT_DATA/UN Malaria 2009.csv", absolute, windows );
result = Load Text File( filename );
Show( result );

file_text2 = Words( result, "\!N" );
N = 10000;
word = "malaria";
s = HP Time();
Summation( i = 1, N,
	dt0 = New Table( "file_dt", New Column( "file_content", character ), Private );
	dt0:file_content << set values( file_text2 );

	pass_qty_row_array 1 = dt0 << get rows where( Contains( :file_content, word ) );
	close( dt0, No Save );
	0
);
Show( time 1 = (HP Time() - s) / 1000000 );

s = HP Time();
Summation( i = 1, N,
	pass_qty_row_array 2 = loc( Matrix( ::map({ file_text2, Contains( _, word ) }) ) );
	0
);
Show( time 2 = (HP Time() - s) / 1000000 );

Show( All( pass_qty_row_array 1 == pass_qty_row_array 2 ) ); 1 - (time 1 - time 2) / time 1

Aside from defining the map function, using it is usually much cleaner looking in code than any other solution.

Jordan
jthi
Super User

Re: Read Text File, Find Lines Containing a Certain String

There is also For Each, Filter Each (and Transform Each) which can help with cases like this and it should be fairly fast.

 

Names Default To Here(1);

file_txt = Load Text File("$SAMPLE_IMPORT_DATA/Animals.txt");
file_list = words(file_txt, "\!n");
search_word = "fall";

pass_qty_row_array = [];
For Each({line, idx}, file_text2,
	If(Contains(line, search_word),
		Insert Into(pass_qty_row_array, idx);
	);
);
show(pass_qty_row_array);

 

 

-Jarmo
LaserGuy
Level II

Re: Read Text File, Find Lines Containing a Certain String

Thank you everyone and especially jthi.

 

I have determined that using a for-loop on N Items(file_text2) was faster than using a data table. And then I saw jthi's method using "For Each", which from time measurements is even faster.

 

Recommended Articles