cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Discussions

Solve problems, and share tips and tricks with other JMP users.
%3CLINGO-SUB%20id%3D%22lingo-sub-499621%22%20slang%3D%22en-US%22%20mode%3D%22NONE%22%3ELeer%20archivo%20de%20texto%2C%20buscar%20l%C3%ADneas%20que%20contengan%20una%20determinada%20cadena%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-499621%22%20slang%3D%22en-US%22%20mode%3D%22NONE%22%3E%3CP%3EHola%20a%20todos%2C%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EAqu%C3%AD%20hay%20un%20script%20que%20tengo%20que%20funciona.%20Quiero%20saber%20si%20hay%20una%20mejor%20manera%20de%20hacer%20lo%20mismo.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EPrimero%20uso%20%22cargar%20archivo%20de%20texto%22%20para%20leer%20un%20archivo%20de%20texto%20en%20la%20variable%20de%20cadena%20%22file_text1%22.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EA%20continuaci%C3%B3n%2C%20uso%20%22palabras%22%20con%20%22%5C!n%22%20para%20separar%20las%20l%C3%ADneas%20y%20cargarlas%20en%20la%20matriz%20de%20cadenas%20%22file_text2%22.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3ELuego%2C%20cargo%20el%20%22file_text2%22%20en%20una%20tabla%20de%20datos.%20Luego%20uso%20%22obtener%20filas%20donde%22%20combinado%20con%20%22contiene%22%20para%20encontrar%20el%20n%C3%BAmero%20de%20l%C3%ADnea%20en%20el%20archivo%20de%20texto%20original%20que%20contiene%20una%20determinada%20cadena.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3E%C2%BFExiste%20una%20funci%C3%B3n%20equivalente%20a%20%22obtener%20filas%20donde%22%20pero%20utilizable%20para%20matrices%20de%20cadenas%3F%20De%20esta%20manera%2C%20puedo%20omitir%20la%20parte%20para%20cargar%20en%20la%20tabla%20de%20datos.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CPRE%3E%3CCODE%20class%3D%22%20language-jsl%22%3Efile_text1%20%3D%20load%20text%20file(%20%22example.txt%22%20)%3B%0Afile_text2%20%3D%20words(file_text1%20%2C%20%22%5C!n%22)%3B%0Adt0%20%3D%20new%20table(%22file_dt%22%2C%20new%20column(%22file_content%22%2C%20character)%20)%3B%0A%20dt0%3Afile_content%20%26lt%3B%26lt%3B%20set%20values(file_text2)%3B%0A%0Apass_qty_row_array%20%3D%20dt0%20%26lt%3B%26lt%3B%20get%20rows%20where(contains(%20%3Afile_content%2C%20%22Pass%22)%20)%3B%3C%2FCODE%3E%3C%2FPRE%3E%3C%2FLINGO-BODY%3E%3CLINGO-LABS%20id%3D%22lingo-labs-499621%22%20slang%3D%22en-US%22%20mode%3D%22NONE%22%3E%3CLINGO-LABEL%3Esecuencias%20de%20comandos%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E%3CLINGO-SUB%20id%3D%22lingo-sub-504688%22%20slang%3D%22en-US%22%20mode%3D%22NONE%22%3ERe%3A%20Leer%20archivo%20de%20texto%2C%20buscar%20l%C3%ADneas%20que%20contengan%20una%20determinada%20cadena%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-504688%22%20slang%3D%22en-US%22%20mode%3D%22NONE%22%3E%3CP%3EGracias%20a%20todos%20y%20especialmente%20a%20jthi.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EHe%20determinado%20que%20usar%20un%20bucle%20for%20en%20N%20elementos%20(file_text2)%20fue%20m%C3%A1s%20r%C3%A1pido%20que%20usar%20una%20tabla%20de%20datos.%20Y%20luego%20vi%20el%20m%C3%A9todo%20de%20jthi%20usando%20%22Para%20cada%20uno%22%2C%20que%20a%20partir%20de%20las%20mediciones%20de%20tiempo%20es%20a%C3%BAn%20m%C3%A1s%20r%C3%A1pido.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-499843%22%20slang%3D%22en-US%22%20mode%3D%22NONE%22%3ERe%3A%20Leer%20archivo%20de%20texto%2C%20buscar%20l%C3%ADneas%20que%20contengan%20una%20determinada%20cadena%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-499843%22%20slang%3D%22en-US%22%20mode%3D%22NONE%22%3E%3CP%3ETambi%C3%A9n%20hay%20For%20Each%2C%20Filter%20Each%20(y%20Transform%20Each)%20que%20pueden%20ayudar%20con%20casos%20como%20este%20y%20deber%C3%ADa%20ser%20bastante%20r%C3%A1pido.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CPRE%3E%3CCODE%20class%3D%22%20language-jsl%22%3ENames%20Default%20To%20Here(1)%3B%0A%0Afile_txt%20%3D%20Load%20Text%20File(%22%24SAMPLE_IMPORT_DATA%2FAnimals.txt%22)%3B%0Afile_list%20%3D%20words(file_txt%2C%20%22%5C!n%22)%3B%0Asearch_word%20%3D%20%22fall%22%3B%0A%0Apass_qty_row_array%20%3D%20%5B%5D%3B%0AFor%20Each(%7Bline%2C%20idx%7D%2C%20file_text2%2C%0A%20If(Contains(line%2C%20search_word)%2C%0A%20%20Insert%20Into(pass_qty_row_array%2C%20idx)%3B%0A%20)%3B%0A)%3B%0Ashow(pass_qty_row_array)%3B%3C%2FCODE%3E%3C%2FPRE%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-499767%22%20slang%3D%22en-US%22%20mode%3D%22NONE%22%3ERe%3A%20Leer%20archivo%20de%20texto%2C%20buscar%20l%C3%ADneas%20que%20contengan%20una%20determinada%20cadena%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-499767%22%20slang%3D%22en-US%22%20mode%3D%22NONE%22%3E%3CP%3EPersonalmente%2C%20he%20creado%20una%20funci%C3%B3n%20Map()%2C%20Filter()%20y%20Reduce()%20para%20problemas%20como%20este.Aqu%C3%AD%20hay%20un%20ejemplo%20que%20usa%20la%20funci%C3%B3n%20Map%20()%20y%20en%20mi%20sistema%20es%20aproximadamente%20un%2050%25%20-%2060%25%20m%C3%A1s%20r%C3%A1pido%20que%20crear%20una%20tabla%20para%20realizar%20la%20b%C3%BAsqueda.%3C%2FP%3E%3CP%3E%26nbsp%3B%3C%2FP%3E%3CP%3EAqu%C3%AD%20est%C3%A1%20la%20funci%C3%B3n%20de%20mapa%20(expl%C3%ADcitamente%20globalizada)%3A%3C%2FP%3E%3CPRE%3E%3CCODE%20class%3D%22%20language-jsl%22%3E%3A%3Amap%20%3D%20Function(%20%7Binputs%20%2F*%20list%2C%20function%20*%2F%20%7D%2C%0A%20%2F*%20uses%20a%20single%20underscore%20_%20as%20the%20wild-card%20*%2F%0A%20%7B__i__%2C%20__result__%2C%20__list__%2C%20_%2C%20__%2C%20__keys__%7D%2C%0A%20__list__%20%3D%20Eval(%20Arg(%20inputs%2C%201%20)%20)%3B%0A%20If(%20Is%20List(%20__list__%20)%2C%0A%20%20__result__%20%3D%20%7B%7D%3B%0A%20%20Eval(%0A%20%20%20Substitute(%0A%20%20%20%20Expr(%0A%20%20%20%20%20For(%20__i__%20%3D%201%2C%20__i__%20%26lt%3B%3D%20__N__%2C%20__i__%2B%2B%2C%0A%20%20%20%20%20%20_%20%3D%20__list__%5B__i__%5D%3B%0A%20%20%20%20%20%20__result__%5B__i__%5D%20%3D%20__function__%0A%20%20%20%20%20)%0A%20%20%20%20)%0A%20%20%20%2C%0A%20%20%20%20Expr(%20__N__%20)%2C%20N%20Items(%20__list__%20)%2C%0A%20%20%20%20Expr(%20__function__%20)%2C%20Arg(%20inputs%2C%202%20)%0A%20%20%20)%3B%0A%20%20)%3B%0A%20%2C%0A%20%20Is%20Associative%20Array(%20__list__%20)%2C%0A%20%20__result__%20%3D%20%5B%3D%26gt%3B%5D%3B%0A%20%20__keys__%20%3D%20__list__%20%26lt%3B%26lt%3B%20Get%20Keys%3B%0A%20%20Eval(%0A%20%20%20Substitute(%0A%20%20%20%20Expr(%0A%20%20%20%20%20For(%20__i__%20%3D%201%2C%20__i__%20%26lt%3B%3D%20__N__%2C%20__i__%2B%2B%2C%0A%20%20%20%20%20%20__%20%3D%20__keys__%5B__i__%5D%3B%0A%20%20%20%20%20%20_%20%3D%20__list__%5B__%5D%3B%0A%20%20%20%20%20%20__result__%5B__%5D%20%3D%20__function__%0A%20%20%20%20%20)%0A%20%20%20%20)%0A%20%20%20%2C%0A%20%20%20%20Expr(%20__N__%20)%2C%20N%20Items(%20__keys__%20)%2C%0A%20%20%20%20Expr(%20__function__%20)%2C%20Arg(%20inputs%2C%202%20)%0A%20%20%20)%0A%20%20)%0A%20)%3B%0A%20__result__%0A)%3B%3C%2FCODE%3E%3C%2FPRE%3E%3CP%3Ey%20aqu%C3%AD%20hay%20una%20comparaci%C3%B3n%20entre%20el%20uso%20de%20una%20tabla%20y%20la%20funci%C3%B3n%20de%20mapa%3A%3C%2FP%3E%3CPRE%3E%3CCODE%20class%3D%22%20language-jsl%22%3ENames%20Default%20To%20Here(%201%20)%3B%0Afilename%20%3D%20Convert%20File%20Path(%20%22%24SAMPLE_IMPORT_DATA%2FUN%20Malaria%202009.csv%22%2C%20absolute%2C%20windows%20)%3B%0Aresult%20%3D%20Load%20Text%20File(%20filename%20)%3B%0AShow(%20result%20)%3B%0A%0Afile_text2%20%3D%20Words(%20result%2C%20%22%5C!N%22%20)%3B%0AN%20%3D%2010000%3B%0Aword%20%3D%20%22malaria%22%3B%0As%20%3D%20HP%20Time()%3B%0ASummation(%20i%20%3D%201%2C%20N%2C%0A%20dt0%20%3D%20New%20Table(%20%22file_dt%22%2C%20New%20Column(%20%22file_content%22%2C%20character%20)%2C%20Private%20)%3B%0A%20dt0%3Afile_content%20%26lt%3B%26lt%3B%20set%20values(%20file_text2%20)%3B%0A%0A%20pass_qty_row_array%201%20%3D%20dt0%20%26lt%3B%26lt%3B%20get%20rows%20where(%20Contains(%20%3Afile_content%2C%20word%20)%20)%3B%0A%20close(%20dt0%2C%20No%20Save%20)%3B%0A%200%0A)%3B%0AShow(%20time%201%20%3D%20(HP%20Time()%20-%20s)%20%2F%201000000%20)%3B%0A%0As%20%3D%20HP%20Time()%3B%0ASummation(%20i%20%3D%201%2C%20N%2C%0A%20pass_qty_row_array%202%20%3D%20loc(%20Matrix(%20%3A%3Amap(%7B%20file_text2%2C%20Contains(%20_%2C%20word%20)%20%7D)%20)%20)%3B%0A%200%0A)%3B%0AShow(%20time%202%20%3D%20(HP%20Time()%20-%20s)%20%2F%201000000%20)%3B%3CBR%20%2F%3E%3CBR%20%2F%3EShow(%20All(%20pass_qty_row_array%201%20%3D%3D%20pass_qty_row_array%202%20)%20)%3B%0A%0A1%20-%20(time%201%20-%20time%202)%20%2F%20time%201%3C%2FCODE%3E%3C%2FPRE%3E%3CP%3EAdem%C3%A1s%20de%20definir%20la%20funci%C3%B3n%20de%20mapa%2C%20su%20uso%20suele%20tener%20un%20aspecto%20mucho%20m%C3%A1s%20limpio%20en%20el%20c%C3%B3digo%20que%20cualquier%20otra%20soluci%C3%B3n.%3C%2FP%3E%3C%2FLINGO-BODY%3E%3CLINGO-SUB%20id%3D%22lingo-sub-499732%22%20slang%3D%22en-US%22%20mode%3D%22NONE%22%3ERe%3A%20Leer%20archivo%20de%20texto%2C%20buscar%20l%C3%ADneas%20que%20contengan%20una%20determinada%20cadena%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-499732%22%20slang%3D%22en-US%22%20mode%3D%22NONE%22%3E%3CP%3EAqu%C3%AD%20hay%20otra%20manera%20de%20manejar%20esto.Puede%20ser%20m%C3%A1s%20r%C3%A1pido%2C%20o%20puede%20ser%20m%C3%A1s%20lento...%20No%20lo%20s%C3%A9.%3C%2FP%3E%0A%3CPRE%3E%3CCODE%20class%3D%22%20language-jsl%22%3Enames%20default%20to%20here(1)%3B%0Adt0%20%3D%20Open(%0A%20%22example.txt%22%2C%0A%20columns(%20New%20Column(%20%22Line%22%2C%20Character%2C%20%22Nominal%22%20)%20)%2C%0A%20Import%20Settings(%0A%20%20End%20Of%20Line(%20CRLF%2C%20CR%2C%20LF%20)%2C%0A%20%20End%20Of%20Field(%20CSV(%200%20)%20)%2C%0A%20%20Strip%20Quotes(%201%20)%2C%0A%20%20Use%20Apostrophe%20as%20Quotation%20Mark(%200%20)%2C%0A%20%20Use%20Regional%20Settings(%200%20)%2C%0A%20%20Scan%20Whole%20File(%201%20)%2C%0A%20%20Treat%20empty%20columns%20as%20numeric(%200%20)%2C%0A%20%20CompressNumericColumns(%200%20)%2C%0A%20%20CompressCharacterColumns(%200%20)%2C%0A%20%20CompressAllowListCheck(%200%20)%2C%0A%20%20Labels(%200%20)%2C%0A%20%20Column%20Names%20Start(%201%20)%2C%0A%20%20Data%20Starts(%201%20)%2C%0A%20%20Lines%20To%20Read(%20%22All%22%20)%2C%0A%20%20Year%20Rule(%20%2220xx%22%20)%0A%20)%0A)%3B%0A%0Apass_qty_row_array%20%3D%20dt0%20%26lt%3B%26lt%3B%20get%20rows%20where(contains(%20%3Afile_content%2C%20%22Pass%22)%20)%3B%3C%2FCODE%3E%3C%2FPRE%3E%3C%2FLINGO-BODY%3E
Choose Language Hide Translation Bar
LaserGuy
Level II

Read Text File, Find Lines Containing a Certain String

Hello Everyone,

 

Here is a script I have that works. I want to know if there is a better way to do the same thing.

 

I first use "load text file" to read a text file into the string variable "file_text1".

 

Next, I use "words" with "\!n" to separate the lines and load them into the string array "file_text2".

 

Next, I then load the "file_text2" into a data table. I then use the "get rows where" combined with "contains" to find the line number in the original text file that contains a certain string.

 

Is there a function equivalent to "get rows where" but usable for string arrays? This way, I can skip the portion to load into data table.

 

file_text1 = load text file( "example.txt" );
file_text2 = words(file_text1 , "\!n");
dt0 = new table("file_dt", new column("file_content", character) );
	dt0:file_content << set values(file_text2);

pass_qty_row_array = dt0 << get rows where(contains( :file_content, "Pass") );
1 ACCEPTED SOLUTION

Accepted Solutions
jthi
Super User

Re: Read Text File, Find Lines Containing a Certain String

There is also For Each, Filter Each (and Transform Each) which can help with cases like this and it should be fairly fast.

 

Names Default To Here(1);

file_txt = Load Text File("$SAMPLE_IMPORT_DATA/Animals.txt");
file_list = words(file_txt, "\!n");
search_word = "fall";

pass_qty_row_array = [];
For Each({line, idx}, file_text2,
	If(Contains(line, search_word),
		Insert Into(pass_qty_row_array, idx);
	);
);
show(pass_qty_row_array);

 

 

-Jarmo

View solution in original post

4 REPLIES 4
txnelson
Super User

Re: Read Text File, Find Lines Containing a Certain String

Here is another way to handle this.  It may be faster, or it may be slower.....I don't know.

names default to here(1);
dt0 = Open(
	"example.txt",
	columns( New Column( "Line", Character, "Nominal" ) ),
	Import Settings(
		End Of Line( CRLF, CR, LF ),
		End Of Field( CSV( 0 ) ),
		Strip Quotes( 1 ),
		Use Apostrophe as Quotation Mark( 0 ),
		Use Regional Settings( 0 ),
		Scan Whole File( 1 ),
		Treat empty columns as numeric( 0 ),
		CompressNumericColumns( 0 ),
		CompressCharacterColumns( 0 ),
		CompressAllowListCheck( 0 ),
		Labels( 0 ),
		Column Names Start( 1 ),
		Data Starts( 1 ),
		Lines To Read( "All" ),
		Year Rule( "20xx" )
	)
);

pass_qty_row_array = dt0 << get rows where(contains( :file_content, "Pass") );
Jim
ErraticAttack
Level VI

Re: Read Text File, Find Lines Containing a Certain String

Personally, I've created a Map(), Filter(), and Reduce() function for problems such as this.  Here is an example using the Map() function and on my system is is roughly 50% - 60% faster than creating a table to do the search.

 

Here is the map function (explicitly globalized):

::map = Function( {inputs /* list, function */ },
	/* uses a single underscore _ as the wild-card */
	{__i__, __result__, __list__, _, __, __keys__},
	__list__ = Eval( Arg( inputs, 1 ) );
	If( Is List( __list__ ),
		__result__ = {};
		Eval(
			Substitute(
				Expr(
					For( __i__ = 1, __i__ <= __N__, __i__++,
						_ = __list__[__i__];
						__result__[__i__] = __function__
					)
				)
			,
				Expr( __N__ ), N Items( __list__ ),
				Expr( __function__ ), Arg( inputs, 2 )
			);
		);
	,
		Is Associative Array( __list__ ),
		__result__ = [=>];
		__keys__ = __list__ << Get Keys;
		Eval(
			Substitute(
				Expr(
					For( __i__ = 1, __i__ <= __N__, __i__++,
						__ = __keys__[__i__];
						_ = __list__[__];
						__result__[__] = __function__
					)
				)
			,
				Expr( __N__ ), N Items( __keys__ ),
				Expr( __function__ ), Arg( inputs, 2 )
			)
		)
	);
	__result__
);

and here is a comparison of using a table vs. the map function:

Names Default To Here( 1 );
filename = Convert File Path( "$SAMPLE_IMPORT_DATA/UN Malaria 2009.csv", absolute, windows );
result = Load Text File( filename );
Show( result );

file_text2 = Words( result, "\!N" );
N = 10000;
word = "malaria";
s = HP Time();
Summation( i = 1, N,
	dt0 = New Table( "file_dt", New Column( "file_content", character ), Private );
	dt0:file_content << set values( file_text2 );

	pass_qty_row_array 1 = dt0 << get rows where( Contains( :file_content, word ) );
	close( dt0, No Save );
	0
);
Show( time 1 = (HP Time() - s) / 1000000 );

s = HP Time();
Summation( i = 1, N,
	pass_qty_row_array 2 = loc( Matrix( ::map({ file_text2, Contains( _, word ) }) ) );
	0
);
Show( time 2 = (HP Time() - s) / 1000000 );

Show( All( pass_qty_row_array 1 == pass_qty_row_array 2 ) ); 1 - (time 1 - time 2) / time 1

Aside from defining the map function, using it is usually much cleaner looking in code than any other solution.

Jordan
jthi
Super User

Re: Read Text File, Find Lines Containing a Certain String

There is also For Each, Filter Each (and Transform Each) which can help with cases like this and it should be fairly fast.

 

Names Default To Here(1);

file_txt = Load Text File("$SAMPLE_IMPORT_DATA/Animals.txt");
file_list = words(file_txt, "\!n");
search_word = "fall";

pass_qty_row_array = [];
For Each({line, idx}, file_text2,
	If(Contains(line, search_word),
		Insert Into(pass_qty_row_array, idx);
	);
);
show(pass_qty_row_array);

 

 

-Jarmo
LaserGuy
Level II

Re: Read Text File, Find Lines Containing a Certain String

Thank you everyone and especially jthi.

 

I have determined that using a for-loop on N Items(file_text2) was faster than using a data table. And then I saw jthi's method using "For Each", which from time measurements is even faster.

 

Recommended Articles