Discussions

lala · Nov 1, 2024 1:03 AM

There are now 2400 such JMP files in the same directory C:\8: JMP tables with file names with detailed time stamps and 5000 rows of data each.For example, C:\8\20241031112503.JMP.

S5 E5 S4 E4 S3 E3 S2 E2 S1 E1 B1 T1 B2 T2 B3 T3 B4 T4 B5 T5 enco
6024 2 6023 10 6021 6 6020 96 6019 18 6018 400 6017 16 6016 78 6015 41 6014 8 603198
1263 88 1262 290 1261 254 1260 347 1259 301 1258 544 1257 364 1256 630 1255 1123 1254 362 600377

Now need to reorganize the table to save the file according to the code number of the "enco" column, of course, the same "enco" so that the timestamp data in the original file name should be added to distinguish.

Instructions:

"enco" is not repeated on each line of the same file.And enco isn't exactly the same in 2400 files

The key is to find a way to do this quickly.Either JSL or python methods can be used with JMP software (indicating python in the JMP environment).
Of course, if the original file is in CSV format or some database format that is more convenient for this kind of splitting and sorting, you can provide better suggestions.Thank you!

lala · Nov 1, 2024 04:00 AM

I tried to combine them all into a JMP table and then extract subsets.The speed is slow.

Whether to create different memory arrays in memory to concatenate is faster, but this will make the code more complex.

Thanks Experts!

jthi · Nov 1, 2024 04:01 AM

I would first try with Multiple File Import if it can handle that (build single table with source (filename) and go from there). Then I would maybe try some JSL solution, then possible some other JSL solution and lastly Python Integration with DuckDB.

-Jarmo

lala · Nov 1, 2024 04:08 AM

Thanks Experts!

Yes, I also want to try new methods.Ask experts how to implement Python and DuckDB specifically?Thank you very much!

jthi · Nov 1, 2024 04:49 AM

I can provide suggestions for JMP (and JSL) but for that I do need more examples and proper understanding of what you are trying to achieve. Are you trying to have one unique file for each enco? Do all tables contain exactly the same columns? Do they always contain similar data (integers)? Do they always have just two rows? Are the file names always in the same format? Do the tables have headers properly set or is the first row character data with the headers?

I consider that (pure) Python questions belong (mostly) to other forums. If you have the data as .csv files using DuckDB should be simple task. Also when using Python it is definitely worth it to learn to read the documentation of each of the libraries you are using instead of just relying on LLM (Reading Multiple Files – DuckDB).

-Jarmo

lala · Nov 1, 2024 2:12 AM

Yes, I also know that GPT-4o is not familiar with JSL of JMP software.

So later I let it handle CSV directly.

Then add the JSL syntax to make it modify again.

import os
import duckdb
import pandas as pd
from pathlib import Path
output_dir = Path(r"C:\3")
output_dir.mkdir(exist_ok=True)
con = duckdb.connect(":memory:")
def process_csv_files():
csv_files = list(Path(r"C:\8").glob("*.CSV"))
first_file = csv_files[0]
timestamp = first_file.stem 
con.execute("""
CREATE TABLE all_data AS 
SELECT *, ? as file_timestamp 
FROM read_csv_auto(?)
""", [timestamp, str(first_file)])
for csv_file in csv_files[1:]:
timestamp = csv_file.stem 
con.execute("""
INSERT INTO all_data 
SELECT *, ? as file_timestamp 
FROM read_csv_auto(?)
""", [timestamp, str(csv_file)])
enco_values = con.execute("SELECT DISTINCT enco FROM all_data").fetchall()
for (enco,) in enco_values:
query = """
SELECT * 
FROM all_data 
WHERE enco = ?
ORDER BY file_timestamp
"""
df = con.execute(query, [enco]).df()
output_file = output_dir / f"{enco}.csv"
df.to_csv(output_file, index=False)
if __name__ == "__main__":
try:
process_csv_files()
except Exception as e:
print(f"E: {str(e)}")

lala · Nov 1, 2024 3:12 AM

By contrast

Processing CSV is much faster.

lala · Nov 4, 2024 02:43 AM

Compared, or directly use CSV libraries and dictionaries, containers are the fastest.

Discussions

How can I quickly reassemble and generate different tables from many JMP tables?

Re: How can I quickly reassemble and generate different tables from many JMP tables?

Re: How can I quickly reassemble and generate different tables from many JMP tables?

Re: How can I quickly reassemble and generate different tables from many JMP tables?

Re: How can I quickly reassemble and generate different tables from many JMP tables?

Re: How can I quickly reassemble and generate different tables from many JMP tables?

Re: How can I quickly reassemble and generate different tables from many JMP tables?

Re: How can I quickly reassemble and generate different tables from many JMP tables?

Recommended Articles