Discussions

EdgeFrog719 · Jun 10, 2023 4:51 PM

Hi, I was trying to do Multiple series stack for data set (3.5M rows & 225 Columns). It's giving RAM overflow issue (Out of Memory). Using 32 GB RAM. Is there any way to do this in serial way to avoid the memory issue and get the expected data table?

peng_liu · Jul 27, 2022 10:50 PM

A single data table with 3.5M rows & 225 columns = 787.5M rows, costs me 12+GB. The regular stacked table will have 3.5M X 225 = 787.5M rows, and two columns (second being the label), which will be easily in the range around 20GB, So they won't co-exist on 32GB RAM.

I suggest the following sequence:

Take original 3.5M rows & 225 Columns data, save individual columns to separate files, one at a time. And close the new file every time after saving it. Every saved file has two columns, one is the series, the other is the indicator or label. Use numerical type for label. In your case, just use 1 through 225. (Optional, try compress the label column. And the compress column will use a 2 byte field.)
Close the big original file.
Open the first saved file with one column. Then Concatenate other individual files, one at a time, with "Append to first table" checked. And close every individual files immediately after they get appended.

I gave it a try to append 3.5M rows & 2 column small table to the 787.5M row & 2 column big table. The top memory consumption by JMP sometimes jumped over 20GB, but survived, and reduced to ~18+GB after concatenation got done. And I kept appending for another couple of rounds, I observed similar behavior. And I see more substantial memory saving if I use compressed columns. Anyway, seems worth a try. Good luck!

BTW, how do you intend to analyze the data? What platform do you want to use?

View solution in original post

peng_liu · Jul 27, 2022 10:50 PM

A single data table with 3.5M rows & 225 columns = 787.5M rows, costs me 12+GB. The regular stacked table will have 3.5M X 225 = 787.5M rows, and two columns (second being the label), which will be easily in the range around 20GB, So they won't co-exist on 32GB RAM.

I suggest the following sequence:

Take original 3.5M rows & 225 Columns data, save individual columns to separate files, one at a time. And close the new file every time after saving it. Every saved file has two columns, one is the series, the other is the indicator or label. Use numerical type for label. In your case, just use 1 through 225. (Optional, try compress the label column. And the compress column will use a 2 byte field.)
Close the big original file.
Open the first saved file with one column. Then Concatenate other individual files, one at a time, with "Append to first table" checked. And close every individual files immediately after they get appended.

I gave it a try to append 3.5M rows & 2 column small table to the 787.5M row & 2 column big table. The top memory consumption by JMP sometimes jumped over 20GB, but survived, and reduced to ~18+GB after concatenation got done. And I kept appending for another couple of rounds, I observed similar behavior. And I see more substantial memory saving if I use compressed columns. Anyway, seems worth a try. Good luck!

BTW, how do you intend to analyze the data? What platform do you want to use?

Discussions

JMP Multiple Series Stack memory Issue

Re: JMP Multiple Series Stack memory Issue

Re: JMP Multiple Series Stack memory Issue

Recommended Articles