An Analysis of ETFs in China's Stock Market Using Dynamic Time Warping (2021-US-30MP-905)

Level: Beginner

Lim Yong Kai, Student, Singapore Management University
Chen Li, Student, Singapore Management University

Economic theories are supposed to interpret why and how the economy behaves and then determine the best solutions to influence or solve the economic phenomena. However, these theories are full of assumptions, hypotheses and contexts, in terms of moral values and politics. Price movements of stock markets have never been explained well by any economic theories. Hence, our question as investors: Is there a holistic way to understand the price movements in the market without application of any economic theory? An option would be to use unsupervised learning to detect objective patterns of the subject without the requirement of any domain knowledge. We believe one approach to understand real-world complexity is to get the pattern first, followed by forming and studying the theories.

In this paper, we explore more than 100 ETFs in China's stock market without prior domain knowledge of each ETF by using dynamic time warping clustering (DTW) and the agglomerative hierarchical clustering method to detect the similarities of their price movements. Our results show that clusters from the DTW method largely coincide with the type of industries that ETFs involve. Analysis of the clusters’ price movement also revealed that certain industries performed better after 2019, when compared to 2018 in light of China’s new self-reliance economic direction.

Auto-generated transcript...

Speaker	Transcript
LIM Yong Kai	Okay. Hi everybody. So I'll be presenting our paper.
	The paper title is Analysis of Stock Market with Unsupervised Learning, specifically an analysis of ETFs in China stock market using dynamic time warping.
	So, my name is Yong Kai the author of the paper, and Chen Li is my coauthor. So the paper was something for Associate Professor Kam in a master's program at Singapore Management University.
	So the agenda of my presentation today as follows. First, I'll start off with an introduction and objective of the paper.
	Next will be the data preparation, followed by the clustering. Next will be the analysis of our clusters' performance and, finally, will be the conclusion and the future works of our paper.
	So I start off with the introduction and objective.
	So exchange traded funds, or ETFs in short, allows investors to get access to many stocks across various industries and they tend to have better risk management through the diversification in their portfolio.
	So China's stock exchange is the world's second largest stock exchange operator, and due to the popularity of ETFs, many new ETFs have been created over the years, which makes it hard for investors to select and choose the ETF to invest in.
	So the objective of our paper aims to narrow down the vast number of ETFs and help investors with little domain knowledge to build their investment portfolio by using unsupervised learning techniques.
	So two clustering techniques will be used here, mainly the dynamic time warping and also hierarchical clustering.
	So these two clustering outperform to aggregate the ETFs into clusters with similar price movement pattern. So in this paper, we will look at the ETFs listed in the Shanghai and Shenzhen stock market in China.
	So the software tools that will be using is definitely JMP Pro. We're using the version 15.2. JMP Pro is used in our data preparation and also our analysis and results analysis using Graph Builder.
	And we will also use SAS Enterprise Miner for our clustering, which will perform dynamic time warping and also hierarchical clustering.
	Moving on will be the data preparation.
	So six different data sets were used, mainly they are the daily closing price of Shanghai's ETFs from 2018 to 2020, and also the daily closing price of Shenzhen's ETFs, similarly from 2018 to 2020.
	So the workflow to prepare the data for clustering is as follows. So we will first import our six data sets into JMP Pro.
	And then we would join the multiple tables for both Shanghai and Shenzhen ETFs. And next we will use the missing data pattern function in JMP Pro to clean up the data set. Thereafter, we also tabulate the monthly return for each ETF using the formula as given, which is
	the price at the end of the month, minus price at the start of the month, divided by the price at the start of the month.
	And finally, we have combined both tables together and now it's ready for clustering.
	So firstly we will use the import multiple files function in JMP to batch import our six files from our raw data set for the ???.
	So here, you can see, we are using the input multiple files function, and this is the folder for our data sets.
	So, as you can see, with this function is very easy for us to import all the different data sets into JMP Pro at once.
	So, as you can see our 6 data sets have been loaded.
	Next, our second step will be needing to join all the different data...all the tables together.
	Here I'm using the Shanghai 2018 data and I'm performing an outer join with 2019 data and the matching column will be the serial number of each ETF.
	So after combining this, it will comprise of all the ETF data for 2018 to 2019.
	Next, I'll perform another outer join with the 2020 data. And now
	the resulting data set will have all the data, ranging from 2018 to 2020 for Shanghai's ETFs. And of course we would repeat the process for Shenzhen ETF data set as well.
	So, moving on, we'll check for missing data, and we will using the missing data pattern function under the tables tool bar in JMP Pro
	So under tables tool bar, we can choose the missing data pattern and we can select all our columns to check for missing data. As you can see,
	there are 98 columns that have no missing data, and we can select it which will ??? back to our main table. We can select that 98 rows and this will be our resulting data sets for
	clustering.
	So this will be also performed for the other data sets.
	Next we we'll also do our tabulate our monthly return, which is the formula I mentioned earlier, and what we do is we are insert a column into the data.
	Of course, we will insert a formula.
	As mentioned, the formula will be the closing price at the end of the month, which is on 31st of January 2018, minus the first day, which is the second.
	And, of course, divided by first date. It should give us the monthly return in a proportion of percentage. So of course we can fine tune these, we can change the column name to January of 2018, and of course, we can change the format to percentage with two decimal places.
	So here we go is the monthly return for January, and we can repeat for all the months throughout the entire data set.
	And lastly, after both data sets for Shenzhen and Shanghai ETFs are prepared, we will need to combine them together.
	So we will use the concatenate function in the table tool bar to combine both data sets into one, which will be used for clustering modeling after. So as you can see, both data for
	Shenzhen and Shanghai ETFs have been combined into one with 143 rows. We should be using our clustering thereafter.
	So now moving on to our clustering.
	hierarchical clustering uses a proximity matrix to determine pairwise similarity and dissimilarity between each monthly return
	using Euclidean distance. On the right, dynamic time warping is used to compare similarity between time series, which accounts for the time factor. It has equal factor in propagation delays and even detect time series that share similar patterns that are slightly out of phase.
	So although JMP Pro is able to perform hierarchical clustering, dynamic time warping is only available on SAS Enterprise Miner, hence both clustering modeling was performed in SAS Enterprise Miner.
	Now perhaps JMP Pro might want to consider including dynamic time warping in the clustering function in the future, so for us users, we can have an all in one software to perform our entire analysis from data preparation to modeling and, finally, the visualization of our results.
	So after the clustering is completed, the data was loaded back into JMP Pro from SAS Enterprise Miner for visualization and analysis.
	So clustering is an unsupervised modeling method and JMP Pro Graph Builder is very useful in aiding to visualize these results.
	An appropriate visualization to be used is the heat map, which allows users to quickly get insights from the clustering result.
	So along the X axis...the Y axis are the different ETFs and the X axis are the monthly returns of ETF. So from this heat map, we can quickly observe that within this cluster, the monthly return shows signs of similarity indicated by the intensity of the color, which is the monthly return.
	So next we will work to compare two clusters from both clustering techniques. On the left will be the dynamic time warping and on the right will be hierarchical clustering.
	So dynamic time warping algorithm clustered seven different ETFs, which all belongs to the growth enterprise market.
	On the left and on the right are the hierarchical clustering which is a cluster of a combination of some growth enterprise ETFs and other ETFs together.
	So dynamic time warping also cluster two additional ETFs belonging to the enterprise sector, which was missing in the hierarchical clustering and they are boxed in red on the left.
	This exhibit merits of the dynamic time warping algorithm in comparison to hierarchical clustering algorithm, where it can pick up ??? pattern when the clutering is performed. So I will show...
	So, as you can see, this is what we have done in JMP Pro. On the left will be the dynamic time warping and on the right will be the clustering. So using Graph Builder, we can interactively see
	the same ETF within the same period as they are both linked to the same data sets. So other comparisions, we can have a look at cluster five and cluster nine.
	So the local data filter here allows us to toggle between the different clusters easily to make our comparison. You can see, we can look at the similar, as you can see these few are not present in this dynamic time warping. As you can see here,
	it's linked.
	And of course, so we can maybe have a look at the last one, so these two are comparative clusters that are the same...similar, so the dynamic time warping grouped four ETFs together, whereas the hierarchical grouped about eight of them together.
	So, moving back.
	So there are several similar results when we do both clusters, when you compare them using the JMP Pro Graph Builder heat map, which is also documented in our paper so we will put our paper on JMP Community for all our findings in our paper.
	So comparing the clustering results with the ETF portfolio composition, dynamic time warping algorithm managed to cluster ETFs of similar industry or portfolio more accurately than hierarchical clustering.
	This is may be because, instead of only calculating the Euclidean distance between same time period as per hierarchical clustering algorithm,
	the dynamic time warping can slide along that time axis to calculate the shortest distance between two times series and also detect patterns that are slightly out of phase.
	So now we move into the analysis of our clusters in terms of the performers in monthly return.
	So JMP Pro Graph Builder was used to plot the monthly return trends so Graph Builder allows us to build visualizations using multiple ETFs monthly returns.
	This is a box plot for each month and those show a smooth trend line through out the three years period. So the interactive function allows us as users to ??? insights for visualization for analysis.
	And from the graph, we can also observe that the range of the fluctuation from 2019 to 2020, in blue here, is higher or larger as compared to 2018.
	The inter quarterile range of the months that follow are also way larger which signifies that for those months the variance for the monthly return amongst the ETFs are also larger.
	So from the graph you can see that the best performing months are, in fact, February of 2019, you can see here, July of 2020 and also
	February of 2020. And the contrary, the worst performing months are much of 2020,
	followed by December of 2018 and then June of 2018.
	So drilling down to display only specific clusters, we can use JMP Pro Graph Builder, the local data filter.
	This was similar to the demo that I showed where we can toggle between the different clusters to look specifically and zoom in on different clusters.
	So we can analyze those clusters individually. So we pick up two clusters here on the left, they are mainly ETF belonging into the military manufacturers, and the right
	are technology companies in the mature enterprise market. So both clusters have seen record highs from 2019 to 2020,
	which is also are the best performing months, as were shown in the previous slide.
	However, taking a closer look at a box plot for each month, the cluster on the left has generally smaller interquartile range compared to the clusters on the right.
	This indicates that picking any of the ETF from the cluster on the left will produce more consistent monthly return as compared to the ETFs on the right. Definitely if you are buying between 2019 and 2020.
	So if we drill donw on the four clusters heat maps using JMP Pro Graph Builder we can observe that in 2019 to 2020, the monthly return performs generally better as compared to 2018.
	So we can see they are more high intensity positive return, which are indicated by brighter green color from 2019 to 2020, as compared to 2018.
	So we did some investigation and our preliminary assumption that it has to do with China and United States rising trade war tension, which started in 2017.
	And this has disrupted China's economy and trade markets infrequently. In 2019, Chinese president also called for China's self reliance.
	According to statistics, China's high tech manufacturing make up a larger portion of the country's industrial growth in the first half of 2019 as they are shifting away from dependence on foreign technology and other products.
	So, furthermore, China has also invested heavily in industries such as artificial intelligence and also integrated circuits to achieve their goal toward self reliance.
	So this economy-driven growth in China might have led to this few industries to grow positively in 2019 to 2020 after the announcement and the shift in the country's direction.
	So finally I'll be closing with our conclusion and some of the future work that can be applied with our paper.
	So although both hierarchical clustering and dynamic time warping produce very similar clustering results, there are merits of dynamic time warping
	algorithm as it managed to cluster the ETFs more accurately. Furthermore ETFs monthly return, so in a time series and the dynamic time warping algorithm was to produce better clustering results.
	We also observe correlation of clusters' monthly return performers to China's economy goals, and of course this resulted in the better monthly performers in the sectors.
	So some of the future work that can be done is to explore other correlation of industries with the country's macro economic factors to draw different insights.
	And also, we can apply dynamic time warping to different financial instruments such as stop risers commodity price...commodity prices,
	currencies, or even derivatives. And lastly analysis can be performed, you know, in different time period, such as pre-COVID and post-COVID which, of course, we have not yet to reach that stage.
	Even in different time interval. For now we are using monthly, you can also look at weekly, quarterly, or yearly...monthly....yearly
	returns.
	So I'd like to JMP Pro for giving me the opportunity to showcase my work through this platform.
	I'd like to also thank my co author and mentor for supporting me throughout this journey. Lastly I'll do some reflection on my personal experience as a user when using JMP Pro.
	JMP Pro provides an excellent user experience for us as users, and it's very interactive and dynamic data analytical software.
	It allows for us users to prepare our data preparation so much quicker and with so much more accurate results. So Graph Builder function is also very handy and useful function for the analysis of our results.
	We can draw beautiful insights and interesting information from this Graph Builder and, lastly, statistical analyses, such as hypothesis testing and even other modeling techniques such as clustering or PCA are also very user friendly when we are using JMP Pro to do it.
	So thank you for your time. That sums up my paper.