| Included with this assignment is an Excel spreadsheet that contains data with two dimension values. |
| The purpose of this assignment is to demonstrate steps performed in a K-Means Cluster analysis. |
| Review the “k-MEANS CLUSTERING ALGORITHM” section in Chapter 4 of the Sharda et. al. textbook for additional background. |
| Use Excel to perform the following data analysis. |
| 1. Plot the data on a scatter plot. |
| 2. Determine the ideal number of clusters. |
| 3. Choose random center points (centroids) for each cluster. (Note: Each student will select a different random set of centroids.) |
| 4. Using a standard distance formula measure the distance from each data point to each center point. |
| 5. Assign each data point to an initial cluster region based on closeness. |
| 6. For each cluster calculate new center points. |
| 7. Repeat steps 4 through 6. |
| You will use Excel to help with calculations, but only standard functions should be used (i.e. don’t use a plug-in to perform the analysis for you.) You need to show your work doing this analysis the long way. If you were to repeat steps 4 through 6, what will likely happen with the cluster centroids? The rubric for this assignment can be viewed when clicking on the assignment link. |
| Here is a link to an example spreadsheet using a smaller data set. It contains two tabs. The first tab is the raw data. The second tab contains the analysis that was performed. Make sure that you use a different starting center points from the example. |
Categories:
