This is week is full of coding and visualization.
Eventually, I collected all SNS data. To see the overlap between followers from different SNS account, I decided to use names as identifier (because email address is not available). I cleaned data collected from different SNS account, split identifiers, integrated all the data, and generated proper files for Gephi.
Cleaning and Integration took most time of the process. Since our files have different formats of data, I needed to code in different ways for each format. After integration, each row represents one subject with features, and columns listย different SNS accounts. The value of feature is 0 or 1, meaning whether the subject follows the corresponding SNS account. For visualization purpose, all data are separated by semicolon and saved as .csv format.
After integrating all data in one file, I generated Nodes and Edges files for initial visualization. To find the center of all the nodes, I colored and weighted nodes and edges by their degree. Since we have more than 4000 nodes and edges, it is hard to see any pattern in the visualization. Therefore, I tried different ways of clustering. Some of them gave much clearer view of the linkage between data.
Now the problem is to improve and evaluate the visualization. Although I used cluster algorithms to organize the data points, the number of data points is still large and we could only see few patterns in the visualization. More detailed analysis is needed. I need to improve the visualization to give a clearer view of data and find strong patterns in the data.