This week I explored different software for conduction network analyses. When I started wondering around, I found there were many, many tools available online. So the question was not only about picking one useful software, but was about finding efficient software, which is proper for our data set.
It is easy to get lost when you have plenty of choices; especially many of them are not suitable for our input or cannot export desirable output. So the first step for me was to find the structure of our data and think about output. Because most sources of our data containing less than 3 features, we know the data is not high-dimensional. The number of our nodes is less than 30,000 and therefore we don’t need tool optimized with parallel computing (however, we can consider use some of those functions later). Also, the primary goal for our analysis is to cluster the data so visualization is our priority. Applications satisfy those requirements are many. Gephi and UCINET are often used for social network analysis. Gephi is very powerful tool for visualization and UCINET is a nice tool for medium-size SNS analysis. Also another traditional statistics tool, R, also has package avaliable supporting SNS analysis and visualization.
Now I am working with tools on some small sample and see the results. My plan for now: for our data collection, we need to use API (each SNS has its own API) to extract data from social network account. After collecting the data, we will need to clean and integrate the data into same format. Then for each SNS analysis, we will use Gephi/UCINET/R (or other tools, if we find anything with better functionality) to do the analysis and visualize the data. From cross-data analysis, I intend to integrate data from different sources and use application to generate results and visualization. I will keep my plan updated and revise it as I proceed.