Week Five – DataONE Notebooks

Last week I got the initial visualization, but the integrated data seemed to have some mistakes. One large cluster of four webinar followers moved away from the central big cluster. It means that none of followers in that webinar cluster followed other SNS accounts. However, it was confirmed that some users in the webinar cluster should follow other SNS accounts, and they didn’t list in those accounts. Therefore, my priority for this week was to find the reason of what happended, generate new data, re-integrate input data, and generate a new visualization.

After re-looking at the data several times, I found that for subjects in the webinar data their names are converse. That means for those subjects I used “Last Name-First Name” as input. That is why those subjects were not connected to other SNS networks. Meanwhile, I didn’t use emails as identifiers because some input data doesn’t have that information available. However, I should have include them which would make the result more accurate. Therefore, I wrote codes to adjust the input names to right order, and included email as identifiers when information available. After that, I re-integrated the data, fed the data into Gephi, and found all clusters connecting with each other. I adjusted the label size, and checked some metric information of the new graph. Now the visualization looks more reasonable.

Also, it would be very nice if we can have a live version of visualization. Unfortunately, I found the only ways to have animation Gephi graph is to use Gephi to open the file or to use screencast. Therefore, I plan to use screencast to show the network connection after I get the final version of visualization.

Another task is to generate most active subjects. I calculated the number of SNS accounts followed by each user, displayed the descriptive statistics and printed the results. Though for many subjects, they only follow one SNS account, there are still some subjects following more than one SNS account. However, it will take an effort to extract all their information. Many of that information have to be typed by handed. So I plan to discuss it with my mentor to decide what kind of information we want to extract and how many subjects we want to have.

Leave a Reply Cancel reply