Week Seven – DataONE Notebooks

After generating all information together in one file, I did cross-reference by hand to match the same person with similar names. It took a lot time because I need to go back to check the original data and search online to confirm that the similar names represent the same person. For example, someone use abbreviation or nickname to register mailing list/SNS account; some people sometimes use their middle name/capital letter of their middle name and sometimes don’t; some letters in foreign language appear in different forms in the data; some people sometimes use symbols in their names and sometimes not, etc. I did cross-reference first by or their last names and first names, and then did the inverse order. I removed duplicates and combined the information when I am sure the two/three names represent the same person. I confirmed it by available information such as photo, working position, geographic information, resume, etc. For some special case, for example, same person uses different email/SNS account following us, I explained what I found in the comments.

When I got the clean version of data, I combined similar information from different columns together. As my mentor suggested, I set flag to show the conflicts and saved conflicted information in the comments. Next week I expect to have more work and geographic information coming in; and I will re-integration data with new information.

Because the demographic information will not affect the visualization of SNS/mailing list, I re-built the visualization after I got the cleaned data. Also, my mentor helped me find useful information about export live-version of visualization. So I exported such live-version of visualization, which can be opened locally. It looks really nice. Next week (or later) I will try to put it on webserver if needed.

Leave a Reply Cancel reply