{"id":3577,"date":"2019-07-09T20:49:05","date_gmt":"2019-07-09T20:49:05","guid":{"rendered":"https:\/\/notebooks.dataone.org\/?p=3577"},"modified":"2019-07-09T20:49:33","modified_gmt":"2019-07-09T20:49:33","slug":"week-7-sql-database-creation","status":"publish","type":"post","link":"https:\/\/notebooks.dataone.org\/prov-self\/week-7-sql-database-creation\/","title":{"rendered":"Week 7 SQL Database creation"},"content":{"rendered":"\n

Happy 4th of July, everyone <\/p>\n\n\n\n

Main tasks for me in this week is to build an SQLite for future analysis, dive deeper and try to figure out how Galaxy Group collect information and added tags, additionally, continue reading the papers related to \u201cprovenance\u201d, \u201creproducibility\u201d and \u201cworkflow\u201d in the corpus. <\/p>\n\n\n\n

Where are these tags come from?<\/strong> <\/h3>\n\n\n\n

To ensure the reproducibility and also the legality to analyze these papers, we email the group member of the Galaxy Project. Hope we can get the answer soon.
As for the tags automatically generated by Zotero, some interesting points have been found. By comparing the information of one paper in Galaxy Collection and personal Zotero Group, the big difference except who added this paper existed in library catalog and tags. <\/p>\n\n\n\n

\"\"<\/figure>\n\n\n\n
\"\"<\/figure>\n\n\n\n

Furthermore, by going to the website where this paper published, we confirmed that automatically generated tags come from the keywords provided by the publisher. One thing attracts our attention is that these tags lost their provenance, which could be our further research topic. For example, the tags come from IEEE Xplore actually should be divided into four categories including IEEE Keywords, INSPEC: Controlled Indexing, INSPEC: Non-Controlled Indexing,  and Author Keywords.  <\/p>\n\n\n\n

SQLite for Galaxy Zotero Corpus<\/strong><\/h3>\n\n\n\n

SQL is always a good language to analyze data and ER-diagram could present the data much clearer. Through several transformations, the final database comes out.
Database V1.0 contains three tables named paper_table, link_table, tag_table. For paper_table, it contains six attributes including paper_id, title, author, year, abstract and publication. paper_id is the primary key. In tag_table, three attributes are included tag_id, tag_name, and tag_type. To connect these two tables, link_table is created which contains paper_id and tag_id.<\/p>\n\n\n\n

\"\"<\/figure>\n\n\n\n

Paper reading<\/strong><\/h3>\n\n\n\n

The number of papers under these chosen tags(‘+Methods’, ‘Reproducibility’)is:5 <\/p>\n\n\n\n

\"\"<\/figure>\n\n\n\n

The number of papers under these chosen tags(‘Reproducibility’, ‘Workflow’)is:7\n<\/p>\n\n\n\n

\"\"<\/figure>\n\n\n\n

Reading  for these 12 papers will be finished next week. \n<\/p>\n","protected":false},"excerpt":{"rendered":"

Happy 4th of July, everyone Main tasks for me in this week is to build an SQLite for future analysis, dive deeper and try to figure out how Galaxy Group collect information and added tags, additionally, continue reading the papers related to \u201cprovenance\u201d, \u201creproducibility\u201d and \u201cworkflow\u201d in the corpus. Where Continue reading Week 7 SQL Database creation<\/span>→<\/span><\/a><\/p>\n","protected":false},"author":124,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[391],"tags":[],"_links":{"self":[{"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/posts\/3577"}],"collection":[{"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/users\/124"}],"replies":[{"embeddable":true,"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/comments?post=3577"}],"version-history":[{"count":2,"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/posts\/3577\/revisions"}],"predecessor-version":[{"id":3584,"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/posts\/3577\/revisions\/3584"}],"wp:attachment":[{"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/media?parent=3577"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/categories?post=3577"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/notebooks.dataone.org\/wp-json\/wp\/v2\/tags?post=3577"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}