{"id":3607,"date":"2019-07-21T02:23:02","date_gmt":"2019-07-21T02:23:02","guid":{"rendered":"https:\/\/notebooks.dataone.org\/?p=3607"},"modified":"2019-07-21T14:35:03","modified_gmt":"2019-07-21T14:35:03","slug":"the-school-of-hard-knocks","status":"publish","type":"post","link":"https:\/\/notebooks.dataone.org\/networked-lod\/the-school-of-hard-knocks\/","title":{"rendered":"Week 7: The School of Hard Knocks"},"content":{"rendered":"\n

Sometimes you just have to learn things the hard way.<\/p>\n\n\n\n

On Monday at 2:37pm I started to run my spiffy Make-a-Network code on the big table of all the datasets currently stored in the DataONE archives. In general, this code takes as input a table of unique dataset-person pairs, where the “person” could be anybody – a creator, a contributor, or a user who downloads a dataset. The Big Table of all the DataONE datasets contains 1,295,315 rows of unique dataset-person combinations. …and already in the ether I hear the gnashing of teeth, as those familiar with R and big data worry tremendously about how this will turn out. I was young. I was naive.<\/p>\n\n\n\n

My spiffy Make-a-Network code turns a table of unique dataset-person pairs into an edge list by identifying all the unique persons in the table, then connecting pairwise all the datasets associated with that unique person. This means that for any given person, if n<\/em> datasets are associated with that person, then my code creates n<\/em> choose 2 dataset pairs, also known as edges.<\/p>\n\n\n\n

n<\/em> choose k<\/em> is the number of unordered k<\/em>-tuples that can be made without replacement from a given set of n<\/em> objects. 52 choose 5 is the total number of 5-card poker hands that can be made from a standard deck of cards. If k<\/em> = 2, then n<\/em> choose k<\/em> is the number of pairs that can be made from a set of n<\/em> objects. And for those of you not into combinatorics, the “choose” function in math makes numbers grow really fast. I’ll spare you the mathematical details (you can find them in the Wikipedia entry<\/a>), but here are some examples:<\/p>\n\n\n\n