Public Information for Job 228

Created By: demo
Created At: Tue, 31 Mar 2020 12:29:12 -0500

Input Dataset: 2020 Mar/COVID-19

Last Submitted At: Tue, 31 Mar 2020 12:29:12 -0500
Last Finished At: Tue, 31 Mar 2020 12:30:59 -0500 (1m 47s)

Source Code

# What 100 words appear in the most number of paper bodies? p: Paper = input; o: output top(100) of string weight int; stopwords: set of string = stop_words(); # store all words used in this paper's body bodyWords: set of string; foreach (i: int; def(p.body_text[i])) { foreach (m: int; def(p.body_text[i].body[m])) { paragraphWords: array of string = splitall(lowercase(p.body_text[i].body[m].text), " "); foreach (j: int; !contains(stopwords, paragraphWords[j])) add(bodyWords, paragraphWords[j]); } } words := values(bodyWords); foreach (k: int; def(words[k])) o << words[k] weight 1;

Output

Job Output Size: 2.13k

o[] = 1, 25916.0
o[] = 2, 25284.0
o[] = however,, 24724.0
o[] = used, 24644.0
o[] = one, 24403.0
o[] = may, 24099.0
o[] = two, 23473.0
o[] = 3, 23170.0
o[] = results, 22348.0
o[] = study, 22137.0
o[] = different, 22094.0
o[] = including, 21792.0
o[] = high, 21749.0
o[] = could, 21669.0
o[] = first, 21567.0
o[] = found, 21424.0
o[] = number, 21290.0
o[] = studies, 21132.0
o[] = virus, 21101.0
o[] = well, 21053.0
o[] = 5, 21051.0
o[] = data, 20978.0
o[] = reported, 20902.0
o[] = similar, 20873.0
o[] = 4, 20832.0
o[] = although, 20728.0
o[] = infection, 20427.0
o[] = important, 20377.0
o[] = 10, 20335.0
o[] = three, 20216.0
o[] = based, 19837.0
o[] = several, 19812.0
o[] = within, 19766.0
o[] = disease, 19755.0
o[] = )., 19749.0
o[] = human, 19699.0
o[] = control, 19616.0
o[] = compared, 19387.0
o[] = due, 19290.0
o[] = time, 19191.0
o[] = analysis, 19184.0
o[] = use, 19160.0
o[] = specific, 19102.0
o[] = many, 18849.0
o[] = described, 18844.0
o[] = viral, 18831.0
o[] = associated, 18735.0
o[] = significant, 18474.0
o[] = observed, 18328.0
o[] = following, 18296.0
o[] = present, 18284.0
o[] = respiratory, 18280.0
o[] = potential, 18188.0
o[] = higher, 17976.0
o[] = 6, 17927.0
o[] = showed, 17891.0
o[] = among, 17872.0
o[] = new, 17768.0
o[] = (, 17652.0
o[] = performed, 17647.0
o[] = increased, 17515.0
o[] = without, 17328.0
o[] = less, 17284.0
o[] = low, 17117.0
o[] = known, 16952.0
o[] = identified, 16913.0
o[] = system, 16902.0
o[] = table, 16901.0
o[] = possible, 16896.0
o[] = severe, 16879.0
o[] = since, 16759.0
o[] = total, 16682.0
o[] = role, 16606.0
o[] = would, 16581.0
o[] = cell, 16563.0
o[] = clinical, 16470.0
o[] = acute, 16468.0
o[] = obtained, 16421.0
o[] = presence, 16223.0
o[] = major, 16220.0
o[] = lower, 16134.0
o[] = considered, 16069.0
o[] = infected, 15974.0
o[] = increase, 15898.0
o[] = either, 15888.0
o[] = even, 15881.0
o[] = recent, 15852.0
o[] = cells, 15846.0
o[] = positive, 15768.0
o[] = available, 15660.0
o[] = small, 15653.0
o[] = likely, 15621.0
o[] = infectious, 15587.0
o[] = another, 15582.0
o[] = highly, 15456.0
o[] = ), 15448.0
o[] = common, 15362.0
o[] = level, 15350.0
o[] = previous, 15307.0
o[] = viruses, 15289.0

Compilation

Status: Finished
Started: Tue, 31 Mar 2020 12:29:13 -0500
Finished: Tue, 31 Mar 2020 12:29:23 -0500 (10s)

Execution

Status: Finished
Started: Tue, 31 Mar 2020 12:29:28 -0500
Finished: Tue, 31 Mar 2020 12:30:59 -0500 (1m 31s)