Russo, Pedro; Prada, Cesar (2016), Abstracts, Dataset, https://doi.org/10.15146/R3TP4D


The large amount of available literature in academic research imposes a considerable challenge for researchers aiming to establish a thorough understanding of a given subject. In this project, we collected abstracts from studies pertaining Juvenile Idiopathic Arthritis in the PubMed database and aimed to establish a method for finding clusters of closely related studies.


This dataset was collected using the esearch NCBI API. It was processed by selecting relevant fields (title, authors and abstract) and inserted into a large text file (.txt) using a custom Bash script.

Usage Notes

The scripts used to obtain the dataset, separate the file into separate files, process and analyze the results can be obtained at: https://github.com/pedrostr... .