NPM Datasets for Open Science

Datasets collected from the NPM package registry of complete dependency and code data. Designed to support easy data analysis of the NPM ecosystem

Quick Start

  1. Download the most recent PostgreSQL database snapshot from the downloads page.

  2. Import the snapshot into your local PostgreSQL database.

  3. Have fun writing queries!

See the setup and documentation pages for full details of how to import and use the data.

Citing Us

If you make use of this work, please cite us as follows:

@inproceedings{npmdata,
author = {Pinckney, Donald and Cassano, Federico and Guha, Arjun and Bell, Jonathan},
title = {A Large Scale Analysis of Semantic Versioning in NPM},
series = {MSR},
booktitle = {Proceedings of the 20th International Conference on Mining Software Repositories},
year = {2023},
note = {Acceptance rate: 36\%.},
url = {https://www.jonbell.net/preprint/msr23-npm.pdf},
artifact = {https://doi.org/10.5281/zenodo.7552551}
}

Contacting Us

If you run into problems with using the datasets, have any other questions, or generally want to connect, we'd be happy to hear from you! Please feel free to email at donald_pinckney@icloud.com or submit an issue on GitHub.