Methodology

The visualisations on this website represent data from 25 million medical research papers across more than 7500 medical journals, from a period of two decades. We used an OpenAlex snapshot from August 2022, keeping only medical journals as identified by SCImago.

We identified author gender using several APIs that demonstrate state-of-the-art performance in validation studies on non-English names, including Gender-API and Genderize. Gender matching was conducted using first name and affiliation country, with 84% of entries matched.

Author affiliations were identified and geocoded using OpenAlex identification of more than 100,000 research producers in the Research Organisation Registry. For affiliations with no match, we geocoded raw affiliation strings using a custom Nominatim API. Geographic locations were matched in 88% of author instances.

Enriched metadata is produced using fine-tuned Natural Language Processing models (BERT-Pubmed) for research classification and entity extraction, as described here.

We recognise that these tools are not perfect, but we hope that they provide a ‘big picture’ view of the global research landscape. In the future, open access to better reporting by journals and publishers of these key metrics will greatly promote the drive for equity in science.