The table below summarize the 33 countries known to have longitudinal linked employer-employee (LEE) datasets. Our data collection process was based on input from co-authors and experts familiar with these datasets. When direct contacts were unavailable, we relied on publicly available comparative studies, research papers, and professional networks.
We include all identified datasets, even those with highly restricted access. In total, we found that 33 countries have large-scale LEE data. For about half of these, the information has been verified by researchers with direct access to the data or by the data owners themselves. The tables highlight key features of these datasets, including their main characteristics.
In case you find any mistakes in the tables, please reach out to István Boza.
The scope of LEE datasets is shaped by two key factors:
Population coverage – Some datasets focus on specific sectors, such as the private sector, while others include all formal employment in a country or region. A common limitation is the exclusion of public sector employees. Additionally, administrative datasets primarily capture formal employment, which may represent only part of the total labor force—an important consideration in certain Latin American countries. Some datasets go beyond employment records, incorporating information on unemployed individuals and even linking to administrative data on health expenditures and social policies. In such cases, these datasets are classified as all individuals in the tables.
Data privacy and sampling – To protect confidentiality and manage large data volumes, LEE datasets often use sampling before making them available to researchers. Sampling can be applied at different levels, such as individuals, firms, or employment spells, with common sample sizes including 1%, 5%, 20%, and 50%. While this allows for secure and manageable data access, it can limit research possibilities in certain fields, such as network analysis.
The frequency at which employer-employee relationships are recorded varies across datasets. Some datasets provide annual snapshots (recording employment on a specific date), while others offer monthly or even continuous tracking. This level of detail affects how researchers can study short-term trends and labor market fluctuations. Additionally, wage data may be recorded for a different period than employment status, which is important to consider when analyzing earnings trends.
The table above does not include details on data access, as policies vary significantly across countries. Even within a single country, access conditions may differ depending on the institution managing the data. However, the appendix provides an overview of access restrictions. In many cases, data access is limited to researchers affiliated with domestic institutions or those meeting nationality requirements. Some countries require researchers to work on-site in secure environments, with strict rules on data use and disclosure. Others allow remote access under strict security measures, such as multi-factor authentication.
The table below provides an overview of the availability of specific topics across various longitudinal linked employer-employee (LEE) datasets. While not all datasets include every topic, their presence offers valuable research opportunities.
Currently, the scope of this table is limited to the countries where we were able to directly contact researchers. More detailed, country-specific responses can be found in the last table. We are actively working to expand this information and welcome contributions from experts in countries we have not yet covered in detail.
At present, detailed data are available for 15 countries, while information for 18 others is still being collected. The most up-to-date version of these tables is available on our website, and we encourage feedback and additional input to enhance the dataset coverage.