What is LEE(P) data?
In recent decades, one of the most important developments in economic research has been the rise of administrative linked employer-employee data (LEED or LEE data), that is dataset that record not only individual data but link them to the employers of the workers as well. A look at major labor economics conferences shows that most new research is not only empirical but increasingly based on employer-employee panels or other administrative datasets. The use of such data has grown significantly over time and continues to expand.
A key advantage of these datasets is that they provide a comprehensive view of both individual labor market outcomes and employer characteristics. They allow researchers to analyze firm-level dynamics and differences across firms in greater detail. While earlier sources, such as plant-level surveys and firm workforce data, have been available for some time, they were often limited in scope. Similarly, longitudinal household surveys have tracked individuals over time, but they did not offer the ability to follow both individuals and firms simultaneously.
When a dataset follows both workers and firms over multiple years—forming a linked employer-employee panel (LEEP)—it opens up new research opportunities. This structure enables the study of worker mobility, the effects of firm-level changes (such as ownership transitions), and the impact of broad policy shifts, providing deeper insights into labor market dynamics.
Sources and types of LEE data
There are two main types of data sources used to create linked employer-employee (LEE) datasets. Historically, the earliest LEE datasets were based on wage surveys regularly conducted by employers. These surveys were primarily designed for statistical purposes, providing valuable insights for researchers and policymakers. One of the first examples was France’s 1966 Enquête sur la Structure des Salaires (ESS), and similar datasets emerged across Europe in the late 1970s.
Survey-based data collection offers highly detailed information about employees' working conditions. It can include job-related details such as hours worked, overtime, and bonuses, as well as employees' qualifications, skills, and workplace conditions. Surveys also allow for a broader range of questions, helping to capture economic trends more comprehensively.
However, surveys have limitations. They typically cover only a sample of firms or employees, and conducting them is costly. They also only capture employees at the time of data collection, excluding those who are unemployed, self-employed, or inactive. Additionally, participation in surveys has been declining in recent years, making it harder to collect reliable data.
To overcome these challenges, researchers have increasingly turned to administrative data sources. These datasets are constructed by linking records from tax authorities, pension systems, and other public institutions. While primarily collected for government operations, they have also become a valuable resource for economic research. Early examples of this approach appeared in the United States, where researchers used social security records to study labor market trends. Since then, many countries have developed similar databases.
Administrative data offer several advantages. They are more cost-efficient than surveys since they do not require repeated data collection. They also provide a more complete picture of the labor market, covering both employed and unemployed individuals over time. Additionally, because they come from official records, they are generally considered more reliable than self-reported survey responses.
However, administrative datasets also have limitations. They only include information recorded by public institutions, which means some details—such as certain types of bonuses or overtime pay—may be missing or difficult to separate from other income components. They often lack insights into working conditions, family background, and education history unless such data are also collected by public agencies. Furthermore, they do not cover informal employment, which plays a significant role in some economies.
Two important points should be considered. First, with the widespread adoption of digital technologies, firms increasingly use the same data for both wage surveys and administrative reporting, improving the consistency of survey-based datasets. Second, some countries conduct economy-wide surveys where all firms report workforce data. While technically classified as surveys, these datasets (such as Portugal’s Quadros de Pessoal) share key characteristics with administrative records.
Given these factors, a more useful way to classify LEE datasets is not by their data source but by whether they allow tracking individuals over time. Datasets that enable such tracking are known as longitudinal LEE data or LEE panel data (LEEP), while those that do not link individual records over time are considered cross-sectional LEE data. The latter primarily includes survey-based datasets, whereas longitudinal LEE datasets can be based on both surveys and administrative records.
On this website, we focus on longitudinal linked employer-employee (LEE) data, meaning datasets that track individuals over time. This category includes both administrative datasets and, in some countries, large-scale wage surveys that produce comparable longitudinal data, allowing for similar types of analysis.
We do not cover cross-sectional LEE datasets, which do not track individuals over time. While these datasets are often available alongside administrative LEE data, in some cases, they are the only form of LEE data—such as in the Czech Republic and Belgium in Europe, and Japan and South Korea in Asia.
Additionally, various types of administrative data collected over time, even if not directly related to labor markets, are often merged with longitudinal LEE panel (LEEP) datasets. This integration significantly expands research opportunities, for example, by allowing a deeper analysis of wage structures and working conditions through repeated survey waves linked to LEEP data.