Privacy-Preserving Record Linkage: Past, Present and Yet-to-Come
=============



**Tutorial at [29th International Conference on Extending Database Technology (EDBT 2026)](https://edbticdt2026.github.io/)**


**Location**: Tampere, Finland

**Date**: Wednesday 25th of March 2026, 15:15 – 17:45 @ Auditorium A2a

View and/or download slides: 
1. [Tutorial Paper](https://openproceedings.org/2026/conf/edbt/paper-T2.pdf)
2. [Part 1](../_static/presentations/PPRL_1.pdf)
2. [Part 2](../_static/presentations/PPRL_2.pdf)
3. [Part 3](../_static/presentations/PPRL_3.pdf)
  

View and/or download tool-tutorials: [PPRL-Tutorial](https://github.com/AI-team-UoA/PPRL-tutorial)
<!-- [{bdg-success}`Download slides`] -->



# Presenters


::::{grid}
:gutter: 4

:::{grid-item-card} [Lefteris Stetsikas]()
Research Associate at [University of Athens](https://en.uoa.gr)
:::

:::{grid-item-card} [Dimitrios Karapiperis]()
Senior Researcher at [International Hellenic University](https://www.ihu.gr/)
<!-- {bdg-primary} -->
<!-- Assistant Professor at [Tilburg University](https://www.tilburguniversity.edu) -->
<!-- {bdg-primary}`Product Matching expert` -->
:::

:::{grid-item-card} [George Papadakis](https://gpapadis.wordpress.com)
Senior Researcher at [University of Athens](https://en.uoa.gr)
{bdg-primary}``Entity Resolution expert``
:::

:::{grid-item-card} [Manolis Koubarakis](https://cgi.di.uoa.gr/~koubarak/)
Professor at [University of Athens](https://en.uoa.gr)
:::

::::


# Abstract
Privacy-preserving record linkage (PPRL) constitutes a critical technique for integrating sensitive data across organizational boundaries without compromising the privacy and confidentiality of personal information. Over the past two decades, PPRL has evolved from simple hash-based exact matching methods to sophisticated approximate matching techniques that address the complex challenges of scalability and linkage quality.

This tutorial provides a comprehensive overview of PPRL, organizing the relevant works in chronological order. We begin with the fundamental challenges that motivated PPRL, i.e., the legal restrictions on data sharing (e.g., GDPR), the need for approximate matching in the presence of data errors, and scalability requirements for large databases. Next, we focus on the past: we discuss the evolution of PPRL from early secure hash encoding techniques to more advanced privacy-preserving methods (e.g., secure multi-party computation, k-anonymity etc). The present section focuses on current state-of-the-art approaches that address the three main challenges of PPRL: scalability, variety and end-to-end privacy preservation. We then focus on the future, identifying critical open challenges and promising research directions. A hands-on section demonstrates the open-source software we have developed in Python for integrating the main PPRL tools. We also discuss adversary models, privacy vulnerabilities, and evaluation frameworks for assessing scalability, linkage quality, and privacy protection. We conclude with a discussion about the open challenges and promising research directions. Overall, the tutorial takes special care to synthesize theoretical foundations, current methodologies, and future research trajectories, equipping attendees with comprehensive knowledge to advance PPRL research and deploy privacy-preserving solutions in practice.

# Programme

- **Introduction and motivation**, including foundational challenges of PPRL, preliminaries on PPRL, fundamental Assumptions, principles and definitions, practical applications
- **The Past of PPRL**, focusing on encoding, blocking and matching methods
- **The Present of PPRL**, tackling bloom filter based techniques, scalable blocking techniques, filtering and acceleration methods, parallel and distributed PPRL, matching
- **The Future of PPRL**, tackling integrating Meta-blocking techniques,approximate nearest-neighbor search (ANNS), clustering techniques, and the experimental results
- **Evaluation Methods**, tackling adversary models, privacy attacks, privacy measures, assesing linkage quality:
  - Pairs completeness, pairs quality
  - Precision, recall, F-measure
  - Fault-tolerance to data errors
and benchmark datasets (real & synthetic)
- **Hands-on Session: PPRL tools**, we will present the state-of-the-art open-source PPRL tools
  - Anonlink
  - PRIMAT
  - Linkja
  - PPRL Toolkit
  - AMPPERE
  - **privJedAI** (our PPRL python library)
- **Challenges and Final Remarks**, including hardening techniques, best practices for secure deployment, dynamic data and real-time linking, multi-party PPRL scalability, integration with privacy-by-design, deep learning for PPRL
and conclusions.

# References

1. Peter Christen and Vassilios Verykios. 2012. A Tutorial on Privacy-Preserving Record Linkage. In PAKDD.
2. CSIRO’s Data61. 2017. Anonlink Private Record Linkage System. https://github.com/data61/clkhash.
3. Alexandros Karakasidis, Georgia Koloniari, and Vassilios S Verykios. 2015. PRIVATEER: A Private Record Linkage Toolkit. In CAiSe Forum. 197–204.
4. Dimitrios Karapiperis, Aris Gkoulalas-Divanis, and Vassilios S Verykios. 2016. LSHDB: a parallel and distributed engine for record linkage and similarity search. In International Conference on Data Mining Workshops (ICDMW). IEEE,
1–4.
5. Dimitrios Karapiperis, Aris Gkoulalas-Divanis, and Vassilios S Verykios. 2017. Distance-aware encoding of numerical values for privacy-preserving record linkage. In 2017 IEEE 33rd International Conference on Data Engineering (ICDE). IEEE, 135–138.
6. D. Karapiperis and V.S. Verykios. 2015. An LSH-based Blocking Approach with a Homomorphic Matching Technique for Privacy-Preserving Record Linkage. TKDE 27, 4 (2015), 909–921.
7. Ibrahim Lazrig, Toan C Ong, Indrajit Ray, Indrakshi Ray, Xiaoqian Jiang, and Jaideep Vaidya. 2018. Privacy preserving probabilistic record linkage without trusted third party. In Annual Conference on Privacy, Security and Trust (PST). IEEE, 1–10.
8. Y. Malkov and D. Yashunin. 2018. Efficient and robust approximate nearest neighbor search using Hierarchical Navigable Small World graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 4 (2018), 824–
836.
9. Thilina Ranbaduge, Peter Christen, and Dinusha Vatsalan. 2014. Tree Based Scalable Indexing for Multi-Party Privacy-Preserving Record Linkage. In Australasian Data Mining. Brisbane.
10. Thilina Ranbaduge, Dinusha Vatsalan, and Peter Christen. 2020. Secure Multiparty Summation Protocols: Are They Secure Enough Under Collusion? Transactions on Data Privacy 13, 1 (2020), 25–60.
11. R. Schnell, T. Bachteler, and J. Reiher. 2009. Privacy-preserving record linkage using Bloom filters. BMC Med Inform Decision Making 9, 1 (2009).
12. Dinusha Vatsalan, Peter Christen, and Erhard Rahm. 2020. Incremental clustering techniques for multi-party Privacy-Preserving Record Linkage. Data & Knowledge Engineering (2020), 101809.
13. Anushka Vidanage, Peter Christen, Thilina Ranbaduge, and Rainer Schnell. 2023. A Vulnerability Assessment Framework for Privacy-preserving Record Linkage. ACM Transactions on Privacy and Security 26, 3 (2023).
14. Wanli Xue, Dinusha Vatsalan, Wen Hu, and Aruna Seneviratne. 2020. Sequence Data Matching and Beyond: New Privacy-Preserving Primitives Based on Bloom Filters. IEEE Transactions on Information Forensics and Security 15
(2020), 2973–2987.
15. Yixiang Yao, Tanmay Ghai, Srivatsan Ravi, and Pedro Szekely. 2021. AMPPERE: A Universal Abstract Machine for Privacy-Preserving Entity Resolution Evaluation. In ACM International Conference on Information and Knowledge
Management. 2394–2403.


# Cite Us
```
@inproceedings{Stetsikas:EDBT26,
  author    = {Stetsikas, Lefteris and Papadakis, George and Karapiperis, Dimitrios and Koubarakis, Manolis},
  title     = {Privacy-Preserving Record Linkage: Past, Present and Yet-to-Come},
  booktitle = {Proceedings of the 29th International Conference on Extending Database Technology (EDBT)},
  year      = {2026},
  pages     = {781--784},
  doi       = {10.48786/edbt.2026.79},
  url       = {https://openproceedings.org/2026/conf/edbt/paper-T2.pdf},
  isbn      = {978-3-98318-104-9},
  issn      = {2367-2005},
  publisher = {OpenProceedings.org},
  address   = {Tampere, Finland}
}
```


# Acknowledgements



::::{grid} 2
:gutter: 4
:align: center

:::{grid-item}
:align: center
```{image} ../imgs/Full_logo_white_vertical.png
:width: 180px
:target: https://recitals-project.eu
:alt: Recitals Logo
```

:::

:::{grid-item}
:align: center
```{image} https://upload.wikimedia.org/wikipedia/commons/thumb/b/b7/Flag_of_Europe.svg/1200px-Flag_of_Europe.svg.png
:width: 180px
:target: https://ec.europa.eu/info/index_en
:alt: EU Flag
```
:::
::::

<div align="center">
  <br>
 <!-- <a href="https://stelar-project.eu">
  <img align="center" src="https://stelar-project.eu/wp-content/uploads/2022/08/Logo-Stelar-1-f.png" width=180/>
 </a> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -->
 <!-- <a href="https://ec.europa.eu/info/index_en">
  <img align="left" src="https://upload.wikimedia.org/wikipedia/commons/thumb/b/b7/Flag_of_Europe.svg/1200px-Flag_of_Europe.svg.png" width=140/>
 </a> -->
 <br><br>
 This work was supported by the <a href="https://research-and-innovation.ec.europa.eu/funding/funding-opportunities/funding-programmes-and-open-calls/horizon-europe_en">Horizon Europe</a> project  <a href="https://recitals-project.eu">RECITALS</a> (Grant No.101168490.).<br>
</div>
<br>
<br>