New - PII Masking 3M Asia-Pacific Release is here

Our Datasets

Open-source and enterprise PII masking datasets for training privacy-preserving NLP models. From foundational research data to production-grade European multilingual collections.

New release · Asia-Pacific

The world's largest open PII-masking dataset just got larger

PII-Masking-3M now spans Europe, the Americas, and Asia-Pacific — 3M+ synthetic examples across 30 languages, built in partnership with VNCyberS.

PII-Masking-3M Asia-Pacific coverage map across South Korea, Japan, China, Taiwan, Thailand, Vietnam, Malaysia, Brunei, the Philippines, and Singapore, with a global coverage inset

Need custom data?

We can generate bespoke PII datasets tailored to your domain, locale requirements, and entity types. Reach out to discuss your needs.