Viqus Logo Viqus Logo
Home
Categories
Language Models Generative Imagery Hardware & Chips Business & Funding Ethics & Society Science & Robotics
Resources
AI Glossary Academy CLI Tool Labs
About Contact

Synthetic Personas: A Data Wall Breaker for Japan’s AI

Synthetic Data AI Training Japanese AI NeMo Data Designer Nemotron-Personas-Japan Data Privacy Sovereign AI
February 19, 2026
Viqus Verdict Logo Viqus Verdict Logo 8
Strategic Breakthrough
Media Hype 6/10
Real Impact 8/10

Article Summary

NTT DATA’s recent research presents a significant breakthrough for Japan’s AI ambitions, tackling the pervasive ‘data wall’ that hinders the development of culturally grounded language models. The core challenge lies in the scarcity of task-specific, Japanese-language training data, compounded by privacy regulations like PIPA and Japan’s evolving AI governance guidelines. NTT DATA’s solution leverages synthetic data, specifically the Nemotron-Personas-Japan dataset (6 million culturally-aligned Japanese personas generated via NeMo Data Designer), to overcome this limitation. The results are striking: a 60-point accuracy improvement from 15.3% to 79.3% – achieved without exposing sensitive data. Beyond the immediate gains, the methodology unlocks new efficiencies: Continued Pre-training (CPT) becomes optional, reducing compute costs and accelerating iteration cycles. Crucially, the research highlights the potential for ‘sovereign AI’ – models grounded in local norms and constraints, aligning with Japan’s data governance priorities. Furthermore, NTT DATA is pioneering ‘data spaces,’ collaborative environments for sharing AI-ready synthetic data under shared governance, leveraging federated learning and end-to-end encryption. This isn't simply a technical optimization; it’s a foundational technology enabling a shift toward interoperable, privacy-preserving AI systems. The research directly addresses concerns around regulatory compliance and demonstrates a path to harnessing AI innovation while upholding data sovereignty.

Key Points

  • Synthetic data generated by Nemotron-Personas-Japan achieved a 60-point accuracy improvement (15.3% to 79.3%) in Japanese language models.
  • The methodology allows for optional Continued Pre-training (CPT), reducing compute costs and accelerating model development.
  • NTT DATA is pioneering ‘data spaces,’ enabling collaborative sharing of synthetic data under shared governance frameworks, supporting Japan’s sovereign AI vision.

Why It Matters

This research represents a critical inflection point for Japan’s AI strategy. The data scarcity issue has long been a major bottleneck, hindering the development of truly effective and culturally relevant models. NTT DATA’s findings demonstrate that synthetic data offers a viable pathway to break through this barrier, aligning with Japan’s broader ambitions to leverage AI for economic growth and innovation while upholding stringent data governance requirements. It moves beyond incremental improvements to address a fundamental constraint, and the potential implications for Japanese competitiveness in the global AI landscape are significant. This has long-term ramifications for the development of AI applications across diverse sectors – from customer support and marketing to more complex domains like legal and financial services.

You might also be interested in