PKICryptographieRechercheAuditPost-Quantique

0.5% in 2012, 0.003% in 2026: what 31 million RSA moduli told us

ZetaCert Research

Équipe Research de ZetaCert. Audits cryptographiques à grande échelle de la PKI publique — méthodologie reproductible, résultats négatifs publiés, divulgation coordonnée.

May 11, 2026

12 min

0.5% in 2012, 0.003% in 2026: what 31 million RSA moduli told us

In 2012, Heninger et al. published Mining Your Ps and Qs: 0.5% of visible RSA moduli in the Web PKI were factorable by a plain batch GCD. Fourteen years later, we replayed the exercise — broader, deeper, with four detectors — against 44.6 million certificates. The result on scanned RSA moduli: 836 weak keys, or 0.003%. A ~150× improvement on the core crypto.

And yet, 891 vulnerable keys are still sitting in the active TLS scan today. The modern commercial PKI is solid. The long tail is not. Here's how we measured both.

CryptoAudit public dashboard — 58.9M certificates indexed in total, of which 31.1M are RSA-keyed and 27.8M ECC-keyed; 891 weak keys detected live

Between February and April 2026, our Research team ran a cryptanalytic audit of the public PKI. The objective: determine whether the classical cryptanalytic attacks — broken RNG, ECDSA nonce bias, CA mis-issuance, modulus generation defects — still apply to the certificates that modern commercial authorities issue.

Four months later, the verdict comes in two parts. On the portion visible via Certificate Transparency — all those commercial CAs we know: Let's Encrypt, Sectigo, DigiCert, Google Trust Services, Cloudflare, Microsoft Azure, Amazon Trust… — the four detectors return zero positive findings. The crypto is clean. Heninger 2012 → fixed.

On the long tail — embedded devices, appliance management planes, residential CPE, industrial controllers — the story is different. But that's the topic of a full white paper under coordinated embargo until July 25, 2026. This post is about the methodology: what we looked at, how, and why.

The corpus: 44.6 million X.509 certificates

The corpus was built from two structurally distinct sources.

Certificate Transparency — harvesting from ten public logs: Google Xenon and Argon, Cloudflare Nimbus, DigiCert Wyvern and Sphinx, Sectigo Elephant and Tiger, TrustAsia 2026a and 2026b. 33.6 million certificates, exhaustive on the modern commercial estate. CT is the cleanest source for what we're looking for: any certificate signed by a CA in the Web trust root must be logged, so we get a complete and signed view.

Polite TLS probing — 24 IoT/SCADA-relevant ports across the public IPv4 address space, at a rate of 10 to 20 connections per second per port. 11 million additional certificates, targeting the long tail: embedded devices, appliance management planes, industrial controllers, residential CPE. This population is never signed by a public CA, so invisible to CT. Without the active scan, we'd miss half the story.

Deduplicated total for the cryptanalytic analysis: 24.9 million unique RSA moduli, approximately 19.8 million distinct ECC public points. Scanning is continuous: the public dashboard at audit.zetacert.com indexes in real time, and the corpus keeps growing between campaigns.

Deployed key-size distribution vs CNSA 2.0 post-quantum guidance

At the analysis freeze, the RSA distribution is dominated by RSA-2048 (86%), with a marginal share of RSA-3072 (0.7%) and 13% RSA-4096 — in line with CA/B Forum BR, with a tentative migration toward CNSA 2.0 sizes. On the ECC side, 89% in P-256, 11% in P-384, and a handful in P-521 (44 certificates). And then the outliers: 836 RSA moduli below or strictly at 1024 bits — 0.003% of the RSA corpus. Heninger 2012's 0.5% divided by ~170.

Ethics note on active TLS probing

The scan limits itself to reading the certificate publicly presented on any TLS connection — the same information any web client retrieves when visiting a site. No authentication attempts, no exploitation, no scraping behind auth. The 10–20 conn/s/port rate is deliberately low to avoid burdening scanned devices.

Methodologically, this matches the approach of Censys, Shodan and Rapid7 Project Sonar — academic and industrial tools publicly used for the past decade.

Four cryptanalytic detectors

Each detector targets a specific class of cryptographic failure. None is an in-house invention: all are canonical constructions from the academic literature. The goal is reproducibility — anyone should be able to replay the pipeline against the same corpus and obtain the same result.

Detector 1 — Batch GCD (Bernstein 2004 / Heninger 2012)

The principle is simple: if two RSA moduli n₁ = p × q₁ and n₂ = p × q₂ share a prime factor p, then gcd(n₁, n₂) = p and both keys are factored.

Batch GCD is the algorithm that computes this operation in O(N log² N) across the entire corpus — not in O(N²) as one might naively fear. Bernstein 2004 published the construction; Heninger et al. deployed it at scale in Mining Your Ps and Qs in 2012 and found that 0.5% of RSA moduli visible in the Web PKI were factorable this way. The root pattern: SOHO routers and embedded devices that seeded their RNG from low-entropy sources at first boot, producing statistically non-independent keys.

2026 run: 6.6 million deduplicated RSA moduli, ~7 hours on commodity hardware, 0 shared factors on the CT-visible portion.

Detector 2 — Fermat close-prime sweep

The Fermat algorithm efficiently factors RSA moduli n = p × q when |p − q| is small enough. It targets a different class of defect than batch GCD: not an RNG that shares bits between independent keys, but a biased RNG that draws p and q from the same restricted pool — producing factors close in absolute value but not shared.

The most salient instance of this failure mode in the literature: the Pearl-2019 NCC Group report, which factored several hundred Yubico keys sold between 2015 and 2018 by exactly this mechanism — a firmware RNG that drew p and q too close together.

2026 run: at depth k=100 against 25.8 million moduli, ~10 minutes (40,500 moduli/second on 12 workers). Then an additional sweep at depth k=2,000,000 (twenty thousand times deeper) on the actively-scanned IoT sub-corpus — the highest-risk population because embedded firmware has historically had the worst RNGs. 0 close-prime pairs in both passes.

Detector 3 — Hidden Number Problem / lattice reduction (ECDSA)

This detector is more technical. In short: if an ECDSA authority signs with a biased nonce — typically, high-order bits always at zero due to a limited RNG — then the signing private key is recoverable by lattice reduction from enough signatures.

This is the Boneh-Venkatesan / Nguyen-Shparlinski construction, formalised in the 1990s and deployed several times since:

Breitner & Heninger 2019 against the Bitcoin chain: recovery of dozens of ECDSA P-256 keys that had signed transactions;
Feng et al. 2026 against Alipay signatures: recovery on nonce bias;
Several historical Web PKI deployments with ECDSA implementations in non-standard libraries.

2026 run in two passes:

Against CT-extracted signatures: 23 groups (issuer, curve), 61 lattice attempts at biases ∈ {2, 4, 8} → 0 recoveries.
Against live signatures extracted from TLS handshakes against populations selected for their volume: 1,436 signatures, same biases → 0 recoveries.

The CAs tested in the CT pass included intermediaries from Cloudflare TLS Issuing, Google Trust Services, Microsoft Azure ECC TLS, Amazon Trust, ZeroSSL ECC, Sectigo Public Server Auth, TrustAsia LiteSSL, Certainly and several others. All clean. Fully consistent with CAs operating behind FIPS-validated HSMs, with deterministic (RFC 6979) or hardware nonce generation.

Detector 4 — CA mis-issuance

Strict regex search for publicly-issued certificates whose subject Common Name points to a private namespace: RFC 1918 (10.x.x.x, 172.16-31.x.x, 192.168.x.x), .local, .internal, .lan, .home, .corp, .intranet, localhost. Designed to surface violations of the CA/B Forum Baseline Requirements §7.1.4.2.1 — a public CA must never issue a certificate for a non-publicly-routable namespace.

2026 run: 44.6 million certificates, 2 false positives (AWS-internal artifacts: ip-10-x-y-z.compute.internal, deposited in CT for monitoring purposes), 0 genuine mis-issuance.

The negative result counts as much as the positive

Detector	Population scanned	Outcome
Batch GCD (Bernstein 2004)	6.6M unique RSA moduli	0 shared factors
Fermat (k=100)	25.8M unique moduli	0 close-prime pairs
Fermat (k=2M, IoT sub-corpus)	22,000 / 30,050	0 close-prime pairs
HNP / lattice (CT sigs)	23 groups, 61 attempts	0 recoveries
HNP / lattice (TLS handshake sigs)	1,436 signatures	0 recoveries
CA mis-issuance	44.6M certificates	2 false positives, 0 genuine

None of these results are boring. They mean something precise and strong: the CT-visible portion of the modern commercial estate correctly implements key generation, nonce generation, and template discipline.

Aggregated corpus findings: 1,158 vulnerable keys broken down across Key Size, Public Exponent, ec_key_reuse and y2038_doomed_validity

But on the long tail, the picture changes. The live dashboard currently aggregates 1,158 vulnerable keys across the full corpus, split into four families: weak key sizes (837), non-standard public exponents (230), ECDSA key reuse (90), and validity hard-coded to the 32-bit time limit (1). None of these families involve modern commercial CAs — they all live in firmware, in factory templates, in undocumented operator architectures.

This is consistent with what we know about modern commercial CAs: generation in FIPS-validated HSMs, deterministic ECDSA nonces (RFC 6979) or hardware-sourced, template reviews via CAB Forum Working Groups, annual WebTrust audits. The 0.5% defect that Heninger measured in 2012 has been fixed.

This is the visible tier — the one that gets audited, and re-audited. Reassuring.

Findings distribution by vendor category — consumer router SOHO, enterprise networking, NAS, firewall, residential gateway, industrial RF / OT, surveillance CCTV...

But the cryptographic risk hasn't vanished. It has migrated to layers that classical cryptanalytic detectors don't see — and that TLS cipher suite scans don't see either. The public dashboard exposes the distribution by vendor category: 287 live findings on consumer SOHO routers, 30 on residential gateways, 16 on enterprise firewalls, and several smaller categories down to industrial CPE and network CCTV. No vendor name is published yet — that's the subject of the full white paper on July 25.

The methodology lesson — an HNP false positive that taught us humility

This section is defensive. It reveals an internal bug in our pipeline. Transparency here is more useful than silence.

Our first run of the HNP detector produced absurd results: claimed recovery of signing keys for CAs known to operate behind FIPS-validated HSMs. Cloudflare TLS Issuing ECC CA 3, several Google Trust Services intermediaries (WE1, WE3, WE5), two Microsoft Azure ECC TLS CAs, ZeroSSL ECC, cPanel ECC, TrustAsia LiteSSL, Certainly Intermediate E1, nazwaSSL DV TLS G2 E29.

Obviously wrong. No chance that all those CAs were simultaneously broken in the same way.

Investigation traced back to a sanity-check expression at line 132 of research/ecdsa_nonce.py:

if k_candidate.bit_length() > n.bit_length() - bias + 2:
    break

For curve P-256 (n.bit_length() == 256), at bias 2, the condition evaluates to > 256 — unreachable for any valid k < n. The check therefore never broke the loop, and the lattice routine reported every "short" candidate as a recovered key.

Fix: strict bound n.bit_length() - bias and rejection of x_candidate ∈ {0, n−1}. Re-run → clean negative result.

20 false-positive rows cleaned from the database.

To anyone running HNP-style detectors at scale: verify your sanity-check bounds against a known-clean CA population before drawing conclusions from a positive hit. At least one previously-published "weak-CA HNP" finding may have used template lattice code with a similar latent bug. This bug is easier to introduce than to detect.

This discipline — verifying the detector against a known-negative population before believing its positives — costs half a day. It saves you from publishing imaginary findings. Universal recommendation.

Coordinated disclosure — the live picture

Documenting a flaw without contacting the vendor isn't research, it's theater. The entire chain of findings is in coordinated disclosure with the affected actors, under a 90-day grace per batch. The public dashboard shows the state in real time:

Coordinated disclosure pipeline — 1,092 pending, 63 under grace, 0 grace elapsed, 0 remediated. Lifecycle of vulnerable certificates: 894 crypto-valid, 261 expired, 0 reused, 0 revoked

At the time of writing: 63 disclosures under grace, 1,092 disclosures drafted but not yet sent (orphans of additional findings on already-notified vendors), 0 grace elapsed, 0 fix confirmed yet. The lifecycle of affected certificates is measured passively — no active probing during the grace window, to avoid signalling to an attacker that the finding exists.

It's an ethical discipline: a finding published without prior disclosure is a gift to attackers. And that discipline defines the white paper's timeline — publication happens once all grace periods have expired.

What's coming on July 25

The full white paper documents:

The corpus in detail (individual CT logs, geographic distribution of IPv4 scans, deduplication methodology)
The methodology above, more formally (correctness proofs of the detectors, complexity, performance)
The negative results above, in full table form
The migrated failure modes — where the cryptographic risk has actually moved in the 2026 ecosystem
The vendor response spectrum observed during coordinated disclosure with the affected manufacturers and operators
A concrete audit checklist for CISOs and procurement teams

The document is under coordinated embargo until July 25, 2026 — the date on which disclosure grace periods expire across every notification batch we've sent.

In the meantime — two useful things

1. Audit your own certificates, for free

audit.zetacert.com runs the exact same detector chain against an individual certificate or a hostname. No account, no data retention. If you want to check your own exposure to the failure modes we'll document, here's the tool:

→ audit.zetacert.com

2. Get notified at publication

Our white paper page accepts subscriptions for publication notifications. No other communication until then — a single email with the download link on July 25.

→ zetacert.com/en/whitepapers/state-of-public-pki-2026

ZetaCert Research conducts cryptographic audits of the public PKI at corpus scale. Reproducible methodology, published negative results, coordinated disclosure. More on zetacert.com/research or by email at research@zetacert.com.

Page LinkedIn LinkedIn

Join the discussion

0 CommentsWhat people are saying

Be the first to share your thoughts!

PKICryptographieRechercheAuditPost-Quantique

0.5% in 2012, 0.003% in 2026: what 31 million RSA moduli told us

ZetaCert Research

Équipe Research de ZetaCert. Audits cryptographiques à grande échelle de la PKI publique — méthodologie reproductible, résultats négatifs publiés, divulgation coordonnée.

May 11, 2026

12 min

0.5% in 2012, 0.003% in 2026: what 31 million RSA moduli told us

In 2012, Heninger et al. published Mining Your Ps and Qs: 0.5% of visible RSA moduli in the Web PKI were factorable by a plain batch GCD. Fourteen years later, we replayed the exercise — broader, deeper, with four detectors — against 44.6 million certificates. The result on scanned RSA moduli: 836 weak keys, or 0.003%. A ~150× improvement on the core crypto.

And yet, 891 vulnerable keys are still sitting in the active TLS scan today. The modern commercial PKI is solid. The long tail is not. Here's how we measured both.

CryptoAudit public dashboard — 58.9M certificates indexed in total, of which 31.1M are RSA-keyed and 27.8M ECC-keyed; 891 weak keys detected live

The corpus: 44.6 million X.509 certificates

The corpus was built from two structurally distinct sources.

Deployed key-size distribution vs CNSA 2.0 post-quantum guidance

Ethics note on active TLS probing

Methodologically, this matches the approach of Censys, Shodan and Rapid7 Project Sonar — academic and industrial tools publicly used for the past decade.

Four cryptanalytic detectors

Detector 1 — Batch GCD (Bernstein 2004 / Heninger 2012)

The principle is simple: if two RSA moduli n₁ = p × q₁ and n₂ = p × q₂ share a prime factor p, then gcd(n₁, n₂) = p and both keys are factored.

2026 run: 6.6 million deduplicated RSA moduli, ~7 hours on commodity hardware, 0 shared factors on the CT-visible portion.

Detector 2 — Fermat close-prime sweep

Detector 3 — Hidden Number Problem / lattice reduction (ECDSA)

This is the Boneh-Venkatesan / Nguyen-Shparlinski construction, formalised in the 1990s and deployed several times since:

Breitner & Heninger 2019 against the Bitcoin chain: recovery of dozens of ECDSA P-256 keys that had signed transactions;
Feng et al. 2026 against Alipay signatures: recovery on nonce bias;
Several historical Web PKI deployments with ECDSA implementations in non-standard libraries.

2026 run in two passes:

Against CT-extracted signatures: 23 groups (issuer, curve), 61 lattice attempts at biases ∈ {2, 4, 8} → 0 recoveries.
Against live signatures extracted from TLS handshakes against populations selected for their volume: 1,436 signatures, same biases → 0 recoveries.

Detector 4 — CA mis-issuance

2026 run: 44.6 million certificates, 2 false positives (AWS-internal artifacts: ip-10-x-y-z.compute.internal, deposited in CT for monitoring purposes), 0 genuine mis-issuance.

The negative result counts as much as the positive

Detector	Population scanned	Outcome
Batch GCD (Bernstein 2004)	6.6M unique RSA moduli	0 shared factors
Fermat (k=100)	25.8M unique moduli	0 close-prime pairs
Fermat (k=2M, IoT sub-corpus)	22,000 / 30,050	0 close-prime pairs
HNP / lattice (CT sigs)	23 groups, 61 attempts	0 recoveries
HNP / lattice (TLS handshake sigs)	1,436 signatures	0 recoveries
CA mis-issuance	44.6M certificates	2 false positives, 0 genuine

Aggregated corpus findings: 1,158 vulnerable keys broken down across Key Size, Public Exponent, ec_key_reuse and y2038_doomed_validity

This is the visible tier — the one that gets audited, and re-audited. Reassuring.

Findings distribution by vendor category — consumer router SOHO, enterprise networking, NAS, firewall, residential gateway, industrial RF / OT, surveillance CCTV...

The methodology lesson — an HNP false positive that taught us humility

This section is defensive. It reveals an internal bug in our pipeline. Transparency here is more useful than silence.

Obviously wrong. No chance that all those CAs were simultaneously broken in the same way.

Investigation traced back to a sanity-check expression at line 132 of research/ecdsa_nonce.py:

if k_candidate.bit_length() > n.bit_length() - bias + 2:
    break

Fix: strict bound n.bit_length() - bias and rejection of x_candidate ∈ {0, n−1}. Re-run → clean negative result.

20 false-positive rows cleaned from the database.

To anyone running HNP-style detectors at scale: verify your sanity-check bounds against a known-clean CA population before drawing conclusions from a positive hit. At least one previously-published "weak-CA HNP" finding may have used template lattice code with a similar latent bug. This bug is easier to introduce than to detect.

Coordinated disclosure — the live picture

Coordinated disclosure pipeline — 1,092 pending, 63 under grace, 0 grace elapsed, 0 remediated. Lifecycle of vulnerable certificates: 894 crypto-valid, 261 expired, 0 reused, 0 revoked

What's coming on July 25

The full white paper documents:

The corpus in detail (individual CT logs, geographic distribution of IPv4 scans, deduplication methodology)
The methodology above, more formally (correctness proofs of the detectors, complexity, performance)
The negative results above, in full table form
The migrated failure modes — where the cryptographic risk has actually moved in the 2026 ecosystem
The vendor response spectrum observed during coordinated disclosure with the affected manufacturers and operators
A concrete audit checklist for CISOs and procurement teams

The document is under coordinated embargo until July 25, 2026 — the date on which disclosure grace periods expire across every notification batch we've sent.