Research disclosure artifact

Bicharacter Grapheme-Cluster Rendering Exhaustion

This bug was discovered by Kai Parsons while experimenting with the limits of UTF-8 text handling. It is shared for research purposes to help safeguard applications that process complex character clusters and to support practical strategies for detecting, constraining, and safely rendering inputs that can otherwise make software unresponsive.

Payload Access

The page does not render the payloads. Each button fetches Base64 chunks, decodes them in memory, and writes the decoded bytes to the clipboard as text/plain.

Payload Light

Smaller sequence intended for controlled testing where payload size limits may be present.

Fetches chunked Base64 Decoded on click

Payload Heavy

Larger sequence intended for isolated lab environments and deeper stress testing.

Fetches chunked Base64 Decoded on click

Ready. Payload chunks are loaded only when a copy button is clicked.

Research use only. Test only in environments you own or where you have explicit authorization. Avoid production systems and shared third-party services.

How To Use

  1. Copy the character sequence using one of the buttons above: payload light or payload heavy.
  2. Paste into an application or website inside an authorized test environment.

Testing Done

Testing has been performed on macOS, iOS, and Android, with Chrome and Safari included in browser coverage.

  • Pasting the sequence into many websites and web apps resulted in an unresponsive page.
  • Specific tests included Google Docs, ChatGPT, WhatsApp, and Pages.
  • The light payload does not exceed Gmail's size limit, but testing resulted in failed delivery or a sending error.

Why This Is Occurring

The current working hypothesis is that the sequence creates an unusually expensive grapheme-cluster and text-shaping workload. Modern text systems do not render UTF-8 bytes directly. They first decode Unicode scalar values, segment them into grapheme clusters, apply normalization and bidirectional-text rules where relevant, choose fonts and fallback fonts, and then run shaping engines that determine how the visible glyphs should be composed.

When a very small visible sequence expands into a dense cluster of combining marks or related code points, the rendering engine may repeatedly evaluate cluster boundaries, glyph substitutions, fallback fonts, hit-testing positions, caret movement, line breaking, and layout invalidation. If these operations are not bounded defensively, a short input can produce disproportionate CPU, memory, or layout work. The result is not necessarily memory corruption; in many applications it appears as algorithmic complexity exhaustion, where the UI thread is occupied by shaping or layout work and the application becomes unresponsive.

A robust mitigation is likely to involve layered defenses: limiting pathological cluster length, enforcing maximum combining-mark depth, isolating rendering work from the main UI thread where possible, adding time or complexity budgets to shaping and layout paths, and preserving the original text bytes for storage while using a safe placeholder representation in high-risk rendering contexts.