regexMisleadingUnicodeCharacters

Reports characters in regex character classes that appear as single visual characters but are made of multiple code points.

✅ This rule is included in the ts logical and logicalStrict presets.

Some characters that appear as a single visual unit are actually composed of multiple Unicode code points. When these appear in regex character classes, each code point is matched separately, which is typically not the intended behavior.

This rule detects several types of multi-code-point characters in character classes:

Surrogate pairs: Characters like 👍 that require two UTF-16 code units
Combined characters: Base characters with combining marks like Á (A + combining accent)
Emoji with modifiers: Emoji with skin tone modifiers like 👶🏻
Regional indicator symbols: Flag emoji like 🇯🇵 (two regional indicators)
ZWJ sequences: Characters joined with a zero-width joiner like 👨‍👩‍👦

// Surrogate pair without unicode flag
const 
const pattern: RegExp
pattern = /[👍]/;

// Combined character (A + combining accent)
const 
const pattern: RegExp
pattern = /[Á]/;

// Emoji with skin tone modifier
const 
const pattern: RegExp
pattern = /[👶🏻]/u;

// Regional indicator symbols (flag)
const 
const pattern: RegExp
pattern = /[🇯🇵]/u;

// ZWJ sequence (family emoji)
const 
const pattern: RegExp
pattern = /[👨‍👩‍👦]/u;

// Unicode flag handles surrogate pairs correctly
const 
const pattern: RegExp
pattern = /[👍]/u;

// Match outside character class
const 
const pattern: RegExp
pattern = /👍/;

// Use precomposed character
const 
const pattern: RegExp
pattern = /[Á]/;

// Match emoji sequence outside character class
const 
const pattern: RegExp
pattern = /👶🏻/;

// Use \q{} syntax with v flag for grapheme clusters
const 
const pattern: RegExp
pattern = /[\q{👶🏻}]/v;

// Solo regional indicator is fine
const 
const pattern: RegExp
pattern = /[🇯]/u;

Options

This rule is not configurable.

When Not To Use It

If you intentionally want to match individual code points rather than visual characters, or if your regex pattern specifically needs to match partial Unicode sequences, you might prefer to disable this rule. Some specialized text processing may require matching individual surrogate halves or combining marks.

Equivalents in Other Linters

Made with ❤️‍🔥 around the world by the Flint team and contributors.

regexMisleadingUnicodeCharacters

Examples

Options

When Not To Use It

Further Reading

Equivalents in Other Linters