Post cover

Is it Safe to Compare JavaScript Strings?

Let's compare 2 strings str1 and str2:

javascript
const str1 = 'Hello!';
const str2 = 'Hello!';
str1 === str2; // => true

Because str1 and str2 have the same characters, these strings are equal.

Is it always the case that 2 strings looking the same are equal? Let's try another example:

javascript
const str1 = 'café';
const str2 = 'café';
str1 === str2; // => false

While str1 and str2 look the same, the comparison str1 === str2 evaluates to false. How's that possible?

Let's detail into how to correctly compare strings in JavaScript. Before starting, I'm going to familiarize you with the terms of grapheme (a unit of writing) and combining character (specialized character that modify the look of a base character).

Before I go on, let me recommend something to you.

If you want to significantly improve your JavaScript knowledge, take the amazingly useful course "Modern JavaScript From The Beginning 2.0" by Brad Traversy. Use the coupon code "DMITRI" and get 20% discount!

1. What's a grapheme

Looking at the following string, what can you say about its content?

javascript
const str1 = 'café';

You can easily see that it has 4 letters: lowercase c, lowercase a, lowercase f, and lowercase e with acute.

The way a user thinks about a character as a unit of writing is named grapheme. The example string café contains 4 graphemes.

Here's a formal definition of a grapheme:

Grapheme is a minimally distinctive unit of writing in the context of a particular writing system.

Ok, that's all interesting, but how does it relate to the safe comparison of strings? Some graphemes can be rendered using different sequences of characters.

Particularly, there is a special set of characters named combining characters that modify the previous character to create new graphemes. Let's detail combining characters.

2. What's a combining character

Combining character is a character that applies to the precedent base character to create a grapheme.

Combining character include accents, diacritics, Hebrew points, Arabic vowel signs, and Indic matras.

Combining character always require a base character to be applied to. You should avoid displaying them isolated.

For example, é is an atomic grapheme. You can take a lowercase e (the base character) and combine it with combining acute accent ◌́ (the combining character) to render the grapheme: e + ◌́ = é.

javascript
const e1 = 'e\u0301';
e1; // renders as "é"

where \u0301 is the unicode escape sequence of the combining character ◌́.

Note, however, that the same é can be represented in a different way using the lowercase e with acute character:

javascript
const e2 = 'é';
e2; // renders as "é"

Even though e1 and e2 render the same grapheme, nevertheless, they are different string values:

javascript
const e1 = 'e\u0301';
const e2 = 'é';
e1 === e2; // => false

3. Safe comparison of strings

Having a better understanding of graphemes and combining characters, here are a couple of rules for safer strings comparison in JavaScript.

Firstly, you are safe to compare strings that contain characters from Basic Multilangual Plane (including the ASCII characters) using regular comparison operators ===, == or utility function Object.is().

javascript
const str1 = 'Hello!';
const str2 = 'Hello!';
str1 === str2; // => true

Both str1 and str2 contain ASCII characters, so you can safely compare them using comparison operators.

Secondly, if you deal with characters above the Basic Multilingual Plane, including combining characters, then you aren't safe to compare strings using ===, == and Object.is(). What you need to do additionally is to normalize the compared strings.

javascript
const str1 = 'café';
const str2 = 'cafe\u0301'; // same as 'café'
str1 === str2; // => false
str1.normalize() === str2.normalize(); // => true

In simple words, the string normalization makes canonical-equivalent strings ('café' and 'cafe\u0301' are equivalent because they represent the same graphemes) to have a unique representation (both 'café' and 'cafe\u0301' are normalized to a unique 'café').

4. Summary

You're safe to compare strings directly when their characters are from the Basic Multilingual Plane.

However, if the strings can contain combining characters, then it would be safer to normalize the compared strings to the same form using string.normalize() function. Then perform the comparison on the normalized strings.

Like the post? Please share!

Quality posts into your inbox

I regularly publish posts containing:

  • Important JavaScript concepts explained in simple words
  • Overview of new JavaScript features
  • How to use TypeScript and typing
  • Software design and good coding practices

Subscribe to my newsletter to get them right into your inbox.

Join 7094 other subscribers.
Dmitri Pavlutin

About Dmitri Pavlutin

Tech writer and coach. My daily routine consists of (but not limited to) drinking coffee, coding, writing, coaching, overcoming boredom 😉.

Quality posts into your inbox

I regularly publish posts containing:

  • Important JavaScript concepts explained in simple words
  • Overview of new JavaScript features
  • How to use TypeScript and typing
  • Software design and good coding practices

Subscribe to my newsletter to get them right into your inbox.

Join 7094 other subscribers.
Dmitri Pavlutin

About Dmitri Pavlutin

Tech writer and coach. My daily routine consists of (but not limited to) drinking coffee, coding, writing, coaching, overcoming boredom 😉.