String type is an important component of any programming language. The most useful information that user reads from the window of an iOS application is pure text.
To reach a higher number of users, the iOS application must be internationalised and support a lot of modern languages. The Unicode standard solves this problem, but creates additional complexity when working with strings.
On one hand, the language should provide a good balance between the Unicode complexity and the performance when processing strings. On the other hand, it should provide developer with comfortable structures to handle strings.
In my opinion, Swift does a great job on both hands.
Fortunately Swift's string is not a simple sequence of UTF-16 code units, like in JavaScript or Java.
In case of a sequence of UTF-16 code units it's a pain to do Unicode-aware string manipulations: you might break a surrogate pair or combining character sequence.
Swift implements a better approach. The string itself is not a collection, instead it provides views over the string content that may be applied according to situation. And one particular view, String.CharacterView
, is fully Unicode-aware.
For let myStr = "Hello, world"
you can access the following string views:
myStr.characters
isString.CharacterView
. Valuable to access graphemes, that visually are rendered as a single symbol. The most used view.myStr.unicodeScalars
isString.UnicodeScalarView
. Valuable to access the Unicode code point numbers as 21-bit integersmyStr.utf16
isString.UTF16View
. Useful to access the code unit values encoded in UTF16myStr.utf8
isString.UTF8View
. Valuable to access the code unit values encoded in UTF8
Most of the time developer deals with simple string characters, without diving into details like encoding or code units.
CharacterView
works nice for most of the string related tasks: iteration over the characters, counting the number of characters, verify substring existence, access by index, different manipulations and so on.
Let's see in more details how these tasks are accomplished in Swift.
1. Character and CharacterView structures
String.CharacterView
structure is a view over string content that is a collection of Character
.
To access the view from a string, use characters
string property:
let message = "Hello, world"let characters = message.charactersprint(type(of: characters)) // => "CharacterView"
message.characters
returns the CharacterView
structure.
The character view is a collection of Character
structures. For example, let's access the first character in a string view:
let message = "Hello, world"let firstCharacter = message.characters.first!print(firstCharacter) // => "H"print(type(of: firstCharacter)) // => "Character"let capitalHCharacter: Character = "H"print(capitalHCharacter == firstCharacter) // => true
message.characters.first
returns an optional that is the first character "H"
.
The character instance represents a single symbol H
.
In Unicode terms H
is Latin Capital letter H, U+0048
code point.
Let's go beyond ASCII and see how Swift handles composite symbols. Such characters are rendered as a single visual symbol, but are composed from a sequence of two or more Unicode scalars. Strictly such characters are named grapheme clusters.
Important: CharacterView
is a collection of grapheme clusters of the string.
Let's take a closer look at ç
grapheme. It may be represented in two ways:
- Using
U+00E7
LATIN SMALL LETTER C WITH CEDILLA: rendered asç
- Or using a combining character sequence:
U+0063
LATIN SMALL LETTER C plus the combining markU+0327
COMBINING CEDILLA. The grapheme is composite:c
+◌̧
=ç
Let's pick the second option and see how Swift handles it:
let message = "c\u{0327}a va bien" // => "ça va bien" let firstCharacter = message.characters.first! print(firstCharacter) // => "ç"let combiningCharacter: Character = "c\u{0327}" print(combiningCharacter == firstCharacter) // => true
firstCharacter
contains a single grapheme ç
that is rendered using two Unicode scalars U+0063
and U+0327
.
Character
structure accepts multiple Unicode scalars as long as they create a single grapheme. If you try to add more graphemes into a single Character
, Swift triggers an error:
let singleGrapheme: Character = "c\u{0327}\u{0301}" // Worksprint(singleGrapheme) // => "ḉ"let multipleGraphemes: Character = "ab" // Error!
Even if singleGrapheme
is composed of 3 Unicode scalars, it creates a single grapheme ḉ
.
multipleGraphemes
tries to create a Character
from 2 Unicode scalars. This creates 2 separated graphemes a
and b
in a single Character
structure, which is not allowed.
2. Iterating over characters in a string
CharacterView
collection conforms to Sequence
protocol. This allows to iterate over the view characters in a for-in
loop:
let weather = "rain"for char in weather.characters { print(char)}// => "r" // => "a" // => "i" // => "n"
Each character from weather.characters
is accessed using for-in
loop. On every iteration char
variable is assigned with a character from weather
string: "r"
, "a"
, "i"
and "n"
.
As an alternative, you can iterate over the characters using forEach(_:)
method, indicating a closure as the first argument:
let weather = "rain"weather.characters.forEach { char in print(char)}// => "r" // => "a" // => "i" // => "n"
The iteration using forEach(_:)
method is almost the same as for-in
, only that you cannot use continue
or break
statements.
To access the index of the current character in the loop, CharacterView
provides the enumerated()
method. The method returns a sequence of tuples (index, character)
:
let weather = "rain"for (index, char) in weather.characters.enumerated() { print("index: \(index), char: \(char)")}// => "index: 0, char: r" // => "index: 1, char: a" // => "index: 2, char: i" // => "index: 3, char: n"
enumerated()
method on each iteration returns tuples (index, char)
.
index
variable contains the character index at the current loop step. Correspondingly char
variable contains the character.
3. Counting characters
Simply use count
property of the CharacterView
to get the number of characters:
let weather = "sunny" print(weather.characters.count) // => 5
weather.characters.count
contains the number of characters in the string.
Each character in the view holds a grapheme. When an adjacent character (for example a combining mark) is appended to string, you may find that count
property is not increased.
It happens because an adjacent character does not create a new grapheme in the string, instead it modifies an existing base Unicode character. Let's see an example:
var drink = "cafe"print(drink.characters.count) // => 4drink += "\u{0301}"print(drink) // => "café"print(drink.characters.count) // => 4
Initially drink
has 4 characters.
When the combining mark U+0301
COMBINING ACUTE ACCENT is appended to string, it modifies the previous base character e
and creates a new grapheme é
. The property count
is not increased, because the number of graphemes is still the same.
4. Accessing character by index
Swift doesn't know about the characters count in the string view until it actually evaluates the graphemes in it. As result a subscript that allows to access the character by an integer index directly does not exist.
You can access the characters by a special type String.Index
.
If you need to access the first or last characters in the string, the character view structure has first
and last
properties:
let season = "summer"print(season.characters.first!) // => "s"print(season.characters.last!) // => "r"let empty = ""print(empty.characters.first == nil) // => trueprint(empty.characters.last == nil) // => true
Notice that first
and last
properties are optional type Character?
.
In the empty string empty
these properties are nil
.
To get a character at specific position, you have to use String.Index
type (actually an alias of String.CharacterView.Index
). String offers a subscript that accepts String.Index
to access the character, as well as pre-defined indexes myString.startIndex
and myString.endIndex
.
Using string index type, let's access the first and last characters:
let color = "green"let startIndex = color.startIndexlet beforeEndIndex = color.index(before: color.endIndex)print(color[startIndex]) // => "g"print(color[beforeEndIndex]) // => "n"
color.startIndex
is the first character index, so color[startIndex]
evaluates to g
.
color.endIndex
indicates the past the end position, or simply the position one greater than the last valid subscript argument. To access the last character, you must calculate the index right before string's end index: color.index(before: color.endIndex)
.
To access characters at position by an offset, use the offsetBy
argument of index(theIndex, offsetBy: theOffset)
method:
let color = "green"let secondCharIndex = color.index(color.startIndex, offsetBy: 1)let thirdCharIndex = color.index(color.startIndex, offsetBy: 2)print(color[secondCharIndex]) // => "r"print(color[thirdCharIndex]) // => "e"
Indicating the offsetBy
argument, you can access the character at specific offset.
Of course offsetBy
argument is jumping over string graphemes, i.e. the offset applies over Character
instances of string's CharacterView
.
If the index is out of range, Swift generates an error:
let color = "green"let oops = color.index(color.startIndex, offsetBy: 100) // Error!
To prevent such situations, indicate an additional argument limitedBy
to limit the offset: index(theIndex, offsetBy: theOffset, limitedBy: theLimit)
. The function returns an optional, which is nil
for out of bounds index:
let color = "green"let oops = color.index(color.startIndex, offsetBy: 100, limitedBy: color.endIndex)if let charIndex = oops { print("Correct index")} else { print("Incorrect index")}// => "Incorrect index"
oops
is an optional String.Index?
. The optional unwrap verifies whether the index didn't jump out of the string.
5. Checking substring existence
The simplest way to verify the substring existence is to call contains(_ other: String)
string method:
import Foundationlet animal = "white rabbit" print(animal.contains("rabbit")) // => true print(animal.contains("cat")) // => false
animal.contains("rabbit")
returns true
because animal
contains "rabbit"
substring.
Correspondingly animal.contains("cat")
evaluates to false
for a non-existing substring.
To verify whether the string has specific prefix or suffix, the methods hasPrefix(_:)
and hasSuffix(_:)
are available. Let's use them in an example:
import Foundationlet animal = "white rabbit" print(animal.hasPrefix("white")) // => true print(animal.hasSuffix("rabbit")) // => true
"white"
is a prefix and "rabbit"
is a suffix of "white rabbit"
. So the corresponding method calls animal.hasPrefix("white")
and animal.hasSuffix("rabbit")
return true
.
When you need to search for a particular character, it makes sense to query directly the character view. For example:
let animal = "white rabbit"let aChar: Character = "a"let bChar: Character = "b"print(animal.characters.contains(aChar)) // => trueprint(animal.characters.contains { $0 == aChar || $0 == bChar}) // => true
contains(_:)
verifies whether the character view has a particular character.
The second function form accepts a closure: contains(where predicate: (Character) -> Bool)
and performs the same verification.
6. String manipulation
The string in Swift is a value type. Whether you pass a string as an argument on function call, assign it to a variable or constant - every time a copy of the original string is created.
A mutating method call changes the string in place.
This chapter covers the common manipulations over strings.
Append to string a character or another string
The simplest way to append to string is +=
operator. You can append an entire string to original one:
var bird = "pigeon"bird += " sparrow"print(bird) // => "pigeon sparrow"
String structure provides a mutating method append()
. The method accepts a string, a character or even a sequence of characters, and appends it to the original string. For instance:
var bird = "pigeon"let sChar: Character = "s"bird.append(sChar)print(bird) // => "pigeons"bird.append(" and sparrows")print(bird) // => "pigeons and sparrows"bird.append(contentsOf: " fly".characters)print(bird) // => "pigeons and sparrows fly"
Extract a substring from string
The method substring()
allows to extract substrings:
- from a specific index up to the end of string
- from the the start up to a specific index
- or based on a range of indexes.
Let's see how it works:
import Foundationlet plant = "red flower"let strIndex = plant.index(plant.startIndex, offsetBy: 4)print(plant.substring(from: strIndex)) // => "flower"print(plant.substring(to: strIndex)) // => "red "if let index = plant.characters.index(of: "f") { let flowerRange = index..<plant.endIndex print(plant.substring(with: flowerRange)) // => "flower" }
The string subscript accepts a range or closed range of string indexes. This helps extracting substrings based on ranges of indexes:
let plant = "green tree"let excludeFirstRange = plant.index(plant.startIndex, offsetBy: 1)..<plant.endIndexprint(plant[excludeFirstRange]) // => "reen tree"let lastTwoRange = plant.index(plant.endIndex, offsetBy: -2)..<plant.endIndexprint(plant[lastTwoRange]) // => "ee"
Insert into string
The string type provides the mutating method insert()
. The method allows to insert a character or a sequence of characters at specific index.
The new character or sequence is inserted before the element currently at the specified index.
See the following sample:
var plant = "green tree"plant.insert("s", at: plant.endIndex)print(plant) // => "green trees"plant.insert(contentsOf: "nice ".characters, at: plant.startIndex)print(plant) // => "nice green trees"
Remove from string
The mutating method remove(at:)
removes the character at an index:
var weather = "sunny day"if let index = weather.characters.index(of: " ") { weather.remove(at: index) print(weather) // => "sunnyday"}
You can remove characters in the string that are in a range of indexes using removeSubrange(_:)
:
var weather = "sunny day" let index = weather.index(weather.startIndex, offsetBy: 6)let range = index..<weather.endIndex weather.removeSubrange(range) print(weather) // => "sunny"
Replace in string
The method replaceSubrange(_:with:)
accepts a range of indexes that should be replaced with a particular string. The method is mutating the string.
Let's see a sample:
var weather = "sunny day"if let index = weather.characters.index(of: " ") { let range = weather.startIndex..<index weather.replaceSubrange(range, with: "rainy") print(weather) // => "rainy day"}
The character view mutation alternative
Many of string manipulations described above may be applied directly on string's character view.
It is a good alternative if you find more comfortable to work directly with a collection of characters.
For example you can remove characters at specific index, or directly the first or last characters:
var fruit = "apple"fruit.characters.remove(at: fruit.startIndex)print(fruit) // => "pple"fruit.characters.removeFirst()print(fruit) // => "ple"fruit.characters.removeLast()print(fruit) // => "pl"
To reverse a word use reversed()
method of the character view:
var fruit = "peach"var reversed = String(fruit.characters.reversed())print(reversed) // => "hcaep"
You can easily filter the string:
let fruit = "or*an*ge"let filtered = fruit.characters.filter { char in return char != "*"}print(String(filtered)) // => "orange"
Map the string content by applying a transformer closure:
let fruit = "or*an*ge"let mapped = fruit.characters.map { char -> Character in if char == "*" { return "+" } return char}print(String(mapped)) // => "or+an+ge"
Or reduce the string content to an accumulator value:
let fruit = "or*an*ge"let numberOfStars = fruit.characters.reduce(0) { countStars, char in if (char == "*") { return countStarts + 1 } return countStars} print(numberOfStars) // => 2
7. Final words
At first sight, the idea of different types of views over string's content may seem overcomplicated.
In my opinion it is a great implementation. Strings can be viewed in different angles: as a collection of graphemes, UTF-8 or UTF-16 code units or simple Unicode scalars.
Just pick the view depending on your task. In most of the cases it is CharacterView
.
The character view deals with graphemes that may be compound from one or more Unicode scalars. As result the string cannot be integer indexed (like arrays). Instead a special type of index is applicable: String.Index
.
Special index type adds a bit of complexity when accessing individual characters or manipulating strings. I agree to pay this price, because having truly Unicode-aware operations on strings is awesome!
Do you find string views comfortable to use? Write a comment below and let's discuss!