GSM-7 basics: 160 chars/segment, 7-bit encoding, Latin alphabet
GSM-7 is the default SMS encoding standard, defined in the GSM technical specification (3GPP TS 23.038). It packs characters into 7 bits each, allowing up to 160 characters per segment. The alphabet includes:
- Latin A–Z, a–z, 0–9
- Space, punctuation: . , ? ! ' " - ( ) & + * / = : ; % £ ¥ § ¿
- Special characters: @, [, ], {, }, ~, |, \, ^, <, >, ¡
This set was chosen because it covers English, German, French, Italian, Spanish, Portuguese, Dutch, Swedish, Danish, Norwegian, and Finnish without accented variants. If your message uses only these characters—no é, no ñ, no emoji—you get 160 usable characters per segment. At smsroute's pricing from $0.004 per SMS across 149 countries, a single 160-character GSM-7 message costs the same as one segment globally.
The 7-bit encoding is not arbitrary. Early SMS was bandwidth-constrained; 7-bit packing saved radio spectrum. The limit of 160 comes from the 140-octet (byte) maximum payload minus a 20-octet preamble in some networks, yielding 120 × 8 ÷ 7 = 160 characters. This constraint persists across all modern carriers, even though bandwidth is no longer scarce. It is a protocol fossil.
UCS-2: triggered by any non-GSM-7 character, 70 chars/segment
UCS-2 (Universal Coded Character Set with 2 octets) is a 16-bit encoding that can represent nearly all Unicode characters. The moment your message contains any character not in the GSM-7 alphabet—a tilde in Spanish (~n), a Cyrillic letter, a Chinese character, an emoji, or even the Euro sign (€)—the entire message is re-encoded to UCS-2. There is no mixed encoding within a single SMS. The encoding decision is all-or-nothing.
UCS-2 uses 2 bytes (16 bits) per character, so the 140-octet SMS payload holds only 70 characters. For those accustomed to GSM-7's generous 160-character budget, this is a shock: UCS-2 cuts your message capacity in half. A 160-character UCS-2 message automatically splits into 3 segments. The cost impact is direct: what you expected to send as one SMS now costs three times as much.
UCS-2 is non-negotiable. You cannot opt out. If you are sending to a Russian number with a Cyrillic name in the body, or a Portuguese message with accented vowels, or an OTP verification code to a user in Japan with locale-specific emoji, you are paying the UCS-2 penalty. This applies even if only 1% of your message uses non-GSM-7 characters. The entire message switches encoding.
Concatenation overhead: 153/67 chars when sending >1 segment
When an SMS exceeds the single-segment limit, the gateway splits the message and adds a Concatenation Header (UDH). This header is 7 octets long and includes a reference number and sequence information so the recipient's handset can reassemble the segments in the correct order. These 7 octets are transparent to the user but occupy payload space, reducing the usable character count in every segment except the last.
For GSM-7 concatenated messages, the first segment holds only 153 characters instead of 160. For UCS-2, it holds 67 instead of 70. The final segment in a chain retains full capacity because the header is only needed once per message. A 2-segment GSM-7 message can therefore hold 153 + 160 = 313 characters, not 320. A 3-segment message yields 153 + 153 + 160 = 466, not 480. The overhead is constant but compounds across long messages.
This overhead is why counting segments accurately before sending is critical. A naive character counter that ignores concatenation will miscalculate costs. If you send a 161-character GSM-7 message, the gateway must split it into 2 segments. The first segment holds only 153 characters (due to UDH overhead), so the remaining 8 characters spill into the second segment. You are charged for 2 SMS. Your 161-character message costs twice as much as a 160-character message.
Common surprises: the Euro sign €, the tilde ~, curly quotes, emoji
Certain characters behave unexpectedly. The Euro sign (€) is not in GSM-7, even though the Euro is the official currency of 20 European countries. The straight tilde (~) is in GSM-7, but curly/smart quotes (" and ") are not—many word processors and mobile keyboards default to smart quotes, which trigger UCS-2. This is the most common source of surprise billing.
A developer implements a notification template: "Your balance is €50." The Euro sign forces UCS-2 encoding. If the message is 100 characters, it fits in one segment and costs as a normal SMS. But if it is 71+ characters, it splits into 2 segments, costing double. This happens silently unless you test the encoding beforehand.
Emoji are obvious UCS-2 triggers and are understood to consume capacity. Less obvious are:
- Curly quotes ("hello" instead of "hello")—common in CMS and rich-text editors
- Em-dash (—) instead of hyphen (-)—common in markdown converters
- Non-breaking space (\u00A0) instead of regular space—common in HTML-to-text conversion
- Accented Latin letters (é, ñ, ü, etc.)—necessary for Romance and Nordic languages but absent from GSM-7
- Degree symbol (°) and plus sign within a circle (⊕)—often used in weather or technical contexts
If you are building OTP and verification SMS systems, you have less control over message content, but you can still audit templates. If you are sending marketing SMS or platform notifications, audit your message builder to ensure it does not inject smart quotes, non-breaking spaces, or other invisible UCS-2 triggers.
How to count before you send — a JavaScript helper
The only way to control encoding costs is to predict segment count before submission. Here is a JavaScript function that counts segments accurately:
// language: javascript
const GSM7_CHARSET = /^[\x00-\x0C\x0E\x1F\x20-\x23\x25-\x2A\x2C-\x2F\x30-\x39\x3A-\x3C\x3F\x40\x41-\x5A\x5C\x5E\x5F\x61-\x7A\x7B\x7D]*$/;
function countSmsSegments(message) {
// Check if message uses only GSM-7
const isGSM7 = GSM7_CHARSET.test(message);
const singleSegmentLimit = isGSM7 ? 160 : 70;
const multiSegmentLimit = isGSM7 ? 153 : 67;
if (message.length <= singleSegmentLimit) {
return 1;
}
// Calculate segments needed for multi-segment message
const firstSegmentChars = multiSegmentLimit;
const remainingChars = message.length - firstSegmentChars;
const additionalSegments = Math.ceil(remainingChars / multiSegmentLimit);
return 1 + additionalSegments;
}
// Usage
console.log(countSmsSegments("Hello world")); // 1 segment (GSM-7)
console.log(countSmsSegments("€50 balance available")); // 1 or 2 segments (UCS-2)
console.log(countSmsSegments("A".repeat(160))); // 1 segment (GSM-7)
console.log(countSmsSegments("A".repeat(161))); // 2 segments (GSM-7, UDH overhead)
console.log(countSmsSegments("你好")); // 1 segment (UCS-2, 2 chars < 70)
Integrate this function into your message preview UI. When a user drafts a message, display the segment count in real-time. If the count jumps unexpectedly, highlight the non-GSM-7 characters and suggest alternatives. For Python integration with smsroute, you can call this logic server-side before querying the API. For API consumers at scale, pre-flight validation saves costs and prevents user confusion.
Many SMS gateways (including smsroute) return the actual segment count in the API response after you submit a message, but relying on that feedback loop is reactive. Pre-flight calculation lets you warn users or reject messages before they are charged. This is especially important in cost-sensitive applications like OTP delivery, support notifications, or marketing campaigns where per-message costs aggregate quickly.
Frequently asked questions
Why did my message send as 2 segments instead of 1?
A single SMS can hold either 160 characters (GSM-7 encoding) or 70 characters (UCS-2 encoding). If your message exceeds these limits, the SMS gateway automatically splits it into multiple segments. Each additional segment costs the same as a single SMS, so a 161-character GSM-7 message becomes 2 billable segments. UCS-2 triggers whenever you use any character outside the standard GSM-7 alphabet—emoji, accented letters, Chinese characters, or even certain punctuation marks like the Euro sign (€) or smart quotes.
What's the difference between GSM-7 and UCS-2?
GSM-7 is a 7-bit character set designed for SMS and covers Latin characters, digits, and common punctuation. It allows 160 characters per segment. UCS-2 is a 16-bit Unicode encoding that supports nearly all languages and symbols, but only fits 70 characters per segment. The moment your message includes even one non-GSM-7 character—such as ñ, €, 中文, or 😀—the entire message switches to UCS-2 encoding. There is no partial encoding; the gateway commits to UCS-2 for the whole message.
How many characters do I get in a multi-segment SMS?
Concatenated SMS (messages split across multiple segments) use 7 octets for header information in each segment. This reduces usable space to 153 characters per GSM-7 segment and 67 characters per UCS-2 segment. A 2-segment GSM-7 message therefore holds 153 + 160 = 313 characters total (the first segment has overhead, the second does not). For UCS-2, two segments yield 67 + 70 = 137 characters. This overhead is a fixed cost of using the concatenation protocol.
Will my bill increase if I use special characters?
Yes. Any character outside GSM-7 forces UCS-2 encoding, which halves your capacity from 160 to 70 characters per segment. If you normally send 160-character messages at, for example, $0.004 per SMS, adding a single emoji or accented letter will cause that message to switch to UCS-2 and potentially cost 2× as much (2 segments instead of 1). This is why monitoring character encoding during templating and QA is critical for cost control, especially at scale.
How can I predict the segment count before sending?
Implement a JavaScript helper function that checks each character against the GSM-7 alphabet and calculates segment boundaries. The function should iterate through your message, flag any non-GSM-7 characters, determine the encoding, and divide the message length by the appropriate segment capacity (160 for GSM-7, 70 for UCS-2), accounting for concatenation overhead (153 and 67 respectively). Many SMS gateways, including smsroute, return segment count in the API response after submission, but pre-flight calculation allows you to warn users or reject messages before they are charged.
Why did my message send as 2 segments instead of 1?
A single SMS can hold either 160 characters (GSM-7 encoding) or 70 characters (UCS-2 encoding). If your message exceeds these limits, the SMS gateway automatically splits it into multiple segments. Each additional segment costs the same as a single SMS, so a 161-character GSM-7 message becomes 2 billable segments. UCS-2 triggers whenever you use any character outside the standard GSM-7 alphabet—emoji, accented letters, Chinese characters, or even certain punctuation marks like the Euro sign (€) or smart quotes.
What's the difference between GSM-7 and UCS-2?
GSM-7 is a 7-bit character set designed for SMS and covers Latin characters, digits, and common punctuation. It allows 160 characters per segment. UCS-2 is a 16-bit Unicode encoding that supports nearly all languages and symbols, but only fits 70 characters per segment. The moment your message includes even one non-GSM-7 character—such as ñ, €, 中文, or 😀—the entire message switches to UCS-2 encoding. There is no partial encoding; the gateway commits to UCS-2 for the whole message.
How many characters do I get in a multi-segment SMS?
Concatenated SMS (messages split across multiple segments) use 7 octets for header information in each segment. This reduces usable space to 153 characters per GSM-7 segment and 67 characters per UCS-2 segment. A 2-segment GSM-7 message therefore holds 153 + 160 = 313 characters total (the first segment has overhead, the second does not). For UCS-2, two segments yield 67 + 70 = 137 characters. This overhead is a fixed cost of using the concatenation protocol.
Will my bill increase if I use special characters?
Yes. Any character outside GSM-7 forces UCS-2 encoding, which halves your capacity from 160 to 70 characters per segment. If you normally send 160-character messages at, for example, $0.004 per SMS, adding a single emoji or accented letter will cause that message to switch to UCS-2 and potentially cost 2× as much (2 segments instead of 1). This is why monitoring character encoding during templating and QA is critical for cost control, especially at scale.
How can I predict the segment count before sending?
Implement a JavaScript helper function that checks each character against the GSM-7 alphabet and calculates segment boundaries. The function should iterate through your message, flag any non-GSM-7 characters, determine the encoding, and divide the message length by the appropriate segment capacity (160 for GSM-7, 70 for UCS-2), accounting for concatenation overhead (153 and 67 respectively). Many SMS gateways, including smsroute, return segment count in the API response after submission, but pre-flight calculation allows you to warn users or reject messages before they are charged.
Why did my message send as 2 segments instead of 1?
A single SMS can hold either 160 characters (GSM-7 encoding) or 70 characters (UCS-2 encoding). If your message exceeds these limits, the SMS gateway automatically splits it into multiple segments. Each additional segment costs the same as a single SMS, so a 161-character GSM-7 message becomes 2 billable segments. UCS-2 triggers whenever you use any character outside the standard GSM-7 alphabet—emoji, accented letters, Chinese characters, or even certain punctuation marks like the Euro sign (€) or smart quotes.
What's the difference between GSM-7 and UCS-2?
GSM-7 is a 7-bit character set designed for SMS and covers Latin characters, digits, and common punctuation. It allows 160 characters per segment. UCS-2 is a 16-bit Unicode encoding that supports nearly all languages and symbols, but only fits 70 characters per segment. The moment your message includes even one non-GSM-7 character—such as ñ, €, 中文, or 😀—the entire message switches to UCS-2 encoding. There is no partial encoding; the gateway commits to UCS-2 for the whole message.
How many characters do I get in a multi-segment SMS?
Concatenated SMS (messages split across multiple segments) use 7 octets for header information in each segment. This reduces usable space to 153 characters per GSM-7 segment and 67 characters per UCS-2 segment. A 2-segment GSM-7 message therefore holds 153 + 160 = 313 characters total (the first segment has overhead, the second does not). For UCS-2, two segments yield 67 + 70 = 137 characters. This overhead is a fixed cost of using the concatenation protocol.
Will my bill increase if I use special characters?
Yes. Any character outside GSM-7 forces UCS-2 encoding, which halves your capacity from 160 to 70 characters per segment. If you normally send 160-character messages at, for example, $0.004 per SMS, adding a single emoji or accented letter will cause that message to switch to UCS-2 and potentially cost 2× as much (2 segments instead of 1). This is why monitoring character encoding during templating and QA is critical for cost control, especially at scale.
How can I predict the segment count before sending?
Implement a JavaScript helper function that checks each character against the GSM-7 alphabet and calculates segment boundaries. The function should iterate through your message, flag any non-GSM-7 characters, determine the encoding, and divide the message length by the appropriate segment capacity (160 for GSM-7, 70 for UCS-2), accounting for concatenation overhead (153 and 67 respectively). Many SMS gateways, including smsroute, return segment count in the API response after submission, but pre-flight calculation allows you to warn users or reject messages before they are charged.
Why did my message send as 2 segments instead of 1?
A single SMS can hold either 160 characters (GSM-7 encoding) or 70 characters (UCS-2 encoding). If your message exceeds these limits, the SMS gateway automatically splits it into multiple segments. Each additional segment costs the same as a single SMS, so a 161-character GSM-7 message becomes 2 billable segments. UCS-2 triggers whenever you use any character outside the standard GSM-7 alphabet—emoji, accented letters, Chinese characters, or even certain punctuation marks like the Euro sign (€) or smart quotes.
What's the difference between GSM-7 and UCS-2?
GSM-7 is a 7-bit character set designed for SMS and covers Latin characters, digits, and common punctuation. It allows 160 characters per segment. UCS-2 is a 16-bit Unicode encoding that supports nearly all languages and symbols, but only fits 70 characters per segment. The moment your message includes even one non-GSM-7 character—such as ñ, €, 中文, or 😀—the entire message switches to UCS-2 encoding. There is no partial encoding; the gateway commits to UCS-2 for the whole message.
How many characters do I get in a multi-segment SMS?
Concatenated SMS (messages split across multiple segments) use 7 octets for header information in each segment. This reduces usable space to 153 characters per GSM-7 segment and 67 characters per UCS-2 segment. A 2-segment GSM-7 message therefore holds 153 + 160 = 313 characters total (the first segment has overhead, the second does not). For UCS-2, two segments yield 67 + 70 = 137 characters. This overhead is a fixed cost of using the concatenation protocol.
Will my bill increase if I use special characters?
Yes. Any character outside GSM-7 forces UCS-2 encoding, which halves your capacity from 160 to 70 characters per segment. If you normally send 160-character messages at, for example, $0.004 per SMS, adding a single emoji or accented letter will cause that message to switch to UCS-2 and potentially cost 2× as much (2 segments instead of 1). This is why monitoring character encoding during templating and QA is critical for cost control, especially at scale.
How can I predict the segment count before sending?
Implement a JavaScript helper function that checks each character against the GSM-7 alphabet and calculates segment boundaries. The function should iterate through your message, flag any non-GSM-7 characters, determine the encoding, and divide the message length by the appropriate segment capacity (160 for GSM-7, 70 for UCS-2), accounting for concatenation overhead (153 and 67 respectively). Many SMS gateways, including smsroute, return segment count in the API response after submission, but pre-flight calculation allows you to warn users or reject messages before they are charged.
Why did my message send as 2 segments instead of 1?
A single SMS can hold either 160 characters (GSM-7 encoding) or 70 characters (UCS-2 encoding). If your message exceeds these limits, the SMS gateway automatically splits it into multiple segments. Each additional segment costs the same as a single SMS, so a 161-character GSM-7 message becomes 2 billable segments. UCS-2 triggers whenever you use any character outside the standard GSM-7 alphabet—emoji, accented letters, Chinese characters, or even certain punctuation marks like the Euro sign (€) or smart quotes.
What's the difference between GSM-7 and UCS-2?
GSM-7 is a 7-bit character set designed for SMS and covers Latin characters, digits, and common punctuation. It allows 160 characters per segment. UCS-2 is a 16-bit Unicode encoding that supports nearly all languages and symbols, but only fits 70 characters per segment. The moment your message includes even one non-GSM-7 character—such as ñ, €, 中文, or 😀—the entire message switches to UCS-2 encoding. There is no partial encoding; the gateway commits to UCS-2 for the whole message.
How many characters do I get in a multi-segment SMS?
Concatenated SMS (messages split across multiple segments) use 7 octets for header information in each segment. This reduces usable space to 153 characters per GSM-7 segment and 67 characters per UCS-2 segment. A 2-segment GSM-7 message therefore holds 153 + 160 = 313 characters total (the first segment has overhead, the second does not). For UCS-2, two segments yield 67 + 70 = 137 characters. This overhead is a fixed cost of using the concatenation protocol.
Will my bill increase if I use special characters?
Yes. Any character outside GSM-7 forces UCS-2 encoding, which halves your capacity from 160 to 70 characters per segment. If you normally send 160-character messages at, for example, $0.004 per SMS, adding a single emoji or accented letter will cause that message to switch to UCS-2 and potentially cost 2× as much (2 segments instead of 1). This is why monitoring character encoding during templating and QA is critical for cost control, especially at scale.
How can I predict the segment count before sending?
Implement a JavaScript helper function that checks each character against the GSM-7 alphabet and calculates segment boundaries. The function should iterate through your message, flag any non-GSM-7 characters, determine the encoding, and divide the message length by the appropriate segment capacity (160 for GSM-7, 70 for UCS-2), accounting for concatenation overhead (153 and 67 respectively). Many SMS gateways, including smsroute, return segment count in the API response after submission, but pre-flight calculation allows you to warn users or reject messages before they are charged.