UTF-8 Byte Counter

Count characters, UTF-8 bytes, and UTF-16 units in any text — including emoji and non-Latin scripts — to check database column and payload limits. Runs in your browser.

Counts

UTF-8 bytesstorage size; VARCHAR byte limits, HTTP bodies
19
Characters (code points)what a human counts, incl. emoji as 1
11
UTF-16 units (.length)JavaScript string length
12
Lines
1
Words
3

UTF-8 bytes is what database column limits (e.g. a VARCHAR byte cap) and network payloads actually measure — and it differs from character count whenever non-ASCII text or emoji are present.

About this tool

Counting text is deceptively subtle once you leave plain ASCII, and the right number depends on what you are measuring. This tool reports all three counts that matter. UTF-8 bytes is the encoded size — what a database byte limit, an HTTP body, or a storage quota actually measures, where an accented letter takes two bytes and an emoji four. Characters (Unicode code points) is what a human counts, treating 世 or 🌍 as one. UTF-16 units is JavaScript's string length, where characters outside the basic plane (most emoji) count as two. It also shows line and word counts. The classic gotcha this solves: a value that looks like 10 characters can be 20+ bytes, silently overflowing a column declared with a byte limit. The counting uses the browser's own encoder, so the byte figure exactly matches what servers and databases see. Everything runs locally.

How to use it

  • Type or paste your text.
  • Read the UTF-8 byte count for storage and payload limits.
  • Compare it to the character count to spot multi-byte content.
  • Use the UTF-16 figure when reasoning about JavaScript string length.

Frequently asked questions

Why is the byte count higher than the character count?
UTF-8 uses one byte for ASCII but two to four bytes for other characters: most accented and Greek/Cyrillic letters take two, most CJK characters three, and most emoji four. So any non-ASCII text has more bytes than characters.
Which number does a database VARCHAR limit use?
It depends on the database and column definition. Some count characters, but many (and byte-typed columns) count UTF-8 bytes — so a VARCHAR(20) can reject a 12-character string that encodes to 24 bytes. When in doubt, size against the UTF-8 byte count shown here.
Why does JavaScript report a different length?
JavaScript strings are UTF-16, so .length counts 16-bit units. Characters outside the Basic Multilingual Plane — including most emoji — are stored as a surrogate pair and count as 2. That is the "UTF-16 units" figure.
How are emoji counted?
As one code point each in the character count (unless they are a combined sequence like a flag or skin-tone variant, which can be several), typically four bytes in UTF-8, and two units in UTF-16. The tool shows all three so you can see the difference.
Does this match `wc -c` and `wc -m`?
Yes: the UTF-8 byte count corresponds to `wc -c` (bytes) and the character count to `wc -m` (characters) on a UTF-8 locale, since the byte figure uses the same UTF-8 encoding.
Is my text uploaded?
No. All counting uses the browser's built-in text encoder with no network request.

Related tools