ASCII / Unicode Codepoint Lookup

Look up the ASCII/Unicode codepoint for any character โ€” or go the other way and turn a list of codepoints back into text. Shows decimal, hex (U+XXXX), octal, binary, HTML entity, JS escape, and URL-encoded forms for every code point.

Inputs

Pick whichever direction matches what you have.

Used when direction is Text โ†’ codepoints. Emoji and CJK characters work โ€” they are decoded by full code point (not UTF-16 surrogate halves).

Used when direction is Codepoints โ†’ text. Accepts U+XXXX, 0x48, 0o110, 0b1001000, H, H, \u0048, or plain decimal โ€” separated by spaces, commas, or newlines.

Result

5 codepoints from 5 characters
U+0048 U+0069 U+0021 U+0020 U+01F44B
  • CodepointH U+0048 dec 72 0x48 0o110 0b1001000 HTML H / H JS \u0048 URL %48
  • Codepointi U+0069 dec 105 0x69 0o151 0b1101001 HTML i / i JS \u0069 URL %69
  • Codepoint! U+0021 dec 33 0x21 0o41 0b100001 HTML ! / ! JS \u0021 URL %21
  • Codepoint U+0020 dec 32 0x20 0o40 0b100000 HTML   /   JS \u0020 URL %20
  • Codepoint๐Ÿ‘‹ U+01F44B dec 128075 0x1F44B 0o372113 0b11111010001001011 HTML 👋 / 👋 JS \u1F44B URL %1F44B
Note โ€” Only the code point is shown โ€” not the visual grapheme cluster. Combining marks or ZWJ-joined emoji are reported as multiple code points (this matches String.prototype[@@iterator]).

Step-by-step

  1. Iterate the input by Unicode code point (not UTF-16 code unit) so surrogate pairs collapse into one entry.
  2. For each code point, emit U+HEX, decimal, octal, binary, HTML numeric entity, JS \u escape, and the URL-percent byte.

How to use this calculator

  • Switch the direction depending on whether you have text or codepoints.
  • Paste any text โ€” emoji and supplementary-plane characters (codepoint > U+FFFF) are handled by code point, not by UTF-16 code unit.
  • When entering codepoints, mix and match notations โ€” U+0041, 0x41, 65, A, \u0041 are all accepted.
  • Use the per-codepoint rows to grab the exact HTML entity or JS escape you need, with no further hand-conversion.

About this calculator

Every character on the modern web โ€” Latin letters, CJK ideographs, emoji, math symbols โ€” has a Unicode code point. This tool maps in both directions: type some text and read out the U+HEX code point for each character, or paste a list of code points (in any common notation) and reconstruct the text. Itโ€™s the lookup table you reach for when youโ€™re writing a regex that needs to allow only certain script ranges, sanity-checking a copy-paste that arrived with garbled mojibake, or building a font fallback test page. For each code point you get the decimal, hex, octal, binary, HTML numeric entity (both forms), JavaScript \u escape, and the single-byte URL-percent representation โ€” the full set youโ€™d otherwise paste-into-multiple-tools to assemble.

How it works โ€” the formula

codepoint(ch) = ch.codePointAt(0) text(cps) = cps.map(String.fromCodePoint).join("")

The Unicode standard maps every abstract character to a unique 21-bit integer (the code point) in the range U+0000 to U+10FFFF. JavaScript exposes the code point via String.prototype.codePointAt and the reverse via String.fromCodePoint; these correctly handle UTF-16 surrogate pairs so supplementary-plane characters (most emoji) are not split.

Worked examples

Example 1
ASCII letter
Inputs:
text = "A"
Output:
U+0041 dec 65 0x41
Example 2
CJK ideograph
Inputs:
text = "ไธญ"
Output:
U+4E2D dec 20013 0x4E2D
Example 3
Supplementary plane (emoji)
Inputs:
text = "๐Ÿ‘‹"
Output:
U+1F44B dec 128075 0x1F44B

Limitations

  • Grapheme clusters (composed emoji, combining marks) are reported by their constituent code points, not as a single glyph. This matches the iterator behaviour of every modern JS engine.
  • Codepoints in the surrogate range (U+D800โ€“U+DFFF) are not scalar values and cannot be encoded; entering them returns the Unicode replacement character.
  • URL %XX bytes shown are correct for code points โ‰ค U+007F only. For higher code points use proper UTF-8 percent-encoding (encodeURIComponent).

Every numeric value here is derived directly from the Unicode code-point integer โ€” there is no rounding or transcoding loss.

Frequently asked

A 21-bit integer (U+0000 through U+10FFFF) that names a single abstract character in the Unicode standard. Code points are the layer above any specific encoding (UTF-8, UTF-16, UTF-32 are all different byte-level serializations of the same code points).

Related calculators

More tools you might like