unicode_codepoints_from_string
function in APL to convert a UTF-8 string into an array of Unicode code points. This function is useful when you want to analyze or transform strings at the character encoding level, especially in multilingual datasets, log inspection, or byte-level debugging.
You can use this function to detect non-printable or non-ASCII characters, analyze internationalized content, or perform detailed comparisons between strings that look visually similar but differ in underlying code points.
For users of other query languages
If you come from other query languages, this section explains how to adjust your existing queries to achieve the same results in APL.Splunk SPL users
Splunk SPL users
In Splunk SPL, working with Unicode code points requires using
eval
expressions with ord
or custom logic, which can be verbose. APL offers a built-in function for this, making it concise and efficient.ANSI SQL users
ANSI SQL users
ANSI SQL does not have a native function to extract Unicode code points. You typically need to use platform-specific functions or procedural logic. In APL, this is a single-function call.
Usage
Syntax
Parameters
Name | Type | Description |
---|---|---|
source | string | The input UTF-8 string to convert. |
Returns
An array of integers, where each integer is the Unicode code point of the corresponding character in the input string.Use case examples
Use this function to identify unusual characters in request URLs that might indicate obfuscated attacks or encoding issues.QueryRun in PlaygroundOutput
This query flags URIs with non-standard characters, helping you identify suspicious or malformed requests.
_time | uri | codepoints |
---|---|---|
2025-07-27T12:00:00Z | /api/v1/textdata/background/change£ | 163 |
List of related functions
- array_concat: Combines multiple arrays. Useful when merging code point arrays from different strings.
- array_length: Returns the number of elements in an array. Use it to check how many code points a string contains.
- parse_path: Parses a path into components. Use it with
unicode_codepoints_from_string
when decoding or inspecting URL paths. - unicode_codepoints_to_string: TODO