On this page

Overview

VEX includes a string datatype. This is useful in several places:

  • Manipulating text

  • Referencing filenames and op node names

  • Manipulating binary data

String Literals

String literals can be enclosed in either single quotes (') or double quotes ("). Strings may also be specified using the Python or C++ raw-string format.

string s = 'foo';
string t = "bar";
string py = r"Hello world\n";        // Python style, equivalent to "Hello world\\n"
string cpp = R"(Hello world\n)";   // C++ style, equivalent to "Hello world\\n"

Escaped strings (non-raw strings) automatically convert known escape sequences to their representative byte sequences. For example "\n" will convert to the ASCII byte to emit a newline.

Raw strings ignore escape sequences. For a raw string, the "\n" will be interpreted literally as a backslash and the lower case n.

The syntax for strings can be summarized

  • Escaped strings "text" or 'text'

  • Python raw-strings r"raw text"

  • C++ raw-string R"delimiter(raw text)delimiter"

    Where the delimiter is an optional string of 0 to 16 characters. Unlike Python raw-strings, C++ style raw strings can contain multi-line text and even binary data.

string escaped = 'Line 1\nLine 2';
string raw = r"Line 1\nLine 1 continues";        // "Line 1\\nLine 1 continues"
string cppraw = R"(Line 1\nLine 1 continues)";        // "Line 1\\nLine 1 continues"
string cppmultiline = R"multi(This is a long
    string which has multiple lines.  The string
    also contains an embedded raw string R"(raw string)"
    But since the delimiter doesn't match, the string isn't
    actually ended until here.)multi";

Declaring string types

To declare a string variable, the general form is string var_name:

// My string is a normal string
string   mystring;

To declare a function that returns a string:

// A function which returns a string
string rgb_name()
{
...
};    

To specify a literal array, use double quotes or single quotes.

string a_string = "hello world!";
string another_string = 'good-bye!'

Accessing and setting string values

Use string[index] to look up a character by its position in the array.

The index is a byte offset into the string, not a character offset. This is an important distinction when you deal with Unicode strings. VEX assumes a UTF-8 encoding for all strings. If the given offset isn’t a valid UTF-8 character, an empty string is returned. Otherwise, the full UTF-8 character is returned - this may be a string of length greater than one!

String bounds are checked at run time. Reading out of bounds will result in an empty string. This may generate a warning or optional run-time error in the future.

Python-style indexing is used. This means negative indices refer to positions from the end of the array.

The slice notation can be used with the square brackets to extract ranges of a string. This will operate on byte sequences, side stepping the UTF-8 requirements of the normal square bracket operation. Thus if you want the third byte, regardless of whether it is a valid UTF-8 or not, use:

string a_string = "hello world!";
string thirdbyte = a_string[2:3];

You cannot assign values to an array using square brackets.

(The getcomp function is the equivalent for using the square brackets notation.)

Looping over a string

See foreach.

Note you will get empty strings for the offsets which do not correspond to valid unicode characters.

Working with strings

The following functions let you query and manipulate arrays.

len

Returns the length of a string.

append

Adds another array to the end of this one.

ord

Converts a UTF-8 string to a codepoint.

chr

Converts a codepoint to a UTF-8 string.

VEX

Language

Next steps

Reference