Slices & Strings in Odin
This is a look into how strings compare to slices in the Odin programming language. I write this, as it was one of the stumbling blocks to my first steps of learning Odin. It seems simple to me now, but maybe this will help a newcomer.
Slices
A slice is a mutable data type that holds a length and a pointer. It can be thought of as a window or a view into a span of data that you can change. It may or may not own the data it points to, and unlike a dynamic array, it is not resizeable. Like an array, the length is always a function of how many units of data that it points to, so if we have a slice of bytes, the length will represent the number of bytes in the slice.
buf := make([]u8, 4) // Allocate a slice of 4 bytes.
l := len(buf) // 4
sz := len(buf) * size_of(u8) // 4
If we have a larger data type, such as a series of rune
s, then the length
will reflect the number of runes and not the number of bytes. A rune
in Odin
is represented by a signed 32-bit integer.
buf := make([]rune, 4) // Allocate a slice of 4 runes.
l := len(buf) // 4
sz := len(buf) * size_of(rune) // 16
buf
in the above code block has a length of 4, but it takes up 16 bytes of memory.
Strings
Strings in Odin are fundamentally slices, yet an important difference is that they are immutable on the surface; their index slots cannot be assigned to with different values. Normally, changing anything about a string requires creating a new string, unless you have access to the underlying buffer.
Where len
can be confusing is that an Odin string
can be thought of as a
slice of bytes that has special iteration properties when using the for x in y
construct.
str := "Hellope!"
l := len(str) // 8
uni := "こにちは"
luni := len(uni) // 12
This shows us that Odin’s len
reports the byte count of the string, as
opposed to the number of codepoints.
However, if we try to iterate the string, we’ll receive the UTF-8 decoded
Unicode codepoints as rune
s with each stage of iteration, along with the byte
index:
for r, i in uni {
fmt.println(r, i)
}
// こ 0
// に 3
// ち 6
// は 9
This won’t be the case if you iterate by index over a string.
for i := 0; i < len(uni); i += 1 {
fmt.println(uni[i])
}
// 227
// 129
// 147
// 227
// 129
// 171
// 227
// 129
// 161
// 227
// 129
// 175
fmt.println(typeid_of(type_of(uni[0])))
// u8
As you can see, the string
data type in Odin is a slice of u8
s or unsigned bytes.
It’s important to keep this in mind when working with strings, as this is the only case in Odin where iteration over a slice-like data type behaves differently from the usual slices or arrays.