C++ persistent containers - array

Posted November 02, 2018         « Previous post     Next post »

Introduction

Until now, our C++ bindings were missing one important component - persistent containers. In 1.5 release we have introduced the first one - pmem::obj::array. This container is currently placed in experimental namespace and folder (this means that both API and layout can change). It has almost the same functionality as std::array from C++11 but takes care of adding elements to a transaction. Once experimental status will be dropped, it will also guarantee a stable in-memory layout (it will be the same for all compilers). API for pmem::obj::array and std::array is the same, except for the following:

If you want to store a sequence of objects, whose length is known at compile time, you should use pmem::obj::array. In contrast to plain arrays, pmem::obj::array automatically adds modified elements to the enclosing transaction.

Usage

Let’s start with a simple example:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#include <libpmemobj++/experimental/array.hpp>

struct data {
	data() {
		array = {6, 5, 4, 3, 2, 1};
	}

	pmem::obj::experimental::array<int, 6> array;
}

pmem::obj::transaction::run(pop, [&] {
	ptr = pmem::obj::make_persistent<data>();

	for (auto &e : ptr->array)
		e++;

	for (auto it = ptr->array.begin(); it != ptr->array.end(); it++)
		*it += 2;

	for (int i = 0; i < ptr->array.size(); i++)
		ptr->array[i]--;
});

As seen above, pmem::obj::array can be used just like an ordinary std::array. For iterating over it you can use indexing operator, range-based for loops or iterators. Array can also be processed using std::algorithms:

1
2
3
pmem::obj::transaction::run(pop, [&] {
	std::sort(ptr->array.begin(), ptr->array.end());
}

If there is an active transaction, elements (accessed using any of the listed methods) are snapshotted. In case of iterators returned by begin() and end() snapshotting happens during iterator dereferencing. Of course, snapshotting is done only for mutable elements. In case of const iterators or const versions of indexing operator, nothing is added to a transaction. That’s why it is extremely important to use const functions (cbegin(), cend(), etc.) whenever possible. It will reduce number of snapshots and can significantly reduce the performance impact of transactions.

pmem::obj::slice

In cases where loop is known to modify several consecutive elements in the array, a bulk-snapshot optimization can be performed using a special range() function which returns an instance of pmem::obj::slice struct. This structure provides interface to access sequence of objects - it implements indexing operators as well as begin() and end() methods (plus const and reverse variants).

Here’s sample usage:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
pmem::obj::transaction::run(pop, [&] {
	auto slice = ptr->array.range(0, ptr->array.size(), 2);

	for (auto it = slice.begin(); it != slice.end(); it++)
		*it++;

	for (auto &e : slice)
		e++;

	std::sort(slice.begin(), slice.end());

	for (int i = 0; i < slice.size(); i++)
		slice[i]--;
}

This examples shows that pmem::obj::slice can be iterated the same way as pmem::obj::array. The difference is that elements are not snapshotted one by one, instead they are added to a transaction in bulk. Let’s analyze what is happening in case of first for loop in the above example. First, notice that the third argument in range() function is equal to 2. This means that elements will be snapshotted in pairs. At the beginning of the loop, first two elements in the array will be already added to a transaction (this is done in range() method), so that it will not have to be done in the first and the second iteration. In the third iteration, elements at indexes 2 and 3 will be snapshotted, and so on. Assuming size of array equal to 6, number of snapshots will be thus equal to 3. This mechanism is also described here.

If all elements (or most of them) are expected to be modified, range() can be called like this:

1
auto slice = ptr->array.range(0, ptr->array.size());

This will add the entire array to a transaction once.

There is no universal rule, when to use range(). Performance gain will depend on snapshot size, element type and type of workload. Usage of this method should be carefully thought out or benchmarked.

pmem::obj::array and pmem::obj::persistent_ptr

Above examples used pmem::obj::array as a struct member but it is also possible to have direct pmem::obj::persistent_ptr to it. There is, however, one thing users should be aware of while using this approach. Consider the following code:

1
2
3
4
5
6
7
8
9
using array_type = pmem::obj::experimental::array<int, 5>;

pmem::obj::transaction::run(pop, [&] {
		// not possible before C++17
		ptr = pmem::obj::make_persistent<array_type>(1, 2, 3, 4, 5);

		// always works
		ptr2 = pmem::obj::make_persistent<array_type>();
	});

As stated in the comment, initializing pmem::obj::array in pmem::obj::make_persistent with list of values is only possible since C++17. This is because pmem::obj::array, just like std::array, is an aggregate type and needs special initialization syntax (brace initialization must be used). The problem is that, in order to support aggregate initialization, we must check whether a type is an aggregate in pmem::obj::make_persistent and this check is only available since C++17.

Summary

To summarize, if you need to store fixed-length array in persistent memory you should always use pmem::obj::array. This is currently the only persistent container in our library. However, we are working on pmem::obj::vector and pmem::obj::string, so you can expect our containers collection to grow in the near future.



Posted by @igchor         « Previous post     Next post »