Coroutine-based data aggregation in C++

25 Sep 2023

In modern C++, coroutines have become an increasingly popular feature that allows developers to write asynchronous code in a more intuitive and readable way. One area where coroutines can be particularly beneficial is data aggregation, where we need to collect and aggregate data from multiple sources concurrently. In this blog post, we will explore how coroutines can be used for efficient and scalable data aggregation in C++.

Understanding Coroutines

Coroutines in C++ are implemented using the co_await keyword and the std::coroutine library. They allow for suspension and resumption of execution, making it easier to handle complex asynchronous tasks. By using coroutines, we can write code in a sequential manner, even though the underlying tasks might execute concurrently.

Scenario: Aggregating Data from Multiple Sources

Let’s consider a scenario where we need to retrieve data from multiple sources, such as APIs or databases, and aggregate it into a single result. Traditionally, this could involve writing callback-based code or manually handling threads and synchronization. However, with coroutines, we can simplify this process significantly.

Implementing Coroutine-based Data Aggregation

To start, we define a coroutine function that retrieves data from a single source:

#include <iostream>
#include <experimental/coroutine>
#include <vector>

std::vector<int> fetchDataFromSource(int sourceId) {
    // Simulate fetching data from source
    std::vector<int> data;
    for (int i = 0; i < 10; ++i) {
        data.push_back(sourceId * 10 + i);
    }
    return data;
}

Next, we implement a coroutine-based data aggregator that uses the co_await keyword to suspend execution until the data from each source is retrieved:

struct Aggregator {
    struct promise_type {
        Aggregator get_return_object() noexcept { return {}; }
        std::experimental::suspend_never initial_suspend() noexcept { return {}; }
        std::experimental::suspend_never final_suspend() noexcept { return {}; }
        void unhandled_exception() noexcept {}
        void return_void() noexcept {}

        std::vector<int> result;
    };

    Aggregator() {}

    std::experimental::suspend_always yield_value(std::vector<int> data) {
        co_await std::experimental::suspend_always{};
        co_return;
    }

    void resume() {
        std::experimental::coroutine_handle<promise_type>::from_promise(handle_).resume();
    }

    std::experimental::coroutine_handle<promise_type> handle_;
};

Finally, we create a data aggregation function that uses the coroutine-based aggregator to retrieve data from multiple sources concurrently:

Aggregator dataAggregator() {
    Aggregator::promise_type::result_type result;

    Aggregator aggregator;
    aggregator.handle_ = std::experimental::coroutine_handle<Aggregator::promise_type>::from_promise(aggregator.promise());

    for (int i = 0; i < 5; ++i) {
        std::vector<int> data = fetchDataFromSource(i);
        co_await aggregator.yield_value(std::move(data));
    }

    aggregator.resume();
    result = aggregator.handle_.promise().result;

    co_return result;
}

Conclusion

By utilizing coroutines, we can simplify data aggregation in C++ and make our code more readable and maintainable. Coroutine-based data aggregation enables us to retrieve data from multiple sources concurrently, improving performance and scalability.

With the power of coroutines, developers can take advantage of modern C++ features to build efficient and scalable data aggregation solutions.