Implementing parallel matrix operations with `std::jthread`

01 Oct 2023

In modern computing, leveraging parallelism is crucial to achieve better performance, especially when performing computationally intensive tasks like matrix operations. The C++ Standard Library provides the std::jthread class, introduced in C++20, which can be used to implement parallel computations easily. In this blog post, we will explore how to leverage std::jthread to perform parallel matrix operations efficiently.

What are Matrix Operations?

Matrix operations involve performing various mathematical operations on matrices, such as addition, subtraction, multiplication, and more. These operations can be computationally expensive, especially when dealing with large matrices.

The Problem

Let’s focus on the matrix multiplication operation. In a traditional sequential implementation, we iterate over each element of the resulting matrix, computing the dot product of the corresponding row in the first matrix and the corresponding column in the second matrix. This approach doesn’t efficiently utilize all available CPU cores, leading to slower performance.

Parallel Matrix Multiplication

To implement parallel matrix multiplication using std::jthread, we can divide the work into smaller sub-tasks and assign them to different threads. Here’s an example code snippet demonstrating the basic idea:

#include <iostream>
#include <vector>
#include <thread>
#include <numeric>

constexpr size_t NUM_THREADS = std::thread::hardware_concurrency();

void multiplyMatrices(const std::vector<std::vector<int>>& matrix1,
                      const std::vector<std::vector<int>>& matrix2,
                      std::vector<std::vector<int>>& result) {
    const size_t rows = matrix1.size();
    const size_t cols = matrix2[0].size();
    const size_t commonDim = matrix1[0].size();

    std::vector<std::jthread> threads(NUM_THREADS);

    for (size_t i = 0; i < NUM_THREADS; ++i) {
        threads[i] = std::jthread([&, i]() {
            for (size_t row = i; row < rows; row += NUM_THREADS) {
                for (size_t col = 0; col < cols; ++col) {
                    result[row][col] = std::inner_product(matrix1[row].begin(), matrix1[row].end(),
                                                          matrix2[col].begin(), 0);
                }
            }
        });
    }
}

In this example, we divide the computation of matrix multiplication into multiple threads using std::jthread. Each thread is responsible for computing a subset of rows of the resulting matrix. The number of threads created is determined by std::thread::hardware_concurrency(), which returns the number of available CPU cores.

Conclusion

Using std::jthread, we can easily implement parallel computations for matrix operations in C++. By dividing the workload among multiple threads, we can effectively utilize the available CPU cores and achieve better performance. Parallel matrix operations can be extended to other operations like matrix addition, subtraction, and more.

Implementing parallelism in matrix operations is just one example of how std::jthread can be used to leverage parallelism in C++. It’s important to understand the nature of the problem and experiment with different thread allocations to achieve the best performance.

#techblog #parallelcomputing