Multithread not working: analyze the output root file with multi-thread

Dear experts,

I would like to process the root file generated from allpix using multithreading. But it turns out that I couldn’t make it properly. Could you help me take a look at my code?

I have two scripts (but it told me new users are not allowed to upload files). One is to create a regular root file with a regular TTree file (You can find the script here by clicking to expand

Click to expand
#include <TROOT.h>
#include <TFile.h>
#include <TTree.h>
#include <TH2D.h>
#include <TH1D.h>
#include <TStopwatch.h>
#include <iostream>
#include <thread>

// Get thread count (cores - 1)
int get_optimal_threads() {
    unsigned int cores = std::thread::hardware_concurrency();
    return (cores > 1) ? cores - 1 : 1;
}

// Create test data
void create_test_data() {
    TFile testFile("test_data.root", "RECREATE");
    TTree testTree("testTree", "Test Tree");
    std::vector<double> x, y, z;
    testTree.Branch("x", &x);
    testTree.Branch("y", &y);
    testTree.Branch("z", &z);

    for (int i = 0; i < 2000000; i++) {  // 20,000 events
        int nPoints = 100;  // Fixed points per event
        x.resize(nPoints);
        y.resize(nPoints);
        z.resize(nPoints);

        for (int j = 0; j < nPoints; j++) {
            x[j] = gRandom->Gaus(10, 2);
            y[j] = gRandom->Gaus(10, 2);
            z[j] = gRandom->Uniform(0, 100);
        }
        testTree.Fill();
    }
    testFile.Write();
    testFile.Close();
}

// Multi-threaded processing with core-1 threads
void multithread_processing() {
    TStopwatch timer;
    timer.Start();

    int threads = get_optimal_threads();
    std::cout << "Using " << threads << " threads (cores-1)\n";

    ROOT::EnableImplicitMT(threads);
    TFile inputFile("test_data.root");
    TTree* tree = (TTree*)inputFile.Get("testTree");

    ROOT::TThreadedObject<TH2D> hitmap("mt_hitmap", "MT Hitmap;X;Y", 40, 0, 20, 40, 0, 20);
    ROOT::TThreadedObject<TH1D> zhist("mt_zhist", "MT Z Distribution;Z;Counts", 100, 0, 100);

    ROOT::TTreeProcessorMT processor(*tree);
    processor.Process([&](TTreeReader& reader) {
        TTreeReaderArray<double> xReader(reader, "x");
        TTreeReaderArray<double> yReader(reader, "y");
        TTreeReaderArray<double> zReader(reader, "z");

        auto localHitmap = hitmap.Get();
        auto localZhist = zhist.Get();

        while (reader.Next()) {
            for (size_t i = 0; i < xReader.GetSize(); i++) {
                localHitmap->Fill(xReader[i], yReader[i]);
                localZhist->Fill(zReader[i]);

                // Alternative: Actual computation (e.g., uncomment below)
                double sum = 0.0;
                for (int i = 0; i < 100; ++i) {
                    sum += std::sqrt(i) * std::sin(i);
                }
                volatile double sink = sum; // Prevent optimization

            }
        }
    });

    TFile output("mt_results.root", "RECREATE");
    hitmap.Merge()->Write();
    zhist.Merge()->Write();
    output.Close();
    inputFile.Close();

    timer.Stop();
    std::cout << "Multi-threaded processing time (" << threads << " threads): " 
              << timer.RealTime() << " seconds (real), "
              << timer.CpuTime() << " seconds (CPU)\n";
}

// Single-threaded processing
void singlethread_processing() {
    TStopwatch timer;
    timer.Start();

    TFile inputFile("test_data.root");
    TTree* tree = (TTree*)inputFile.Get("testTree");

    TH2D hitmap("st_hitmap", "ST Hitmap;X;Y", 40, 0, 20, 40, 0, 20);
    TH1D zhist("st_zhist", "ST Z Distribution;Z;Counts", 100, 0, 100);

    TTreeReader reader(tree);
    TTreeReaderArray<double> xReader(reader, "x");
    TTreeReaderArray<double> yReader(reader, "y");
    TTreeReaderArray<double> zReader(reader, "z");

    while (reader.Next()) {
        for (size_t i = 0; i < xReader.GetSize(); i++) {
            hitmap.Fill(xReader[i], yReader[i]);
            zhist.Fill(zReader[i]);
        }
    }

    TFile output("st_results.root", "RECREATE");
    hitmap.Write();
    zhist.Write();
    output.Close();
    inputFile.Close();

    timer.Stop();
    std::cout << "Single-threaded processing time: " 
              << timer.RealTime() << " seconds (real), "
              << timer.CpuTime() << " seconds (CPU)\n";
}

// Comparison function
void compare_multithread_and_singlethread_notAllpixInputroot() {
    // Create test data first
    std::cout << "Creating test data...\n";
    create_test_data();
    unsigned int cores = std::thread::hardware_concurrency();
    std::cout << "Created test data with 20,000 events (2M points)\n";
    std::cout << "System has " << cores << " CPU cores\n";

    // Run both versions
    std::cout << "\nTesting multi-threaded processing...\n";
    multithread_processing();

    std::cout << "\nTesting single-threaded processing...\n";
    singlethread_processing();
}

// To run: compare_threading()

// Output
// Testing multi-threaded processing...
// Using 15 threads (cores-1)
// Multi-threaded processing time (15 threads): 15.16 seconds (real), 70.68 seconds (CPU)

// Testing single-threaded processing...
// Single-threaded processing time: 85.9569 seconds (real), 62.18 seconds (CPU)

, but this is not important, just to create the input if you need).

Another file is using multithread to process the allpix input OR the regular TTree file I created.

#include <TROOT.h>
#include <TSystem.h>
#include <TFile.h>
#include <TTree.h>
#include <vector>
#include <iostream>
#include <TH2D.h>
#include <TH1D.h>
#include <ROOT/TThreadedObject.hxx>
#include <TStopwatch.h>

using namespace ROOT;

int get_optimal_threads() {
    unsigned int cores = std::thread::hardware_concurrency();
    return (cores > 1) ? cores - 1 : 1;
}

void multithread_withAllpixInput_notworking() {
    // Enable batch mode
    gROOT->SetBatch(kTRUE);
    
    std::string dut = "detector1";
    const char* input_path = "data.root";
    const char* output_path = "simplified_analysis.root";
    
    TStopwatch timer;
    timer.Start();
    
    // Create output file
    TFile* output_file = TFile::Open(output_path, "RECREATE");
    if (!output_file || output_file->IsZombie()) {
        std::cerr << "Error creating output file!" << std::endl;
        return;
    }
    
    // This is NOT working for multithread
    TFile input_file(input_path);
    TTree* pixel_hit_tree = (TTree*)input_file.Get("PixelHit");

    // This is working for multithread
    // TFile inputFile("test_data.root");
    // TTree* pixel_hit_tree = (TTree*)inputFile.Get("testTree");

    // Setup multi-threading
    int threads = get_optimal_threads();
    ROOT::EnableImplicitMT(threads);
    std::cout << "Processing with " << threads << " threads\n";
    
    // Create thread-safe histograms
    TThreadedObject<TH2D> threadedHitmap(
        "hitmap", "Hitmap;x [mm];y [mm];Counts", 200, 0, 20, 200, 0, 20
    );
    TThreadedObject<TH1D> threadedSpectrum(
        "spectrum", "PixelHit Signal;Signal;Counts", 200, 0, 100000
    );

    // Process tree
    TTreeProcessorMT processor(*pixel_hit_tree);
    processor.Process([&](TTreeReader &reader) {
        while (reader.Next()) {
               // dummy test
                double sum = 0.0;
                for (int i = 0; i < 100000; ++i) {
                    sum += std::sqrt(i) * std::sin(i);
                }
                volatile double sink = sum; // Prevent optimization
        }
    });
    
    // Merge and save results
    output_file->cd();
    
    // Cleanup
    output_file->Close();
    input_file.Close();
    
    timer.Stop();
    std::cout << "\nSimplified analysis completed!" << std::endl;
    std::cout << "Output saved to: " << output_path << std::endl;
    std::cout << "Processing time: "
              << timer.RealTime() << "s real, "
              << timer.CpuTime() << "s CPU" << std::endl;
}

When I run the Allpix input root the CPU is always below 100%, but if I use the regular root file (you can uncomment the lines (//TFile inputFile(“test_data.root”); // TTree* pixel_hit_tree = (TTree*)inputFile.Get(“testTree”):wink: in the code above), my CPU usage can be 300%. Could you help me with that? That would be very appreciated!!

Best,
Yingjie

Hi @weiy

I’m not 100% sure what the reason might be, and maybe it’s worthwhile to ask in the ROOT forum.

However, one reason might be our use of TRef objects to cross-reference simulation objects. These references are stored by ROOT internally using a global table with a lock - which might be the locking issue you are experiencing. For us that’s the reason we are using the PointerWrapper class which allows us to resolve them once and then continue with raw pointers in a multithreaded context.

Try calling loadHistory() on the objects after reading them, as we do in the ROOTObjectReader module.

Best,
Simon