Event branch in DepositionReader

Hi,

I’m trying to read data from a ROOT file created with Geant4 with the DepositionReader module, but it just loops over the TTree without doing anything (I only get the start and finish outputs). I tried to look at the code and understand what’s happening, but I’m a bit confused, particularly about this if statement in line 419f:

// Separate individual events
if(static_cast<unsigned int>(*event_->Get()) > event_num - 1) {
    return false;
}

In the first loop, event_num is 1, but my first entry in the event branch is not (it’s 2250). This means that the statement will always be true. Also due to multi-threading, the event branch is sorted numerically (the lowest entry is 13, but it’s not in the beginning), and not every event has an entry. This doesn’t look like it’s fits to that code, but tbh I’m not sure if I understood it right.

The documentation is a bit sparse on what that entry should be. I’m filling the event column in Geant4’s EndOfEventAction with event->GetEventID(). Can you (@simonspa) maybe provide a very simple example for your module? (Some existing code would also be fine, just something that works.)

Hi @stephanlachnit

If I understand you correctly, the order of your events in the input trees are completely random because they were produced with multithreading, right? For reading them in a defined order this is problematic because we would need to search the whole tree for every event, starting at the beginning. As far as I know no random access is possible.

(In our own multithreading code we therefore explicitly buffer output and write events sequentially in the correct order. This will be released as Allpix Squared 2.0)

The lines you quote read the event number of the current leaf from the root tree. If it is larger than the currently processed event, the function returns false, indicating to the caller to finish up this event and to move to the next (via src/modules/DepositionReader/DepositionReaderModule.cpp · master · Allpix Squared / Allpix Squared · GitLab). This ends the reading loop, dispatches all cached information and ends the run() function.

This function is called again by the module manager with the incremented event_num for the subsequent event.

As for an example, have a look at etc/scripts/create_deposition_file.py · master · Allpix Squared / Allpix Squared · GitLab which is a short Python script which generates an example input for the module.

I hope I could help. The module is relatively “bare” so if you have ideas or suggestions how to improve it, please feel free to reach out to us - every code change is welcome. :slight_smile:
/Simon

Thanks for the quick reply!

Kinda, the order of the events itself is random, but the entries of one event are still sequential. If I don’t miss something, this is always the case in Geant4: each thread has its own Ntuple, and they get merged in EndOfRunAction. For example, the entries might be something like 10,11,11,11,13,15,15,2,3,3,3,4,7,9.

I think a valid approach would be to save the first new eventID when going through the tree, and move to the next event if it changes, maybe something like:

if (curr_event_id == -1) {
    curr_event_id = static_cast<int>(*event_->Get());
} else {
    if(static_cast<int>(*event_->Get()) != curr_event_id) {
        curr_event_id = -1;
        return false;
    }
}

I will test if I get somewhere with this.

Ahh thanks, that’s a very handy script to test some stuff.

I do see the technical reason behind this - it’s a lot simpler because each thread simply keeps piling up things and throws it on the haystack at the end. The problem however - in my opinion - arises when you require strong reproducibility. In my reading this also includes the order in which the (individually seeded) events are stored - because any subsequent software can otherwise either drop strong reproducibility (because input comes in random order, and therefore seeding will change every time) or use the event number itself as the last resort to have stable seeding (which is something we’d rather avoid). That’s the situation we have here I guess. :slight_smile:

Having said that, in many cases the strong reproducibility probably isn’t an issue and your suggested solution should work fine and would be a viable option. How about adding a new option

[DepositionReader]
require_sequential_events = false

to resort to the more lax interpretation of event number you suggested?

All the best,
Simon

Good point, didn’t think of that.

That sounds great! If you want I can try coding it up myself, just let me know.

I’d love that! We accept merge requests via CERN GitLab (preferred, but only viable if you have a CERN computing account) or GitHub. Let me know if you need guidance or have follow-up questions - I’m looking forward to receiving your contribution! :smiley:

Simon

Alright I think there is another problem, I’ve been trying to debug the module and it seems to me that there is a serious bug that prevents it from working at all, at least for me.

In line 243f (allpix-squared/DepositionReaderModule.cpp at 737eba52fe13f56134c4e88a7928d1a5103dd40a · allpix-squared/allpix-squared · GitHub), the detector name will never be found.

I added one lines to simplify debugging:

    auto detectors = geo_manager_->getDetectors();
    auto pos = std::find_if(detectors.begin(), detectors.end(), [volume](const std::shared_ptr<Detector>& d) {
        LOG(TRACE) << d->getName() << " " << volume;  // This line is new
        return d->getName() == volume;
    });
    if(pos == detectors.end()) {
        LOG(TRACE) << "Ignored detector \"" << volume << "\", not found in current simulation";
        continue;
    }

And I get the ouput:

(D) [R:DepositionReader] Start reading event 3173
(T) [R:DepositionReader] detector detector
(T) [R:DepositionReader] Ignored detector "detector", not found in current simulation
(I) [R:DepositionReader] Finished reading event 3173

the strings are obviously identical, but the conditions still fails.

I don’t see any coding mistake though, so I’ll have to dig a bit to find out whats wrong, maybe the string comparison fails for some odd reason.

Ok this is one is interesting. I figured out why the check fails, but I’m not sure what the correct solution is.

With:

auto dname = d->getName();

I compared the length of the arrays:

(gdb) p dname.length()
$5 = 8
(gdb) p volume.length()
$6 = 9

I’m not sure why the length of volume is longer. It includes a 0 at end, so nothing that will be printed. Maybe this has something to with ROOT? In Geant4 I write

const G4String detectorName = "detector";

in the NtupleSColumn, so this must be something internal.

As a temporary hack I use

return !(d->getName().compare(volume.c_str()))

which solves this problem. I’m not sure if this is a proper solution though. What’s your opinion?

Ouch, that’s a good catch… This is a C string null termination (you should have \0, not just 0 at the end) versus C++ std::string. Using the string comparison should be fine, not only a hack.

Ok good, then I will include it in the PR on Github. During my first testing everything seems to work, but I want to give it a last test just to be sure.