Bad_alloc error

psmolyan · September 15, 2021, 7:51am

Dear colleagues,

I simulate the response of the Si-Timepix3 detector to 219 MeV protons, but sometimes I get the following error:

|22:02:52.756|   (FATAL) Fatal internal error
                         std::bad_alloc
                         Cannot continue.

I do not exactly where to see a more detailed log to find the problem. Probably, you can advise what can be the reason for this.

My config is below:

[Allpix]
log_level = "INFO" # INFO or WARNING or DEBUG see chapter 4.6 of manual
log_format = "DEFAULT" # DEFAULT or SHORT
detectors_file = "Si_geometry.conf"
model_path = "models"
output_directory = "/mnt/35251/data/Allpix2_simulation/protons/protons_219MeV_60deg/data"
root_file = "modules_Si500um_protons_219MeV_60deg_200_generated_events_0.root"
number_of_events = 200

[GeometryBuilderGeant4]

[DepositionGeant4]
physics_list = FTFP_BERT_LIV
particle_type = "ion/1/1/0/0eV"
source_type = "beam"
source_position = 0um 0um -5mm
source_energy = 219MeV
beam_size = 2mm
beam_direction = 0 0 1
beam_divergence = 0.5mrad 0.5mrad
number_of_particles = 1
max_step_length = 5um
range_cut = 200um

[ElectricFieldReader]
model = "linear"
bias_voltage = 200V
depletion_voltage = 80V
output_plots = true

[WeightingPotentialReader]
name = "detector1"
model = "pad"
output_plots = true

[TransientPropagation]
name = "detector1"
temperature = 315K
charge_per_step = 200
timestep = 1ns
integration_time = 40ns # 40ns for 500um
induction_matrix = 5 5 # 7x7 for 500um

[PulseTransfer]

[ROOTObjectWriter]
exclude = PropagatedCharge
file_name = "data_Si500um_protons_219MeV_60deg_200_generated_events_0.root"

Thank you in advance!
Best regards,
Petr Smolyanskiy

simonspa · September 15, 2021, 10:29am

Hi @psmolyan

this message means the program could not allocate memory and was killed by your system kernel. Are you particularly low on RAM when running these simulations? Because in your config I cannot spot anything that would require an unusually large amount of RAM.

One note though, your particle_type should probably be

particle_type = "proton"

instead, even though of course from a physics perspective the two should be equivalent.

If you manage to reproduce this, we could use the debugger gdb to find out where it comes from:

gdb --args allpix -c your_config.conf
$ catch throw bad_alloc
$ run
# ... runs, crashes and catches the exception
$ backtrace
# shows the back trace where the exceptions comes from

Please then post that output here.

All the best,
Simon

psmolyan · September 15, 2021, 1:40pm

Hello Simon,

Thank you very much for the suggestion!

Finally, I managed to get the same error.

|15:33:20.512|   (FATAL) Fatal internal error
                         std::bad_alloc
                         Cannot continue.
[Inferior 1 (process 1569364) exited with code 0177]
Missing separate debuginfos, use: debuginfo-install bzip2-libs-1.0.6-13.el7.x86_64 elfutils-libelf-0.176-2.el7.x86_64 elfutils-libs-0.176-2.el7.x86_64 expat-2.1.0-12.el7.x86_64 fontconfig-2.13.0-4.3.el7.x86_64 freetype-2.8-14.el7_9.1.x86_64 glib2-2.56.1-9.el7_9.x86_64 glibc-2.17-322.el7_9.x86_64 graphite2-1.3.10-1.el7_3.x86_64 harfbuzz-1.7.5-2.el7.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-37.el7_6.x86_64 libICE-1.0.9-9.el7.x86_64 libSM-1.2.2-2.el7.x86_64 libX11-1.6.7-4.el7_9.x86_64 libXau-1.0.8-2.1.el7.x86_64 libXext-1.3.3-3.el7.x86_64 libXft-2.3.2-2.el7.x86_64 libXmu-1.1.2-2.el7.x86_64 libXp-1.0.2-2.1.el7.x86_64 libXrender-0.9.10-1.el7.x86_64 libXt-1.1.5-3.el7.x86_64 libattr-2.4.46-12.el7.x86_64 libcap-2.22-8.el7.x86_64 libcom_err-1.42.9-19.el7.x86_64 libgcrypt-1.5.3-13.el7_3.1.x86_64 libglvnd-1.0.1-0.8.git5baa1e5.el7.x86_64 libglvnd-glx-1.0.1-0.8.git5baa1e5.el7.x86_64 libgpg-error-1.12-3.el7.x86_64 libicu-50.2-4.el7_7.x86_64 libjpeg-turbo-1.2.90-8.el7.x86_64 libpng-1.5.13-8.el7.x86_64 libselinux-2.5-14.1.el7.x86_64 libuuid-2.23.2-33.el7_3.2.x86_64 libxcb-1.13-1.el7.x86_64 lz4-1.7.5-2.el7.x86_64 motif-2.3.4-8.1.el7_3.x86_64 ncurses-libs-5.9-13.20130511.el7.x86_64 openssl-libs-1.0.2k-21.el7_9.x86_64 pcre-8.32-15.el7_2.1.x86_64 pcre2-utf16-10.23-2.el7.x86_64 qt5-qtbase-5.9.7-5.el7_9.x86_64 qt5-qtbase-gui-5.9.7-5.el7_9.x86_64 systemd-libs-219-78.el7.x86_64 xerces-c-3.1.1-10.el7_7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) backtrace
No stack.

Concerning RAM… About 60 Gb was free at the moment of the simulation.
I guess particle_type is not a problem, because I’ve got the same error for “ion/2/4/0/0eV”

Regards,
Petr

simonspa · September 15, 2021, 5:10pm

Hi @psmolyan

wow okay, with 60GB of RAM available this is a very odd behavior. May I ask which Allpix Squared, ROOT and Geant4 version you are using? If you also provide the geometry file (and models if necessary) I will see if I can reproduce this!

The debugging unfortunately didn’t work out and I don’t exactly understand why

Best,
Simon

psmolyan · September 16, 2021, 7:26am

Hello Simon,

I’m using root-6.22.02, geant-4.10.07p1 and allpix-squared-1.6.0. But, I as far as I remember, Allpix2 was recompiled several times with different Root and Geant4 versions, the problem was the same.
Geometry and models files are rather standard, I attached them.

timepix.conf (359 Bytes)
Si_geometry.conf (94 Bytes)

Best regards,
Petr

simonspa · September 20, 2021, 6:56am

Hi @psmolyan

with the help of your configuration files I was able to reproduce the issue and to obtain the stack trace shown below. Reading it carefully reveals that the program throws a bad_alloc exception when it is expected to allocate a vector with 3x10^11 entries - quite understandable that also your 60G of memory don’t provide that

It comes from a very late entry in a pulse, and I am still puzzled / investigation where this originates from. The time corresponds to something like 400s after the initial interaction in the corresponding event - and your integration_time is properly set to 40ns in the configuration, so we should not even see this.

I will continue to investigate and let you know once I have a better lead - or even: a fix for you.

In the mean time it might be worth considering moving to Allpix Squared 2.0, only minor changes to your configuration files should be necessary - if any at all.

All the best,
Simon

Backtrace:

Catchpoint 1 (exception thrown), 0x00007ffff0da57d2 in __cxa_throw ()
   from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) thread apply all bt full

Thread 1 (Thread 0x7fffed041180 (LWP 1249413) "allpix"):
#0  0x00007ffff0da57d2 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
No symbol table info available.
#1  0x00007ffff0d99641 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
No symbol table info available.
#2  0x00007ffff7f92c14 in __gnu_cxx::new_allocator<double>::allocate (__n=<optimized out>, this=<optimized out>) at /usr/include/c++/10/ext/new_allocator.h:115
        __al = <optimized out>
#3  std::allocator_traits<std::allocator<double> >::allocate (__a=..., __n=<optimized out>) at /usr/include/c++/10/bits/alloc_traits.h:460
No locals.
#4  std::_Vector_base<double, std::allocator<double> >::_M_allocate (__n=<optimized out>, this=<optimized out>) at /usr/include/c++/10/bits/stl_vector.h:346
No locals.
#5  std::vector<double, std::allocator<double> >::_M_default_append (this=0x7fffffffc248, __n=399172664994) at /usr/include/c++/10/bits/vector.tcc:635
        __len = <optimized out>
        __new_start = <optimized out>
        __size = 0
        __navail = <optimized out>
#6  0x00007ffff230d722 in std::vector<double, std::allocator<double> >::resize (__new_size=399172664994, this=0x7fffffffc248) at /usr/include/c++/10/bits/stl_vector.h:940
No locals.
#7  allpix::Pulse::addCharge (this=this@entry=0x7fffffffc240, charge=<optimized out>, time=time@entry=3991726649.9340253) at /home/simonspa/software/allpix-squared/src/objects/Pulse.cpp:26
        bin = 399172664993
#8  0x00007ffff7d50907 in allpix::PulseTransferModule::run (this=0x55555826a1d0, event_num=741) at /home/simonspa/software/allpix-squared/src/modules/PulseTransfer/PulseTransferModule.cpp:139
        model = std::shared_ptr<allpix::DetectorModel> (use count 4, weak count 0) = {get() = 0x5555582695d0}   
        xpixel = 141
        pixel_index = {fCoordinates = {fX = 141, fY = 137}}
        pulse = {_vptr.Pulse = 0x7ffff2332710 <vtable for allpix::Pulse+16>, static fgIsA = {_M_b = {_M_p = 0x55555f898db0}, static is_always_lock_free = <optimized out>}, pulse_ = std::vector of length 0, capacity 0, bin_ = 0.01, initialized_ = true}
        px = std::vector of length 0, capacity -577447673940515225
        position = {fCoordinates = {fX = 7.7417448739270434, fY = 7.5115881659079635, fZ = 0.14959240173177868}}
        ypixel = <optimized out>
        pulses = std::map with 0 elements
[...]

simonspa · September 20, 2021, 7:42am

Hi @psmolyan

this issue was actually fixed in v1.6.1, so the patch version directly following your currently used version (see: Generic/Tr). If your hesitant to move to v2.0.1 yes, I would strongly recommend to at least update to v1.6.2 which contains several bug fixes over v1.6.0 and is 100% backwards-compatible with it.

Please let me know if this fixes the problem for you. If there are still problems appearing with the updated version, don’t hesitate to come back to us!

Cheers,
Simon

psmolyan · September 20, 2021, 8:57am

Hello Simon,

Thank you very much for your help! I will arrange the movement to v.2.0.1 at the server.

Best regards,
Petr

Bail · October 10, 2022, 3:33am

Hi@psmolyan
I had the same problem, did you finally solve it?

Best regards.
Bai

simonspa · October 17, 2022, 12:39pm

Hi @Bail

see my post above, this has been fixed in v1.6.1, so a long time ago. Please use a recent release version. If it still occurs with a newer version it might be a different issue - then please provide us some details (allpix --version and the backtrace) so we can go hunting!

Simon