Many crashes on later builds of Linux

Server version: FXServer-master v1.0.0.3309 linux
Dump files: crashes.zip (3.0 MB)

Logs of some crashes:
This one occured around 4x in one day

* Assertion at mini-exceptions.c:2889, condition `gaddr == tls->stack_ovf_guard_base' not met

network thread hitch warning: timer interval of 224 milliseconds
sync thread hitch warning: timer interval of 223 milliseconds



=================================================================
e[31mFXServer crashed.e[0m
A dump can be found at /opt/cfx-server/crashes/b818c488-02d7-47e8-29ed9da1-c30948f8.dmp.
Crash report ID: 21cf4799-dd5c-4faa-9c6d-0f5373baaffa


=================================================================

This one also occured around 4x per day

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
sync thread hitch warning: timer interval of 216 milliseconds
server thread hitch warning: timer interval of 223 milliseconds


=================================================================
e[31mFXServer crashed.e[0m
A dump can be found at /opt/cfx-server/crashes/0423939c-09be-47f2-78dfa696-9c28d5be.dmp.
Crash report ID: 188d0657-c064-4c0d-8569-0a463522aa62

=================================================================

It’s two issues which seem completely unrelated, I have absolutely no idea what could cause them as they seem completely random.

The only clear cause I have is that it requires a lot of players, the crashes never happen on any of my testing and development servers (neither Windows nor Linux) only on the production server which has around 300 players online.

Development server runs completely fine, crashes appear to start happening above 200 players.

I’ve been running the same build for around 4 months now as the updates are undoable at this rate of crashing. Since I finally fixed mini dumps (seems apparmor and kernel hardening was blocking this) I can finally share some info about the crashes.

The later builds (so around last month) do appear to be more stable than the other builds, so there’s less crashing then there used to be.

Then please use them and report if crashes still happen. Latest build is 3322

Some C# script doing bullshit.

And running out of memory?

How does this even make any sense? :confused:

Anyway, Linux dump files are nearly entirely useless, and since you’re claiming this isn’t reproducible it might be better to switch to Windows or so since we probably won’t be able to fix this without magically having ‘300 players’ on a Linux machine on-demand for apparently ‘an entire day’ (as 4x/day sounds like you need 300+ players for 6 hours?).

Also mind you ‘sticking to old builds’ is not a solution at all especially since builds will expire past some point in the near future.

Yes that’s also the thing I do realise that debugging this is kinda hard.

The server is not out of memory, it’s using around 10 GB out of 64 GB, the memory is ECC memory so errors in memory also wouldn’t make sense.

Would Linux core dumps be of any help at all? Since I also have those for every single crash.

The 3269 build is the build I switched to as of now, that’s the latest build which is stable (does use a lot more memory, around 22 GB, but still manageable)
Switching to this build netted a reduction of about 20% for CPU and 50% for network so worth the extra ram usage.

That is curious, I don’t think there were even any notable server changes since then at all, so that makes the issue even weirder.

If this can help recently switched our server from windows to linux and had this crash after 1 hour of uptime and approximatively 100+ players on latest artifacts (3340) after that we switched to 3221 and no crash for 250+ players and 5 hours of uptime.

No, that can’t “help” and why did you “switch to Linux” and stop fucking downgrading for no fucking reasons? ._.

Use Windows. Use latest. Don’t do counterproductive shit and think it ‘can help’.

Also stop using a dice roll to determine a server version to use seriously…?

That was a shitty decision i approve, we switched because of the sync thread hitch warning spam problem, when i say “we” it’s my mate Azk’ who told you about this problem.
The fact is it’s not even fixed on Linux too (we thinked because of the fact it started appearing when we switched to Windows it was coming from Windows) so we just downgraded like you said.

Not dice roll that’s our friend Cookay’ who told us it’s stable and we didn’t wanted to test every single versions because the community was waiting for a long time :wink:

Worked around since a while, pending allocator change so that the performance fix can be put back in place.

I can say from experience that the latest builds are really stable (running 3371 at the moment)

Had like one or two crashes in last week (which were unrelated to eachother and unrelated to bac_alloc), but besides that completely stable.