Random heap corruption (regression?) on servers

nta · January 7, 2023, 7:39am

Try using something like NSSM (https://nssm.cc/) to run it decoupled from the actual interactive session.

Plouffe · January 18, 2023, 7:51pm

Throwing that out there but, could this be cause by ‘SaveResourceFile’ ?

nta · January 18, 2023, 9:42pm

What makes you think this to be the case?

Plouffe · January 18, 2023, 10:05pm

I have a resources that once the server is started creates 3 pretty big json files has ‘Backup’ and it backups once every 30 minutes.

As the server crashes right on startup sometimes, i figured this could be linked to the issue.
I also remember seeing an error that mentioned something about write acces.
It might not be anywhere near my issue but im still searching.

I just rewrote the whole thing to save this data with mysql to test it out.

Also im not sure if you saw or maybe it wasn’t concluding but i uploaded a crash dump that was log by fxServer in my last post.

squizer · January 19, 2023, 10:23pm

We don’t use too much SaveResourceFile. We handle everything through database mysql but it still sometimes crash

BennoBaba · February 9, 2023, 7:32pm

@Plouffe Do you have any luck with this one, did you fix crashing?

I did pretty much everything, removed most of the StateBags, changed hosting, used older artifacts.
Only difference is that I now get

[ citizen-server-impl] Server list query returned an error: System.Threading.Tasks.TaskCanceledException: A task was canceled. <- System.TimeoutException: A task was canceled. <- System.Threading.Tasks.TaskCanceledException: The request was canceled due to the configured HttpClient.Timeout of 30 seconds elapsing.

Unhandled Exception:
System.NullReferenceException: Object reference not set to an instance of an object
Unhandled exception in Mono script environment: System.NullReferenceException: Object reference not set to an instance of an object
(null)> txaEvent "serverShuttingDown" "{＂delay＂:5000,＂author＂:＂txAdmin＂,＂message＂:＂Server is shutting down: (Server stopped).＂}"

Also, @nta does this line have any meaning to you

Feb 10 18:44:13 5600x kernel: [97146.307251] traps: luv_tcp5[212245] general protection fault ip:7fd02ecf09aa sp:7fd008e7d360 error:0 in ld-musl-x86_64.so.1[7fd02ecdf000+4b000]

That is the message that appears in the kernel log right when server crashes

EDIT: Also I noticed that crash usually occurs RIGHT AFTER heartbeat:

[ citizen-server-impl] Sending heartbeat to https://servers-ingress-live.fivem.net/ingress
[ citizen-server-impl] sync thread hitch warning: timer interval of 102 milliseconds


=================================================================
FXServer crashed.
A dump can be found at /root/FIVEM/MAIN/alpine/opt/cfx-server/crashes/7a3e30b6-6309-4a1e-12abd89a-da288941.dmp.
Crash report ID: bc54b898-55ee-417b-81d0-bfc57a5c0d20
=================================================================
> txaEvent "serverShuttingDown" "{＂delay＂:5000,＂author＂:＂txAdmin＂,＂message＂:＂Server se restartuje: (Server se zaustavio).＂}"

squizer · February 12, 2023, 10:28am

On our server, it happens when there is ± 190 players And people say server crashes when a blimp drops down or someone explodes the gas station.

nta · February 12, 2023, 11:04am

Entirely unrelated to the crashes that have been discussed in this topic so far. If that is a thing you should provide dumps for that scenario separately.

gottfriedleibniz · February 24, 2023, 2:07am

Semi-related (but not really… was looking through the uses of HttpResponse).

Should ResourceHttpComponent be ending or closing the HttpResponse if there is no handler associated with a resource?

nta · February 24, 2023, 2:50pm

In theory, this would be fine once all references vanish… but this is indeed a little bit weird-looking.

nta · March 11, 2023, 11:05am

There was another thread running about this in parallel, this has been closed and is redirected here now:

(yes, the other thread is technically older but this one had more recent activity and is the one that usually ends up found instead)

HypeRP.pl · March 11, 2023, 11:43am

We found out that we are getting rate limits everytime before crashes. Also i think it’s related to the internet, cuz why it would hand rdp before crash in one server, why we don’t have logs before crash, why there is always network hitch etc. Here are some screenshots of console before crashes

nta · March 11, 2023, 11:44am

Since an earlier investigation as recent as 2023-03-08T17:39:00Z resulted in claims that the heap corruption exists ‘as low as 5914’, and theoretically ‘5848’ was still fine, there’s a new theory about what might be going on here:

in tweak(net/server): various resilience and limit tweaks · citizenfx/fivem@1c52f55 · GitHub (2020-08, build 2801), some HTTP server stuff was moved to use EASTL.
in tweak(net): fixed_{multimap->vector} for Http2Server · citizenfx/fivem@114608b · GitHub (2022-04, build 5513), Http2Server’s header list was changed to use EASTL vector instead of EASTL multimap.
Of note is the commit message there saying it ‘mitigates a corruption crash’, which matches what’s going on now as well.
in tweak(vendor): bump eastl to electronicarts/EASTL@5eb9b1ec09faaf59651… · citizenfx/fivem@168f92e · GitHub (2022-09, build 5903), EASTL was upgraded from a revision from 2020 to a revision from 2022.
This might have exposed a latent case of the corruption issue from before.

As another experiment, I’ve just pushed tweak(net/http-server): flag to remove EASTL usage · citizenfx/fivem@6fa9f9f · GitHub (build 6314), which should at least behave differently here (and might also finally show the original corruption in cases with a memory debugger attached, so if the issue still occurs there I’ll probably throw an ASan build out there again).

tl;dr

(tl;dr: try again with 6314, if it still fails do upload a full dump and if it’s the same failure and it needs more info still there’ll be an ASan build to try with again too that’ll hopefully catch it unlike last time)

nta · March 11, 2023, 11:46am

This seems to be a print from this ‘SekulBanSyste’ resource. What does this resource do that makes it send requests that ‘get rate limited’, and when do these get tripped initially?

That behavior also seems to add up with some sort of deliberate attack again, by the way.

HypeRP.pl · March 11, 2023, 11:55am

We are now on 5855 and have crashes all time

HypeRP.pl · March 11, 2023, 11:56am

It is discord logs when someone is joining

HypeRP.pl · March 11, 2023, 12:17pm

After having like 30 crashes in a row after starts.

We have disabled every single http request, making only exeption for TXAdmin. We are running still,

I will keep it disabled for 2 days or more to confirm that problem is gone (we had have at least 5 crashes daily)

function PerformHttpRequest(url, cb, method, data, headers, options)
    if GetCurrentResourceName() == 'monitor' then
        local followLocation = true
        
        if options and options.followLocation ~= nil then
            followLocation = options.followLocation
        end
    
        local t = {
            url = url,
            method = method or 'GET',
            data = data or '',
            headers = headers or {},
            followLocation = followLocation
        }

        local id = PerformHttpRequestInternalEx(t)

        if id ~= -1 then
            httpDispatch[id] = cb
        else
            cb(0, nil, {}, 'Failure handling HTTP request')
        end
    else
        cb(0, nil, {}, 'Failure handling HTTP request')
    end
end

Paste this in scheduler to disable every httprequest

Linden · March 11, 2023, 12:22pm

Third time’s the charm, christ.

Since apparently the D word is a very bad word (even though it was already used in this thread) I am going to censor this url.

https://verybadword.com/developers/docs/topics/rate-limits#exceeding-a-rate-limit-example-exceeded-resource-rate-limit-response

HypeRP.pl · March 11, 2023, 12:26pm

What else we can use to have logs?

tabarra · March 11, 2023, 12:48pm

Although true, that is not relevant for this thread.
So let’s keep the subject centered on the crashes and solutions, shall we?!