Server Long Ticks & Performance Issues

MelbRP · January 7, 2022, 3:01am

Hi all
I’ve been trying to get to the bottom of some major server hitching issues and have found a couple of things through some performance captures. Not really sure what to make of it though, hoping someone might be able to point me in the right direction:

Finding #1 Windows Performance Analyzer shows me this really sus looking resource using a heap of time called “ÈGÁÐþ”. I have no idea what this resource is, searching across the server files comes up with nothing.

Finding #2 Running a server profile capture in the console shows that some ticks are randomly taking a massive amount of time, in particular I keep seeing citizen/scripting/lua/scheduler.lua line 219-221 which is generally a child of the es_extended event event:esx:triggerServerCallback

Line 219-221 of scheduler.lua is part of the function Citizen.SetEventRoutine():

Citizen.CreateThreadNow(function()
                    handler(table_unpack(data))
                end, ('event %s [%s[%d..%d]]'):format(eventName, di.short_src, di.linedefined, di.lastlinedefined))

The lag on the client side caused by hitching seems to get worse with the more players that join, but running client-side performance captures shows nothing out of the ordinary compared to the server side.
We’re on txAdmin v4.10.0 / artifacts 5104
If anyone could offer thoughts on the above or advice on what further I can do to troubleshoot this issue , that’d be greatly appreciated.

Just a side note, I tried posting the above message on discord gated-support but for some reason snaily keeps deleting it due to “banned word” - No idea what word that might be.

Thank you.
-snags

nta · January 7, 2022, 10:51am

Can you send the .etl by chance?

Profiler lately has a few attribution issues apparently, so unless running it in ‘resource’ mode that’s not really anything to go by. It appears it’s not attributing the profiler event to the actual event handler anymore, bit whoops maybe (ignoring the name argument to ThreadNow)

MelbRP · January 8, 2022, 5:28pm

Hey d-bubble sorry for the late reply; Sure thing - I’ve uploaded the .etl to onedrive as it’s 200mb. Hope it helps in some way:

Darn, that’s a shame - Is there anything else I can do to nail down what’s causing this hitching?

I’m about ready to try downgrading artifacts (as the issue appeared at a similar time to us upgrading) and if that doesn’t work, copy/migrate the server onto a linux host to see if there’s any difference in performance.
Trying to optimize all I can without knowing what potential resource is responsible

Thanks!
-snags

nta · January 9, 2022, 12:29pm

Latest server versions fixed profiler’s attribution again, ‘downgrading artifacts’ probably won’t help (and if it does, please don’t leave it at that but try to deduce a repro), and ‘copy onto a linux host’ just seems like a recipe for more issues.

Again though, profiler's resource mode may be needed to find out these ‘ESX’ server callback things as this weird system of theirs does some recursive call oddity.

~~In your test case, it seems to be some C# resource being slow, what C# resources (as shown by a .net.dll being present) do you have?~~

Measuring tick start/end events and finding the first slow tick start before the end shows it’s your discord-screenshot resource doing some slow stuff. I guess it’s this one, and probably it’s performing some slow synchronous operations.

… no, this resource in your case is a C# version with the same name (where from? or worse, _did something inject a .net.dll in your discord-screenshot resource?), and it’s ‘slow’ as it’s indeed ~~doing blocking I/O~~ page-faulting due to low memory?! which is taking ~400ms on the VPS/VM/whatever you’re running.

It’d be really helpful to have access to this machine as this is some weird stuff going on, mainly a resource that should be mostly JS/TS running C# code.