[FXServer] Server desync, instancing, hitching

I also did not observe “out of order” or “dropped packets” in the client console.

That’s probably the main concern - most probably these end up being caused by scripts but no more detailed instrumentation is implemented to drill down into what script is causing this.

Given how the hitch times increase over time, starting at ~100msec and continuing up to ~1-2sec, I’m guessing there’s some script (or worse, internal server function) that doesn’t clear some kind of linear-scan list?

That’s part of legacy networking code and isn’t used anymore.

2 Likes

We’re investigating on our end as well given this information.

Any idea why this would be different between CFX Server and FXServer? Perhaps CFX simply masked the problems a little better?

@promethium When you say some scripts that doesn’t clear some kind of linear-scan list could you clarify the Linear-scan list

O(n) complexity.

For each element added to the list, it takes an additional amount of time to go through the list, and each element adds the same amount of time, so the effect is cumulative.

If it takes 1ms for each item in a list to be processed, a list of 2 items will take 2ms. A list of 3 items will take 3 ms, and so on.

Perhaps mysql-async’s rewrite, perhaps CitizenTaskScheduler, something at least changed. :\

1 Like

I found a MySQL.Sync call in a growing list. I think that’s our problem.

I am assuming that it wasn’t a problem for us before because we were able to use MySQL pooling with the old driver. Lag and hitching is a problem now because that growing number of queries, in synchronous mode, with (TCP + MySQL authentication + query time) was adding significant blocking delays.

fivem-mysql-async does not support pooling as of the time of this writing (even though the underlying MySqlConnector appears to), but I did create a hack to support pools which I need to re-implement and attempt to contribute to that project.

I think that between hacking in pooling and switching this particular loop to be a MySql.Async call we’ll be fine… in addition to a rewrite of our own code after we’re all on FXServer in production.

I wasn’t able to recover my pool hack for fivem-mysql-async, but I did use a combination of Citizen.CreateThread on the server, SetTimeout, and MySQL.Async queries to prevent scheduling too much on a single tick. In tests, this allows us to continue to use our terrible list without any hitching. We’ll try another live test using this optimization and report back.

Hitching is gone with the optimizations mentioned above, but we’re still getting huge numbers of instances, disconnects, and timeouts of players. We reverted again.

Vouches look great now for artifact 301:

Received a vouch for KatieRouu, they have 2 vouches and need 14.
Received a vouch for Rekan Martinez, they have 2 vouches and need 15.
Received a vouch for Phalanx™, they have 2 vouches and need 15.
Received a vouch for Mindless, they have 2 vouches and need 16.
Received a vouch for KennyH, they have 2 vouches and need 17.
Received a vouch for Marth, they have 2 vouches and need 18.
Received a vouch for Atomic_Wolf_, they have 2 vouches and need 17.
Received a vouch for Da True MVP, they have 2 vouches and need 17.
Received a vouch for Mindless, they have 2 vouches and need 20.
Received a vouch for Talbot, they have 2 vouches and need 19.
Received a vouch for Marth, they have 2 vouches and need 19.
Received a vouch for iamboogsy, they have 2 vouches and need 20.
Received a vouch for Krom_, they have 2 vouches and need 20.
Received a vouch for slacker_1980, they have 2 vouches and need 20.
Received a vouch for Atomic_Wolf_, they have 2 vouches and need 20.
Received a vouch for Mindless, they have 2 vouches and need 20.
Received a vouch for Talbot, they have 2 vouches and need 18.

We’re leaking MySQL connections with fivem-mysql-async beta2. I think it has to do with the MySqlConnector.dll update to 0.23.0 and will try to revert the blob to beta1’s version which we were running yesterday.

When we can confirm that we’re not leaking connections and are still getting high rates of instancing I’ll report back once again.

We need to let the game run so players can have time to play. Our next scheduled restart is in a few hours.

I am experiencing the same exact issue with the new FX server!

In fact, here is the post I literally just made before reading this one:

But the thing is,

I am using Couch DB, no SQL at all.

@promethium @aurum does that information help at all?

Older server builds will still hang on PerformHttpRequest due to an oversight. In addition, this is still a known issue (and is being discussed in server testing channels on Discord), however literally no common factor to the clients that are getting disconnected has been identified so far, even with the use of helper scripts to log extra information.

Weird thing is that there have been pretty much no design changes that could cause this to occur between CitizenMP.Server and FXServer, so this either fails due to a design oversight, or some other fluke is causing certain players to stop receiving packets at times.

A pcap dump of server port traffic with correlated real-world times to a log of player connections/disconnections could be a potential next step to figuring out what the hell happens that could cause these players to end up timing out.

I’ll tag a few people on the Discord too so they can see this idea.

2 Likes

I’ll discuss this with my team. As you can imagine, capturing IPs and passwords and whatnot could be a problem to share, especially trying to remove just those packets from a pcap, but I’ll see what I can get authorized since we’re extremely anxious to get off of CFX.

Is the Discord / dev area joinable? I’d like to pop in if it’s public, though I assume it’s private.

Edit: I joined the Discord as CFX.

Hmm, okay. Well let me know if I can help in some way!

Somewhat related cross-post:

3 Likes

Excuse me , it’s maybe a stupid question but what is the vouch?

Thank’s

Vouching is a way of keeping track of which client is the authoritative state of the game (i.e., the host). Clients vote (or “vouch”) for who should be the leader/host. If voting is messed up, clients ultimately get confused.

Instancing has been much improved in Artifact 306, but doesn’t quite solve the problem, particularly in the “higher numbered” ID slots; the last 2-3 slots before the server becomes full seem to have the most problems, but some people crash/instance constantly no matter what their ID.

1 Like

@justcfx2u

Thank you very much…

I don’t know if i can help the comunoty , but i tried a system on my server and it work!

I’m runing with VRP framework … on cfx server , i had a ton of time out , on FX no time out but ton on instance.

On Vrp , i deleted the update position ( \resources\vrp\modules\player_state.lua)

function tvRP.updatePos(x,y,z)
  local user_id = vRP.getUserId(source)
  if user_id ~= nil then
    local data = vRP.getUserDataTable(user_id)
    if data ~= nil then
      data.position = {x = tonumber(x), y = tonumber(y), z = tonumber(z)}
    end
  end
end

And finish the time out on cfx and a ton less instance on FX !

I’m sorry , english it’s not my first language , but in french i can explain to you how to solve the instance problem.

For anyone know arma 3 server and sql system for arma 3 , try to make the same like extdb2 sql with you server and you’ll undertsand :wink:

1 Like

I have already done it and I think no crash so far: P, but the update of the location is essential???

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.