for me, it’s pretty confusing in quite a few instances, but, just to give an example: if the striker shoots and the keeper saves it, but the striker taps in the rebound, how is xG counted? is it counted twice or just once? or, what if neither shot goes in?

  • BlazasAndQuasars@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    How can they even find the average chance of scoring when there are so many factors playing in?

    Where is the player on the pitch? From which direction did the pass come from, and at what speed? Was it a high pass, low, did the ball bounce or change direction on the way? Where are are the defenders positioned when the player is making the shot? Is the pitch dry/bumpy/wet? Just some random examples of the million factors playing in.

    Which factors matter and which do they ignore? I don’t think they have good enough data when considering how “unique” most goal attempts actually are, all factors considered.

    • Emergency_Isopod_324@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      Player position and defender position is factored in. The other intricacies you mention are why xG is not a suitable metric for the accurate depiction of a single game or even several games.

      Rather, it’s far more suited for analysing performance over tens of games or entire seasons and can help to highlight problems in chance creation, finishing, or with specific players etc.

    • antikas1989@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      You are correct that by including a lot of factors they would have to check they have enough data to estimate each factor properly. However, they do have a lot of data. Opta says they use a database of millions of shots. That still might not be enough if they include too many variables in the model - just be cause of the sheer number of combinations of variables that are possible. From what I’ve read about this they seem to be pretty disciplined in what they include.

      In any case, the things the model won’t predict well are more likely to be rare events that don’t show up many times in the training data. But since those shots are rare anyway, it doesn’t matter so much. You have a trade off between including enough variables in the model to predict most events really well (learn from the huge amounts of data that you do have on common shooting chances in football), or you could in theory limit the ability of the model to predict things so as to be “less surprised” by outlier events, but nobody really would want that.

      You have an intuition that most goal attempts are “unique”, but xG seems to do pretty well on average. That “uniqueness” is still captured by the model in terms of variation around the expected number of goals but unfortunately they never present that.

      I personally would like to see some information about xG uncertainty, because that WOULD include information about the rarity in the training data. A super rare shot should in theory (assuming a good model) have more uncertainty about the xG.

      So if we saw things like an xG range of plausible estimates instead, say it was xG-range 0.7-0.9 because all we had in the game was 1 penalty, then that is pretty common and the uncertainty can be low (narrow range of plausible values). But say instead of a single penalty there was like 5 half chances with rare events included we might see something like xG-range 0.2-1.4 and both cases could have the same total xG overall but by presenting the uncertainty we have a much better understanding of what the model actually predicts and what type of game we are dealing with.