How to make money watching UFC

Vasilisa
7 min readAug 11, 2020
Photo: businessinsider.com

Since the ancient times when gladiators have been fighting for crowd’s amusement, bread and circuses hardly ever lost its popularity. Turned out I am not an exception. I started watching UFC events three years ago and from the very first fight (apart from great empathy for every single punch received) it’s been so much fun.

As many of you may know, UFC is a MMA promotion, probably the most popular and successful one. About 40 events are held every year, 10–12 fights in different wight classes per event.

After watching hundred of fights or so, reasonable question appears. What if I can not only enjoy my favorite show, but also make some money with it? As for most of the sport events, it’s possible to bet. Are there any easy-so-see patterns in data I can use for my bets?

Data

Dataset being reviewed contains data about fights held on UFC events during last 10 years. Each row represents a separate bout, including fight-specific information such as winner, duration and finish details and information about fighters: their rank, age, average takedown accuracy and so on.

To be exact, there are 4307 fights described with 113 so-called ‘features’.

A bit of very high-level data structure analysis shows more than 90% of matches are between male fighters. 5% of fights are title bouts: those determine who will be the next champion (aka the best fighter) in certain weight class.

Speaking about weight classes, are those equally represented?

Not at all. The most popular are Lightweight and Welterweight: neighbours in weight table differing by 15lbs.

Data preparation included removing some columns, filling missing values, some conversions and categorical variables encoding. Most of features being used for analysis and modelling were numeric (75 against 4 categorical.

And IT’S TIME.

Photo: mmaindia.com

UFC fans definitely recognize Bruce Buffer — legendary ring announcer and his legendary catchphrase.

Here comes the first question I want to answer:

What would be a simple strategy for betting?

What if I am terribly lazy and don’t want to waste my time building complicated models and making tedious research? Can I find something extremely straightforward but successful?

First though would be always betting the opponent having higher rank. But only a small part of fights have at least one fighter ranked (being in top 15 in certain weight class), mostly those are fights in main card.

Data says better ranked fighter wins in 61.08% of fights. Not bad, but still, this only covers main card fights (28%).

Let me try to calculate what would my earning be, if I always bet, say, higher fighter? Or older?

Having more wins: -4260.27 USD

Older: -2130.51 USD

Making more significant strikes: 3615.10 USD

Having greater reach: -2174.17 USD

Higher: 1480.89 USD

Fighting more rounds: -1597.05 USD

Making more takedowns: -11.06 USD

The only ones leaving me in plus are: the fighter having more wins and the taller one. Still it’s obviously totally random.

Alright, but what does affect the match result in the end? Instead of making guesses let me build the model answering the question:

What influences fight result the most?

Task can be treated as binary classification: basically the model’s job would to answer whether fighter in the blue corner wins or loses.

Having all the data prepared, I am ready to build the model. Not intending to overcomplicate things, I start with simple Logistic regression which is quite straightforward and fast to train. Unfortunately it gives me only 65% of accuracy and neither trying couple more sophisticated models, nor playing parameters was able to increase accuracy.

There definitely are other more complicated and effective technics, but those would definitely be out of the scope of this analysis, so 65% it is.

Even so, I want to check which columns make the most contribution. Model assigns numeric coefficients to all the features it gets in the input, the greater absolute value of this coefficient is, the more important feature is for the model. Here is summary of top 10 columns with the highest ones.

What a surprise (not really)! Most meaningful variables turn out to be the ones related to odds. _ev variable show how much I would earn in 100 USD bet if a fighter wins.

Interesting enough. If we set logistic regression to use L1 penalty (without extra details, such a model tends to eliminate some of the features), it removes “_odd” features.

Which makes sense. There is strong connection between _ev and _odd features: the lower fighter’s odds are, the more I will be payed if he wins. It we calculate correlation we can clearly see its absolute value is very close to 1.

Leaving only one of 4 variables drops the accuracy only down to 64%. Eliminating this data completely will give us 60%.

Conclusion is simple here. Odds do not come from nowhere and you probably want to take those into consideration. Still, how much can I earn always betting the fighter with better odds? Turns out I lose almost 4000 USD.

In real life we usually have some past data and we want to use it for model training and predicting future. Reorganizing data this way, I built new model and checked how much money I can earn following my model recommendations. -1036 USD.

Surprised? Bad result is also a result. Again, good model CAN be built, but there is no simple one minute solution for this. What you always can do is enjoy a spectacular fight.

Speaking of which. I am not a fan of bouts ending with decision. I love knockouts and submission. Moreover, you can not only bet on a winner, but also on many different bout-related things, including how exactly it finished. Which leads me to the next question.

Is there anything special about fights not ending with decision?

There are quite a few ways a fight can finish.

My goal is to distinguish fights ending before decision. So again I will build a model that would tell me if something interesting happens, or fight ends with judges decision.

Can I find features having different distributions for these 2 types of finish? KO/TKO/SUB finishing fights tend to happen in 55% if title bouts against 50% in others, 51% of male fights against 37% female.

Weight class distribution reflects overall picture: Lightweight looks more popular than but there is no significant difference between 2 plots.

Attempts to find any pattern in other variables plot ended with disappointment.

Classes seem to be absolutely indistinguishable visually.

Just like answering previous question, I want to build the model and check importance it gives to features. Unfortunately again model did not perform too good (only 58,5% accuracy).

And again odds (ev variables to be exact) are in top. As well as total rounds fought by both opponents. Is it because they have a lot of experience? Number of rounds in the fight also seems to have affect.

Percentage of KO/TKO/SUB fights among 3-round bouts: 49.17%

Percentage of KO/TKO/SUB fights among 5-round bouts: 64.48%

Makes sense. 5 rounds are more tiring, fighters tend to finish it earlier.

Photo: essentiallysports.com

To be honest I am not quite happy with the results, this is far from what I was expecting when I started working on this dataset. But again, bad result is also a result. Guess I am gonna have to watch every single match hoping for a knockout!

And you, have you ever made a sport bet?

--

--