Why We Built an AI Model for Horse Racing
I'll be honest — when we first started building Racin, a lot of people told us we were wasting our time. "You can't beat the horses with a computer," they said. And look, they're partly right. Horse racing is messy, unpredictable, and full of variables no model can fully capture. A horse might have an off day. A jockey might misjudge the pace. The rail might be out 12 metres and nobody adjusted their assessment.
But here's the thing we kept coming back to: the market gets it wrong more often than most punters realise. Not by huge margins — this isn't about finding $51 winners hiding in plain sight. It's about finding horses at $6 that should really be $4.50. Do that consistently over hundreds of races, and the maths starts working in your favour.
That's what our model is designed to do. Not pick every winner, but identify where the odds don't match the probability.
What Goes Into Each Prediction?
Our model is built on XGBoost — a gradient-boosted decision tree algorithm that's become the workhorse of tabular data prediction across finance, healthcare, and sports analytics. We chose it over neural networks because it handles structured racing data well, trains fast, and most importantly, we can explain why it makes each prediction.
Every runner in every race gets scored across 79 engineered features. Here's what that actually looks like in practice:
Recent Form (The Obvious Stuff)
This is where any decent form student starts. We look at the last 10 starts — finishing positions, margins beaten, sectional times where available, and whether the horse is improving or regressing. But unlike a human reading the form guide, the model weights these mathematically. A horse that's run 3rd, 2nd, 1st in its last three starts gets a very different score than one that's gone 1st, 2nd, 3rd — even though both have identical averages.
We also factor in the quality of the races those results came from. Running third in a Group 1 is vastly different from running third in a Benchmark 58 at Dubbo. The model normalises finishing positions against field quality so it's comparing apples with apples.
Weight and Barrier Draw
Every punter has an opinion on barriers. "Barrier 1 at Randwick is a death trap." "Wide draws at Flemington don't matter over 2500m." The model doesn't rely on opinions — it runs the numbers.
We track each horse's historical performance from different barrier positions and at different weights. Some horses genuinely handle weight better than others. Some are gun gate horses that struggle from wide. The model picks up these individual patterns that a human form analyst might miss across 50+ meetings a week.
Track Conditions and Surface Preferences
This is one of the model's strongest edges. Australian racing happens across a wild range of track conditions — from Good 1 (basically concrete) through to Heavy 10 (a mud bath). Some horses are transformers on wet ground. Others can't handle a blade of grass out of place.
We split each horse's career record by track condition and measure actual-versus-expected performance on each surface type. When the track gets downgraded from Good to Soft on race morning, the model immediately adjusts every runner's predicted probability. Most punters react too slowly to track changes. The model doesn't.
Jockey-Trainer Combinations
Here's something interesting we found in the data: certain jockey-trainer combinations perform significantly better than you'd expect from either the jockey or trainer individually. It's a synergy effect — they know each other's riding instructions, they communicate well, they trust each other.
The model tracks strike rates and A/E (actual vs expected) ratios for every active jockey-trainer combination. When a winning combination reunites, it gets a measurable boost in the prediction. When a new pairing tries something different, the model is appropriately cautious.
Market Data (Used Carefully)
Here's where it gets nuanced. We do use market odds as one input feature — but it's not the primary signal. Think of it as a sanity check. If the model thinks a horse should be $3 but it's drifting out to $15, that's worth investigating. Maybe the market knows something (horse not eating, warm in the yard). Maybe the market is wrong.
Using market data as one of 79 features — rather than the dominant signal — lets the model identify value where the market has mispriced a runner without just blindly following the money.
How We Validate the Model (No Cheating Allowed)
This is crucial, and it's where a lot of "AI prediction" services fall down. If you train a model on data from January to December and then test it on data from March, you're cheating. You've already shown it the answers.
We use walk-forward validation. This means every single prediction in our backtest uses only data that was available before that race happened. The model never sees future results during training. It's the same approach used in quantitative finance — because in trading, just like in racing, you can't trade on tomorrow's prices.
The Actual Numbers
Here's our backtest performance across 2,798 races from July 2025 to March 2026. We're publishing this because transparency matters — and because we think it speaks for itself.
That last point is important. We had a genuinely bad month. The model struggled with the spring carnival conditions, some of the Group 1 fields were harder to separate, and we took a hit. We publish the losing months too, because if someone only shows you winning months, they're hiding something.
"The goal isn't to be right every race. It's to be right often enough, at good enough odds, that the maths compounds in your favour over hundreds of bets."
Where the Model Still Struggles
Let's be honest about the limitations.
Group 1 races are tough. When you've got 10 quality horses in a field, the margins between them are razor-thin. The model's edge is smaller in elite races because there's less mispricing — these are the most-analysed races in the country and the market is more efficient.
First starters are essentially a coin flip. We have no form data on them, so we're relying on breeding, trainer stats with debutants, trial times (where available), and market signals. The model is honest about its uncertainty here — predictions for first starters carry wider confidence intervals.
Extreme track conditions create volatility. When the track deteriorates from Good to Heavy 10 mid-meeting, everything changes. We factor in surface preferences but there's inherent randomness when horses are slogging through mud.
Non-racing factors are invisible to the model. If a horse was agitated in the float, if the jockey had a fight with the trainer, if the horse hasn't eaten for two days — we can't see any of that. The market sometimes prices these factors in through late odds movements, which is why we monitor market movers separately.
How This Translates to What You See on Racin
When you open a race card on Racin, every runner has a predicted win probability generated by this model. You'll see something like:
These probabilities are compared against live bookmaker odds in real time. When the model's probability is significantly higher than what the odds imply — we flag it as a value bet. That's the 20%+ overlay threshold that generated +24.1% ROI in our backtest.
You don't have to follow every flag. You don't have to agree with every prediction. But you now have a data point that no other Australian racing platform publishes with this level of transparency.
What's Next
We're continuously improving the model. The next priorities are:
If you want to see the model's predictions in action, the free tier gives you access to Saturday metropolitan meetings with top 3 picks per race. No credit card, no catch — just data.