Poker Hand Predictor

6 min readNov 12, 2020

Franco Garcia Pedregal

This Story presents the implementation of two approaches I take to predict a hand whit 5 cards using the Machine Learning algorithms of Random Forest and Decision Trees

Introduction

Most of the games we play in the casino or in our houses use an standard 52-card deck nowadays the use Machine Learning algorithms as a recognition methods for objects in real-time to make classifications has increased a lot and as we know also for play or help in games this is the case we try to achieve with this algorithm reduce the human error to make a prediction of 5 cards or ‘Hand’

The composition of this 52 card deck consist of the following

Talking about probability

The Probability of drawing a given hand is calculated by dividing the number of ways of drawing the hand (Frequency) by the total number of 5-card hands so we can have 2,598,960 possible combination with 5 cards.
For example, there are 4 different ways to draw a royal flush (one for each suit), so the probability is 4/2,598,960, or one in 649,740. One would then expect to draw this hand about once in every 649,740 draws, or nearly 0.000154% of the time.

Hand Ranking

With 5 cards we have 10 different Hands and we can ordered by strength starting with the Royal Flush

Data Set

The dataset we’ll be exploring in this post is the Poker Hand data from the UCI Machine Learning Repository.

Each record in the dataset is an example of a hand consisting of five playing cards drawn from a standard deck of 52. Each card is described using two attributes (suit and rank), for a total of 10 predictive attributes. The target column describes the hand, with the possibilities being:

Attribute Information:

1) S1 “Suit of card #1”
Ordinal (1–4) representing {Hearts, Spades, Diamonds, Clubs}

2) C1 “Rank of card #1”
Numerical (1–13) representing (Ace, 2, 3, … , Queen, King)

3) S2 “Suit of card #2”
Ordinal (1–4) representing {Hearts, Spades, Diamonds, Clubs}

4) C2 “Rank of card #2”
Numerical (1–13) representing (Ace, 2, 3, … , Queen, King)

5) S3 “Suit of card #3”
Ordinal (1–4) representing {Hearts, Spades, Diamonds, Clubs}

6) C3 “Rank of card #3”
Numerical (1–13) representing (Ace, 2, 3, … , Queen, King)

7) S4 “Suit of card #4”
Ordinal (1–4) representing {Hearts, Spades, Diamonds, Clubs}

8) C4 “Rank of card #4”
Numerical (1–13) representing (Ace, 2, 3, … , Queen, King)

9) S5 “Suit of card #5”
Ordinal (1–4) representing {Hearts, Spades, Diamonds, Clubs}

10) C5 “Rank of card 5”
Numerical (1–13) representing (Ace, 2, 3, … , Queen, King)

11) CLASS “Poker Hand”
Ordinal (0–9)

0: Nothing in hand; not a recognized poker hand
1: One pair; one pair of equal ranks within five cards
2: Two pairs; two pairs of equal ranks within five cards
3: Three of a kind; three equal ranks within five cards
4: Straight; five cards, sequentially ranked with no gaps
5: Flush; five cards with the same suit
6: Full house; pair + different rank three of a kind
7: Four of a kind; four equal ranks within five cards
8: Straight flush; straight + flush
9: Royal flush; {Ace, King, Queen, Jack, Ten} + flush

Class Balance

Model Proposal

For this problem we gonna use 2 approaches with 2 models in each approach :

First Approach is that we gonna try to predict the hand with a Random Forest and Decision Three.

Second Approach we gonna the combine the 3 strongest hands into one class call “Poker or Better” and then make the same 2 models Random Forest and Decision Three

Helper Functions

We gonna start training the Decision Three the advantage of this Data Set is that we already have it divided in Training and Testing we just need to specify which are the ‘Xs’ and our ‘Ys’

X_train_pre = preprocess_data(train)
X_test_pre = preprocess_data(test)
X_train = X_train_pre.loc[:,X_train_pre.columns != ‘Hand’]
X_test = X_test_pre.loc[:,X_test_pre.columns != ‘Hand’]

If we trained with the data like that we gonna have a very low accuracy scores below .50, but why?

This is too much information for our three we have more than 25,000 possible combinations in our three imagine our poor three. So we gonna make try make it easy to our three we gonna sort things because it is to look on sorted information for our three than in random information we gonna sort only according to the Card Number we don’t care about the Suit of the card

def preprocess_data(data:pd.DataFrame):
df = data.copy()
dfc = df[[‘C1’, ‘C2’, ‘C3’, ‘C4’, ‘C5’]]
dfc.values.sort()
df[[‘C1’, ‘C2’, ‘C3’, ‘C4’, ‘C5’]] = dfc
df = df[[‘C1’, ‘C2’, ‘C3’, ‘C4’, ‘C5’, ‘S1’, ‘S2’, ‘S3’, ‘S4’, ‘S5’, ‘Hand’]]
return df

With this small trick to preprocess our data we increase our accuracy almost to .97 now we gonna observe our model error

This are ROC Curves or ROCAUC(Receiver Operating Characteristic/Area Under the Curve) displays the true positive rate on the Y axis and the false positive rate on the X axis on both a global average and per-class basis. The ideal point is therefore the top-left corner of the plot: false positives are zero and true positives are one.

As we can see all the Hands above flush are having troubles with our three

The classification report shows a representation of the main classification metrics on a per-class basis. This gives a deeper intuition of the classifier behavior over global accuracy which can mask functional weaknesses in one class of a multiclass problem.

Unique Count Function

Since we don’t have to much correlation between we gonna add another function to help our Three with these problems we gonna tell our three how many different Suits it has on the Hand for example:

This is the most example because we have 5 Different Suits of Cards

def add_unique_count(df:pd.DataFrame):
tmp = df[[‘S1’, ‘S2’, ‘S3’, ‘S4’, ‘S5’]]df[‘UniqueS’] = tmp.apply(lambda x: len(np.unique(x)) , axis=1)

Difference Function

If we trained again we gonna solve the problem for Royal Flush and Straight Flush but we still having problems with Four of a Kind and Full House to solve this we gonna help the three to identify when we have the same Card Number but with different Suit we gonna use the difference between each is bot card are equal is this parameter is gonna be 0

def add_diffs(df:pd.DataFrame):
df[‘Diff1’] = df[‘C5’] — df[‘C4’]
df[‘Diff2’] = df[‘C4’] — df[‘C3’]
df[‘Diff3’] = df[‘C3’] — df[‘C2’]
df[‘Diff4’] = df[‘C2’] — df[‘C1’]

For the second proposal we gonna reduce the 3 more Strength hand into one class call Poker or Better so we gonna need this function to change the data

poker_df.loc[poker_df[‘Hand’] >= 7, ‘Hand’] = 7
test.loc[test[‘Hand’] >= 7, ‘Hand’] = 7

To see the implementation of both proposals check mi repository

To see the Results

References

Choosing the right estimator — scikit-learn 0.23.2 documentation. (2020). Retrieved 12 November 2020, from https://scikit-learn.org/stable/tutorial/machine_learning_map/

Classification Report — Yellowbrick v1.2 documentation. (2020). Retrieved 12 November 2020, from https://www.scikit-yb.org/en/latest/api/classifier/classification_report.html

How the good old sorting algorithm helps a great machine learning technique. (2020). Retrieved 12 November 2020, from https://towardsdatascience.com/how-the-good-old-sorting-algorithm-helps-a-great-machine-learning-technique-9e744020254b

ROCAUC — Yellowbrick v1.2 documentation. (2020). Retrieved 12 November 2020, from https://www.scikit-yb.org/en/latest/api/classifier/rocauc.html

UCI Poker dataset classification. (2020). Retrieved 12 November 2020, from https://www.kaggle.com/rasvob/uci-poker-dataset-classification/comments

Visual Machine Learning with Yellowbrick. (2020). Retrieved 12 November 2020, from https://www.coursera.org/projects/machine-learning-visualization