How To create a simple machine learning model in Python?

In this post you can find how to build a simple machine learning model in Python. We will cover the basics of the ML model and how to build it quickly in a few lines of code.

Below you can find the full code of a machine learning model for classifying flowers based on their properties.

import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

iris = sns.load_dataset("iris")

X_train, X_test, y_train, y_test = train_test_split(iris.drop('species',axis=1), iris['species'], test_size=0.3, random_state=42)

model = LogisticRegression()
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

Below you can find an explanation of the code and details.

Setup

We are using modules:

sklearn - for the ML module
seaborn - for the training and testing dataset

We import the required modules and methods. Next we read the iris.csv dataset:

	sepal_length	sepal_width	petal_length	petal_width	species
0	5.1	3.5	1.4	0.2	setosa
1	4.9	3.0	1.4	0.2	setosa
2	4.7	3.2	1.3	0.2	setosa
3	4.6	3.1	1.5	0.2	setosa
4	5.0	3.6	1.4	0.2	setosa

The original dataset has 150 records.

Dataset preparation

ML models needs data for training which has:

input
- sepal_length
- sepal_width
- petal_length
- petal_width
output
- species

To build training and testing dataset we will split original dataset in 2 parts by:

input_data = iris.drop('species',axis=1)
output = iris['species']
X_train, X_test, y_train, y_test = train_test_split(input_data, output, test_size=0.3, random_state=42)

So we will have:

X_train - 105 rows
X_test - 45 rows
y_train - 105 rows
y_test - 45 rows

Example of X_train:

	sepal_length	sepal_width	petal_length	petal_width
81	5.5	2.4	3.7	1.0
133	6.3	2.8	5.1	1.5
137	6.4	3.1	5.5	1.8
75	6.6	3.0	4.4	1.4
109	7.2	3.6	6.1	2.5

and example of y_train:

81 	versicolor
133 	virginica
137 	virginica
75 	versicolor
109 	virginica

So we feed the model the properties of each flower and what is the name of it. The initial classification or labeling was done manually.

Training ML model

To train the model we will use LogisticRegression() and provide the training data:

model = LogisticRegression()
model.fit(X_train, y_train)

Testing model

Next we can test the machine learning model by:

y_pred = model.predict(X_test)

We can check the results:

array(['versicolor', 'setosa', 'virginica', 'versicolor'...

Now our machine learning model is able to classify flowers based on 4 inputs and return the species.

Verifying test results

To verify the test results we will use Pandas to compare the values:

(df.iloc[y_test.index]['species'] == y_test.values).value_counts()

which returns:

True	45
Name: species, dtype: int64

So we have 100% percent accuracy in prediction of iris species.

Visualizing data

We can use K-means to cluster data into groups and visualize it by:

imports and dataset load
build the K-means model
- use 3 clusters
visualize the results

from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

iris = load_iris()
X = iris.data

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(X_scaled)

plt.scatter(X_scaled[:, 0], X_scaled[:, 1], c=kmeans.labels_)
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], marker='x', s=200, linewidths=3, color='r')
plt.xlabel('Sepal length (scaled)')
plt.ylabel('Sepal width (scaled)')
plt.show()

The results is below:

kmeans-visualization-python-machine-learning-iris

	sepal_length	sepal_width	petal_length	petal_width
81	5.5	2.4	3.7	1.0
133	6.3	2.8	5.1	1.5
137	6.4	3.1	5.5	1.8
75	6.6	3.0	4.4	1.4
109	7.2	3.6	6.1	2.5

	sepal_length	sepal_width	petal_length	petal_width
81	5.5	2.4	3.7	1.0
133	6.3	2.8	5.1	1.5
137	6.4	3.1	5.5	1.8
75	6.6	3.0	4.4	1.4
109	7.2	3.6	6.1	2.5

> Python Basics

> Advanced Python Tutorials

> Python Errors

> Pandas Advanced

> Pandas Count

> Pandas Column

> Pandas Basics

> Pandas DataFrame

> Pandas Row

> User Interface

> Advanced Linux

> Troubleshoot

> Video & Sound

> Linux Commands

> MySQL

> SQL Basics

> Python

> DB apps

> JupyterLab

> Jupyter Tips

> Jupyter Display

> Regex in Text Editor

> Regex Basics

> Regex Match

> Regex Date

> PyCharm Advanced

> Git and PyCharm

> PyCharm Error

> PyCharm Tips

> Linux Mint Applications

> VIrtual Machine

> Miscellaneous

> Java

> Automation

> Windows

> Office

> Cheat Sheet