In this brief tutorial, we'll see how to **map numerical data into categories or bins in Pandas.** The process is known also as binning or grouping by data into Categorical.

Typical use case for this operations are:

- financial data
- salaries
- years
- ages
- percentage

We will cover several most interesting examples. To start, create a simple DataFrame with percentage, age, salary columns:

```
data = {'productivity': [80, 20, 60, 30, 50, 55, 95],
'salary': [3500, 1500, 2000, 1000, 2000, 1500, 4000],
'age': [25, 30, 40, 35, 20, 40, 22]}
data_ix = ['Tim', 'Jim', 'Kim', 'Bim', 'Dim', 'Sim', 'Lim']
df_example = pd.DataFrame(data, index=data_ix)
print(df_example.to_markdown())
```

data:

productivity | salary | age | |
---|---|---|---|

Tim | 80 | 3500 | 25 |

Jim | 20 | 1500 | 30 |

Kim | 60 | 2000 | 40 |

Bim | 30 | 1000 | 35 |

Dim | 50 | 2000 | 20 |

Sim | 55 | 1500 | 40 |

Lim | 95 | 4000 | 22 |

## Step 1: Map percentage into bins with Pandas cut

Let's start with simple example of **mapping numerical data/percentage into categories** for each person above. First we need to define the bins or the categories. In this example we will use:

`bins = [0, 20, 50, 75, 100]`

Next we will map the productivity column to each bin by:

```
bins = [0, 20, 50, 75, 100]
df_example['binned'] = pd.cut(df_example['productivity'], bins)
```

the result is:

binned | |
---|---|

Tim | (75, 100] |

Jim | (0, 20] |

Kim | (50, 75] |

Bim | (20, 50] |

Dim | (20, 50] |

Sim | (50, 75] |

Lim | (75, 100] |

Now we can do value_counts:

binned | |
---|---|

(75, 100] | 2 |

(50, 75] | 2 |

(20, 50] | 2 |

(0, 20] | 1 |

Note: If we are interested in the cumulative sum per group then this article is very useful: Python cumulative sum per group with Pandas

## Step 2: Map numeric column into categories with Pandas cut

Now let's group by and map each person **into different categories based on number and add new label** (their experience/age in the area). Again we need to define the limits of the categories before the mapping. But this we need to have also names for each category:

`bins = [15, 20, 25, 50]`

`category = ['junior', 'mid', 'senior']`

Let see how to map each employee in one of these categories:

```
bins = [15, 20, 25, 50]
category = ['junior', 'mid', 'senior']
df_example['experience'] = pd.cut(df_example['age'], bins, labels=category)
```

result:

experience | |
---|---|

Tim | mid |

Jim | senior |

Kim | senior |

Bim | senior |

Dim | junior |

Sim | senior |

Lim | mid |

Note 1: that bin labels must be one fewer than the number of bin edges otherwise error is raised:

`ValueError: Bin labels must be one fewer than the number of bin edges`

Note 2: method `cut`

will create Categorical column

Note 3: `np.inf`

can be used as upper limit

Note 4: `ValueError: bins must increase monotonically`

is error shown if the beans are inconsistent

## Step 3: Pandas map numeric column with dictionary

Finally let see how to **map existing column to a dictionary** and map each value:

```
salary_dict = {1000: 'L1', 1500: 'L2', 2000: 'L3', 2500: 'L4',
2500: 'L5', 3500: 'L6', 4000: 'L7' }
df_example['salary'].map(salary_dict)
```

The final result is:

salary | |
---|---|

Tim | L6 |

Jim | L2 |

Kim | L3 |

Bim | L1 |

Dim | L3 |

Sim | L2 |

Lim | L7 |

Note: **Non-Exhaustive Mapping can be handled by**:

```
salary_dict = {1000: 'L1', 1500: 'L2', 2000: 'L3', 2500: 'L4'}
df_example['salary'].map(di).fillna(df_example['salary'])
```

so instead of NaN values the output contains data from the original column(which doesn't have map):

salary | |
---|---|

Tim | 3500 |

Jim | L2 |

Kim | L3 |

Bim | L1 |

Dim | L3 |

Sim | L2 |

Lim | 4000 |

## Step 4: Pandas map numeric column with np.select

Finally let's see how to map the values based on selection. The selection is done by: `np.select`

. We are going to divide the salaries into two groups:

- low - 0, 2000
- high - 2000, 4000

This is the code:

```
import numpy as np
criteria = [df_example['salary'].between(0, 2000), df_example['salary'].between(2000, 4000)]
values = ['low', 'high']
np.select(criteria, values, 0)
```

and the output is:

```
['high' 'low' 'low' 'low' 'low' 'low' 'high']
```

## Step 5: Plotting Categorical data

If you need to plot the data for the productivity then you can do:

`df_example['productivity'].plot(kind='hist')`

if you like to plot the mapped bins than you can do:

```
bins = [15, 20, 25, 50]
category = ['junior', 'mid', 'senior']
df_example['experience'] = pd.cut(df_example['age'], bins, labels=category)
df_example['experience'].value_counts().plot(kind='barh')
```