Python Pandas 7 examples of filters and lambda apply

Using python and pandas you will need to filter your dataframes depending on a different criteria. You can do a simple filter and much more advanced by using lambda expressions. In this post you can see several examples how to filter your data frames ordered from simple to complex. Testing single or multiple values, expression with loc and isin, lambda functions:

So lets have this simple pandas data frame with some programming languages:

import pandas as pd
foo = [{
    'Language': 'Python',
    'Percent grow': 56
}, {
    'Language': 'Java',
    'Percent grow': 34
}, {
    'Language': 'C',
    'Percent grow': 25
}, {
    'Language': 'C++',
    'Percent grow': 12
}, {
    'Language': 'go',
    'Percent grow': 5
}]
df = pd.DataFrame(foo, index=pd.RangeIndex(0, len(foo)))

if you want to see the whole data set you can simply:

print(df)

result:

#  Language  Percent grow
0   Python            56
1     Java            34
2        C            25
3      C++            12
4       go             5

The most basic and simple way to filter this data by column language is by:

print(df['Language'])

result:

0    Python
1      Java
2         C
3       C++
4        go

you can also test your dataframe row by row with comparison:

print(df['Language'] == 'Java')

result:

0    False
1     True
2    False
3    False
4    False

Counting for dataframes in pandas can be done in several different ways depending on your needs:

print(len(df['Language'] == 'Java'))
print(df.count(axis='columns'))
print(df.loc[df['Language'] == "go"].count())

result:

5

0    2
1    2
2    2
3    2
4    2

dtype: int64
Language        1
Percent grow    1
dtype: int64

Next way to filter the output rows is by using loc and isin:

print(df.loc[df['Language'].isin(['Java','C'])])

result:

#  Language  Percent grow
1     Java            34
2        C            25

Sometimes the code above will raise an error(in case of more complex data types) and you need to apply lambda function. This is how to filter the rows using simple lambda condition:

mylambda = lambda x: x in ['C', 'C++']
print(df.loc[df['Language'].apply(mylambda)])

result:

#    Language  Percent grow
2        C            25
3      C++            12

the same function but as a single line - testing the dataframes if contains the multiple values:

print(df.loc[df['Language'].apply(lambda x: x in ['C', 'C++'] )])

result:

#    Language  Percent grow
2        C            25
3      C++            12

The whole example is here:

import pandas as pd
foo = [{
    'Language': 'Python',
    'Percent grow': 56
}, {
    'Language': 'Java',
    'Percent grow': 34
}, {
    'Language': 'C',
    'Percent grow': 25
}, {
    'Language': 'C++',
    'Percent grow': 12
}, {
    'Language': 'go',
    'Percent grow': 5
}]

df = pd.DataFrame(foo, index=pd.RangeIndex(0, len(foo)))
print(df)
print('--------------')
print(df['Language'])
print('--------------')
print(df['Language'] == 'Java')
print('--------------')
print(df.loc[df['Language'].isin(['Java','C'])])
print('--------------')
mylambda = lambda x: x in ['C', 'C++']
print(df.loc[df['Language'].apply(mylambda)])
print('--------------')

print(df.loc[df['Language'].apply(lambda x: x in ['C', 'C++'] )])

If you have any questions or problems feel free to share them.