Python Pandas 7 examples of filters and lambda apply
Using python and pandas you will need to filter your dataframes depending on a different criteria. You can do a simple filter and much more advanced by using lambda expressions. In this post you can see several examples how to filter your data frames ordered from simple to complex. Testing single or multiple values, expression with loc and isin, lambda functions:
So lets have this simple pandas data frame with some programming languages:
import pandas as pd
foo = [{
'Language': 'Python',
'Percent grow': 56
}, {
'Language': 'Java',
'Percent grow': 34
}, {
'Language': 'C',
'Percent grow': 25
}, {
'Language': 'C++',
'Percent grow': 12
}, {
'Language': 'go',
'Percent grow': 5
}]
df = pd.DataFrame(foo, index=pd.RangeIndex(0, len(foo)))
if you want to see the whole data set you can simply:
print(df)
result:
# Language Percent grow 0 Python 56 1 Java 34 2 C 25 3 C++ 12 4 go 5
The most basic and simple way to filter this data by column language is by:
print(df['Language'])
result:
0 Python 1 Java 2 C 3 C++ 4 go
you can also test your dataframe row by row with comparison:
print(df['Language'] == 'Java')
result:
0 False 1 True 2 False 3 False 4 False
Counting for dataframes in pandas can be done in several different ways depending on your needs:
print(len(df['Language'] == 'Java'))
print(df.count(axis='columns'))
print(df.loc[df['Language'] == "go"].count())
result:
5 0 2 1 2 2 2 3 2 4 2 dtype: int64 Language 1 Percent grow 1 dtype: int64
Next way to filter the output rows is by using loc and isin:
print(df.loc[df['Language'].isin(['Java','C'])])
result:
# Language Percent grow 1 Java 34 2 C 25
Sometimes the code above will raise an error(in case of more complex data types) and you need to apply lambda function. This is how to filter the rows using simple lambda condition:
mylambda = lambda x: x in ['C', 'C++']
print(df.loc[df['Language'].apply(mylambda)])
result:
# Language Percent grow 2 C 25 3 C++ 12
the same function but as a single line - testing the dataframes if contains the multiple values:
print(df.loc[df['Language'].apply(lambda x: x in ['C', 'C++'] )])
result:
# Language Percent grow 2 C 25 3 C++ 12
The whole example is here:
import pandas as pd
foo = [{
'Language': 'Python',
'Percent grow': 56
}, {
'Language': 'Java',
'Percent grow': 34
}, {
'Language': 'C',
'Percent grow': 25
}, {
'Language': 'C++',
'Percent grow': 12
}, {
'Language': 'go',
'Percent grow': 5
}]
df = pd.DataFrame(foo, index=pd.RangeIndex(0, len(foo)))
print(df)
print('--------------')
print(df['Language'])
print('--------------')
print(df['Language'] == 'Java')
print('--------------')
print(df.loc[df['Language'].isin(['Java','C'])])
print('--------------')
mylambda = lambda x: x in ['C', 'C++']
print(df.loc[df['Language'].apply(mylambda)])
print('--------------')
print(df.loc[df['Language'].apply(lambda x: x in ['C', 'C++'] )])
If you have any questions or problems feel free to share them.