Pandas use a list of values to select rows from a column

Filtering pandas dataframe by list of a values is a common operation in data science world. You have two main ways of selecting data:

select pandas rows by exact match from a list
filter pandas rows by partial match from a list

Related resources:

Also pandas offers big variety of options to solve those problems. I'll recommend to use vectorized operations when it's possible because it's much faster:

Vectorization is the process of executing operations on entire arrays.

So let say that we have this data(Value count for a given column):

Value	Count
Another engineering discipline (ex. civil, electrical, mechanical)	6945
Information systems, information technology, or system administration	6507
A natural science (ex. biology, chemistry, physics)	3050
Mathematics or statistics	2818
Web development or web design	2418

and our goal is to find are this values part of the column and create a series with it:

area_list = ['biology', 'physics', 'Computer', 'enginnering']

to get output like:

	biology	physics	Computer	enginnering
0	False	False	False	False
1	True	True	False	False
2	False	False	True	False
3	False	False	True	False
4	False	False	True	False

and total count:

	biology	physics
False	73294	85904
True	18804	6194

This can be done by using this code:

import re
area_df = pd.DataFrame(dict((area, df.UndergradMajor.str.contains(area))
                             for area in area_list))

where:

we create a new dataframe for the result
use vectorized function str.contains in order to verify if the value is part of the column
create a dictionary for the result of the all values

This example show a partial match. If you want to use a full match than you can use another vectorized method from pandas which is str.isin. This is how to filter rows by exact match for the values of a list:

df[df['UndergradMajor'].isin(['Mathematics or statistics', 
                              'Web development or web design'])]

This will filter the rows of the dataframe which contains exactly the values from the list.

The bonus tip for today is how to apply value_counts for the whole dataframe or several columns. This can be done by:

df.apply(pd.Series.value_counts)

the result will be:

	Mobile	Data	QA
False	73294	70209	85904
True	18804	21889	6194

And perform value counts for several columns:

df[['Mobile','QA']].apply(pd.Series.value_counts)

the result will be:

	Mobile	QA
False	73294	85904
True	18804	6194

> Python Basics

> Advanced Python Tutorials

> Python Errors

> Pandas Advanced

> Pandas Count

> Pandas Column

> Pandas Basics

> Pandas DataFrame

> Pandas Row

> User Interface

> Advanced Linux

> Troubleshoot

> Video & Sound

> Linux Commands

> MySQL

> SQL Basics

> Python

> DB apps

> JupyterLab

> Jupyter Tips

> Jupyter Display

> Regex in Text Editor

> Regex Basics

> Regex Match

> Regex Date

> PyCharm Advanced

> Git and PyCharm

> PyCharm Error

> PyCharm Tips

> Linux Mint Applications

> VIrtual Machine

> Miscellaneous

> Java

> Automation

> Windows

> Office

> Cheat Sheet