text/CSV file to dataframe with Python and pandas

In this post you can find information about several topics related to files - text and CSV and pandas dataframes. The post is appropriate for complete beginners and include full code examples and results. The covered topics are:

Convert text file to dataframe
Convert CSV file to dataframe
Convert dataframe to text/CSV file
Error 'python' engine because the 'c' engine does not support regex separators

DataFrame is a two-dimensional labeled data structure in commonly Python and Pandas. Many people refer it to dictionary(of series), excel spreadsheet or SQL table. Often is needed to convert text or CSV files to dataframes and the reverse.

Convert text file to dataframe

Converting simple text file without formatting to dataframe can be done by(which one to chose depends on your data):

pandas.read_fwf - Read a table of fixed-width formatted lines into DataFrame
- pandas.read_fwf(filepath_or_buffer, colspecs='infer', widths=None, **kwds)
pandas.read_csv - Read CSV (comma-separated) file into DataFrame. Full list with parameters can be found on the link or at the bottom of the post.
- pandas.read_csv(filepath_or_buffer, sep=', ', delimiter=None,..)

Let's assume that we have text file with content like:

1 Python 35
2 Java 28
3 Javascript 15

Next code examples shows how to convert this text file to pandas dataframe.

Code example for pandas.read_fwf:

import pandas as pd
df = pd.read_fwf('myfile.txt')

Code example for pandas.read_csv:

import pandas as pd
df = pd.read_csv('myfile.txt', sep=" ")

import pandas as pd
df = pd.read_csv('myfile.txt', ,delimiter="\t")

import pandas as pd
df = pd.read_csv('myfile.txt', sep=" ", header=None, names=["Pos", "Lang", "Perc"])

the resulted dataframe is:

   Pos        Lang  Perc
0    1      Python    35
1    2        Java    28
2    3  Javascript    15

Convert CSV file to dataframe

The same logic can be applied to convert CSV file to dataframe. The example below shows converting file with data:

1, Python, 35
2, Java, 28
3, Javascript, 15

This can be read and converted to dataframe with:

import pandas as pd
df = pd.read_csv('myfile.txt', sep=",", header=None, names=["Pos", "Lang", "Perc"])

result:

   Pos         Lang  Perc
0    1       Python    35
1    2         Java    28
2    3   Javascript    15

Convert dataframe to CSV file

The reverse operation is done again with method of panda:

pandas.DataFrame.to_csv - Write DataFrame to a comma-separated values (csv) file
- DataFrame.to_csv(path_or_buf=None, sep=', ', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, mode='w', encoding=None, compression=None, quoting=None, quotechar='"', line_terminator='\n', chunksize=None, tupleize_cols=None, date_format=None, doublequote=True, escapechar=None, decimal='.')

Example:

df.to_csv('/home/user/myfile.txt', sep=',', encoding='utf-8', header=None)

this would result in:

,Pos,Lang,Perc
0,1, Python,35
1,2, Java,28
2,3, Javascript,15

If you don't want the headers and the indexes you can run:

df.to_csv('/home/user/myfile.txt', sep=',', encoding='utf-8', header=None, index=False)

result:

1, Python,35
2, Java,28
3, Javascript,15

Note: You can use None or False.

Error 'python' engine because the 'c' engine does not support regex separators

You may get error like - ParserWarning:

    ParserWarning: Falling back to the 'python' engine 
    because the 'c' engine does not support regex separators 
    (separators > 1 char and different from '\s+' are interpreted as regex); 
    you can avoid this warning by specifying engine='python'.
      df = pd.read_csv('/home/user/myfile.txt', sep=", ", header=None, names=["Pos", "Lang", "Perc"])

this means that you are using more than one separator for method: pd.read_csv. In order to solve it leave only one of the separators.

pandas.read_csv signature

You can see all the parameters which can be used for method: pandas.read_csv and their default values:

pandas.read_csv(filepath_or_buffer, sep=', ', delimiter=None, header='infer', names=None, index_col=None, usecols=None,
                squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None,
                true_values=None, false_values=None, skipinitialspace=False, skiprows=None, nrows=None, na_values=None,
                keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False,
                infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, iterator=False,
                chunksize=None, compression='infer', thousands=None, decimal=b'.', lineterminator=None, quotechar='"',
                quoting=0, escapechar=None, comment=None, encoding=None, dialect=None, tupleize_cols=None,
                error_bad_lines=True, warn_bad_lines=True, skipfooter=0, skip_footer=0, doublequote=True,
                delim_whitespace=False, as_recarray=None, compact_ints=None, use_unsigned=None, low_memory=True,
                buffer_lines=None, memory_map=False, float_precision=None)

> Python Basics

> Advanced Tutorials

> Python Errors

> Pandas Advanced

> Pandas Count

> Pandas Column

> Pandas Basics

> Pandas DataFrame

> Pandas Row

> User Interface

> Advanced Linux

> Troubleshoot

> Video & Sound

> Linux Commands

> MySQL

> SQL Basics

> Python

> DB apps

> JupyterLab

> Jupyter Tips

> Jupyter Display

> Regex in Text Editor

> Regex Basics

> Regex Match

> Regex Date

> PyCharm Advanced

> Git and PyCharm

> PyCharm Error

> PyCharm Tips

> Linux Mint Applications

> VIrtual Machine

> Miscellaneous

> Java

> Automation

> Windows

> Office

> Cheat Sheet

Convert text file to dataframe

Convert CSV file to dataframe

Convert dataframe to CSV file

Error 'python' engine because the 'c' engine does not support regex separators

pandas.read_csv signature