How to open and search huge files - Windows, Linux

Most people have different definitions for huge files and moreover this is changing each year. In this article and year(2017) we will think for huge file as 10 GB. There are several strategies and application that can help us dealing with such files.

Windows

Split for windows

In windows you will need to use additional software or a program code(see the section below).

GSplit

GSplit is a free tool which offers many features about file split.

  • split by a particular size
  • split by amount of blocks
  • split into sizes for maximum storage space efficiency.
  • option to create a small standalone executable that merges all output files(in case of sharing)

HJSplit

Another freeware for Windows file splitting called HJSplit.

  • user-friendly
  • simple

After the split you can work with the smaller files with Notepad++ 64 bit version. With seems to work fine with files up to 2GB.

Read and search huge logs

LogExpert

It is able to work very smoothly with file bigger than 8 GB log files

  • a tail -like application for MS Windows
  • powerful logfile analysis tool

glogg
Another handy tool is glogg

  • read only, read the file directly from disk
  • handle huge files (bigger than 20GB files)
  • use of regular expressions for search
  • very good perfomance

Linux

Split the file into smaller files with Linux is easy and trivial operation. This one can be done easily and without special software. It will work for logs and text files.

Linux shell read and split files

Divide files to 1GB files. The huge file is automatically split so it will have number of parts which will correspond to the size of the initial file. So in most cases 10 GB => 10 parts. Below is a command for splitting and combining again.

#split by size
split -b 1G -d hugeFile hugeFileSplit 
cat hugeFileSplit * > bigfile

#split with MB and folder
split --bytes=2048m /folder/hugeFile  /folder/prefix
cat /folder/prefix* > hugeFile

#split with number of lines - 10000
split -l 10000 hugeFile 

#view last 300 lines from a log in live mode
sudo tail -f -n300 /folder/hugeLog.log

Info:

  • split -l 10000 -d --additional-suffix=.txt $FileName file
  • -l 10000 : split file into files of 10,000 lines each.
  • -d: numerical suffix. This will make the suffix go from 00 to 99 by default instead of aa to zz.
  • --additional-suffix: lets you specify the suffix, here the extension
  • $FileName: name of the file to be split.
  • file: prefix to add to the resulting files.

Read huge files line by line with code

Groovy

Read file line by line and print all lines between 10,000 and 12,000. All you need to do is to download Groovy and then start the groovy console from folder /bin and run this code.

File file = new File(/C:\Users\user\Desktop\hugeFile.txt/)
       def lineNo = 1
       def line
       file.withReader {
       reader->while ((line = reader.readLine()) != null) {
               if (lineNo >= 10000 & lineNo <= 12000) {
                       println "${lineNo}. ${line}"
               }
               lineNo++
       }
}

Python

Read big file line by line with python. This will read and print all lines between: 10000 and 12000 . It shows the lines in the console output and also write it to a file. If you want to use python and you are not administrator of your computer than you can download the portable version from here: PortablePython

 with open(r'C:\Users\user\Desktop\hugeFile.txt') as f:
    with open(r'C:\user\user\Desktop\out.txt', 'w') as fout:
        for i, line in enumerate(f):
            if (10000 < i < 12000 ):
                print (i, ": ", line) 	#print the line to console
                fout.write(line) 	#write to file

Java

Reading huge file efficiently with java. You can download Eclipse in order to run this code.

import org.apache.commons.io.LineIterator;
import org.apache.commons.io.FileUtils;

File file = new File(/C:\Users\user\Desktop\hugeFile.txt/);
LineIterator it = FileUtils.lineIterator(file, "UTF-8");
try {
    while (it.hasNext()) {
        String line = it.nextLine();
        println line
    }
} finally {
    LineIterator.closeQuietly(it);
}