
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Extract TXT Files from ZIP File Using Python
Python provides a built-in module called zipfile that allows us to create, read, write, and extract ZIP archives.
When we want to extract only specific files, such as all .txt files, then we can perform filtering of the file names using string methods such as endswith().
A ZIP file is one of the archive formats which is used to compress one or more files into a single file to easily store and transfer data. It reduces the file size and keeps all related files compressed together for sharing over the internet and for saving disk space.
Steps involved in Zip file Extraction
Following are the steps we need to follow while extracting all the text files from a zip file using Python -
- Import the zipfile and os modules.
- Open the ZIP file using zipfile.ZipFile() method.
- Get the list of all files using namelist() method.
- Filter the files that end with .txt.
- Extract the matched files to a specified directory using the extract() method.
In this article, we will see the different methods of extracting text files from a zip file using Python.
Extracting .txt Files using extract() Method
When we want to extract the .txt extension files from a Zip Archive, then we can extract them to a target directory with the help of the extract() method of the zipfile module to get all the individual entries with internal folder structure.
Example
Here is an example of extracting all the text files from a zip archive using the extract() method -
import zipfile import os # Path to your ZIP archive zip_path = r"D:\Tutorialspoint\Articles\archive.zip" # Directory where .txt files will be extracted output_dir = 'text_files' os.makedirs(output_dir, exist_ok=True) with zipfile.ZipFile(zip_path, 'r') as zip_ref: for member in zip_ref.namelist(): # Filter for .txt files (case-insensitive) if member.lower().endswith('.txt'): zip_ref.extract(member, output_dir) print(f"Extracted: {member}")
Here is the output of the above program -
Extracted: docs/notes.txt Extracted: readme.txt Extracted: logs/today_log.txt
Extracting .txt Files Using Path.suffix in Python
When we are working with ZIP files in Python, we may want to extract only specific file types, such as .txt files.
In such cases, we have the Path.suffix() function in Python's pathlib module, which is used to check file extensions in a better way than traditional string methods.
Example
Below is an example that is used to extract the text files from the zip file using Python's path.suffix() method -
from pathlib import Path import zipfile import os # Path to the ZIP file zip_path = r"D:\Tutorialspoint\Articles\archive.zip" # Folder to extract .txt files into output_dir = 'only_txt_files' os.makedirs(output_dir, exist_ok=True) with zipfile.ZipFile(zip_path, 'r') as zip_ref: for file in zip_ref.namelist(): if Path(file).suffix.lower() == '.txt': zip_ref.extract(file, output_dir) print(f"Extracted: {file}")
Following is the output of the above example -
Extracted: notes/info.txt Extracted: logs/report.txt