Python Programming Unit 3 & 4 Master Guide
In-depth notes, interactive code templates, completely solved assignments, and top 20 predicted exam questions. Switch tabs above to begin!
Unit 3: File Handling & Organization
1. Reading and Writing Files
Python provides built-in mechanisms for performing Input/Output (I/O) operations on files. The foundation of file manipulation is the open() function.
File Opening Modes
Reading Methods
read(): Reads the entire file content as a single giant string.readline(): Reads a single line from the file per call.readlines(): Returns a list of strings, where each string is a line.
# Using Context Managers ('with' statement) automatically closes the file
with open('example.txt', 'w') as f:
f.write("First Line\n")
f.write("Second Line\n")
with open('example.txt', 'r') as f:
lines = f.readlines()
print(lines) # Output: ['First Line\n', 'Second Line\n']
2. The os.path Module
Operating systems use different path separators (Windows: \, Unix/Mac: /). The os.path module creates cross-platform programs by handling these differences dynamically.
Intelligently combines path components using the correct OS separator. os.path.join('folder', 'file.txt') → folder\file.txt (Win)
Returns True if the file or folder physically exists on the hard drive.
Converts a relative path (e.g., ./file.txt) into an absolute path (e.g., C:\Users\data\file.txt).
Extracts the base file name from a long path. basename('/usr/bin/python') → 'python'
Extracts the directory path preceding the file name. dirname('/usr/bin/python') → '/usr/bin'
3. Saving Variables (shelve & pprint)
Instead of converting lists and dictionaries to text formats to save them, Python provides modules to save native structures directly.
The shelve Module
Creates a binary persistent dictionary. Keys must be strings, but values can be any picklable Python object.
import shelve
# Write
with shelve.open('mydata') as db:
db['config'] = [1920, 1080]
# Read
with shelve.open('mydata') as db:
resolution = db['config']
pprint.pformat()
Returns a formatted, syntactically correct Python string representation of a complex variable to save as a .py file.
import pprint
data = [{'name': 'A', 'age': 10}]
text = pprint.pformat(data)
with open('my_vars.py', 'w') as f:
f.write('saved_data = ' + text)
4. Organizing Files (shutil & os.walk)
The shutil Module
Short for shell utilities. It performs high-level operations on files and collections of files.
shutil.copy(src, dst): Copies a single file.shutil.copytree(src, dst): Copies an entire folder and all nested files/folders.shutil.move(src, dst): Moves a file or folder (also used for renaming).shutil.rmtree(path): Irreversibly deletes an entire directory tree.
Walking a Directory Tree: os.walk()
os.walk() is used to traverse through directories. In a loop, it yields a 3-value tuple for every folder it visits:
for root, dirs, files in os.walk(path):
root: A string of the absolute path of the current folder.
dirs: A list of strings of the sub-folders inside the current folder.
files: A list of strings of the files inside the current folder.
5. Compressing Files (zipfile module)
Python can create, read, and extract ZIP archives natively.
- Creating/Writing: Open with
zipfile.ZipFile('name.zip', 'w'). Callwrite(filename, compress_type=zipfile.ZIP_DEFLATED)to compress. - Extracting: Open in
'r'mode. Callextractall('target_folder')to unzip everything. - Reading Metadata: Use
namelist()to get an array of all files inside the zip. Usegetinfo(filename)to get an object with attributes likefile_sizeandcompress_size.
Unit 4: Web Scraping & GUI
1. Web Scraping & webbrowser
Web scraping is an automated method to obtain large amounts of unstructured data (HTML) from websites and convert it into structured data. It requires a scraper algorithm.
Project: mapit.py
The simplest automation involves the webbrowser module, which opens a URL in your default browser. In the mapit.py project, we launch Google Maps using an address pulled from command line arguments or the clipboard.
# mapit.py
import webbrowser, sys, pyperclip
if len(sys.argv) > 1:
# Read address from command line
address = ' '.join(sys.argv[1:])
else:
# Read address from clipboard
address = pyperclip.paste()
webbrowser.open('https://www.google.com/maps/place/' + address)
2. Downloading Files (requests Module)
The requests module handles HTTP requests programmatically. It is superior to standard library urllib due to its simple API.
Downloading Content
Use requests.get(url) to fetch content. The response is stored in an object.
res = requests.get('http://site.com/file')
Error Handling
Always call raise_for_status(). It raises an exception if the download failed (e.g., 404 Not Found) instead of silently corrupting data.
res.raise_for_status()
Saving to Hard Drive
You MUST open files in Binary Write Mode ('wb') when saving web content to preserve Unicode formats and prevent corruption of images/PDFs.
To prevent RAM crashes on large files, we stream the download using iter_content(chunk_size).
res = requests.get(url, stream=True)
res.raise_for_status()
with open('large_file.zip', 'wb') as file:
# Download in 100KB chunks
for chunk in res.iter_content(chunk_size=100000):
file.write(chunk)
3. HTML (Hypertext Markup Language)
To scrape websites using tools like BeautifulSoup, you must understand HTML structure. Webpages are built using elements represented by tags.
- Tags: Surround content. e.g.,
<p>(paragraph),<a>(hyperlink),<img>(image). - Attributes: Add properties to tags. E.g.,
href(URL for links),src(image source),class, andid. - Structure:
<html>contains<head>(metadata) and<body>(visible content).
4. GUI Programming with Tkinter
Tkinter is the standard GUI library in Python for building desktop applications.
tk.Tk()
tk.Button(), tk.Label()
widget.pack()
root.mainloop()
import tkinter as tk
def close_app():
root.destroy()
root = tk.Tk()
root.title("My App")
btn = tk.Button(root, text="Exit", command=close_app)
btn.pack(pady=20)
root.mainloop()
Official Assignments Solved
Expanded answers structured for 5-Mark Questions
Assignment - 3
Q1. What do you mean by the OS & Shutil Module?
Definition & Concept (2 Marks):
- OS Module: The
osmodule in Python provides a portable way of using operating system-dependent functionality. It allows you to interface with the underlying OS, enabling tasks like reading or writing to the file system, managing paths, and fetching environment variables. - Shutil Module: Short for 'shell utilities', the
shutilmodule offers a higher-level interface for file operations. Whileoshandles basic path manipulations,shutilis specifically designed for complex file and collection management (like copying or deleting entire directory trees).
Key Methods (1.5 Marks):
os.getcwd(),os.chdir(path),os.makedirs(path),os.path.join()shutil.copy(src, dst),shutil.move(src, dst),shutil.copytree(src, dst),shutil.rmtree(path)
Code Example (1.5 Marks):
import os, shutil
# OS Module: Get current directory and create a new folder
current_dir = os.getcwd()
new_folder = os.path.join(current_dir, 'backup_folder')
if not os.path.exists(new_folder):
os.makedirs(new_folder)
# Shutil Module: Copy a file into the new folder
shutil.copy('data.txt', new_folder)
print("File copied successfully!")
Q2. What is the difference between seek() and tell() methods?
Concept of File Pointers (1 Mark): Whenever a file is opened in Python, the system maintains a "file pointer" (or cursor) that tracks the current reading/writing position in bytes.
tell() Method (1.5 Marks)
- Purpose: Returns the current position of the file pointer.
- Syntax:
file_object.tell() - Returns: An integer representing the byte offset from the beginning of the file.
seek() Method (1.5 Marks)
- Purpose: Moves the file pointer to a specific location.
- Syntax:
file_object.seek(offset, whence) - Whence Values: 0 (Start of file), 1 (Current position), 2 (End of file).
Code Example (1 Mark):
with open('sample.txt', 'r') as file:
print("Initial position:", file.tell()) # Output: 0
file.read(5) # Read first 5 characters
print("Position after reading:", file.tell()) # Output: 5
file.seek(0) # Move pointer back to start
print("Position after seek:", file.tell()) # Output: 0
Q3. How to compress a file using zipfile module?
Step-by-Step Explanation (3 Marks):
- Import Module: First, import Python's built-in
zipfilemodule. - Open ZipFile Object: Create a new
ZipFileobject using the'w'(write) or'a'(append) mode. It is highly recommended to use thewithstatement to ensure the archive is properly closed. - Write and Compress: Call the
write()method on the ZipFile object. Pass the name of the file you want to compress. - Compression Algorithm: To ensure the file is actually compressed (and not just stored), you must pass the
compress_type=zipfile.ZIP_DEFLATEDargument.
Code Example (2 Marks):
import zipfile
# Step 1 & 2: Open a new zip file in write mode
with zipfile.ZipFile('my_backup.zip', 'w') as zip_archive:
# Step 3 & 4: Add 'report.pdf' to the archive and compress it
zip_archive.write('report.pdf', compress_type=zipfile.ZIP_DEFLATED)
print("File successfully compressed!")
Q4. Explain the shelve module with all its methods.
Concept of Shelve (2 Marks): The shelve module implements persistent storage for arbitrary Python objects. It acts like a persistent dictionary. Instead of writing parsing code to convert text files back into Python lists or dictionaries, you can save variables directly to a binary file and retrieve them later using a string-based key.
Important Methods & Operations (1.5 Marks):
shelve.open('filename'): Opens a shelf file. If the file doesn't exist, it creates it.- Dictionary Methods: It supports standard dictionary methods like
keys(),values(), anditems(). - Assignment: You store data using dictionary syntax:
shelfFile['key'] = data. shelfFile.close(): Critical method to ensure data is flushed and saved to the hard drive.
Code Example (1.5 Marks):
import shelve
# 1. Saving Data
with shelve.open('my_data') as db:
db['config'] = {'theme': 'dark', 'version': 2.0}
db['users'] = ['Alice', 'Bob', 'Charlie']
# 2. Retrieving Data
with shelve.open('my_data') as db:
print(list(db.keys())) # Output: ['config', 'users']
users_list = db['users']
print(users_list[0]) # Output: Alice
Assignment - 4
Q1. What is Web Scraping and what is the use of the webbrowser module?
Web Scraping (2 Marks): Web scraping is the automated, programmatic extraction of large amounts of data from websites. Instead of a human manually copying and pasting information, an algorithm fetches the HTML code of a web page, parses the DOM tree, and extracts structured data (like product prices, news headlines, or weather data) for analysis or database storage.
webbrowser Module (1.5 Marks): Python's built-in webbrowser module provides a high-level interface to allow displaying Web-based documents to users. In scripts, it is primarily used to automatically open a specific URL in the system's default web browser. It is excellent for automation tasks (e.g., automatically opening a map based on an address).
Code Example (1.5 Marks):
import webbrowser
address = "Taj Mahal, Agra"
print("Opening map for:", address)
# Opens the default browser (Chrome/Edge/Firefox) to the specified URL
webbrowser.open('https://www.google.com/maps/place/' + address)
Q2. How to download a file from the Web using the requests module?
The Requests Module (2 Marks): requests is an elegant and simple HTTP library for Python. To download a file, you use the requests.get(url) function, which sends an HTTP GET request to the server and returns a Response object containing the file's data.
Error Handling / Validation (1.5 Marks): Web requests can fail (e.g., 404 Not Found, 500 Internal Server Error). It is a mandatory best practice to call the raise_for_status() method on the response object immediately after downloading. This method raises an HTTPError exception if the download was unsuccessful, preventing the program from writing a corrupted file.
Code Example (1.5 Marks):
import requests
url = 'http://www.gutenberg.org/cache/epub/1112/pg1112.txt'
try:
# Send GET request
response = requests.get(url)
# Halt execution if the status code is not 200 OK
response.raise_for_status()
print("File downloaded successfully. Length:", len(response.text))
except requests.exceptions.HTTPError as err:
print(f"HTTP Error occurred: {err}")
Q3. How to save the downloaded file from the web to the Hard Drive?
Binary Write Mode (2 Marks): After downloading data with requests, you must save it by opening a local file in binary write mode ('wb'). Even if the file is plain text, using binary mode prevents Python from attempting to translate Unicode characters, which could corrupt files like PDFs or images.
Memory Management (1.5 Marks): For large files, loading the entire response into RAM at once will crash your program. Instead, use the iter_content(chunk_size) method in a loop to write the file in small, manageable chunks (e.g., 100,000 bytes at a time).
Code Example (1.5 Marks):
import requests
res = requests.get('https://example.com/large_image.jpg')
res.raise_for_status()
# Open file in 'wb' (Write Binary) mode
with open('saved_image.jpg', 'wb') as file:
# Loop through the data in chunks of 100KB
for chunk in res.iter_content(chunk_size=100000):
file.write(chunk)
print("File securely saved to hard drive!")
Q4. Write a program to implement the GUI.
Tkinter Overview (2 Marks): tkinter is Python's standard, built-in Graphical User Interface (GUI) package. It allows developers to create desktop applications with native-looking windows, buttons, text inputs, and menus.
Key Components (1.5 Marks):
tk.Tk(): Initializes the main application window.- Widgets: Elements like
tk.Buttonortk.Labelthat are placed inside the window. pack(): A geometry manager that organizes and renders the widgets on the screen.mainloop(): An infinite loop that catches events (like button clicks) and keeps the window open.
Code Example (1.5 Marks):
import tkinter as tk
# Define a function for the button click event
def display_message():
label.config(text="Hello! GUI is working.", fg="green")
# 1. Create main window
root = tk.Tk()
root.title("My Python GUI")
root.geometry("300x150")
# 2. Create Widgets
label = tk.Label(root, text="Welcome to Tkinter", font=("Helvetica", 12))
btn = tk.Button(root, text="Click Me", command=display_message)
# 3. Pack widgets into the window
label.pack(pady=20)
btn.pack()
# 4. Run the application
root.mainloop()
Exam Predictions: Top 20 Q&A
Carefully curated important questions from the syllabus, expanded for 5-Mark Formats.
Unit 3 Predictions
1. Difference between absolute and relative path?
- Absolute Path: Provides the complete, exact location of a file or directory starting from the root of the file system (e.g.,
C:\on Windows or/on Linux). It does not depend on your current working directory. - Example:
C:\Users\John\Documents\file.txt - Relative Path: Specifies the location of a file relative to the program's current working directory (CWD). It is shorter and more portable.
- Example:
./data/file.txt(where.means current directory, and..means parent directory).
2. Why is os.path.join() important for file paths?
- Cross-Platform Compatibility: Different operating systems use different characters to separate folders in a path. Windows uses backslashes (
\), while macOS and Linux use forward slashes (/). - Error Prevention: Hardcoding slashes (like
folder + '/' + file) will cause your script to crash when run on an OS that expects a different separator. - Solution:
os.path.join()takes string arguments and intelligently combines them using the correct separator for the host operating system.
import os
path = os.path.join('Users', 'John', 'docs.txt')
# Returns 'Users\John\docs.txt' on Win, 'Users/John/docs.txt' on Mac
3. Difference between read() and readlines()?
- read(): Reads the entire contents of a file and returns it as a single, large multi-line string. Useful for reading small files completely into memory at once.
- readlines(): Reads the entire file but splits it by newline characters, returning a List of Strings. Each element in the list represents a single line from the file. Useful when you need to iterate or index specific lines.
4. How do you delete a folder and all its contents?
- os.rmdir limitation: The built-in
os.rmdir(path)function can only delete a directory if it is completely empty. If there are files inside, it throws anOSError. - The Solution: You must use the
shutilmodule'srmtree()function. shutil.rmtree(path)recursively walks through the directory tree, deleting all internal files and subfolders, and finally deletes the parent folder itself.
import shutil
shutil.rmtree('C:\\old_project_backup')
5. What does pprint.pformat() do?
- Purpose: The
pprint(pretty print) module formats complex data structures (like heavily nested dictionaries) to make them human-readable. - pformat(): While
pprint.pprint()prints directly to the console,pprint.pformat()returns the formatted text as a syntactically correct Python string. - Saving Data: You can write this string to a
.pyfile. This effectively saves your variables as a custom Python module, which you can laterimportto restore the data.
6. Explain the values yielded by os.walk().
- Traversal:
os.walk(path)generates the file names in a directory tree by traversing top-down or bottom-up. - The Yield: In a loop, it yields a tuple of three values for every directory it visits:
foldername(String): The absolute path of the current folder.subfolders(List): A list of strings representing the folders directly inside the current folder.filenames(List): A list of strings representing the files directly inside the current folder.
7. How to check if a file path exists?
- Existence Check: You use
os.path.exists(path). It returns a booleanTrueif the file or directory exists, andFalseotherwise. - Specific Checks:
- To check if it exists AND is specifically a file:os.path.isfile(path)
- To check if it exists AND is specifically a directory:os.path.isdir(path)
8. Difference between shutil.copy() and copytree()?
- shutil.copy(src, dst): Copies a single file from the source path to the destination path. If the destination is a folder, it drops the file inside it with the original name.
- shutil.copytree(src, dst): Copies an entire directory tree. It takes the source folder and copies it, along with every file and subfolder nested inside it, creating a completely cloned directory at the destination.
9. How do you find the compressed size of a zip file?
- ZipInfo Object: You must first open the zip file in read mode using
zipfile.ZipFile(). - getinfo(): Call the
getinfo('filename')method on the ZipFile object. This returns aZipInfoobject containing metadata about that specific file. - Attributes: You can then access
file_size(original size in bytes) andcompress_size(compressed size in bytes) attributes.
10. Explain the 'a' file opening mode.
- Append Mode: The
'a'mode stands for 'append'. It is used when you want to add data to an existing file without deleting what is already there. - Mechanism: Unlike the
'w'(write) mode which instantly erases/truncates an existing file, the'a'mode opens the file and places the file pointer at the very end. - Creation: If the file does not exist, append mode will automatically create a new blank file, just like write mode.
Unit 4 Predictions
1. Name three popular Python libraries for web scraping.
- Requests: Used for making HTTP requests to download the raw HTML content of a webpage easily.
- BeautifulSoup: A parsing library that navigates the DOM tree of HTML/XML files. It is used to extract specific data (like finding all
<div>tags with a certain class). - Selenium / Playwright: Browser automation frameworks used for scraping dynamic, JavaScript-heavy websites that require clicking, scrolling, or waiting for elements to load.
2. What does res.raise_for_status() do?
- Error Detection: When using the
requestsmodule, downloading a page doesn't crash the program if a 404 (Not Found) or 500 (Server Error) occurs. It just downloads the error page text. - Immediate Halt:
raise_for_status()checks the HTTP status code. If it's a success (200 OK), it does nothing. If the download failed, it raises anHTTPErrorexception. - Best Practice: Always call this immediately after
requests.get()so your program doesn't proceed to process or save corrupted/error HTML data.
3. Why do we write downloaded files in binary mode ('wb')?
- Encoding Protection: Web content consists of various encodings and binary data types (Images, ZIPs, PDFs, EXEs).
- Translation Issues: If you open a file in standard write mode (
'w'), Python attempts to interpret and encode the text (often as UTF-8) and handle newline character translations based on your OS. - Corruption Prevention: Writing in
'wb'(Write Binary) mode dumps the exact bytes downloaded directly to the hard drive, ensuring no data is mutated or corrupted during the save process.
4. What is BeautifulSoup and why is it needed?
- The Problem: The
requestsmodule returns raw HTML as a giant, unstructured text string. Searching this text using basic string manipulation or regular expressions is incredibly error-prone due to HTML inconsistencies. - The Solution:
BeautifulSoupis a Python library that takes raw HTML and parses it into a structured, navigable DOM tree object. - Functionality: It allows developers to search for elements semantically using methods like
soup.find_all('div', class_='price')to extract exact data points effortlessly.
5. How does the mapit.py project automate Google Maps?
- Input Handling: The script uses the
sys.argvlist to read a street address typed in the command prompt. If no arguments are typed, it uses thepyperclipmodule to read the address currently copied to the user's clipboard. - URL Construction: It appends the retrieved address string to the base Google Maps URL (
'https://www.google.com/maps/place/' + address). - Execution: Finally, it uses the
webbrowser.open()function to launch the system's default browser directly to that specific map location, saving the user multiple manual clicks.
6. Explain the structure of an HTML element.
- Opening Tag: The name of the element enclosed in angle brackets (e.g.,
<a>for a link,<p>for a paragraph). - Attributes: Placed inside the opening tag, providing extra configuration or metadata (e.g.,
href="https..."orclass="highlight"). - Content/Inner Text: The text or nested elements displayed on the webpage located between the opening and closing tags.
- Closing Tag: Identical to the opening tag but preceded by a forward slash (e.g.,
</a>) to mark the end of the element.
7. What is Tkinter?
- Definition:
Tkinteris the standard, built-in Graphical User Interface (GUI) library for Python. It is a wrapper around the Tcl/Tk GUI toolkit. - Purpose: It allows developers to create native desktop applications with windows, dialogs, buttons, text boxes, and menus, moving away from simple command-line terminal scripts.
- Usage: You instantiate a root window with
tk.Tk(), instantiate widgets (liketk.Button), place them using layout managers likepack(), and trigger an infinite event loop usingmainloop().
8. What is the role of iter_content() in requests?
- Memory Bottleneck: If you download a 5GB video file using
response.content, Python attempts to load the entire 5GB into RAM simultaneously, which will likely crash the program (MemoryError). - Chunking: The
iter_content(chunk_size)method solves this by acting as a generator. It streams the download, yielding small chunks of data (e.g., 100,000 bytes at a time). - Implementation: You use a
forloop to receive a chunk, write it to the hard drive, and then discard it from RAM, keeping the memory footprint consistently low regardless of file size.
9. What does the pandas library do in the context of scraping?
- Data Structuring: While BeautifulSoup is responsible for extracting the raw variables (titles, prices, dates) from HTML, those variables sit in simple Python lists.
- DataFrames:
pandastakes those lists and structures them into a 2-Dimensional table called a DataFrame, aligning columns and rows perfectly. - Data Export:
pandasprovides highly optimized, one-line methods to export this scraped data into production formats, such asdf.to_csv('data.csv'),df.to_excel(), or SQL databases.
10. What does the pack() method do in Tkinter?
- Geometry Manager: In Tkinter, simply creating a widget (like a Button) does not display it on the screen. You must use a geometry manager to position it.
- pack() Execution:
pack()is the simplest geometry manager. It automatically organizes widgets into blocks and stacks them sequentially (usually top-to-bottom) within the parent window. - Flexibility: It accepts arguments like
padyorpadxfor padding, andside(e.g.,tk.LEFT) to control alignment without requiring absolute x/y coordinates.