For a program to retain data between the times it is run, you must save the data. Data is saved to a file, typically on a computer disk. Saved data can be retrieved and used at a later time.
‘Writing data to’: saving data on a file
Output file: a file that data is written to
‘Reading data from’: process of retrieving data from a file
Input file: a file from which data is read
There are three steps when a program uses a file. These are:
- Open the file
- Process the file
- Close the file
Strings and Their Methods
Python includes a set of built-in functions (or methods) that can be invoked on strings to perform operations. Generally, these functions do not modify the original string, instead, they are used to create new strings that have been altered.
This topic covers a few examples. To see more examples and methods, click here.
String in Python
Strings in python are sequences of characters (UNICODE Characters). See the example of UNICODE characters here.
A character could be an alphabet or any symbol, and a sequence could be considered as an orderly collection of items (for example a list or an array), where each item is indexed based on its position in the sequence by an integer.
Generally, we usually use the triple quote when we want to extend to multiple lines or store large sequences of characters as strings.
How to access String in python?
As mentioned above, strings are a sequence of characters and could be considered an orderly collection of items similar to a list or an array.
Does that mean we can access strings like a list? Yes!
Y | O | O | B | E | E |
0 | 1 | 2 | 3 | 4 | 5 |
The table above shows the string “YOOBEE” as a sequence of six characters that can be indexed and sliced. Each character in the sequence can be accessed using its index, and a range of characters can be accessed using slicing.
The following examples can be entered line by line into the Python console to see the output.
The example below demonstrates how to access the first letter of a string variable. The character at the index of 0 is stored in a new variable before being output.
Enter each line into the Python console
Python also allows negative indexing to reference values from the end of a string.
For example, to get the last character of the string "YOOBEE" we can do the following:
The example below demonstrates how to access the last letter of a string variable. The character at the index of -1 is stored in a new variable before being output.
Enter each line into the Python console
Slicing
To get a range of characters, we can use the slicing operator ‘:’ (a colon). The syntax for this is in the form:
To get the 4th to 6th characters:
The example below demonstrates how to access the 4th to 6th characters of a string variable. The character from the index of 3 to the index of 6 are stored in a new variable before being output.
Enter each line into the Python console
Replace
We use the replace() method to replace a specified string (oldvalue) with another specified string (newvalue).
Syntax:
or
Note that count is an optional parameter used to specify the number of times the old value is replaced with the new value.
The example below demonstrates how to replace "four" in a string variable with "three". The old value "four" is given as the first parameter, the new value of "three" is given as the second parameter, and the return value is stored in a new variable before being output.
Enter each line into the Python console
Count
The count() method returns the number of times a specified value appears in a string.
Syntax:
or
Note that start and end are optional parameters used to specify where in a string to start and end the search.
The example below demonstrates how to search for and count the number of times "one" occurs in a string. The value "one" is given as the parameter, and the return value is stored in a new variable before being output.
Enter each line into the Python console
There are many other built-in methods included with python. See more examples here.
String Formatting and Escape Sequence
String formatting (aka, string interpolation) involves the insertion of a custom string or variable in a predefined string. Generally, Python uses C-style formatting to create new, formatted strings. Users may want to have results or text/strings printed in a specific format.
There are several ways to format strings in python:
- The use of the modulo ‘%’ operator for formatting
- Formatting with f-strings
- Formatting with built-in function format().
- Formatting with String Template class.
using ‘%’ operator for formatting
The string to format contains normal text together with "argument specifiers" (special symbols like "%s" and "%d").
The following are some basic argument specifiers used to format a text string.
- %s - String (or any object with a string representation, like numbers)
- %d - Integers
- %f - Floating point numbers (including a decimal point followed by a value can be used to specify the number of decimal places to use e.g. %.2f for two decimal places.
At the end of the text to be formatted, a “%” operator is used, followed by a tuple containing the variables in order of use.
The example below declares a series of variables of different types to define a data set about a student called “John Doe”.
The variable “output” is used to format a text string using the previously declared variables.
The first %s is substituted for name, %d for age, %.1f for height(with one decimal place), and the second and third %s for the role and school, respectively.
Formatting with f-strings
In the example below, information for John Doe from the variable values is printed using “f” string to format the text. The variables are defined within curly braces and their respective positions in our text.
Formatting with .format() method
Call the .format() method on a string using the following syntax:
Curly braces {} are used in the string to be formatted to define placeholders called “format fields” that will be replaced with values stored in variables. The variables are passed as parameters to the .format() method.
In the example below, the print() function is called using the .format() method. The format fields from 0 to 4 will be substituted with the previously defined variables.
If no value is given within the curly braces, they will be substituted in the order the parameters are passed to the .format() method.
See more examples of how to use the .format() method here.
Example: Formatting with String Template class.
The built-in Template class can be imported from the python string module. The template class enables simplified syntax for the resulting output string specification.
A template is created using the Template constructor and passing it a string to format.
The string to format uses placeholder names prefixed with “$” that are replaced when the .subsitute() method is called.
In the example below, the template is constructed using $v1 to $v5 as placeholders.
The print() function is used to output the template using the .substitute() method. Each of the placeholders is given a previously defined variable to get its value for substitution.
Placeholders must be defined using valid Python identifiers (alphanumeric characters and underscores). Note that the template strings do not allow format specifiers like we have using the % modulo operator.
Concatenation
Concatenation is appending one string to the end of another string. Use the + operator to produce a string that is a combination of its operands. The augmented assignment operator += can also be used to concatenate strings. The operand on the left side of the += operator must be an existing variable; otherwise, an exception is raised.
Strings are immutable, once they are created, they cannot be changed. Concatenation does not actually change the existing string, instead it creates a new string and assigns the new string to the previously used variable.
Slicing and Casting
Slice: span of items taken from a sequence, known as substring
Slicing format: string[start : end]
Expression will return a string containing a copy of the characters from start up to, but not including, end
- If start not specified, 0 is used for start index
- If end not specified, len(string) is used for end index
- Slicing expressions can include a step value and negative indexes relative to end of string
Working with Files in Python
Files are named locations on the system disk to store related information. Read more about files here.
File handling generally involves creating, modifying (writing to a file or updating a file), reading (including extracting file content), and deleting different types of files (txt, csv, json, etc.). The skills you will learn from this component, will help you when you are building applications that require file-handling functionality.
Python provides several built-in functions and modules for working with files. One of the commonly used functions is the open() function.
Open() function:
Open() is a python built-in function to open or modify a file. It returns a file object and is generally used with two positional arguments and one keyword.
Syntax:
Argument descriptions:
mode: mode can be 'r' when the file will only be read, 'w' for only writing, and 'a' opens the file for appending; any data written to the file is automatically added to the end. 'r' will be assumed if it’s omitted.
You can use Open() to open and interact with a file using different mode flags:
"a" - Append - Opens a file for appending, creates the file if it does not exist
"w" - Write - Opens a file for writing, creates the file if it does not exist
"x" - Create - Creates the specified file, returns an error if the file exists
'r+' - opens the specified file for both reading and writing.
"t" - Text - Default value. Text mode
"b" - Binary - Binary mode (e.g., images)
encoding: this specifies if the string written and read to/from a file is encoded following a specific standard and format. For example, a Unicode standard (like utf-8) can encode all 1,112,064, valid character code points in Unicode using one to four one-byte code units. Read more about the standard here.
Note that both mode and encoding arguments are optional. You can open a file at the default, without specifying the arguments.
The default value for mode, in this case, is rt, hence you do not need to specify them. So, the above example could be re-written as:
When and how to use the open() function?
Common file operations are performed in the order of open -> read or write into the file -> close the file.
If you want to perform any of these operations in python, the open() function is very handy. Open() comes with different modes:
How to Open a file
Example: Creating new file, Opening, Writing, and Reading a file in Python:
The below example will create a new file (newfile.txt), open the file in a write mode using the flag ‘w’, and write string str (“hello world”) into the file which is open as object f.
Now that we have a file, we can read the file and close the file:
Why and when do we use the ‘with’ keyword?
Calling f.write() or f.read() with the “with” keyword make it easy for us to completely write or read the arguments of f.write(..) and f.read(..) into the disk. We also do not need to call the f.close() function when we use the ‘with’ keyword with these methods.
if you’re not using the ‘with’ keyword, then you should call f.close() to close the file and immediately free up any system resources used by it. See more examples here.
Extracting Content from a File
Reading Multiple Lines and Iterating over each Line in a File
We may want to read a file of multiple lines and process specific lines by index.
For example, say we have a file (john_doe.txt) that contains the following information about John Doe in separate lines:
Line 1: John Doe is a Student of Yoobee College.
Line 2: He is currently studying Software Engineering
Line 3: He is 18 years old.
We can read each line in this file by using the readline() function in python:
In the above example, the first line opens the file john_doe.txt as object f in read mode.
In the second line, we use the readline() method to read each line in the file into a variable “eachline”
To know where each line ends, we check for the end of the line (EOF).
In the third line of the example, we use a while loop with the condition that the line we are reading is not empty. Then, in the fourth line, we only print the line we read and terminate when we see an empty character “” (note this is different from space “ “).
In line five, we read the next line. Lines 3 -5 will be repeated for all the lines in the file.
We can simplify the above example using for loop without readline() statement. See below:
In the example above, we simply read the file as f object and print each line.
Extra: Working with files and Directories: See examples here
Deleting Files in Python
To delete a file, we can import another python built-in module os and use different functions such as os.remove() or os.unlink() to delete a file from a directory.
In the above example, we import the module os, and define a variable file_to_delete, which stores the path with the file name that we want to delete.
See more examples here.
Using Regular Expressions when Working with Files
A Regular Expression (RegEx) is a sequence of characters that defines a search pattern. Python has a module for the regular expression (see here).
In python, errors, including syntax errors, are described as complaints that you get when your code does not run successfully. Python terminates as soon as an error is encountered.
Consider two kinds of errors in pythons:
The Syntax Error (a.k.a Parser Error)
This is the most common error that you will encounter as a python beginner. It is an error that occur when the parser detects an incorrect statement or expression in python. When this error occur, python will generate a message indicating what happened before program termination.
Can you locate the error in the statement above using the Syntax error message?
In the example, the parser repeats the offending line and displays a little ‘arrow’ pointing at the earliest point in the line where the error was detected.
The error is caused by (or at least detected at) the token preceding the arrow:
Exceptions:
This type of error occurs whenever syntactically correct Python code generates an error. Consider the example below:
In the example, an attempt to print the result of 2 divided by 0. Our print statement appeared to be syntactically correct; however, python generated an exception error called ZeroDivisionError. Python has many built-in exceptions like this (Read more about built-in exceptions here). We can also define, handle or raise (force an exception to occur) exceptions in our codes if certain conditions occur.
See various examples of how to handle and raise exceptions here.
We can raise and handle exceptions when we are working with files. For example, to check if the file you want to read or modify is available, you may want to raise an exception if the file name is not found. See some examples here.
The following clauses are used for handling exception errors in python:
- try: Try block allow you to test a block of code for error
- except: Except block allows you to handle the try error. See an example here.
- finally: finally, block is usually defined to run after the try and except block. This block will run regardless of the try block. See an example here.
- raise: we can use the raise clause to raise an exception if a condition in our code occurs. See an example here.
- assert: We use an assert clause to ascertain that a condition is met. If this condition turns out to be true, then that is excellent! The program can continue. If the condition turns out to be false, you can have the program throw an AssertionError exception. See examples here.