Data structures

Submitted by natalia.machdo… on Fri, 10/28/2022 - 17:40
Sub Topics

Data structures are the fundamental building blocks on which you build your programs. You can view them as containers for storing data. They help you to organize your data efficiently. Each data structure offers a particular way of organizing data so that it can be accessed or retrieved efficiently. So, as a software engineer, you will adopt a data structure based on the need of your application.

For this course, we will be using different data structures shipped with Python in its standard library. You can also create your own custom data structure to suit how you want your data to be organised and accessed (more on this later in this section).

Here, we will consider two broad categories of data structures:

  • The ones that store data elements sequentially. Examples here include List, Tuples, Set, Linked List, Queue, Stack and Matrix.
  • The ones with no sequential linking of data elements. Examples include Hash table and Binary Tree.

We will start with the sequential data structures.

First, what do we mean by sequence here? A sequence is an object that contains multiple items of data. The items are stored in sequence, one after another.

Sequence can be mutable or immutable. Mutable sequences can be modified or changed after their creation, while immutable sequences cannot be modified or updated (however, there are some techniques you can use to update them).

Another way to put this is to say, mutable sequences are read-write compatible while immutable sequences are read-only.

  • Examples of mutable sequences are List, Set and Dictionary.
  • Examples of immutable sequences are strings and tuples.

 

By running the Repl below, you encounter the following error. caused by attempting to assign a new value to the first character of a string.

TypeError: 'str' object does not support item assignment.

Consider forking the Repl, and removing or commenting out line 10 to allow the process below it to demonstrate how to replace the first character of the string.

Tuple

A Tuple is a finite ordered list of elements (this means that we can iterate over the items in a tuple). It is used to store a collection of heterogenous data type, meaning data of different data types. The elements are variables and/or values of different data types. We can also give a data structure as an element in a tuple.

How to Create Tuple

To create a Tuple, we use a parenthesis () and separate each element inside the parentheses with a comma.

 

In the example below, we store John Stone’s data with different data types, including name(string), gender(string), year of birth(integer) and height (float), using a tuple. The example shows how to create a tuple called mytuple1 and print both the type and elements of the tuple.

As mentioned in the description of a tuple, we can also give other data structures or the range object as elements in a tuple.

 

In the above example, we gave a list as an element of a tuple. Our result after printing shows the conversion of the list of object type to a tuple. We can do the same for other data structures like set, dictionary and even tuple.

Accessing the elements of a tuple

We can access the elements of a tuple by using index or by slicing.

If we want to access the age of John Stone in mytuple1 (fourth element), we can simply print the item using its index:

 

if we want to select the first two elements in the tuple, we can use slicing:

 

Try and play around with a different slice of the tuple element.

Performing Operations on a Tuple

You can get the list of methods and attributes associated with a tuple object by invoking dir() function on the object. This also applies to other data structures.

Run print(dir(tuple)) in the python console to see different methods and attributes associated with the tuple objects.

 

Try other methods in the list of methods and attributes.

When to Use Tuple in Your Code

This depends on your program requirements. For example, if you need to store a sequence of heterogeneous data types that you do not want to be modified, a Tuple could be your best option.

In cases where both Lists and Tuples can be used, tuples might be preferable because they are more memory efficient. This is because tuples are immutable, they work with a fixed volume of memory (python allocates the minimum memory block needed for data storage), while lists are mutable and require extra memory block allocation in case we need to extend the size of the list object after its creation.

Try this example to see the memory difference between mytuple2 and listoffruit.

 

List

Unlike other programming languages, Python does not have native support for arrays. However, it has a more generic data structure called LIST.

A list is an object that contains multiple data items.

As mentioned earlier, unlike strings and tuples, lists are mutable.

  • List: an object that contains multiple data items
  • Element: An item in a list
  • Format: list = [item1, item2, etc.]

Lists can hold items of different types.

The print function can be used to display an entire list. The list() function can convert certain types of objects to lists.

How to Create a List

You can create a list object using either [] square bracket or using the list() function.

1. Create a list with square bracket

 

2. Create a list using the list() function. This is commonly used for type conversion. For example, we can convert a tuple to a list:

 

The following code will create a list from a sequence of numbers ranging 0 to 8, increment by 2. Note: the syntax for the range function is range (start, stop, step).

 
Nested List

A list can store any object type. We can use a list to store another list, which could also contain another sub-list or list of tuples. This process is called nesting, and the resulting object is called a nested list.

For example, if we have a list of courses, a list of days that the courses are delivered, and a list of tutors teaching the courses. We can also group students taking each course as tuples and have a list of tuples. We can store all these lists in a nested list object called course_data.

 

We can give this nested list to a function that helps us to process the data and print out well-formatted student-course information:

 
Accessing List

Generally, we can access lists the same way we did for tuples using indexing and slicing.

Indexing (including negative indexing)

An index is a number specifying the position of an element in a list. The index enables access to individual elements in a list.

The index of the first element in a list is 0, the second element is 1, and n’th element is n-1.

By now, you should be familiar with indexing but let us see some examples. Remember our course data nested example?

 

So how does negative indexing work? As you can see in the previous examples, we can also use a negative index to manipulate the list. Negative indexes identify positions relative to the end of the list.

index -7 -6 -5 -4 -3 -2 -1
list elements 10 20 30 40 50 60 70

Consider the illustration above, index -1 identifies the last element, -2 identifies the 2nd to last element, index – N identifies the first element of the list, where N is the length of the list.

Slicing

A slice is a span of items that are taken from a sequence.

List slicing syntax: mylist[start : end: indexJump]

If mylist is a list containing copies of elements from start, up to but not including, end, at a step size indexjump.

  • If start is not specified, 0 is used for start index
  • If end is not specified, len(list) is used for end index

Slicing expressions can include a step value and negative indexes relative to end of a list.

We did not use all the content in our print_course_info() function. For example, we did not use course days data. You may want to slice off the data you need for your program, especially when you are dealing with a large dataset. You want to avoid copying into memory data that you do not use in your program.

 

In the example above, we slice from the second item in the list course data.

Here is another example with comments:

 
Modifying and Adding a List

Because lists are mutable, we can change or modify the elements of a list.

Using the indexing operator (square brackets []) on the left side of an assignment, we can update one of the list items.

For example, if we want to change the last element in our course data list, to update the students_for_each_course data, we can do the following:

 

In the above example, we created a new student list( list of tuples) and assigned this to the last element in the course_data list using index -1.

Built-in functions for lists

Just like a tuple, lists come with built-in methods that we can use to perform specific operations. You can use dir(list) to get all methods and attributes associated with a list.

 

Some of the commonly used ones include: append, extend, insert, remove, pop, index, count, copy, sort.

We will cover a few of these in our examples (Check section 5.1 of python doc here for more information on the available built-in methods for lists).

Using append() to create or update a list

Append is one of the commonly used list methods. We can use it to add a new element to an existing list.

For example, if we have another tutor joining the course_tutor list, we can append the new tutor to the course_tutor list:

 

In the example above, we select the course_tutor data from course_data and append the new tutor “John”.

We can create an entirely new list using append in a loop:

 

In the example above, a new list of tutors is created by looping through all the elements in course_data[1] and appending each element to the new empty list – new_tutor_list.

Using the count() method

We can use the count() method to count the number of occurrences of an element in the list. For example:

 

Using the pop() method

The pop() method remove an item at the given position in a list, and returns it.

 

This will return item at index 2 (Jian)

If no index is specified a pop() removes and returns the last item in the list.

 

This will return the last item in the list (John)

List Comprehensions and Lambda Functions

List comprehensions are a third way of making lists. With this elegant approach, you could rewrite the for loop from the count example in just a single line of code.

 

See more examples in section 5.1.3 of the python doc here, and here.

Lambda functions can also be used to modify or create a list using fewer lines of code. Lambda functions are anonymous functions, they are functions without names (not a ‘def’ defined functions).

We can use lambda function with our updated course_tutor data.

Alan, Brain and Jodie have joined the tutoring team, and we want to sort the tutor names alphabetically. We can use the python list built-in function sort() with Lambda function instead of sorting using for loop.

 

In the example above, we first updated the course_tutors list and then sort the list alphabetically using a lambda function without having to loop over the list while we are sorting.

Using Lists as Stacks and Queue

A stack is a collection of objects that supports efficient Last-In/First-Out semantics for item insertions and deletion.

A real-world analogy for a stack data structure is a stack of plates. New plates are added to the top of the stack, and because the plates are precious and heavy, only the topmost plate can be moved. In order to get the plates that are lower down the stack, we have to remove the topmost plates one after the other.

The list built-in functions make it easy to use a list as a simple stack, where the last element added is the first element retrieved (Last-In/First-Out semantics).

You are now familiar with the append() and pop() functions. We can use append() to add an item to the top of the stack and use pop() to retrieve item from the top of the stack.

For example, we can treat our course_tutor list as a stack.

 

A queue is a collection of objects that supports first-in, first-out (FIFO) semantics for item insertions and deletions. The first element added is the first element retrieved, hence it is also feasible to use a list as a queue; The insert and delete operations are sometimes called enqueue and dequeue.

Even though we can use a list for this, it is worth nothing that lists are inefficient for this use case. Inserting or popping from the beginning of a list takes longer than appending or popping from the end of one. This is because all of the other elements have to be shifted by one.

So, to implement a queue, we can use another python module called Collections, which allows us to use a python implemented queue designed for more efficient pop and append operations with our list.

 

In the example above, we imported the deque library and converted our course_tutor object to queue object. Then we can remove items on a first-in-first-out basis.

Repetition operators with lists

Repetition operators are used to make multiple copies of a list and join them together. The * symbol is a repetition operator when applied to a sequence and an integer.

The sequence is the left operand, and the number is the right.

Syntax: list * n

You can iterate over a list using a for loop.

Syntax: for x in list

The following Repl includes list examples

 

A programmer working on code with colleagues in the background

Dictionary

Dictionaries are a central data structure and one of python’s most powerful data collections. Dictionaries are different from Lists: Unlike Lists which use integer indices to indicate the position of each element, Dictionaries use more generic indices (of almost any type) to form key-value pairs. A Dictionary does not have to maintain an order (although there are ordered dictionaries).

Dictionaries allow for efficient database-like operations such as lookups, insertions, and deletion in python.

Another real-world analogy of a dictionary is a traditional phonebook, where you have the name (key) of a person associated with their phone number (value). Dictionaries are also known by other names such as HashMap, Maps, Lookup tables, associative array, properties, property Bags and so on. In PHP: associative arrays, C++/Java: Maps or HashMaps/Properties, C#/.Net: Property Bag.

Creating and Accessing Dictionary: Keys and values

Each element in a Dictionary consists of a key and a value. This is often referred to as mapping key to value. The key must be an immutable object.

To retrieve a specific value, use the key associated with it.

Syntax for creating a dictionary:

 

How to create a dictionary?

Two common ways are:

  1. Use curly braces {} e.g., dict1 = {'James:02101001', 'Adam:021927625'}
  2. Use the dict() function

How do we use the second approach - dict()?

Let’s go back to our course_data example.

Say we want to map the data fields in our course_data nested list with corresponding data so that we can have field-data in form of key-value pair. We can do the following:

 

We created a list for fields in our data.

 
 

The following Repl includes both methods for creating dictionaries.

Dictionary Methods

Clear method

deletes all the elements in a dictionary, leaving it empty
Format: dictionary.clear()

Get method

gets a value associated with the specified key from the dictionary

Format: dictionary.get(key, default)
default is returned if the key is not found

  • Alternative to [] operator
  • Cannot raise KeyError exception
Items method

returns all the dictionaries keys and associated values
Format: dictionary.items()
Returned as a dictionary view

Each element in the dictionary view is a tuple which contains a key and its associated value

Use a for loop to iterate over the tuples in the sequence

Can use a variable which receives a tuple, or can use two variables which receive key and value

Keys method

returns all the dictionaries keys as a sequence
Format: dictionary.keys()

Pop method

returns the value associated with the specified key and removes that key-value pair from the dictionary
Format: dictionary.pop(key, default)
default is returned if key is not found

Values method

returns all the dictionaries values as a sequence
Format: dictionary.values()
Use a for loop to iterate over the values

The following example includes a simple function named func_name. Try running the program, try to predict what the output put will be.

 

How to access and modify dictionary

As you might have noticed in the examples, dictionaries in python are mutable objects that can be easily modified.

We can modify the values or keys used in our dictionary. For example, in the phonebook dict, we can update phone number of anyone if someone changes his or her phone number. Say Allison-Neu who is one of the names in our phonebook dictionary changes phone number from 555-8396 to 555-8283. We can do the following:

  1. We can lookup Allison-Neu in the phonebook dictionary, and
  2. We then update her phone number by modifying the corresponding value.
 

Please, print the built-in methods and attributes of dict and try to use them. e.g. print(dir(phonebook))

For example, one of the methods you will see is update(), which allows you to insert a new item (key-value) into your dict.

 

Ordered Dicts

Python provides a dictionary subclass called OrderedDict, which is different from the regular dict you will have been using.

With the type of dictionary we have been using, items are not ordered but we know that we can use sort() or sorted() methods to order by keys or values.

So then, why and when would you need an ordered dictionary in python?

In this section, you will learn the differences between the two types of python dictionary and understand the pros and cons of using either of the two (OrderedDict and dict).

Some features of OrderedDict that make it valuable and different from a regular dict:

  • Intent Signaling: In situations where the order of the items in your dictionary is important to how your application behaves, it is better to use OrderedDict from start rather than regular dict. You are stating unequivocally that your application or code requires or depends on the arrangement of the entries in the underlying dictionary.
  • Control over the order of items: In situations where you need to re-order the items in a dictionary, OrderedDict provides methods such as .move_to_end(), popitem() to do this.
  • When Testing for Equality Between 2 Dicts: When testing if two dicts are equal, and the order of items is important to that comparison, it is better to use OrderedDict()
How to use OrderedDict in python?

You can import Ordered Dict from the python Collection module. Try to re-implement your solutions for the word count example above using OrderedDict. That is word-count should be ordered.

Note the following constraints in OrderedDict:

  • If you create an OrderedDict by passing your regular dict as an argument, then the ordering may be lost because dict does not maintain the insertion order.
  • If an item is overwritten in the OrderedDict, its position is maintained.
  • If an item is deleted and added again, then it is moved to the last.
  • Popitem function of OrderedDict removes items in FIFO order.
  • It accepts a boolean argument last, if it’s set to True then items are returned in LIFO order.
  • You can use the move_to_end function to move items of OrderedDict from beginning or end. It accepts a boolean argument last, if it’s set to False then item is moved to the start of the ordered dict.

Dictionary Comprehensions

Dictionary comprehensions provide a concise way to create dictionaries from other dictionaries. With dictionary comprehensions, you can create a new dictionary by specifying key-value pairs that depend on key-value pairs in another dictionary. You can also transform the values in the new dictionary as needed.

Dictionary comprehension is a powerful concept that can be used to substitute for loops and lambda functions. While not all for loops can be written as a dictionary comprehension, all dictionary comprehension can be written with a for loop.

Case 1: To do word frequency count

Study the function below:

 

Given str='Run the example to show the frequency of words in the given text. This example prints word count (as key-value) as a dict and removes unwanted characters like - ,.'

Predict how the function will behave (what the output will be) when print func_name(str) is called.

It does the same thing as our word count example, func_name(), using fewer lines of code. We can even make this significantly shorter using dictionary comprehension:

 

funct_name2(str) is only 4 lines!

Activity

Case 2: To filter key-values and modify dictionary

You may want to filter a dictionary and create a new dictionary based on a list of keys. You may also want to modify or update some items in a dictionary.

Dictionary comprehension becomes handy for these purposes.

Now, your turn to write some code! You will need to modify the code to perform case 2 for the word count dictionary.

Run and see if your results are what you expected.

Case 3: To invert and sort by key a dictionary.

In a case where you have a dictionary that both the keys and values are unique, and you would love to swap keys for values and sort accordingly.

Again, write some code! Create a new dictionary that has unique keys and values. Then implement the above requirements for the dictionary.

Run and see if your results are what you expect.

Case 4: Flattening List of Dictionaries into Single Dictionary

Assuming you have a list of dictionaries, and you would like to flatten the list into a single dictionary such that all the sub-dictionaries are combined into a single dictionary. We can achieve this with a single line of code.

The following Repl demonstrates how it works.

Share your attempt in the forum.

Using Python’s Zip () function to create Dictionary

Example 4: Study the example code below and predict how it should behave:

 

The following Repl demonstrates how it works.

Zip() function is defined as zip(*iterable). Iterables are passed as arguments into the function, which return iterator. This iterator produces a string of tuples made up of components from each iterable. Any type of iterable, including lists, tuples, dictionaries, sets, etc., can be accepted by the zip() function.

Counter in Python (very similar to a dictionary but specifically designed for counting in a pythonic way)

Study the code examples provided here. In this counter.py, three functions (read_story(filename),count_words_fast(text) and word_count_distribution(text)) were provided with a file named ‘romeo-juliet.txt’.

Do the following before you read the explanations of what these functions do:

  1. Study each function and predict how it will behave
  2. Study how counter works in count_words_fast(text) and word_count_distribution(text) functions.
  3. Analyse the built function called from counter
  4. Modify the code to behave differently based on your thought.
  5. Run again

When you are satisfied, read the explanations below.

As you may have guessed, function read_story(filename) takes a filename as input (in this case, file is Romeo_juliet.txt) , read and parsed the file as a string.

For function count_word_fast(text), the name already gave it up!... Like our first example in dictionary, it counts the number of times each word occurs in a text and returns key-value pairs of unique word counts.

For function word_count_distribution (text), it calls the count_word_fast(text) function and processes the resulting dictionary from the function and returns a new dictionary with items corresponding to the count of items a collection of words appears in translation, and values as the number of numbers of words that appear with that frequency.

You can find below the explanation of the Counter class at docs.python.org

Module Linking
Main Topic Image
A close view of a programmer with code reflected in his glasses
Is Study Guide?
Off
Is Assessment Consultation?
Off