Class 11

Tuples and the Python memory model

Objectives for today

  • Explain when tuples are used
  • Describe tuples as an immutable ordered collection of objects of arbitrary type
  • Create a tuple
  • Explain and use functions and operators on tuples
  • Use tuples to return multiple values from a function
  • Describe the Python memory model
  • Explain the behavior of references
  • Differentiate between mutable and immutable types
  • Predict how immutable and mutable values referenced by variables will be affected by assignments, operators, functions and methods

Warmup

Let’s start off by reviewing what we did last week (files) and last class (sets, dictionaries). Write a program to compute the frequency of words in the song “Here Comes the Sun” by The Beatles. Punctuation has been removed so you don’t have to treat punctuation but you should account for the case of the letters (i.e. Sun and sun are the same word). You can treat the word it's as a single word.

{'here': 17, 'comes': 15, 'the': 12, 'sun': 25, 'doo': 20, 'and': 4, 'i': 5, 'say': 4, "it's": 10, 'all': 6, 'right': 6, 'little': 6, 'darling': 6, 'been': 4, 'a': 1, 'long': 1, 'cold': 1, 'lonely': 1, 'winter': 1, 'it': 8, 'feels': 1, 'like': 3, 'years': 3, 'since': 3, 'smiles': 1, 'returning': 1, 'to': 1, 'faces': 1, 'seems': 2, 'feel': 1, 'that': 1, 'ice': 1, 'is': 1, 'slowly': 1, 'melting': 1, 'clear': 1}

How can we print the in order or most frequent to least frequent words? For example, here are the words that appear more than once in order of decreasing frequency:

sun appears 25 times
doo appears 20 times
here appears 17 times
comes appears 15 times
the appears 12 times
it's appears 10 times
it appears 8 times
all appears 6 times
right appears 6 times
little appears 6 times
darling appears 6 times
i appears 5 times
and appears 4 times
say appears 4 times
been appears 4 times
like appears 3 times
years appears 3 times
since appears 3 times
seems appears 2 times

After converting the words (which are the dictionary keys) to a list, we can sort them according to the frequency using a “lambda” (a special function that allows us to sort according to our own criterion). This is similar to how we used key=len when we talked about sort a few weeks ago.

words = list(frequency.keys())
words.sort(reverse=True, key=lambda x: frequency[x])

Let’s come up with another way that’s closer to the Python tools we are familiar with. But first, let’s talk about tuples.

Tuples

A tuple is an immutable sequence. That said, tuples are not just used as a immutable list. Often tuples are heterogeneous structures, e.g. ("January", 1, 1970) could represent a date, where specific slots refer to values with a consistent meaning, e.g. month.

Tuples can be created with parentheses, e.g. (1, 2) or the named constructor, tuple(), or often just with a comma (and no parentheses).

>>> my_tuple = (1, 2, 3, 4)
>>> my_tuple
(1, 2, 3, 4)
>>> another_tuple = ("a", "b", "c", "d")
>>> another_tuple
('a', 'b', 'c', 'd')
>>> my_tuple[0]
1
>>> my_tuple[1]
2
>>> for i in my_tuple:
...     print(i)
... 
1
2
3
4
>>> my_tuple[1:3]
(2, 3)

Python allows for very concise unpacking of tuples into individual variables (termed “tuple unpacking”). This is the feature we used when iterating through the items of a dictionary.

>>> var1, var2, var3, var4 = another_tuple
>>> var1
'a'
>>> var4
'd'

We can use this feature to elegantly implement swap:

>>> x = 10
>>> y = 20
>>> (y, x) = (x, y)
>>> x
20
>>> y
10

Recall that tuples are immutable, like strings.

>>> my_tuple[0] = 5
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment
>>> my_tuple.append(6)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'tuple' object has no attribute 'append'

But as with strings, that immutability doesn’t prevent reassignment of a variable “pointing to” a tuple, just changing the elements of a tuple.

>>> my_tuple = another_tuple
>>> my_tuple
('a', 'b', 'c', 'd')
frequency = {}
with open("here_comes_the_sun.txt", "r") as file:
    for line in file:
            line = line.strip()
            if line == "":
                continue
            words = line.split()
            for word in words:
              word = word.lower()
              if word in frequency:
                frequency[word] += 1
              else:
                frequency[word] = 1

word_pairs = []
for word, count in frequency.items():
    word_pairs.append((count, word))
word_pairs.sort(reverse=True)
for count,word in word_pairs:
    print(word + " appears " + str(count) + " times")

Notice we need to handle two different cases:

  1. we have not previously seen a value, i.e. item in frequency is False, and
  2. we have previously seen a value.

An important part of developing and implementing an algorithm is enumerating all the cases our program needs to handle (and then making sure our approach does so correctly). That is asking ourselves will our approach handle for instance: even and odd length strings, empty lists, inputs with dissimilar lengths, the first occurrence and all future occurrences, etc.

Recall from last time that the items method returns an iterable of (key, value) tuples, i.e., each iteration of the above loop with assign the next key-value pair in the loop variable as a tuple. By specified multiple variables as the loop variables, we leverage tuple unpacking to automatically split that tuple into its individual parts, assigning each part to a different variable.

Modifying a collection while iterating?

As we work more and more with mutable data structures, we may encounter (or be tempted to create) a situation in which we modify a collection while iterating through it. For example, notice that we get an error with the following:

d = { 1 : "one", 2 : "two", 3 : "three" }
for key in d:
    if key == 2:
        d.pop(key)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[3], line 2
      1 d = { 1 : "one", 2 : "two", 3 : "three" }
----> 2 for key in d:
      3     if key == 2:
      4         d.pop(key)

RuntimeError: dictionary changed size during iteration

Depending on the data structure and the operation, we may or may not get an error. However, we will have a hard time reasoning about the behavior. It is generally not a good approach to modify a collection while iterating. Make a copy instead, something we talk more about next.

Objects and references

Almost every value in Python is actually an object. ints, floats, etc. are objects, just like strings are objects. For example, here are all the methods available for for integers:

x= -10
dir(x)
['__abs__',
 '__add__',
 '__and__',
 '__bool__',
 '__ceil__',
 '__class__',
 '__delattr__',
 '__dir__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floor__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getnewargs__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__index__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__le__',
 '__lshift__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdivmod__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rfloordiv__',
 '__rlshift__',
 '__rmod__',
 '__rmul__',
 '__ror__',
 '__round__',
 '__rpow__',
 '__rrshift__',
 '__rshift__',
 '__rsub__',
 '__rtruediv__',
 '__rxor__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__truediv__',
 '__trunc__',
 '__xor__',
 'as_integer_ratio',
 'bit_count',
 'bit_length',
 'conjugate',
 'denominator',
 'from_bytes',
 'imag',
 'is_integer',
 'numerator',
 'real',
 'to_bytes']

When we create a new object, we are allocating memory for that object’s data (e.g. the integer above). When we assign that object to a variable that variable is now a reference to that object. Let’s look at the memory model for the following code. When we reassign a variable we are changing the object the variable “points to”. When we assign one variable to another, e.g. y=x, we are setting both variables to point to the same object in memory. Changing these pointers does not change the underlying objects or result in “copying” objects.

Mutability

We saw that lists are very similar to strings in many respects. But one important difference is mutability. In Python strings are immutable, i.e., they can’t be modified. Python lists are mutable. Let’s see the impact of mutability.

a = [1, 2, 3]
b = a
a[1] = 4
a
[1, 4, 3]

a has the value we would expect, but notice that b has also changed!

b
[1, 4, 3]

Let’s draw out the memory model. Notice that changes to the list “pointed to” by a are reflected in b, since they both reference the same underlying list object. As we continue in the semester we will learn how to take advantage of mutability. In the meantime we need to know about the interaction between mutability and aliasing. In the above code, a and b point to the same underlying list. When we modify that list via a or b, the changes are reflected in both variables. We would say a and b are aliased.

Parameters as references

Aliasing can occur with function parameters as well.

def aliasing(param):
    param[1] = 4
a = [1, 2, 3]
aliasing(a)
a
[1, 4, 3]

Again, let’s check out the memory model. Since strings, integers, floats, etc. are also objects, why don’t we have the same aliasing problems? Recall that strings are immutable. So are integers, etc. There are no operations on integers that modify the underlying objects, all methods and operators create new integers.

As a more general example, consider the following function:

def my_function(a, b):
    return a+b

Recall that a and b are the parameters, specifically “formal parameters”. When we call my_function we:

  1. Evaluate the arguments (or “actual parameters”) left to right
  2. The arguments are bound to the function’s formal parameters (similar to assignment). The formal parameters are effectively new variables that are references to the same objects as the actual parameters
  3. Execute the body of the function

When the arguments are immutable, the multiple references to the same objects don’t matter. But as we saw earlier for mutable objects, like lists, function calls create the possibility for aliasing. Modifications applied via the function parameters are reflected in other variables that are references to the same underlying object.

Consider another example:

def my_function(a):
    a = [0]*5
    a[0] = 6
x = [1, 2, 3, 4, 5]
my_function(x)
x
[1, 2, 3, 4, 5]

Let’s check out the memory model. In my_function, why isn’t x modified? The statement a = [0] * 5 we are creating a new list object, different than that originally pointed to by a (and x), assigning it to a. Any subsequent change to a effects this new object and not the list pointed to be x.

Shallow vs. deep copies

Why does Python implement variable and parameter assignment via references, termed “shallow copies”? Performance. We don’t need to copy the entire object. Often we just want to “use” the value, not modify it.

What if we need a “deep copy”? That is, to perform variable assignment without the potential for aliasing.

For lists, slicing creates a shallow copy.

>>> x = [1, 2, 3, 4, 5]
>>> y = x[0:2]
>>> y
[1, 2]
>>> y[0] = 6
>>> y
[6, 2]
>>> x
[1, 2, 3, 4, 5]
>>> y = x[:]
>>> y[4] = 12
>>> y
[1, 2, 3, 4, 12]
>>> x
[1, 2, 3, 4, 5]

Note that slices aren’t truly deep copies. If the list contains mutable values, e.g. other lists, those values are not deep copied. Check out the the memory model for the following example:

>>> x = [1, 2, [3, 4], 5, 6]
>>> y = x[:]
>>> y[2][0] = 7
>>> y[3] = 8
>>> y
[1, 2, [7, 4], 8, 6]
>>> x
[1, 2, [7, 4], 5, 6]

For truly deep copies, i.e., of arbitrary nested lists, you will want to use the copy module.

A deeper example

Consider the following two functions:

def insert_after(a_list, n1, n2):
    """
    Return a new list consisting of all elements from a_list and a copy of n2 after each occurrence of n1.
    
    (list[int], int, int) -> list of int
    
    >>> insert_after([3, 4, 5], 3, 10) 
    [3, 10, 4, 5]
    """

def insert_after2(a_list, n1, n2):
    """
    Insert n2 after each occurrence of n1 in a_list.
    
    (list[int], int, int) -> NoneType

    >>> x = [3, 4, 5]
    >>> insert_after2(x, 3, 10) 
    >>> x
    [3, 10, 4, 5]
    """

How would the implementations differ? The first will need to create a copy of the list to be returned. The second would directly modify its argument. How might you approach the implementations differently?

def insert_after(a_list, n1, n2):
    """
    Return a new list consisting of all elements from a_list,
    plus a copy of n2 after each occurrence of n1.
    
    (list of int, int, int) -> list of int
    
    >>> insert_after([3, 4, 5], 3, 10) 
    [3, 10, 4, 5]
    """
    new_list = []
    for element in a_list:
        new_list.append(element)
        if element == n1:
            new_list.append(n2)
    return new_list

def insert_after2(a_list, n1, n2):
    """
    Insert n2 after each occurrence of n1 in a_list.
    
    (list of int, int, int) -> NoneType

    >>> x = [3, 4, 5]
    >>> insert_after2(x, 3, 10) 
    >>> x
    [3, 10, 4, 5]
    """
    i=0
    while i < len(a_list):
        if a_list[i] == n1:
            a_list.insert(i+1, n2)
            i += 1
        i += 1

Why a while loop in the second function? Recall our note about changing a collection while iterating through it with a for loop. Since we are modifying the list, the indices are changing and so we need to (re)evaluate the stopping conditional every iteration.