Python Collections

This section presents some commonly used data collections, namely lists and dictionaries, and their use.

Lists

Lists are the most commonly used data collection in python. It represents an ordered container of values which can be expanded as needed and its items can be accessed by their index.

Initializing

Empty lists are created either using the empty square brackets notation [] or by using the list( ) function. Using the square bracket's notation for initializing lists is more concise.

empty = []
empty = list( )

Initialization can be performed using literal values or using other variables.

numbers = [1, 2, 3]

a = 1
b = 2
c = 3
numbers = [a, b, c]

The following syntax can be used to create list with an initial number of elements of a specific value.

zeros = [0] * 10
#-- ... instead of writing
zeros = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

It is possible to place lists within lists forming nested or multi-dimensional lists. By nesting lists it is possible to represent concepts such as lists of points, matrices, spreadsheets, raster images etc.

points = [[1, 0, 0],
          [0, 1, 0],
          [0, 0, 1]]

row = [0, 0, 0, 0]
mat = [row, row, row]

Collecting

Adding items to lists is performed using the append( ) method as seen below. The append method places the item at the end of the list. Note that lists grow automatically.

sequence = []
for value in range( 10 ):
    sequence.append( value )

The same operation as above can be performed using the following syntax. As it is more concise for performing simple data operations it is preferred.

sequence = [value
    for value in range( 10 )]

List generators can be nested, as seen below, to create both flat as well as nested lists such as tables, grids of points etc.

grid = [[( i, j )
    for i in range( 2 )]
        for j in range( 3 )]
""" Grid of Points
    [[( 0, 0 ), ( 1, 0 )],
     [( 0, 1 ), ( 1, 1 )],
     [( 0, 2 ), ( 1, 2 )]]
"""

rows = [( i, j )
    for i in range( 2 )
        for j in range( 3 )]
""" List of Points
    [( 0, 0 ),
     ( 0, 1 ),
     ( 0, 2 ),
     ( 1, 0 ),
     ( 1, 1 ),
     ( 1, 2 )]]
"""

The insert( ) method is used for placing the items at specific locations. In the sample below, items are placed at the start of the list. In this fashion, the list will eventually contain a list of numbers in reverse order.

sequence = []
for value in range( 10 ):
    sequence.insert( 0, value )

Measuring

The number of items in a list, also known as the list's length, is a read-only value obtained using the len( ) function. It is not possible to set the length of the list directly.

items = [1, 2, 3, 4]
count = len( items )

Accessing

Getting and setting items within a list is performed using the square bracket notation. The number within the square brackets. Note that indexing starts at 0, that is the first element of a list has index 0. Therefore, the last element has value len( items ) - 1.

items = [1, 2, 3, 4]

print( items[0] )
print( items[1] )
print( items[2] )
print( items[3] )

Accessing a list's item with an index value is larger than the number of items available, results into an index out of range error.

items = [1, 2, 3, 4]
print( items[4] ) #-- Exception!

Python supports negative values for list indices. Those represent addressing items from the end of the list. The most commonly used case is getting the value of the last item using items[-1] notation as opposed to the equivalent but more verbose items[len( items ) - 1].

print( items[ 0] ) #-- First item
print( items[-1] ) #-- Last item

Removing

The del keyword can be used for removing an item from a specific index within a list. In the example below the first item is deleted.

numbers = [1, 2, 3, 4, 5]
del numbers[0]

The 'pop( )' method enables removing items from a list at specific locations by index. Note that 'pop( )' returns the value of the element removed which is useful. On the other hand, the equivalent using del does not. If the index is not provided, 'pop( )' removes the last item from the list.

numbers = [1, 2, 3, 4, 5]
numbers.pop( 0 ) #-- Remove first item
numbers.pop(   ) #-- Remove last item

The remove( ) method can be used for removing items by value. Note that it removes the first item encountered with the provided value, instead of removing all the items. Also if the items does not exist it will raise an exception.

numbers = [1, 2, 888, 3, 4]
numbers.remove( 888 )

Enumerating

Enumerating is for accessing items one by one from the start to the end of a list. Below are three equivalent versions.

numbers = [1, 2, 3, 4, 5]

#-- Need only values
#--
for number in numbers:
    print( number )

#-- Need index and value
#--
for index in range( len( numbers ) ):
    number = numbers[index]
    print( index, number )

for index, number in enumerate( numbers ):
    print( index, number )

Searching

Testing whether a value is already in a list can be perform with the in operator. Its result is a boolean value which is typically used in conditional operations.

numbers = [1, 2, 3, 4, 5]
print( 4 in numbers )

number = 8
if( not ( number in numbers ) ):
    numbers.append( number )

The index( ) method is used for finding the location of a value in a list. If the value is not present in the list then an exception is used.

numbers = [1, 2, 3, 4, 5]
print( numbers.index( 3 ) )

Generally it is not convenient to manage code which throws exceptions. Below is an alternative implementation which either returns the index of the item if it is present in the list or -1 to signal that the item was not found.

def IndexOf( items, item ):
    for index, value in enumerate( items ):
        if( value == item ): return index
    return -1

numbers = [1, 2, 3, 4, 5]
print( IndexOf( numbers, 3 ) )

Merging

The addition operator + can be used for concatenating lists. The result of the operation is a new list with the items appended in sequence.

a = [1, 2, 3]
b = [4, 5, 6]
c = a + b
""" c = [1, 2, 3, 4, 5, 6] """

Slicing

Extracting sub-lists of items from lists, also known as slices, can be performed using the square brackets notation but instead of indicating a single index, a range. The semantics of the expression in the square brackets follows the properties of the range( ) function.

Some very common idioms are presented below, namely items[1:] which creates a list without the first item, items[:-1] which creates a list without the last item and items[::2] which skips every other item.

items = [1, 2, 3, 4, 5]

trim_head = items[1:]   """ [2, 3, 4, 5] """
trim_tail = items[:-1]  """ [1, 2, 3, 4] """
trim_odds = items[::2]  """ [1, 3, 5]    """

Filtering

List generators can be used for filtering by adding a conditional after the for-loop. Generators and filters are also known as list comprehensions.

numbers = [0, -3, 8, 1, -10]
positive = [number
    for number in numbers
        if( number >= 0 )]
""" positive = [0, 8, 1] """

items = [item
    for item in range( 10 )
        if( item % 2 == 0 )]
""" items = [0, 2, 4, 8] """

Patterning

The zip( ) function can be used for enumerating multiple lists simultaneously, in the sense of lacing items across. It is commonly used for creating pairs as seen below.

source = [1, 2, 3, 4]
target = [5, 6, 7, 8]

for s, t in zip( source, target ):
    print( s, t )

list_of_pairs = [( s, t )
    for s, t in zip( source, target )]
""" list_of_pairs = [
        ( 1, 5 ),
        ( 2, 6 ),
        ( 3, 7 ),
        ( 4, 8 )]
"""

Additionally, it can be used for pairing consecutive items using the idiom below. The expression items[:-1] produces the list [1, 2, 3, 4] and items[1:] results into [2, 3, 4, 5]. Zipping the lists creates pairs of consecutive numbers [[1, 2], [2, 3], [3, 4], [4, 5]].

items = [1, 2, 3, 4, 5]

for a, b in zip( items[:-1], items[1:] ):
    print( a, b )

list_of_pairs = [( a, b )
    for a, b in zip( items[:-1], items[1:] )]
""" list_of_pairs = [
        ( 1, 2 ),
        ( 2, 3 ),
        ( 3, 4 ),
        ( 4, 5 )]
"""

Ordering

The reverse( ) method is used for flipping the order of the items in a list. Note that reversing means swapping positions front to back. It has nothing to do with the values of the items.

items = [4, 3, 1, 2]
items.reverse( )
""" items = [2, 1, 3, 4] """

The sort( ) method is used for ordering a list's values in ascending or descending fashion. This works for lists which contain values that are comparable, such as numbers.

items = [9, 3, 5, 8]

items.sort( )
""" items = [3, 5, 8, 9] """

items.sort( reverse = True )
""" items = [9, 8, 5, 3] """

For sorting more complex data types we need to pass a function using the key parameter, that given an item from the list, it returns the item's priority.

In the example below the list vectors contains lists of pairs of numbers we can interpret as 2D vectors. In addition, there are three functions ByX, ByY and ByLength which allow us to sort the list based on their X-component, Y-component or Euclidean norm of the vector.

vectors = [[5, 0], [1, 9], [1, 3], [2, 1]]

def ByX( vector ):
    return vector[0]

def ByY( vector ):
    return vector[1]

def ByLength( vector ):
    return math.sqrt(
        vector[0] ** 2 +
        vector[1] ** 2 )

vectors.sort( key = ByX )
vectors.sort( key = ByY )
vectors.sort( key = ByLength )

The same example can be simplified using lambda expressions as follows:

vectors = [[5, 0], [1, 9], [1, 3], [2, 1]]

vectors.sort( key = lambda vector : vector[0] )
vectors.sort( key = lambda vector : vector[1] )
vectors.sort( key = lambda vector :
    math.sqrt( vector[0] ** 2 + vector[1] ** 2 ) )

It is possible to sort lists in a cascading manner, where first the items are sorted by a primary key, and then items with the same primary key are sorted by a secondary key. To achieve this, the key function needs to return a list of values instead of a single one, representing the first and second priority.

For example, we may wish to sort the vectors first by their X-component and then by their Y-component. In this scenario we are in luck because we can either return the vector itself. In the case of sorting the vectors in a polar coordinates' sense, we can return a list containing their angle and radius.

vectors = [[5, 0], [1, 9], [1, 3], [2, 1]]

vectors.sort( key = lambda vector : vector )
vectors.sort( ) #-- also works in this case

#-- Polar Sorting
#--
vectors.sort( key = lambda vector : [
    math.atan2( vector[1], vector[0] ),
    math.sqrt( vector[0] ** 2 + vector[1] ** 2 )] )

Dictionaries

Dictionaries are the second most important built-in collection supported by python. A dictionary represents an unordered collection of unique items. We cannot therefore sort dictionaries as well as add duplicate items.

Dictionaries are used for associating a simple key value, which is often a number or string, which a more complex data value, such as a surface object. For this behavior they are also known as key-value stores.

Initializing

There are two ways to construct an empty dictionary, namely using the empty curly brackets notation as well as the dict( ) function.

pairs = { }
pairs = dict( )

Initializing dictionaries follows the key:value notation seen below. In the example code, we define two dictionaries, the first one maps numbers to their string representation. Note that this can be achieved with a list, however the second dictionary maps strings to numbers which is not possible with a list.

num_to_str = { 0: 'zero', 1: 'one', 2: 'two', 3: 'three' }
str_to_num = { 'zero': 1, 'one': 1, 'two': 2, 'three': 3 }

Using the dict( ) function can be used, where the named parameters passed are converted to keys with the associated value. Because the parameters have to follow the standard python's naming conventions, we cannot pass numerals for keys. Therefore, we can only define the string-to-number dictionary using this approach.

""" Invalid syntax!
"""
num_to_str = dict( 0 = 'zero', 1 = 'one', 2 = 'two', 3 = 'three' )

""" Valid syntax
"""
str_to_num = dict( zero = 0, one = 1, two = 2, three = 3 )

Collecting

Building dictionaries can be performed in the same sense as with lists. However, unlike lists there is no append( ) method. Instead items are set in the dictionary using the square bracket notation. Additionally, we can use the short-hand notation to construct dictionaries but need to provide both the key and and the value as seen below.

squares = { }
for value in range( 10 ):
    squares[value] = value ** 2

squares = { value: value ** 2
    for value in range( 10 ) }

Measuring

The len( ) function using for lists earlier, can also be used for dictionaries to compute the number of key-value pairs stored.

str_to_num = { 'zero': 1, 'one': 1, 'two': 2, 'three': 3 }
print( len( str_to_num ) )

Accessing

The square brackets notation is used for getting and setting values associated with keys. If the key requested while trying to get a value from a dictionary is not available, then key not found exception is emitted.

point = { 'X': 0.0, 'Y': 0.0 }

point['Y'] = 1.0    #-- Set the Y-coordinate
point['Z'] = 1.0    #-- Add the Z-coordinate

print( point['X'] ) #-- Get the X-coordinate
print( point['W'] ) #-- Key Not Found Exception!

The get( ) method can be used to avoid checking if a key exists in a dictionary, to avoid raising exceptions, before getting its value. In the example below we attempt to get the value for 'X' which a exists but also for 'Z' which does not. The second parameter is the default value that will be returned in the case of a missing key.

point = { 'X': 0.0, 'Y': 0.0 }
x = point.get( 'X', 0.0 )
z = point.get( 'Z', 0.0 )

Removing

Both the del statement and pop( ) function are applicable to dictionaries with the same semantics as with lists.

point = { 'X': 0.0, 'Y': 0.0 }

del point['X']
point.pop( 'Y' ) #-- Return deleted value

Enumerating

There are several ways to enumerate dictionaries depending on whether we are interested in their keys, values or both. This is performed using the keys( ), values( ) and items( ) methods.

point = { 'X': 0.0, 'Y': 1.0, 'Z': 2.0 }

#-- Only enumerate keys
#--
for key in point:
    print( key )

for key in point.keys( ):
    print( coordinate_key )

#-- Only enumerate values
#--
for value in point.values( ):
    print( value )

#-- Only enumerate keys and values
#--
for key, value in point.items( ):
    print( key, value )

Extracting

Sometimes it is useful to get only the keys or values from a dictionary and collect them in a list. This can be done in a single line using the list construction function, namely list( ). Note that the keys( ), values( ) and items( ) methods return iterators not lists.

point = { 'X': 0.0, 'Y': 1.0, 'Z': 2.0 }

keys = list( point.keys( ) )
vals = list( point.values( ) )

Searching

The in operator has slightly different semantics when used with dictionaries as opposed to lists. The intent of the expression something in dictionary is relevant to its keys not its values.

point = { 'X': 0.0, 'Y': 1.0 }

if( 'Z' in point ):
    print( 'Point 3D' )
else:
    print( 'Point 2D' )