Notes on learning Numpy
My notes while reading this. More or less copy paste. Take with a grain of salt…or sugar.
Abstract
NumPy is a Python package. It stands for ‘Numerical Python’. It is a library consisting of multidimensional array classes and a collection of routines for processing of an array.
From the officials:
NumPy is the fundamental package for scientific computing with Python. It contains among other things:
- a powerful N-dimensional array object
- sophisticated (broadcasting) functions
- tools for integrating C/C++ and Fortran code
- useful linear algebra, Fourier transform, and random number capabilities
Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.
In layman’s terms (according to me) - better python-arrays with native C-code under the hood so that everything is blazingly fast. Easy array generation, generation of ranges, intervals and distributions of random numbers in some interval. Conveniently iterating over ranges and the generated arrays. Numpy also provides some basic/advanced statistical functions. Linear Algebra is also something that is already built-in.
Installation:
pip3 install numpy
Ndarray
The most important object defined in NumPy is an N-dimensional array type called ndarray. This is the main workhorse of the package. It describes the collection of items of the same type. Items in the collection can be accessed using a zero-based index. It’s much better than the standard python arrays and lists. Basic construction:
numpy.array(object, dtype = None, copy = True, order = None, subok = False, ndmin = 0)
object
- Any object exposing the array interface method returns an array, or any (nested) sequence.dtype
- the desired data type of array, optionalcopy
- Optional. By default (true), the object is copiedorder
- C (row-major) or F (column-major) or A (any) (default)subok
- By default, the returned array forced to be a base class array. If true, sub-classes passed throughndmin
- Specifies minimum dimensions of the resultant array
import numpy as np
a = np.array([1, 2,3,4])
print(a)
a = np.array([[1, 2], [3, 4]])
print(a)
a = np.array([1, 2,3,4], dtype=complex)
print(a)
[1 2 3 4]
[[1 2]
[3 4]]
[1.+0.j 2.+0.j 3.+0.j 4.+0.j]
Data Types
There are a lot of possible data types that are supported by numpy. Some of the scalar types include:
- bool_, int_, intc, intp, int8, int16, int32, uint8, uint16, uint32
- float_, float16, float32, flaot64
- complex, complex64, complex128
The types can be encapsulated in Data Type Object.
numpy.dtype(object, align, copy)
- Object − To be converted to data type object
- Align − If true, adds padding to the field to make it similar to C-struct
- Copy − Makes a new copy of dtype object. If false, the result is a reference to builtin data type object
The object can be later used when a given function or a constructor takes dtype argument. Those objects can also create struct-like types for structured data. Think of it like defining a custom struct in C++ and then creating a vector with elements of this struct.
import numpy as np
dt = np.dtype([('age',np.int8),('time', np.int32)])
a = np.array([(10,30),(20,30),(30,30)], dtype = dt)
print(a)
print(a['age'])
print(a['time'])
Which yields:
[(10, 30) (20, 30) (30, 30)]
[10 20 30]
[30 30 30]
This way we can define a map-like structure that contains a bunch of different arrays, each addressable with a key. The following code defines a ‘human’ data type with string field name, int field age, and float field social class.
import numpy as np
human = np.dtype([('name', 'S20'), ('age', 'i1'), ('social_class', 'f4')])
print(human)
a = np.array([('Lenin', 45, 100),('John', 19, 3.5)], dtype = human)
print(a)
[('name', 'S20'), ('age', 'i1'), ('social_class', '<f4')]
[(b'Lenin', 45, 100. ) (b'John', 19, 3.5)]
This shows another important point. Each time can be specified with a single character.
'b'
− boolean'i'
− (signed) integer'u'
− unsigned integer'f'
− floating-point'c'
− complex-floating point'm'
− time delta'M'
− DateTime'O'
− (Python) objects'S'
, ‘a’ − (byte-)string'U'
− Unicode'V'
− raw data (void)
Array Attributes
Attributes, as the name implies, give us some information about a given object. In the case of numpy, the most important thing about an array is its shape and ndim
Shape and dimension
Shape of an array is its layout in memory. Normally numpy represents the ndarray as a n-dimensional array in memory. This means that that the shapes is nothing more than a tuple that consists of each dimension’s size. For example, the array [1,2,3]
has shape (3)
, the array [[1,2],[1,2]]
has shape (2,2)
because it consists of two array and each of them contains two elements.
import numpy as np
a = np.array([1,2,3,4])
print(a.shape)
a = np.array([[1,2,3,4],[1,2,3,4]])
print(a.shape)
(4,)
(2, 4)
Ndim of an numpy array on the other hand is the number of dimensions. I. e. [1,2]
has ndim equal to 1, and [[1,3],[1.3]]
has ndim 2
import numpy as np
import numpy as np
a = np.arange(24)
print(a.ndim)
a = np.array([[1,2,3,4],[1,2,3,4]])
print(a.ndim)
1
2
Reshaping
A lot of times we want to change the basic structure of and array without changing the information in it. In such cases, the reshape method comes to help.
import numpy as np
a = np.arange(24)
print(a.ndim)
# now reshape it
b = a.reshape(2,4,3)
print(b)
The elements stay the same and occupy the same memory location but the access to them is happening in different way.** ‘Size’ of an array numpy.itemsize
- this array attribute returns the length of each element of array in bytes.
import numpy as np
x = np.array([1,2,3,4,5], dtype = np.int8)
print(x.itemsize)
1
Num elements
The attribute that everyone has been waiting for… numpy.size
import numpy as np
a = np.arange(24)
print(a.size)
24
Slicing
Basic slicing
Basic slicing is an extension of Python’s basic concept of slicing to n dimensions. A Python slice object is constructed by giving start, stop, and step parameters to the built-in slice function. This slice object is passed to the array to extract a part of array.
import numpy as np
a = np.arange(10)
b = a[2:7:2]
print(b)
print(a[2:])
[2 4 6]
[2 3 4 5 6 7 8 9]
Advanced
This mechanism helps in selecting any arbitrary item in an array based on its Ndimensional index. Each integer array represents the number of indexes into that dimension. When the index consists of as many integer arrays as the dimensions of the target ndarray, it becomes straightforward.
import numpy as np
x = np.array([[ 0, 1, 2],[ 3, 4, 5],[ 6, 7, 8],[ 9, 10, 11]])
print('Our array is:')
print(x)
print('\n')
rows = np.array([[0,0],[3,3]])
cols = np.array([[0,2],[0,2]])
y = x[rows,cols]
print('The corner elements of this array are:' )
print(y)
Our array is:
[[ 0 1 2]
[ 3 4 5]
[ 6 7 8]
[ 9 10 11]]
The corner elements of this array are:
[[ 0 2]
[ 9 11]]
Boolean Array Indexing
This type of advanced indexing is used when the resultant object is meant to be the result of Boolean operations, such as comparison operators.
import numpy as np
x = np.arange(0,20)
print(x[x>5])
[ 6 7 8 9 10 11 12 13 14 15 16 17 18 19]
Broadcasting
This refers to the way arithmetic operations are handled. Numpy is intelligent enough to perform arithmetical operations on arrays element-wise as long as the shapes of the objects are ‘compatible’.
import numpy as np
a = np.array([1,2,3,4])
b = np.array([10,20,30,40])
c = a * b
print(c)
[ 10 40 90 160]
Iterating
numpy.nditer()
is your best friend. It returns an iterable (for loop) object that can be further used to go through an array. It can also be constructed with two arrays to go through the both of them simultaneously. The array must be broadcastable. Constructing an iterator that can modify the values of an array happens through a flag in the constructor.
import numpy as np
a = np.arange(0, 60, 5)
b = np.arange(0, 12)
for num in np.nditer(a):
print(str(num))
print('\n')
for a_num,b_num in np.nditer([a,b]):
print(str(a_num) + '+' + str(b_num))
print('\n')
for num in np.nditer(a, op_flags=["readwrite"]):
num += 10
print(a)
0
5
10
15
20
25
30
35
40
45
50
55
0+0
5+1
10+2
15+3
20+4
25+5
30+6
35+7
40+8
45+9
50+10
55+11
[10 15 20 25 30 35 40 45 50 55 60 65]
Manipulating
Changing shape
Method | Description |
---|---|
reshape |
Change shape |
flat |
1D Iterator |
flatten |
Returns a new array |
revel |
Returns a contiguous flattened array |
Transpose
Method | Description |
---|---|
transpose |
Permutes the dimensions of an array |
ndarray.T |
Same as self.transpose() |
rollaxis |
Rolls the specified axis backwards |
swapaxes |
Interchanges the two axes of an array |
Joining Arrays
Method | Desc |
---|---|
concatenate |
Joins several arrays along an existing axes |
stack |
Joins several arrays along a new axes |
hstack |
Stacks arrays in sequence horizontally |
vstack |
Stacks arrays in sequence vertically |
Adding and removing elements
Method | Description |
---|---|
append | Pushes new value at the end |
insert | Inserts value along a given axis before a given index |
delete | Return a new array with sub-arrays along an axis deleted |
unique | Finds all unique elements in an array |
I/O
There are two flavors of saving/loading a ndarray to/from file.
.npy files
This .npy file stores data, shape, dtype and other information required to reconstruct the ndarray in a disk file such that the array is correctly retrieved even if the file is on another machine with different architecture.
The IO is simple and is done through the functions numpy.save()
and numpy.load()
import numpy as np
a = np.array([1,2,3,4,5])
np.save('outfile',a)
b = np.load('outfile.npy')
.txt files
This is just a simple, cheap and dirty way to save an array to a file “symbolically”. The function that come into play are numpy.savetxt()
and numpy.loadtxt()
import numpy as np
a = np.array([1,2,3,4,5])
np.savetxt('out.txt',a)
b = np.loadtxt('out.txt')
print(b)