Python/SciPy

From Christoph's Personal Wiki
Jump to: navigation, search

SciPy (pronounced "Sigh Pie") is an open source Python library used for scientific computing and technical computing.

The SciPy Library/Package

The SciPy package of key algorithms and functions core to Python's scientific computing capabilities. Available sub-packages include:

  • constants: physical constants and conversion factors
  • cluster: hierarchical clustering, vector quantization, K-means
  • fftpack: Discrete Fourier Transform algorithms
  • integrate: numerical integration routines
  • interpolate: interpolation tools
  • io: data input and output
  • lib: Python wrappers to external libraries
  • linalg: linear algebra routines
  • misc: miscellaneous utilities (e.g. image reading/writing)
  • ndimage: various functions for multi-dimensional image processing
  • optimize: optimization algorithms including linear programming
  • signal: signal processing tools
  • sparse: sparse matrix and related algorithms
  • spatial: KD-trees, nearest neighbors, distance functions
  • special: special functions
  • stats: statistical functions
  • weave: tool for writing C/C++ code as Python multi-line strings

Examples

Note: These examples will be biased towards preparation for deep learning and machine learning courses (not to show you every feature of all these libraries).

from scipy.stats import norm
import numpy as np
Gaussian probability density function (PDF)
norm.pdf(0)
norm.pdf(0, loc=5, scale=10) # scale => stdev

r = np.random.randn(10)
norm.pdf(r) # calculate the PDF of all values r at the same time

# Log of Gaussian PDF is much faster to calculate, since there are no exponentials
# adding is a cheaper operation than multiplying
norm.logpdf(r)
# CDF
Guassian cumulative distribution function (CDF)
norm.cdf(r)
norm.logcdf(r)
Sampling from a Gaussian distribution (1D)
r = np.random.randn(10000)
plt.hist(r, bins=100)
plt.show()

r = 10 * np.random.randn(10000) + 5 # 10 = stdev; 5 = mean
plt.hist(r, bins=100)
plt.show()
Sampling from a Gaussian distribution (spherical and axis-aligned elliptical)
r = np.random.randn(10000, 2)
plt.scatter(r[:,0], r[:,1])
plt.show()

# stdev
r[:,1] = 5 * r[:,1] + 2
plt.scatter(r[:,0], r[:,1])
plt.show()

plt.scatter(r[:,0], r[:,1])
plt.axis('equal')
plt.show()
Sampling from a General Multivariate Normal
from scipy.stats import multivariate_normal as mvn

cov = np.array([ [1, 0.8], [0.8, 3] ])
mu = np.array([0, 2]) # mean
r = mvn.rvs(mean=mu, cov=cov, size=1000) # 1k data points
plt.scatter(r[:,0], r[:,1])
plt.axis('equal')
plt.show()

# Do the same in NumPy
r = np.random.multivariate_normal(mean=mu, cov=cov, size=1000)
plt.scatter(r[:,0], r[:,1])
plt.axis('equal')
plt.show()
Other interesting SciPy functions
x = np.linspace(0, 100, 10000)
# Frequencies: 1, 3, 5
y = np.sin(x) + np.sin(3*x) + np.sin(5*x)
plt.plot(y)
plt.show()

Y = np.fft.fft(y)
plt.plot(np.abs(Y))
plt.show()

# Find the original frequencies:
2 * np.pi * 16/100 # => ~1
2 * np.pi * 48/100 # => ~3
2 * np.pi * 80/100 # => ~5

See also

External links