Programming Maths
Must Watch!
MustWatch
1. Sets
Sets are collections of objects.
Georg Cantor, a German mathematician initiated the development of set theory.
A set is mathematically represented by items inside curly braces ({}
).
For example:
a set of natural numbers (N): {1, 2, 3, }
a set of whole numbers : {0,1, 2, 3, }
a set of all numbers: consisting of a set of real numbers and complex numbers
Notations
is a Null set i.e. A set that does not have any elements.
is the Universal Set i.e. the set of all the elements of all the sets which are being considered for set operations
A is the complement of a Set: A set that includes all the elements that are not part of the universal set i.e. (A = A)
x A represents that x is a member of A. \nFor example, a person (x) in a set of the population of France (A).
A B represents that A is a subset of B. \nFor example, the set of the population of France (A) is a subset of the set of the population of Europe (B).
represents the Union of two sets i.e. the combination of all the elements of two sets into a single set
represents the Intersection of two sets i.e. combining the common elements of the two sets into a new set
2. Venn Diagrams
These are a scheme of representing sets and their operations diagrammatically.
Intersection of Two Sets
Intersection of two sets (A B)
Union of Two Sets
Union of two sets (A B)
3. Prime Numbers & Riemann Hypothesis
A Prime Number is any natural number greater than 1, that can only be divided exactly by itself or 1.
A set of Prime numbers can be represented as {2,3,5,7,11,13,17,...}
.
Riemann Hypothesis
It is a hypothesis that suggests that the distribution of prime numbers can be predicted and the frequency of prime numbers is very closely related to the behavior of the Reimann Zeta function.
The Reimann Zeta FunctionTo be precise, this conjecture tells that the Riemann zeta function has its zeros only at:
the negative even integers (trivial zeros) and
complex numbers with real part 1/2 (non-trivial zeros)
This is one of the unsolved Millennium problems in mathematics and the first person to provide acceptable proof will be given a $1 million prize by the Clay Mathematics Institute.
Millennium Problems
If it is easy to check that a solution to a problem is correct, is it also easy to solve the problem? This is the
www.claymath.org
4. Probability
It is a mathematical branch that deals with the likelihood of an event occurrence.
Mathematically represented by P, theprobability of an event X occurring is P(X).
P (X) = Number of events where X occurs / Total number of possible outcomes
For example, the probability of getting heads in a coin toss is 1/2 or 50%.
Rules of Probability
The probability of an event can only be between 0 and 1.
If two events are independent of each others occurrence, then the probability of both happening is given by multiplying their individual probabilities.
P (X and Y) = P (X) * P (Y)
If two events are mutually exclusive, the probability of either one occurring is given by adding their individual probabilities.
P (X or Y) = P (X) + P (Y)
5. Differential Calculus
It is a mathematical branch that is concerned with continuous change.
Its two major branches deal with:
the rate of change/ gradient of a quantity (differential calculus)
summation of infinitely many small quantities to calculate the whole (integral calculus)
Differentiation
It is used to calculate the gradient/ slope/ tangent of a mathematical curve.
It can also be used to find the smallest (minima) and largest value (maxima) of a function.
For a curve with an equation of y= x^n
, the gradient of y with respect to x (dy/dx
) is given by n * x^(n-1)
.
Rules of Differentiation
d(n)/dx = 0
d(x)/dx = 1
d(x^n)/dx = n * x^(n-1)
d(e^x)/dx = e^x
d(ln x)/dx = 1/x
d(n^x)/dx = n^x * ln(x)
d(sin x)/dx = cos x
d(cos x)/dx = sin x
Integration
It is the inverse of differentiation.
It is commonly used for calculating areas and volumes of shapes.
For an equation y = x^n
, the integral of y is given by:
y dx = (x^n) dx = (1/n+1) * (x^n+1) + C
where C is a numerical constant.
6. Correlation
A term introduced by Francis Galton, Correlation is a property that demonstrates how two variables are associated with each other.
It determines how varying one variable changes the other.
The Pearson correlation coefficient / Pearsons r () measures this correlation on a scale from -1 to 1 where:
0 No correlation
-1 Anti-correlation
1 Perfect correlation
Pearsons coefficient
Remember that:
Correlation does not imply causation.
The number of ice creams sold on a particular day and the number of children born on that day may correlate but this does not mean that one is the cause of another.
7. Regression
It is a process of mathematically estimating the relationships between a dependent variable and one or more independent variables.
Two types of regression models are explained below.
8. Linear Regression
It is an approach to modeling the linear relationship between different variables.
This relation can be between:
two variables (plotted on a 2-D plane) or
multiple variables (plotted on a multi-dimensional plane)
The relation between two variables can be represented as:
y = m*x + c
where x is the independent variable and y is the dependent variable.
The line that minimizes the sum of the squared vertical distance between the data points is considered the best-fitting line for linear regression.
Line of best fit in Linear Regression
Linear regression as a machine learning model can be used to predict continuous variables as opposed to classification.
For example, estimating the temperature (a continuous variable) at a particular time of the day.
Coefficient Of Determination (R2)
It is the proportion of the variation in the dependent variable that is predictable from the independent variable.
R2 is calculated as follows where:
SS(res) is the sum of squares of residuals (the sum of squared distance between the line of best fit and observations)
SS(tot) is the total sum of squares (the sum over all squared differences between the observations and their overall mean)
Formula for calculating R2Note that R2 should not be confused with the square of Pearsons coefficient.
s coefficient.
To learn more, check out this great article by Krishna Rao:
r2 or R2 When to Use What
Graphical explanation of the squared Pearson correlation coefficient and coefficient of determination
towardsdatascience.com
R2 can range between any negative number to +1, where
+1 indicates a perfect match of observations to predictions
0 indicates that predictions are random
a negative number indicates that the predictions are worse than random
9. Logistic Regression
This approach estimates the probability of an event occurring (that lies between 0 and 1) based on given independent variables.
Logistic Regression as a machine learning model is used for classification tasks (as opposed to regression tasks).
These classification tasks can either be:
binary (classification between 2 outcomes)
multi-class (classification between more than 2 outcomes)
The Logistic function is written as follows where :
is a location parameter or the midpoint of the logistic curve (0.5 in the plot below)
s is a scale parameter
Logistic FunctionThe following logistic plot is used to classify the probability of passing the exam based on the number of hours of study.
The Logistic Regression Plot
10. Matrices
A Matrix is an array/table of numbers.
They are extensively used in algebra to perform operations in a condensed manner.
An m * n
matrix has m
rows (horizontal) and n
columns (vertical).
The elements in a matrix are notated as follows where for element a
:
the first number in the subscript is the row number and,
the second number in the subscript is the column number
An m*n matrix with a(mn) elements
Matrix Arithmetic
Addition & Subtraction
Prerequisite: \nTwo matrices must have the same dimensions i.e. with the same number of rows and columns for them to be added or subtracted.
The number at a location can be added/ subtracted from a similarly located element in another matrix as follows:
Graphical representation of Matrix Addition
Numerical Representation of Matrix Subtraction (Example from Wikipedia) Matrix Multiplication
Prerequisite:\nTwo matrices A & B can only be multiplied if A has the same number of columns as the number of rows in B.
Numerical Representation of Matrix Multiplication Matrix Transposition
Matrix transposition flips a matrix over its diagonal.
This operation switches the row and column indices of matrix A by producing another matrix, denoted by A^T.
Graphical Representation of Matrix Multiplication
Matrix Inversion
In number multiplication, if x = 10
, then the inverse of x (x-1)i.e. 1/x = 1/10
Hence, the following holds true:
x * x^-1 = 1
For matrices, the inverse of a matrix A
must satisfy the above condition.
A * A^-1 = 1
For example, the inverse of a 2x2 matrix can be demonstrated below:
Numerical Representation of Matrix InversionThis is because A
multiplied by its inverse A^-1
returns the following:
Numerical Representation of calculating the Identity Matrix Identity Matrix (I)
It is the matrix counterpart of 1.
The identity matrix of size n x n
square matrix is one with ones on the main diagonal and zeros elsewhere.
Identity Matrix
Matrix Division
Two matrices A
and B
can be divided as follows:
A / B = A * B^-1
Remember that this will only be true if matrix B
has a possible inverse.
Also,
A^-1 * B
is not equal to A * B^-1
as for numbers.
Matrix Properties
Matrix multiplication is non-commutative
A X B
is not equal to B X A
Matrix multiplication is distributive with respect to matrix addition
For matrix A
of size m n
and B
of size n p
:
A(B+C) = AB + AC
(left distributivity)
For matrix C
of size n p
and D
of size p q
:
(B+C)D = BD + CD
(right distributivity)
If A
is a matrix and c
a scalar, then the matrices cA
and Ac
are obtained by left and right multiplying the entries of A
with c
.
11. Vectors
A Vector is a one-dimensional matrix.
It can be written as a list of numbers.
Matrices with a single row are called row vectors (1 x n
dimensional).
Matrices with a single column are called column vectors (n x 1
dimensional).
12. Fibonacci Numbers
Fibonacci numbers were first described in India by Pingala.
1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...
It is a sequence of whole numbers that can be obtained by adding the previous two numbers in the series.
Calculation of nth Fibonacci NumberThis sequence commonly occurs in nature in form of:
branching in trees
the arrangement of leaves on a stem
number of spirals formed from the number of seeds in the spirals of a sunflower
The Fibonacci SpiralApplications of Fibonacci numbers in Computer science include:
Fibonacci Search
Fibonacci Heap
Fibonacci Cubes
Fibonacci Coding
Check out one of my articles below to read more about recursion explained with the use of Fibonacci numbers:
Understanding Recursion By Dropping It Eventually
Lets understand Recursion--a junior developers nightmare!
levelup.gitconnected.com
13. The Golden Ratio ()
If we divide a line into two segments of different lengths i.e. A
and B
, both of these are in a golden ratio (
; phi) if:
The Golden Ratio LineThis equation can be rearranged into a quadratic equation as follows:The positive solution of the quadratic equation is:
Value of (Golden Ratio/ Mean)The Golden Ratio (
) can also be obtained by dividing a number in the Fibonacci sequence by the number behind it (especially true for large Fibonacci numbers)
For example, the two subsequent Fibonacci numbers 233
and 114
when divided:
233 / 144 = 1.618 =
Interesting, isnt it?
13. The Super-Golden Ratio ()
The Super-golden ratio comes from Narayanas cows sequence (which represents the breeding of the cattle population)
This sequence is similar to the Fibonacci sequence and is obtained as follows:
a(0) = a(1) = a(2) = 1
, thereafter:
a(n) = a(n-1) + a(n-3)
The first few numbers in the sequence are:
1, 1, 1, 2, 3, 4, 6, 9, 13, 19, 28, 41, 60, 88,
Similar to the Golden ratio, the super-golden ratio can be calculated by dividing two subsequent numbers in this series.
This ratio is represented by psi (\n
).
\n = 1.465571231876768026
14. Factorials
The Factorial of a number n
is represented by n!
.
It is the product of all positive integers less than or equal to n.
Calculation of Factorial
Calculation of FactorialNote that the value of the factorial of 0 i.e.0! = 1
15. Permutations
It is the number of ways in which items can be arranged in a particular order.
For given n
items, there are n!
possible permutations.
If picking k
items from a total of n
items, the possible permutations are n! / (n-k)!
.
For example, there are six permutations of the set {1, 2, 3}
:
(1, 2, 3)
(1, 3, 2)
(2, 1, 3)
(2, 3, 1)
(3, 1, 2)
(3, 2, 1)
16. Combinations
It is the number of ways in which items can be arranged without particular order.
If picking k
items from a total of n
items, the possible combinations (C (n, k)
) are:
n! / (k! * (n-k)!)
For example, when picking up two items from the set {1, 2, 3}
, there are three combinations of :
(1, 2)
(1, 3)
(2, 3)
17. Modulo Arithmetic
Modulo arithmetic can be represented by the word:
Wraparound
Imagine a clock.
When the 12-hour clock hand completes a circle and moves on 2, the time is still 2 and not 14.
In Python, the modulo of a number can be calculated as below.
This is the remainder left after dividing 12 by 2.
Congruence
Given two numbers a
and b
, if the difference of these numbers ( k = a-b
) is exactly divisible by another number n
:
a
and b
are said to be congruent modulo n
a b (mod n)
Applications
Modulo Arithmetic is used in:
Public Key Cryptography uses Modular Exponentiation
Bitwise Operations (e.g. XOR sums 2 bits, modulo 2)
International Bank Account Numbers (IBANs) use modulo 97 arithmetic to spot user input errors in bank account numbers
Computer algebra algorithms (e.g. Polynomial Factorisation)
18. Elliptic Curve
An elliptic curve is defined using the following equation:
The equation for an Elliptic curveAn elliptic curve used in the Bitcoin software is called secp256k1.
This curve is represented by the equation below where a = 0
and b = 7
.
y2 = x3 + 7
When plotted over real numbers, this curve looks as below.
The plot of secp256k1 over the real numbers ( Source: Bitcoin Wikipedia )
19. Field
A field is a set of numbers along with its two defined operations:
Addition (+) and
Multiplication ()
These operations are should satisfy certain properties called field axioms.
Associativity of addition and multiplication
Commutativity of addition and multiplication
Additive and Multiplicative identity
Additive and Multiplicative inverses
Distributivity of multiplication over addition
For example, there can be fields of:
real numbers
complex numbers (i.e. real + imaginary numbers)
rational numbers, and more
Note that: All of these fields contain infinite elements.
20. Finite Field
It is a field with finite elements.
The order of the field is the number of elements contained in the field.
The order of a finite field is a prime number or a positive power of a prime number.
The result of the addition or multiplication of two elements in the set is always within the set.
To ensure the above,
For a finite field F(n) of order n
with two of its elements a
and b
, their:
Addition is defined as ( a + b ) mod n
Multiplication is defined as ( a * b) mod n
The order of F(2) is two (a prime number).
This field is used extensively in computer science as the two possible values of a bit.
Elliptic curves, Finite fields along with their modular arithmetic operations lead to Public Key Cryptography (explained in depth in the article below).
The Mathematics Behind Bitcoin
levelup.gitconnected.com
21. Decimal Numbers
The numbers that we use on a day-to-day basis are Decimals, which is also called the Base(10) number system.
This number system has numbers between 0
and 9
.
Number Representation
To represent a number in the Decimal number system, we use the Base(10) notation as follows:
1 = 1
x 10
25
= 2
x 101 + 5
x 10
200
= 2
x 102 + 0
x 101 + 0
x 10
22. Binary Numbers
Binary is the Base(2) number system with two possibilities : 0
and 1
.
Binary was developed by an Indian mathematician called Pingala within Sanskrit poetry.
Now, almost all modern-day computers (except quantum computers that follow the Qubit system) rely on Binary bits, which represent the on (1
) and off (0
) state of a switch.
Binary To Decimal Number Conversion
To convert a Decimal number to the Binary number system, we use the Base(2) notation as follows:
1
(binary) = 1
x 2 = 1
(decimal)
1010
(binary) = 1
x 23 + 0
x 22 + 1
x 21 + 0
x 2 = 10
(decimal)
Decimal To Binary Number Conversion
To convert a decimal number N
to binary, the steps are as follows:
Divide the number N
by 2 and store the remainder in a list.
Keep doing this till you reach 0
At this point reserve your list of remainders, which will be the resultant binary number
For example, 88 can be converted to binary as:
88/2 = 44
(remainder = 0
)
44/2 = 22
(remainder = 0
)
22/2 = 11
(remainder = 0
)
11/2 = 5
(remainder = 1
)
5/2 = 2
(remainder = 1
)
2/2 = 1
(remainder = 0
)
1/2 = 0
(remainder = 1
)
The remainder can be written as 0001101
.
To get the result, change the least significant bit to the most significant bit i.e. 1011000
Addition
When adding binary numbers, the result is the largest number out of the two added.
0
+ 0
= 0
1
+ 0
= 1
0
+ 1
= 1
In the case of 1
+ 1
, the result is 0
and we carry over 1
.
The 1's complement of a binary number is calculated by substitution of all its ones with zeros and vice versa.
For example, the 1s complement of 10101
is 01010
.
1s complement represents the bitwise NOT
logical operation.
2s Compliment of Binary Numbers
The 2s complement of a binary number is calculated by calculating the 1s complement and adding 1 to the least significant bit (LSB) of the given result.
For example:
The 2s complement of 1010
is:
Step 1: Calculate the 1s complement i.e. 0101
Step 2: Add 1 to the LSB i.e. 0101
+ 0001
= 0110
2s complements are used to represent signed integers.
Subtraction
The following rules apply when subtracting two binary numbers:
0
C0
= 0
1
C0
= 1
1
C1
= 0
If the case of 0
C1
the result is 1
and we borrow 1
from the next higher order number.
For example:
1111
1010
= 0101
Multiplication
The following rules apply when multiplying two binary numbers:
0
x 0
= 0
0
1
= 0
1
0
= 0
1
1
= 1
Heres a short video from Khan Academy on YouTube that better explains multiplication than just reading it.
Division
Binary division again is very tough to understand just by reading.
Check out this great video by Neso Academy on YouTube that explains it best.
23. Hexadecimal Numbers
There are numbers that follow the Base(16) number system.
These are represented by the set below:
{0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F}
Hexadecimal numbers are commonly used:
To represent memory addresses
To represent Media Access Control (MAC) addresses
To define colors
Conversion of Hexadecimal To Decimal
To convert a Hexadecimal number to a Decimal, we use the Base(16) notation as follows:
1
(hexadecimal) = 1
x 16= 1
(decimal)
1ED2
(hexadecimal) = 1
x 163+ E
x 162+ D
x 161+ 2
x 16= 7890
(decimal)
Conversion of Hexademical To Binary
The steps to perform this are as follows:
Split the hexadecimal number into single values
Convert each value to a decimal number
Convert the decimal number to a binary number with 4 bits
Combine them all together
For example, 1ED2
is binary is:
1
-> 1
(decimal) -> 0001
(binary)
E
-> 14
(decimal) -> 1110
(binary)
D
-> 13
(decimal) -> 1101
(binary)
2
-> 2
(decimal) -> 0010
(binary)
Combine them all to get the result: 0001111011010010
or 1111011010010
(removing 0
s at the beginning)
24. Endianness
It is the order or sequence of bytes used when storing data in computer memory.
Big-endian System
Here, the most significant byte of multi-byte data is stored at the smallest memory address and the least significant byte at the largest.
Before transferring data on a network, data is first converted to big-endian.
Big-endian Memory Storage
Little-endian System
In this system, the least-significant byte is stored at the smallest memory address and vice versa.
Intel-based processors are little-endians.
Little-endian Memory Storage
Note that Bi-endian processors can run in both little and big endian modes.
25. Boolean Logic
Developed by George Boole, it is a systematic way of encoding Boolean values ( True
and False
) into variables and performing operations on them (like in algebra).
Boolean variables can be combined using Logic Gates.
Logic Gates perform a logical operation on one or more binary inputs and produce a single binary output.
In a computer, logic gates are implemented using diodes and transistors.
Such logic gates can be combined to form Logic circuits in devices such as multiplexers, registers, arithmetic logic units (ALUs), computer memory, and microprocessors.
0 and 1
in a Truth table represent the off and on states respectively.
The commonly used logic gates are as follows:
NOT Gate/ Inverter
It negates/ inverts the input passed to it.
It is a gate that requires only one input.
The NOT
logical operation is represented by the negation operator (
).
NOT gate
The NOT
gate truth table is as follows:
NOT gate Truth table AND Gate
This gate performs the logical conjunction operation.
This operation is represented by
.
AND Gate
The AND
gate truth table is as follows:
AND gate Truth Table OR Gate
This gate performs the logical disjunction operation.
This operation is represented by
.
OR Gate Truth Table
The OR
gate truth table is as follows:
OR gate Truth Table NAND Gate
It is the complement of the AND
gate.
NAND Gate
NOR Gate
It is the complement of the OR
gate.
NOR Gate
XOR Gate
Also called the Exclusive OR gate, it represents the inequality function, (the output is True
if the inputs are not equal and vice versa).
XOR Gate
XNOR Gate
It is the complement of the XOR
gate.
XNOR Gate
The truth tables for all the gates described above can be summarized as:
Truth table of Logic Gates
26. De Morgans Laws
Named after Augustus De Morgan, these laws are as follows:
Law 1: Not (A and B) is the same as Not A or Not B
Law 2: Not (A or B) is the same as Not A and Not BIn the above laws,
.
represents the logical AND
operation
+
represents the logical OR
operation
overbar represents the NOT
operation
means identical to
In view of sets, De Morgans laws can be represented using Venn Diagrams as follows:
De Morgans laws in view of Sets represented with Venn diagrams
27. Eulers Theorem & Graph Theory
The Graph theory originated in the 18th century with an interesting story.
Knigsberg was a city in historic Prussia (modern-day Russia) with 7 bridges that crossed river Pregel.
The map of Knigsberg
It was asked:Is it possible to walk around Knigsberg crossing each bridge exactly once?
Note that it did not matter that we finished exactly where we started.
We will come back to this after learning a few terminologies.
Graph
A Graph is a mathematical structure that consists of:
vertices (also called nodes or points) (V) which are connected by
edges (also called links or lines) (E)
It is represented as:
G = (V, E)
Degree
The number of edges of a particular vertex is called its Degree.
A graph with 6 vertices and 7 edges
Graphs are commonly used in computer science to describe relationships between different objects.
For example, Facebook as a graph represents different people (vertices) and their relationships (edges).
Similarly, Wikipedia editors (edges) contributing to different Wikipedia language versions (vertices) during one month in summer, can be described as a graph as below.
Diagramatic representation of River Pregel in Knigsberg
Euler first represented the above diagram using a Graph as below.
He described each land mass using a vertex or node, and each bridge with an edge.
Graph representation of River Pregel in Knigsberg
Euler came up with a theorem that states that:The bridges of a city can be transversed exactly once if apart from at most two, all points have an even degree.
Looking at the graph representing Knigsberg, each vertex has an odd degree and hence it is not possible to walk around the city crossing each bridge exactly once.
This theorem gave rise to the modern graph theory i.e. the study of graphs.
28. Graphs & Types
Directed Graph/ Digraph
A graph in which edges have orientations.
This means that an edge can only be transversed in one direction.
For example, a graph representing a Medium newsletter and its subscribers.
Undirected Graph
A graph in which edges do not have orientations.
This means that an edge can be transversed bidirectionally.
For example, a graph representing the relationship between friends on Facebook.
Cycle
A cycle is a graph with some of its vertices (at least 3) connected in a closed chain.
Cyclic Graph
It is a graph with at least one cycle.
A Directed Cyclic Graph
Acyclic Graph
It is a graph with no cycles.
Connected Graph
It is a graph with an edge from any vertex to another.
It can be:
Strongly connected: if there any bi-directional edge connections between all vertices
Weakly connected: if there are no bi-directional connections between all vertices
Disconnected Graph
A graph with no connected vertices is called to be disconnected.
29. Graph Transversal Algorithms
Depth-first Search
Depth-first search (DFS) is an algorithm for searching graph data structures.
The algorithm starts at the root node and explores as far as possible along each branch before going back to the start.
Depth-first Search can be defined in Python as below.
Here we define a Node
class where its constructor defines its children
(connected vertices) and name
.
The addChild
method adds new children to a node.
The depthFirstSeach
method recursively implements the Depth First Search algorithm.
Breadth-first Search
Breadth-first search (BFS) is another algorithm for searching a graph data structure.
It starts at the root node and explores all nodes at a present branch prior to moving on to the nodes in the other branches.
The algorithm can be defined below using the breadthFirstSearch
method of the Node
class.
Portrait of Euclid
30. Benfords Law
It is an observation followed by many real-life numerical datasets.
In datasets that follow the law, the leading digits are more likely to be small than large.
For example, the number 1
appears as the leading significant digit about 30% of the time, while 9 appears as the leading significant digit less than 5% of the time.
Benfords law applies to a surprisingly large number of datasets including:
stock prices
population numbers
death rates
physical and mathematical constants etc.
Benfords law: The height of the bar is the percentage of numbers that start with that digit
Benfords Law for numbers expressed in base 10, can be mathematically represented as :
where,
P(d)
= Probability of occurrence of a leading digit (d
)
This law is commonly used to detect fake/ randomly generated datasets (e.g. randomly generated population datasets) as these datasets will not follow Benfords law.
31. Zipfs Law
It states that the rank-frequency distribution is an inverse relation.
Applied to natural language, it states the frequency of any word is inversely proportional to its rank in the frequency table.
For example, the most frequent word will occur approximately twice as often as the second most frequent word, and three times as often as the third most frequent word.
The n
th most common word will occur with a probability proportional to 1/n
.
Similarly, the law holds true for other datasets such as:
mathematical expressions
ranks of notes in music
income rankings
ranks of the number of people watching the same TV channel
A great video describing Zipfs law can be found below:
32. Pareto Principle
This principle states that for many real-world outcomes, roughly 80%
of consequences come from 20%
of causes.
When raising funds, 20% of the donors contribute 80% of the total
Observations following the Pareto principle can be represented by Pareto distributions.
Pareto distribution - Wikipedia
The Pareto distribution, named after the Italian civil engineer, economist, and sociologist Vilfredo Pareto (
en.wikipedia.org
Surprisingly, the Pareto principle holds true in computer science as well.
Microsoft noted that by fixing the top 20% of the most-reported bugs, 80% of the related errors and crashes in a given system would be eliminated.
33. Prices Law
It states that 50% of any given result is generated by the square root of the number of those who contribute to it.
For example, 50% of all publications on a given subject are published by the square root of all authors.
Or, in a company with 100 employees, 10 will produce 50% of all the output.
![]()
Portrait of Indian Mathematician Srinivasa Ramanujan
34. Exponentiation
It is a mathematical operation of raising a number b
, called the base, to another number n
, called the exponent or power.
When n
is a positive number in b^n
, it is similar to multiplying n
factors of b
.
When n
is 0
, b^n
is 1
.
When n
is negative, it denotes an inverse i.e. b^-n = 1 / b^n
.
When n
is a fractional number, it denotes a root i.e. b^1/n = nb
Image from Wikipedia Exponent Rules
The following identities apply to exponentiation.
37. Infinite Series
It is an ordered sum of infinite terms represented as follows:
It can also be represented using the \n (sigma) symbol, where the:
subscript represents the lower limit of the series
superscript represents the upper limit of the series
Convergent Series
It is a series whose partial sums (the sum of the first nth terms) tend to a specific number (limit).
A Convergent Series with a limit of 1In the above, the sum of the first:
two terms is: 1/2
three terms is: 3/4
four terms is: 7/8
One can see that these sums are approaching 1
(i.e. the limit).
Divergent Series
It is a series whose partial sums do tend to a specific number and it has a limit of
or -
.
Here, the nth partial sums do not tend to a finite limit and keep on growing.
Ramanujan Summation
It is a technique by the mathematician Srinivasa Ramanujan for assigning a value to divergent series.
Note that partial sums of these series do not converge to the Ramanujan sum (denoted by R
).
Ramanujan Summation of a divergent series (2)
Ramanujan Summation of a divergent series (3) Geometric Series
It is a series where each successive term is obtained by multiplying the previous term by a constant number (called the common ratio).
Geometric series with a common ratio of 1/2 Harmonic Series
It is an infinite series formed by summing all positive unit fractions.
It is always Divergent.
Also, e
can be represented as:
38. Taylor Series
It is a series that can represent a mathematical function f(x)
at a point a
, using polynomials as follows:
or,
sin (x) and its Taylor approximations by polynomials of degrees 1, 3, 5, 7, 9, 11, and 13 at x = 0
As the degree of the Taylor polynomial rises, it approaches the correct function.
Below-mentioned is a beautiful demonstration of the Taylor series and a highly recommended video.
39. Eulers Formula
This formula forms a link between trigonometry and complex exponential functions.
Eulers identitywhere:
e
is Eulers number
i
is the imaginary unit
is the ratio of the circumference of a circle to its diameter
40. Quantum Algorithms
Classical computers understand instructions represented in the form of binary bits ( 0
and 1
).
On the other hand, quantum computers work on instructions represented by Qubits.
A Qubit represents 0
and 1
at the same time (called Superposition).
A Quantum algorithm is a finite sequence of instructions for solving a problem using a Quantum computer (machines that follow principles of Quantum Superposition & Entanglement).
It is a quantum algorithm that is used to find the prime factors of an integer (N
) in Polylogarithmic time (O((log(N))c) where c is a constant
).
Note that there is no classical computer algorithm known that can factor integers in polynomial time.
Shors algorithm can break public-key cryptography (e.g. the RSA scheme) but if this happens, this can be replaced by quantum-resistant cryptographic techniques.
Grovers algorithm
It is a quantum algorithm used to search unstructured data in O(N)
time.
Note that classical computer algorithms usually perform search operations in O(N)
time.
41. P vs. NP Problem
A problem that can be solved in the polynomial time complexity is called to be a problem of type/class P
or polynomial time.
On the other hand, a problem whose solution can be verified in polynomial time is of the type/class NP
or nondeterministic polynomial time.
It remains one of the largest unsolved problems in mathematics and computer science whether P = NP
.
In other words:
If the solution to a problem is easy to verify, must the problem be easy to solve?
For example, Sudoku is in NP
(quickly checkable) but does not seem to be in P
(quickly solvable).
Similarly finding Proof of Work for a blockchain network, is an NP
type problem.
If you are new to the concept of Proof of Work, check it out here:
100 Essential Web-3 Concepts That One Should Know About (Part 2: 11C20)
Your guide to demystifying Web-3
levelup.gitconnected.com
NP-Complete Problems
These are problems that do not have a polynomial-time algorithm to solve them to date nor is proof available that no polynomial-time algorithm exists to solve them.
Interestingly, one NP
-complete problem can simulate every other i.e. if any one of the NP
-complete problems can be solved in polynomial time, then all of them could be.
To prove that a problem is NP
-complete:
first, prove that it is in NP
,
then to reduce a known NP
-complete problem to it
Some common NP
-complete problems are:
Boolean satisfiability problem (SAT)
Knapsack problem
Hamiltonian path problem
Travelling salesman problem
Subgraph isomorphism problem
Subset sum problem
Clique problem
Vertex cover problem
Independent set problem
Dominating set problem
Graph coloring problem
42. Optimization
It is a technique for finding the parameters that maximize or minimize the value of a mathematical function.
The Diet Problem
It is a real-world optimization problem proposed by George Stigler that states:
What is the cheapest combination of 77 available foods to feed a moderately active man on a daily basis so that his recommended dietary intake of 9 key nutrients is satisfied?
This is a problem to which Linear Optimization can be applied.
In other words, the problem can be represented using linear functions.
According to Stiglers calculations, the annual cost of his solution was $39.93
.
Gradient Descent
Shown below is Gradient descent, an optimization algorithm commonly used to train machine learning models and neural networks.
The algorithm aims to minimize the loss function associated with a machine learning model with respect to its parameters, to make it more accurate.
Gradient Descent Optimization Algorithm (Source: https://www.ibm.com/topics/gradient-descent)
Image by Freepik
43. Chaos Theory
It is a branch of mathematics that deals with apparently random or unpredictable behavior in deterministic systems.
Chaotic behavior is commonly seen around us in flowing fluids, weather, road traffic, swinging pendulums, and more.
Founder of the modern chaos theory, Edward Lorenz noticed that minute differences in initial conditions can yield widely diverging outcomes for deterministic systems (systems whose future behavior is fully determined by their initial conditions)
This makes them unpredictable in the long term.
Lorenz noticed this during his weather prediction computations if the initial conditions were entered with 6 decimal places, the results were massively different from when 3 decimal places were used.
Starting the pendulum from a slightly different initial condition results in a vastly different trajectory
Butterfly Effect
It is an important principle of the Chaos theory that states that tiny changes in the initial conditions can lead to drastic changes in the long-term state of a system.
As Lorenz metaphorically stated, a butterfly flapping its wings in Brazil can influence a tornado in Texas.
An attractor a set of states towards which neighboring states eventually approach in their course.
In the example of a swinging pendulum, the point towards which the pendulum eventually goes stationary is the attractor of this dynamic system.
Lorenz described the earths atmosphere with rolling fluid convection using three nonlinear equations.
Lorenz EquationsThe outputs of these equations form a mysterious curve that never crosses itself, never reaches equilibrium, and demonstrates chaos.
This curve is called the Lorenz Attractor.
Solution of Lorenz equations plotted in the form of the Lorenz attractor with = 28, = 10, and = 8/3
44. Knapsack Problem
Lets say that there is a thief who has a knapsack/ backpack that can hold a limited amount of weight.
Which boxes from the following should be chosen to maximize the monetary value of stolen goods while keeping the overall weight under or equal to 15 kg?
Image from WikipediaThis is an optimization problem commonly encountered during resource allocation under constraints.
There is no known algorithm that can give the correct and fastest (polynomial-time) solution to all cases of the problem (The problem is NP-complete).
If we check all possible combinations to find the optimal solution to the problem, this brute-force algorithm takes O(2^n)
time i.e. non-polynomial time.
45. Travelling Salesman Problem
Imagine that there is a salesperson with a list of cities for package delivery.
The problem starts with a question:
What should be the order for them to travel to different cities so that they minimize their total distance?
A Crazy Difficult Algorithmic Problem: Can You Help This Person?
Solving this better will probably get you the Turning Award!
medium.com
The solutions to the problem have been described in this wonderful article by Pedram Ataee, in Towards Data Science:
How to Solve Traveling Salesman Problem A Comparative Analysis
Through 3 optimization methods: dynamic programming, simulated annealing, and 2-opt
towardsdatascience.com
A new solution to this problem was recently published and is well explained in the article below:
Computer Scientists Break Traveling Salesperson Record
"This is a result I have wanted all my career," said David Williamson of Cornell University, who has been studying the
www.quantamagazine.org
The problem has also been applied in other areas such as astronomy & DNA synthesis.
Mathematicians have found the shortest route to visit 2 million stars
We have found the best path to take between the stars. The travelling salesman problem, an infamous mathematical puzzle
www.newscientist.com
Traveling Salesman Applied To DNA Synthesis
One of the fun things about being a programmer is that algorithms that you know can often be applied in unexpected ways
www.i-programmer.info
46. Transcendental Numbers
These are numbers that are not the root of any integer polynomial.
(In contrary to 2
that can be written as 2 = 1.4142..
, where2
is a rational number and2
is an irrational number)
They are also called non-algebraic numbers.
Some important transcendental numbers are:
(Pi)
e
(Eulers number)
sin(x)
cos(x)
tan(x)
ln(x)
In the above examples, x
is algebraic and non-zero.
47. Picks Theorem
It is a theorem in Discrete Geometry by Georg Pick that is used to calculate the area of an arbitrarily shaped polygon.
To calculate the area, lay the polygon on a grid of points where every point is separated by a unit length.
Next, count the points on the interior (i
) and on the boundary of the polygon (including both vertices and points along the sides) (b
).
The area of the polygon is given by:
Picks theoremIn the example below, i = 7
and b = 8
.
Polygon on a grid with internal points in red and boundary points in green
Hence, its area is calculated as:
A = i + b/2 1 = 7 + 8/2 1 = 10
Picks theorem is popularly used in the field of Computer Graphics.
48. Imaginary & Complex Numbers
Imaginary numbers are real numbers multiplied by i
(the imaginary unit or iota).
i2 = 1
or i = -1
Complex numbers (z
) are a combination of real and imaginary numbers represented as:
z = a + i * b
wherea
,b
are real numbers and i
is the imaginary unit.
49. Argand Diagram
It is a diagram that represents a complex number on a 2D plane.
Its vertical axis corresponds to the imaginary part and the horizontal axis corresponds to the real part of the complex number.
It is also called the Complex plane.
The Complex Plane (The distance from the origin to the point z is the modulus or absolute value of z & angle is the argument of z)
50. Quaternions
Irish mathematician William Rowan Hamilton extended the Argand Diagram to add two more imaginary components and represented the complex numbers as follows:
z = a + bi + cj + dk
where a
, b
, c
, and d
are real numbers and i
, j
, k
are different imaginary units/ basis vectors.
i2 = j2 = k2 = ijk = -1
These 4D complex numbers are called Quaternions.
Multiplication of quaternion units represents 90 rotations in the planes of 4-dimensional space.
Multiplication of quaternion units
Complex numbers are extensively used in Quantum Mechanics, a branch of Physics essential in building Quantum computers.
IBM Quantum System One Quantum Computer