# Uncomment the below code and run to enjoy.
# Comment it out back and run before exiting
# as this might cause the notebook to crash
# in the next time you open it
# from IPython.display import HTML
# HTML("""
# <br/>
# <center><font color="red" size="16px">Messi. Messi. Messi.</font></center>
# <center>
# <video controls width="620" height="440" src="imgs/" type="video/mp4">
# </video>
# </center>
# <br/>
# """)
Spectral-response functions of each of the three types of cones
Most spectral color can be represented as a positive linear combination of these primary colors(but..)
But some spectral cannot - need to add some red
green triggers green cone more than red cone, red triggers red cone more than green cone, when the two cones are balanced, the human vision can't tell the difference
Color matching experiments [Wright & Cuild 1920s]
A new space with desired properties
$$x = \frac{X}{X+Y+Z}$$
$$y = \frac{Y}{X+Y+Z}$$
Think of chroma (here a,b) defining a planar disc at each luminance level (L)
If hue values range in [0, 360], what is the absolute difference between the following pairs of hues?
225 and 75: 150
45 and 315: 90
"Picture element" at location $\color{blue}{(x,y)}$, value or color $\color{blue}{c}$
What does this view enable us to do?
%matplotlib inline
import cv2 as cv
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
from IPython.display import clear_output, Image as NoteImage, display
import PIL
from io import BytesIO
def imshow(im,fmt='jpeg'):
#a = np.uint8(np.clip(im, 0, 255))
f = BytesIO()
PIL.Image.fromarray(im).save(f, fmt)
def imsave(im,filename,fmt='jpeg'):
#a = np.uint8(np.clip(im, 0, 255))
PIL.Image.fromarray(im).save(filename, fmt)
def imread(filename):
img = cv.imread(filename)
img = cv.cvtColor(img, cv.COLOR_BGR2RGB)
return img
def rgb_to_color(rgb):
return f"#{hex(rgb[0])[2:].zfill(2)}{hex(rgb[1])[2:].zfill(2)}{hex(rgb[2])[2:].zfill(2)}"#f'rgb({rgb[0]}, {rgb[1]}, {rgb[2]})'
def img_to_colors(img):
return [rgb_to_color(i) for i in img]
def plot_in_rgb_space(img):
fig, ax = plt.subplots(1, 1, subplot_kw={'projection':'3d', 'aspect':'equal'})
c = img_to_colors(img.reshape(-1,3))
img = imread("imgs/L938.png")
def cfilter(img,r,g,b):
img2 = img.copy()
img2[(img2[:,:,0] > r) | (img2[:,:,1] > g) | (img2[:,:,2] > b)] = [255,255,255]
return img2
Define intensity ($\color{blue}{Y}$) as some combination of $\color{blue}{R}$,$\color{blue}{G}$,$\color{blue}{B}$
$$\color{blue}{Y = W_R\times R + W_G \times G + W_B \times B}$$
$$\color{blue}{= 0.299\times R + 0.587 \times G + 0.114 \times B}$$
Then compute new color values, taking out intensity
$$\color{blue}{U = U_{max}\frac{B-Y}{1-W_B} \approx 0.392 \times (B-Y)}$$
$$\color{blue}{V = V_{max}\frac{R-Y}{1-W_R} \approx 0.877 \times (R-Y)}$$
Assuming $\color{blue}{R}$,$\color{blue}{G}$,$\color{blue}{B}$ and $\color{blue}{Y}$ are in the range $\color{blue}{[0,1]}$
$$\color{blue}{U \in [-U_{max}, U_{max}]}\color{black}{\text{ and }}\color{blue}{V \in [-V_{max}, V_{max}]}$$
import cv2
import numpy as np
def make_lut_u():
return np.array([[[i,255-i,0] for i in range(256)]],dtype=np.uint8)
def make_lut_v():
return np.array([[[0,255-i,i] for i in range(256)]],dtype=np.uint8)
img_yuv = cv2.cvtColor(img, cv2.COLOR_BGR2YUV)
y, u, v = cv2.split(img_yuv)
lut_u, lut_v = make_lut_u(), make_lut_v()
# Convert back to BGR so we can apply the LUT and stack the images
y = cv2.cvtColor(y, cv2.COLOR_GRAY2BGR)
u = cv2.cvtColor(u, cv2.COLOR_GRAY2BGR)
v = cv2.cvtColor(v, cv2.COLOR_GRAY2BGR)
def yuvfilter_cv(yuvimg,ymin,ymax,umin,umax,vmin,vmax):
i = yuvimg.copy()
img2[(i[:,:,0] < ymin) | (i[:,:,0] > ymax) | (i[:,:,1] < umin) | (i[:,:,1] > umax) | (i[:,:,2] < vmin) | (i[:,:,2] > vmax)] = [255,255,255]
return img2
print("RGB Color Filter")
yuv_img = cv2.cvtColor(img, cv2.COLOR_BGR2YUV)
print("YUV Filter")
Focus on HS projection
Colors spread along a single dimension! Hue
Treat hue as an angle
$$\color{blue}{SSD = \sum_{cluster\, C_i}\sum_{p \in C_i}||P_j - c_i||^2}$$
import cv2
import numpy as np
def kmeans(image,segments):
1. samples : It should be of np.float32 data type, and each feature should be put in a single column.
2. nclusters(K) : Number of clusters required at end
3. criteria : It is the iteration termination criteria. When this criteria is satisfied,
algorithm iteration stops. Actually, it should be a tuple of 3 parameters.
They are ( type, max_iter, epsilon ):
3.a - type of termination criteria : It has 3 flags as below:
cv2.TERM_CRITERIA_EPS - stop the algorithm iteration if specified accuracy, epsilon, is reached.
cv2.TERM_CRITERIA_MAX_ITER - stop the algorithm after the specified number of iterations, max_iter.
cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER - stop the iteration when any of the above condition is met.
3.b - max_iter - An integer specifying maximum number of iterations.
3.c - epsilon - Required accuracy
4. attempts : Flag to specify the number of times the algorithm is executed using different initial labellings.
The algorithm returns the labels that yield the best compactness.
This compactness is returned as output.
5. flags : This flag is used to specify how initial centers are taken. Normally two flags are
used for this : cv2.KMEANS_PP_CENTERS and cv2.KMEANS_RANDOM_CENTERS.
criteria=(cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 10, 1.0)
res = center[label.flatten()]
segmented_image = res.reshape((image.shape))
return label.reshape((image.shape[0],image.shape[1])),segmented_image.astype(np.uint8)
def extractComponent(image,label_image,label):
return component
def segment(image,segments=3):
label,result= kmeans(image,segments=segments)
messi = imread("imgs/messi_liverpool.jpg")
for i in range(2,20):
Depending on what we choose as the feature space, we can group pixels in different ways
Can be thought of as quantization of the feature space; segmentation label map
Depending on what we choose as the feature space, we can group pixels in different ways.
Grouping pixels based on colorsimilarity.
K-means clustering based on intensity or color is essentially vector quantization of the image attributes
Grouping pixels based on intensity+position similarity
Can combine color and location...
The mean shift algorithm seeks modes or local maxima of density in the feature space
## You need to install pymeanshift from
import cv2
import pymeanshift as pms
original_image = imread("imgs/peppers.jpg")
segmented_image, labels_image, number_regions = pms.segment(original_image, spatial_radius=6,
range_radius=4.5, min_density=50)
print("Number of segments %d" % number_regions)
messi = imread("imgs/messi_liverpool.jpg")
segmented_messi, labels_image, number_regions = pms.segment(messi, spatial_radius=6,
range_radius=4.5, min_density=50)
print("Number of segments %d" % number_regions)
Fully-connected graph
$$\color{blue}{\text{aff}(x_i,x_j) = exp \left ( -\frac{1}{2\sigma^2}dist(x_i,x_j)^2 \right ) }$$
Break Graph into segments
$$\color{blue}{cut(A,B) = \sum_{p\in A,q\in B}w_{pq}}$$
Set of edges whose removal makes a graph disconneted
Cost of a cut: Sum of weights of cut edges
A graph cut gives us a segmentaion
Find minimum cut
Problem with min cut:
Fix bias of min cut by normalizing for size of segments:
$$\color{blue}{Ncut(A,B) = \frac{cut(A,B)}{assoc(A,V)} + \frac{cut(A,B)}{assoc(B,V)}}$$
$\color{blue}{assoc(A,V)}$ = sum of weights of all edges that touch A
Approximate solution for minimizing the Ncut value: Generalized eigenvalue problem
$$color{blue}{D(i,i) = \sum_jW(i,j)}$$
Where $color{blue}{y}$ is an idicator vector with 1 in the $i^{th}$ position if the $i^{th}$ feature point belongs to A, negative constant otherwise
Represent the image as a weighted graph $\color{blue}{G = (V,E)}$, compute weight of each edge and summarize in $\color{blue}{D}$ and $\color{blue}{W}$
Solve $\color{blue}{(D-W)y = \lambda Dy}$ for the eigenvector with the second smallest eigenvalue
Use the entries of the eigenvector to bipartition the graph
GrabCut algorithm was designed by Carsten Rother, Vladimir Kolmogorov & Andrew Blake from Microsoft Research Cambridge, UK. in their paper, "GrabCut": interactive foreground extraction using iterated graph cuts . An algorithm was needed for foreground extraction with minimal user interaction, and the result was GrabCut.
import numpy as np
import cv2
from matplotlib import pyplot as plt
img = imread('imgs/messi5.jpg')
mask = np.zeros(img.shape[:2],np.uint8)
bgdModel = np.zeros((1,65),np.float64)
fgdModel = np.zeros((1,65),np.float64)
rect = (50,50,450,290)
mask2 = np.where((mask==2)|(mask==0),0,1).astype('uint8')
img = img*mask2[:,:,np.newaxis]
img = imread('imgs/messi5.jpg')
# newmask is the mask image I manually labelled
newmask = imread('grabcut_mask.png')[:,:,0]
# whereever it is marked white (sure foreground), change mask=1
# whereever it is marked black (sure background), change mask=0
mask[newmask == 0] = 0
mask[newmask == 255] = 1
mask, bgdModel, fgdModel = cv2.grabCut(img,mask,None,bgdModel,fgdModel,5,cv2.GC_INIT_WITH_MASK)
mask2 = np.where((mask==2)|(mask==0),0,1).astype('uint8')
img = img*mask2[:,:,np.newaxis]
Sad Messi after the second goal against Getafe. I have never read a story or watched a movie where the hero promise something and doesn't deliver it at the end. Appearntly the reality doesn't make sense and fiction is the correct adjusment of its logic
How can we use a histogram to separate an image into 2 (or several) different regions?
Assumption: The histogram is bimodal
Method: Find the threshold $\color{blue}{t}$ that minimizes the weighted sum of within-group variances for the two groups that result from separating the gray tones at value $\color{blue}{t}$
## from:
## it takes sometime
import cv2
import numpy as np
img = cv2.imread('imgs/eGaIy.jpg', 0)
img = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)[1] # ensure binary
ret, labels = cv2.connectedComponents(img)
def imshow_components(labels):
# Map component labels to hue val
label_hue = np.uint8(179*labels/np.max(labels))
blank_ch = 255*np.ones_like(label_hue)
labeled_img = cv2.merge([label_hue, blank_ch, blank_ch])
# cvt to BGR for display
labeled_img = cv2.cvtColor(labeled_img, cv2.COLOR_HSV2BGR)
# set bg label to black
labeled_img[label_hue==0] = 0
Two basic operations
And several composite relations
Dilation expands the connected sets of 1s of a binary image.
It can be used for:
Erosion shrinks the connected sets of 1s of a binary image.
It can be used for:
# Python program to demonstrate erosion and
# dilation of images.
import cv2
import numpy as np
# Reading the input image
jimg = imread('imgs/j.png')[:,:,0]
# Taking a matrix of size 5 as the kernel
kernel = np.ones((5,5), np.uint8)
img_erosion = cv2.erode(jimg, kernel, iterations=1)
img_dilation = cv2.dilate(jimg, kernel, iterations=1)
# dilation of images.
import cv2
import numpy as np
# Reading the input image
sadmessi = imread('imgs/sad_messi.jpg')[:,:,0]
# Taking a matrix of size 5 as the kernel
kernel = np.ones((5,5), np.uint8)
img_erosion = cv2.erode(sadmessi, kernel, iterations=1)
img_dilation = cv2.dilate(sadmessi, kernel, iterations=1)
A shape mask used in basic morphological ops.
Input: Binary image B, structuring element S
# Rectangular Kernel
# Elliptical Kernel
# Cross-shaped Kernel
Input: Binary image B, structuring element S
Intuitively, the opening is the area we can paint when the brush has a footprint B and we are not allowed to paint outside A.
Use large structuring element that fits into big objects
jblob = imread("imgs/j2.png")
opening = cv2.morphologyEx(jblob, cv2.MORPH_OPEN, kernel)
kernel11 = np.ones((11,11), np.uint8)
messi_opening = cv2.morphologyEx(sadmessi, cv2.MORPH_OPEN, kernel)
messi_opening = cv2.morphologyEx(sadmessi, cv2.MORPH_OPEN, kernel11)
Intuitively, the closing is the area we can not paint when the brush has a footprint B and we are not allowed to paint inside A.
simple segmentation:
jblob2 = imread("imgs/j3.png")
closing = cv2.morphologyEx(jblob2, cv2.MORPH_CLOSE, kernel)
kernel11 = np.ones((11,11), np.uint8)
messi_closing = cv2.morphologyEx(sadmessi, cv2.MORPH_CLOSE, kernel)
messi_closing = cv2.morphologyEx(sadmessi, cv2.MORPH_CLOSE, kernel11)
messi_opening = cv2.morphologyEx(sadmessi, cv2.MORPH_OPEN, kernel)
messi_closing = cv2.morphologyEx(messi_opening, cv2.MORPH_CLOSE, kernel)
Let $\color{blue}{A \oplus B}$ denote the dilation of $\color{blue}{A}$ by $\color{blue}{B}$ and let $\color{blue}{A \ominus B}$ denote the erosion of $\color{blue}{A}$ by $\color{blue}{B}$
The boundary of $\color{blue}{A}$ can be computed as:
$$\color{blue}{A - (A \ominus B)}$$
where $\color{blue}{B}$ is a 3x3 square structuring element.
That is, we subtract from $\color{blue}{A}$ and erosion of it to obtain its boundary
$$\color{blue}{A \otimes B}$$ $$\color{blue}{= A - (A \oslash B)}$$ $$\color{blue}{= A \cap (A \oslash B)^C}$$
$$\color{blue}{A \odot B = A \cup (A \oslash B)}$$
gradient = cv2.morphologyEx(jimg, cv2.MORPH_GRADIENT, kernel)
It depends...
If almost "clean" binary images then very powerful to both clean up images and to detect variations from desired image.
messi_gradient = cv2.morphologyEx(sadmessi, cv2.MORPH_GRADIENT, kernel)
Messi...There are still more few battles. Get up and fight... You can't quit now... Fight until the last sweat.
Passive 3D sensing
Active 3D sensing
Bounce signal off of surface, record time to come back
$$\color{blue}{d = v * \frac{t}{2}}$$
Cylindrical lens: Only focuses light in one direction
Randomly select a region anchor, a dot with unknown depth
a. Windowed search via normalized cross correlation along scanline (check that best match score is greater than threshold; if not, mark as "invalid" and go to 2)
b. Region growing
- Neighboring pixels are added to a queue
- For each pixel in queue, intialize by anchor's shift; then search small local neighborhood; if matched, add neighbors to queue
- Stop when no pixels are left in the queue
What are the advantages of IR?
Difficulties with both
Natural: depth image
A little more nuanced: point clouds
Take every depth pixel and put it out in the world
At a point, take a ball of points around it
For every pair of points, find the relationship between the two points and their normals
Must be frame independent