Day 2 Solutions
[2]:
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
from time import time
Reading the lens/source catalog
[1]:
!head -n 5 /home/idies/workspace/Storage/divyar/IAGRG_2022/DataStore/demo_objects.csv
# !wc -l /home/idies/workspace/Storage/divyar/IAGRG_2022/DataStore/demo_objects.csv
ra,dec,z,col_3,col_4,col_5
30.611734783139795,-6.3569091978972425,23.901826,23.043298,22.649281,0.5
30.61078116389429,-6.350381359287763,22.385773,21.548382,21.272576,0.38
30.596477830319948,-6.3461523588063935,23.948118,23.121471,22.773187,0.37
30.592680083175416,-6.3445214768250775,23.42731400000001,22.022285999999998,21.334948,0.45
[3]:
#reading lens file
#tries to read everthing in the file
path="/home/idies/workspace/Storage/divyar/IAGRG_2022/DataStore/demo_objects.csv"
df = pd.read_csv(path, sep=',')
# select required columns only
[4]:
df = pd.read_csv(path, sep=',', usecols=[0,1,2])
[5]:
df.head()
[5]:
| ra | dec | z | |
|---|---|---|---|
| 0 | 30.611735 | -6.356909 | 23.901826 |
| 1 | 30.610781 | -6.350381 | 22.385773 |
| 2 | 30.596478 | -6.346152 | 23.948118 |
| 3 | 30.592680 | -6.344521 | 23.427314 |
| 4 | 30.601111 | -6.338650 | 21.660045 |
Visualising the coordinate space of input data
Exercise:
write a python code to plot dec(on Y-axis) vs ra(on X-axis) of the objects passed in the dataFrame.
This plot will look very dense, so in order to visually see the points distributed on the (ra,dec) plane, downsample the number of points by 100 and again plot.
In order to downsample the points, use
numpy.random.choise()function.
[6]:
fig,ax = plt.subplots(1,2, figsize=(14,6))
ax1,ax2 = ax
#plot all the coordinates
ax1.scatter(df.ra.values,df.dec.values, s=2)
ax1.set_xlabel("ra (deg)", fontsize=14)
ax1.set_ylabel("dec (deg)", fontsize=14)
ax1.set_title("Plot looks too dense.")
#Randomly picking up points to reduce the density of points
Number_of_points = df.ra.size
down_sample_by = 100
random_indices = np.random.choice(np.arange(Number_of_points),
size=int(Number_of_points/down_sample_by),
replace=False)
#plot down sampled number coordinates
ax2.scatter(df.ra.values[random_indices], df.dec.values[random_indices], s=2)
ax2.set_xlabel("ra (deg)", fontsize=14)
ax2.set_ylabel("dec (deg)", fontsize=14)
ax2.set_title("Density of points reduced by %d."%down_sample_by)
plt.show()
On-sky distance between two points on the surface of a unit sphere
In spherical cartesian coordinates : \(\bar{r} = \bar{r}(r,\theta,\phi)\)
and the unit vector :
In coordinates of a galaxy in celestial coordinate system (unit shpere) : \((\alpha,\delta)\)
For \((\alpha_1,\delta_1)\), \((\alpha_2,\delta_2)\), the angle \(\psi\) between them is given by :
where,
Thus, given \((\alpha_1,\delta_1)\), \((\alpha_2,\delta_2)\), we can compute the angle \(\psi\).
Using : distance formula on the arc of a cirle :
Exercise: Using the above formula
Implement a function which takes tuples of (ra,dec) of two objects and returns the distance between them on the Unit sphere.
Your code should return the result in units of arc seconds and NOT in radians.
[7]:
rad_to_arcsec = 180.0*3600./np.pi
arcsec_to_rad = 1/rad_to_arcsec
deg_to_rad = np.pi/180.
def dist_betw_two_obj(obj1,obj2):
ra1,dec1 = obj1
ra2,dec2 = obj2
#convert angles from degree to radians
ra1 = ra1 * deg_to_rad
ra2 = ra2 * deg_to_rad
dec1 = dec1 * deg_to_rad
dec2 = dec2 * deg_to_rad
#calculate the distance in arcsec
psi = np.cos(dec2)* np.cos(dec1)* np.cos(ra1-ra2) + np.sin(dec1)* np.sin(dec2)
#due to precision errors, sometimes psi > 1 , e.g. 1.0000000000000000000001
#in such a case np.arccos funciton will complaint. So handle that now.
if psi>1.0:
psi = 1.0
dist = np.arccos(psi)* rad_to_arcsec
return dist
# Running and testing the distance computation code:
[8]:
# creating tuple of values from ra,dec of each object
coords = list( zip(df.ra.values,df.dec.values))
# print(coords[0])
[9]:
print( f"""obj1 0: {coords[0]}
obj2 1: {coords[1]}
distance: {dist_betw_two_obj(coords[0],coords[1])} arcsec""")
obj1 0: (30.611734783139795, -6.3569091978972425)
obj2 1: (30.61078116389429, -6.350381359287763)
distance: 23.7466131834144 arcsec
Brute force search for neighbour objects within a specified distance
time complexity ~ \(\mathcal{O}(n^2)\)
Exercise:
Write a function
find_neighbour_pointsthat uses (ra,dec) coordinates of an object, and within some givenrmaxdistance, finds its neighbour points in the given dataFrame catalog.Using this code you should be able to find neighbours of each object in the dataFrame - within the same dataFrame.
Say the \(1^{st}\) object in the dataFrame has 5 objects within say rmax=15 arcsec, then you should be able to print/store the
indicesanddistancesof these neighbours.
Write a function
query_neighbour_pointsthat can use (ra,dec) coordinates of one object or an list/array of objets, and within some givenrmaxdistance, finds its neighbour points in any other catalog.Using this code you should be able to access the coordinates of all the neighbours of each object that you are querying for. And hence you should be able to tell how far each neighbour is from the queried object.
[10]:
from collections import defaultdict
def find_neighbour_points(coords, rmax=10, verbose=True):
"""
finding neighbour objects within some distance, for all the objects in a given array of coordinates.
coords: an array/list of tuples (ra,dec)
Returns:
`nnids`: indices into `coords` of neighbours of each object in `coords` within `rmax`
`distances`: distance of objects matched in `nnids"""
# define containers for the distance and id information of matched objects
distances = defaultdict(list)
nnids = defaultdict(list)
c=0
for ii,obj1 in enumerate(coords):
c+=1
d=0
for jj,obj2 in enumerate(coords):
if jj==ii:
continue
d+=1
dist = dist_betw_two_obj(obj1,obj2)
if dist <= rmax:
nnids[ii].append(jj)
distances[ii].append(dist)
if verbose:
print(ii,nnids[ii],distances[ii])
return nnids,distances
def query_neighbour_points(coords1, coords2, rmax=10, verbose=True):
"""
coords1: a tuple or an array/list of tuples (ra,dec)
coords2: an array/list of tuples (ra,dec)
Returns:
`nnids`: indices into `coords2` of neighbours of each object `coords1` within `rmax`.
`distances`: distance of objects matched in `nnids`"""
coords1 = np.atleast_2d(coords1)
# define containers for the distance and id information of matched objects
distances = defaultdict(list)
nnids = defaultdict(list)
for ii,obj1 in enumerate(coords1):
for jj,obj2 in enumerate(coords2):
dist = dist_betw_two_obj(obj1,obj2)
if dist <= rmax:
nnids[ii].append(jj)
distances[ii].append(dist)
if verbose:
print(ii,nnids[ii],distances[ii])
return nnids,distances
Testing the above brute force search functions
Exercise:
Using the first 50 objects in the catalog, Search for their neighbours within a ball of radius
rmax, say forrmax(default value)=15 arcsec.Note: We’re only using 50 points in order to reduce the compute time, but the code has no such limitation. If you run your code for a large number of points, it will take more time to finish that’s all.
Access the above result and print indices,distances and coordinates of the neighbours of few objects.
Then test out the function
query_neighbour_pointsfor the same objects you take above. The neighbour matches fromquery_neighbour_pointsandfind_neighbour_pointsmust match.
[11]:
# Running and testing find_neighbour_points
print("object index, neighbours indices list, neighbours distances list ")
%time nnids,distances = find_neighbour_points(coords[:50],rmax=15)
object index, neighbours indices list, neighbours distances list
0 [] []
1 [] []
2 [3] [14.802299176580224]
3 [2] [14.802299176580224]
4 [36] [13.682331312465545]
5 [] []
6 [] []
7 [] []
8 [13] [9.465537930627812]
9 [10, 12, 38] [10.94680374083333, 5.876628853240463, 14.594679725290598]
10 [9, 12] [10.94680374083333, 10.280601486171243]
11 [42] [7.151725370768233]
12 [9, 10, 14] [5.876628853240463, 10.280601486171243, 11.314136132011965]
13 [8] [9.465537930627812]
14 [12, 15, 16] [11.314136132011965, 5.475958296970058, 11.765854084075805]
15 [14, 16, 44] [5.475958296970058, 13.47975484115049, 10.276694006812336]
16 [14, 15] [11.765854084075805, 13.47975484115049]
17 [46] [6.898087849306218]
18 [] []
19 [] []
20 [] []
21 [] []
22 [] []
23 [] []
24 [] []
25 [26] [13.29908354885402]
26 [25] [13.29908354885402]
27 [] []
28 [] []
29 [30] [3.6918325238285883]
30 [29] [3.6918325238285883]
31 [] []
32 [] []
33 [] []
34 [] []
35 [36] [10.896870606370635]
36 [4, 35] [13.682331312465545, 10.896870606370635]
37 [] []
38 [9, 40] [14.594679725290598, 14.719522078343205]
39 [40] [1.984703189527645]
40 [38, 39] [14.719522078343205, 1.984703189527645]
41 [] []
42 [11] [7.151725370768233]
43 [] []
44 [15] [10.276694006812336]
45 [] []
46 [17] [6.898087849306218]
47 [] []
48 [] []
49 [] []
CPU times: user 41.7 ms, sys: 520 µs, total: 42.2 ms
Wall time: 37.3 ms
[12]:
# Running and testing query_neighbour_points
print("object index, neighbours indices list, neighbours distances list ")
# Remember because I have passed same object catalog as coords and coords2,
# you are seeing same objects matched with each other. So if you ignore the same
# matches, you'll notice that the printed neighbours are exactly the same as shown in
# the above cell.
%time nnidss,distancess = query_neighbour_points(coords1=coords[:50],coords2=coords[:50],rmax=15)
object index, neighbours indices list, neighbours distances list
0 [0] [0.003073585126505738]
1 [1] [0.0]
2 [2, 3] [0.003073585126505738, 14.802299176580224]
3 [2, 3] [14.802299176580224, 0.0]
4 [4, 36] [0.0, 13.682331312465545]
5 [5] [0.0]
6 [6] [0.0]
7 [7] [0.0]
8 [8, 13] [0.0, 9.465537930627812]
9 [9, 10, 12, 38] [0.0, 10.94680374083333, 5.876628853240463, 14.594679725290598]
10 [9, 10, 12] [10.94680374083333, 0.0, 10.280601486171243]
11 [11, 42] [0.0, 7.151725370768233]
12 [9, 10, 12, 14] [5.876628853240463, 10.280601486171243, 0.003073585126505738, 11.314136132011965]
13 [8, 13] [9.465537930627812, 0.0]
14 [12, 14, 15, 16] [11.314136132011965, 0.003073585126505738, 5.475958296970058, 11.765854084075805]
15 [14, 15, 16, 44] [5.475958296970058, 0.0, 13.47975484115049, 10.276694006812336]
16 [14, 15, 16] [11.765854084075805, 13.47975484115049, 0.0]
17 [17, 46] [0.0, 6.898087849306218]
18 [18] [0.0]
19 [19] [0.0]
20 [20] [0.0]
21 [21] [0.0]
22 [22] [0.0]
23 [23] [0.0]
24 [24] [0.003073585126505738]
25 [25, 26] [0.0, 13.29908354885402]
26 [25, 26] [13.29908354885402, 0.0]
27 [27] [0.0]
28 [28] [0.003073585126505738]
29 [29, 30] [0.0, 3.6918325238285883]
30 [29, 30] [3.6918325238285883, 0.0]
31 [31] [0.0]
32 [32] [0.0]
33 [33] [0.0]
34 [34] [0.0]
35 [35, 36] [0.0, 10.896870606370635]
36 [4, 35, 36] [13.682331312465545, 10.896870606370635, 0.003073585126505738]
37 [37] [0.0]
38 [9, 38, 40] [14.594679725290598, 0.0, 14.719522078343205]
39 [39, 40] [0.0, 1.984703189527645]
40 [38, 39, 40] [14.719522078343205, 1.984703189527645, 0.0]
41 [41] [0.003073585126505738]
42 [11, 42] [7.151725370768233, 0.0]
43 [43] [0.0]
44 [15, 44] [10.276694006812336, 0.0]
45 [45] [0.0]
46 [17, 46] [6.898087849306218, 0.0]
47 [47] [0.0]
48 [48] [0.0]
49 [49] [0.0]
CPU times: user 53.5 ms, sys: 4.54 ms, total: 58.1 ms
Wall time: 49.6 ms
[13]:
nth = 15
print("Using result of find_neighbour_points :")
print(f"""Accessing the neighbour points for the {nth}th object using the output
of our brute force method:""")
print(f"neighbours: {nnids[nth]} \ndistances: {distances[nth]}")
print()
print(f"Accessing the neighbour coordinates of the {nth}th object:")
for ii in nnids[nth]:
print(ii, coords[ii])
Using result of find_neighbour_points :
Accessing the neighbour points for the 15th object using the output
of our brute force method:
neighbours: [14, 16, 44]
distances: [5.475958296970058, 13.47975484115049, 10.276694006812336]
Accessing the neighbour coordinates of the 15th object:
14 (30.564297942270603, -6.322030429805476)
16 (30.56234780563957, -6.319398928221537)
44 (30.568297869918087, -6.3202347701814166)
A visual sanity check for execution of find_neighbour_points function
Exercise:
Plot the coordinates of the neighbours of any one test point. This exercise can help us visually inspect whether our neighbour search code gave us reasonable results or not.
Along with plotting, also print the distances of each of these neighbours and visually confirm that the nearest looking object on the plot indeed has the smallest distance from the test point, and similarly check for other neighbour points.
I’m plotting the neighbours of \(15^{th}\) point from the catalog. And also printing their indices and distances to compare the stored distances by eye!
Point marked as \(\star\) is the \(15^{th}\) point. Other points are its neighbours.
[14]:
demoid = 15
plt.figure(figsize=(8,6))
#plot demo point
demora,demodec = coords[demoid]
plt.scatter(demora,demodec, c='black', s=100, marker=r'*')
#access all the neighbours
coords=np.array(coords)
neighbours = nnids[demoid]
#choose colors for a given number of neighbours
colors = ['r','g','b']
# colors = np.random.random( size=(len(neighbours),3) )
#plot all the neighbour points
plotra = [coord[0] for coord in coords[neighbours] ]
plotdec = [coord[1] for coord in coords[neighbours] ]
plt.scatter(plotra, plotdec, c=colors, s=50, marker='o')
for i in range(len(plotra)):
plt.plot([demora,plotra[i]],[demodec,plotdec[i]] , c=colors[i])
plt.xlabel("ra (deg)", fontsize=16)
plt.ylabel("dec (deg)", fontsize=16)
plt.show()
#plt.savefig("neighbours_of_15th_obj.png",bbox_inches="tight")
# Now visuall compare the distances
print("index--> distance\t\tcolor")
ndist = distances[demoid]
for nidx,dist,c in zip(neighbours,ndist,colors):
print(nidx,"--->",dist, '\t', c)
index--> distance color
14 ---> 5.475958296970058 r
16 ---> 13.47975484115049 g
44 ---> 10.276694006812336 b
Using k-d tree algorithm to query the neighbour points within a given distance
time complexity ~ \(\mathcal{O}(n\log{}n)\)
[15]:
from scipy import spatial
sin = np.sin
cos = np.cos
deg2rad = np.deg2rad
Exercise:
Write a code using cKDTree to find out the neighbours of a given
set1of objects (coordinates) from a catalog of otherset2objects.Write this code in a
class structurewhere it takes-in arrays ofra,decof the catalog objects (set2) andconverts (ra,dec) to (x,y,z) of cartesian coordinates,
creates k-d tree of the
set2catalog objects using (x,y,z),queries for the neighbour points of - each object in the
set1, and returns the indices inset2.
[16]:
class find_neighbours:
def __init__(self, tra=None, tdec=None, r=1):
"""r==1 --> unit sphere calculations"""
self.tra = tra
self.tdec= tdec
self.r = r
self.rad_to_arcsec = 180.0*3600./np.pi
self.arcsec_to_rad = 1/self.rad_to_arcsec
print(f"Initializing with sphere of radius, r={r}.")
# write a function that converts ra,dec to cartesian components.
def RA_DEC_to_xyz(self, ra,dec, units='deg'):
"""converts ra,dec to cartesian x,y,z components on a unit sphere"""
if units=="deg":
ra = deg2rad(ra)
dec= deg2rad(dec)
x = self.r *cos(dec) *cos(ra)
y = self.r *cos(dec) *sin(ra)
z = self.r *sin(dec)
return x,y,z
# write a function `create_tree` that takes arrays of ra,dec
# values as inputs and returns a cKDTree object into these coordinates.
def create_tree(self):
# create an array of tuples of x,y,z
x,y,z = self.RA_DEC_to_xyz(self.tra,self.tdec)
# make a tree using cKDTree method
tree = spatial.cKDTree(np.c_[x,y,z])
return tree
# write a function that takes arrays of ra,dec coordinates
# and returns the neighbour object's indices from the tree
# within some distance `rmax` arcsec .
def query_within_rmax(self, ra, dec, rmax=15, units="deg"):
print(f"Searching for {ra.size} number of objects in the catalog with rmax={rmax} arcsec.")
#convert rmax from arcsec to distance units
rmax = self.r *rmax *self.arcsec_to_rad
# array of tuples of x,y,z of input objetcs
x,y,z = self.RA_DEC_to_xyz(ra,dec)
cartesian_coords = np.c_[x,y,z]
# make tree from catalog
tree = self.create_tree()
# look for the neighbour objects
ids_into_tree = tree.query_ball_point(cartesian_coords,rmax)
return ids_into_tree
A sanity check
Testing conversion of (ra,dec) to (x,y,z)
[17]:
# Here we are not passing `tra` and `tdec` while initialising the class `find_neighbours`
# So these variables take default values of `None`.
# We only want to test the coordinate conversion code `RA_DEC_to_xyz`. So it's ok to not
# pass `tra`,`tdec`!
code = find_neighbours()
# a look at the cartesian coordinates
x,y,z = code.RA_DEC_to_xyz(coords[:,0], coords[:,1])
cartesian_coords = np.c_[x,y,z]
print("shapes of x,y,z arrays:",x.shape,y.shape,z.shape)
print("shapes of (x,y,z) tuple array:", cartesian_coords.shape)
#field on objects looks close to equator
cartesian_coords
Initializing with sphere of radius, r=1.
shapes of x,y,z arrays: (220622,) (220622,) (220622,)
shapes of (x,y,z) tuple array: (220622, 3)
[17]:
array([[ 0.8553461 , 0.50608675, -0.11072151],
[ 0.85536537, 0.50607894, -0.11060828],
[ 0.85549871, 0.50586954, -0.11053492],
...,
[ 0.78796508, 0.6142163 , -0.04300421],
[ 0.78737094, 0.61497898, -0.04298683],
[ 0.78774839, 0.61449982, -0.04292369]])
Compare the output of our brute force algorithm with cKDTree algorithm on dew Demo objects
Exercise:
Take the same bunch of objects for which we already have run our brute force search method,
feed them into the wrapper code of k-d tree written above,
find out their neighbour points within the same
rmax(default value)=15 arcsec.The neighbour objects found from both the methods should match!
[18]:
demo_ids = [15] #9,15
demo_ra =coords[demo_ids,0] #deg
demo_dec=coords[demo_ids,1] #deg
demox, demoy, demoz = code.RA_DEC_to_xyz(demo_ra, demo_dec)
demoobj = np.c_[demox,demoy,demoz]
print("""I had chosen following indices from set2(catalog) as my test objects:""",demo_ids)
print("coordinate of test objects:",demoobj)
# we want to look within `rmax` arcsec --> convert `rmax` from (arcsec) angle to length units
# rmax = 15
# print()
# print(f"rmax={rmax} arcsec in radians --->",rmax* arcsec_to_rad)
I had chosen following indices from set2(catalog) as my test objects: [15]
coordinate of test objects: [[ 0.85581232 0.50543305 -0.11010311]]
# This time Initialize class with by passing tra and tdec arguments and set up the tree
[19]:
code_tree = find_neighbours(tra=coords[:,0], tdec=coords[:,1])
Initializing with sphere of radius, r=1.
# Query the neighbours of demo objects
[20]:
idx=code_tree.query_within_rmax(demo_ra, demo_dec)
Searching for 1 number of objects in the catalog with rmax=15 arcsec.
# Go back to brute force output and compare this result…. does it match?
[21]:
idx
[21]:
array([list([14, 15, 16, 44])], dtype=object)
Compare the computer time required by our brute force algorithm vs the k-d tree algorithm
First let’s use k-d tree to query neighbours within rmax=15 arcsec for each object within the catalog.
[22]:
%time idx=code_tree.query_within_rmax(coords[:,0], coords[:,1])
Searching for 220622 number of objects in the catalog with rmax=15 arcsec.
CPU times: user 589 ms, sys: 15.6 ms, total: 605 ms
Wall time: 602 ms
Reminder: full catalog has about 2,20,000 objects… and k-d tree found neighbours for each one of them in 1-2 seconds. (This time will vary depending on your computer’s ability.)
Now let’s use the brute force method only for 1000 objects.
[23]:
%time nnids,distances = query_neighbour_points(coords[:1000], coords[:1000], rmax=15, verbose=False)
CPU times: user 7.94 s, sys: 0 ns, total: 7.94 s
Wall time: 7.94 s
You will notice that as the number of points to search for increases in set1, the performance of the k-d tree algorithm will keep on improving over the brute force method.
That’s just because the time required by k-d tree will grow as \(\sim \mathcal{O}(n_1\log{n_2})\) where as that of brute force will grow as \(\sim n_1 n_2\).
Congratulations! Now you have learnt to -
use
cKDTreeto find out neighbours of a givenset1of objects into some otherset2of objects.count the number of neighbours of any given object in
set1.access the results of cKDTree and identify neighbours of each obejct in
set1into theset2.access the coordinates and distances of each neighbour of every point in
set1.