Some of the projects we have ( especially those projects that need paired datasets ) need some sort of algorithm that can find the closest image match. For example, in one of my projects, I had 300 synthetic images, and I wanted to make a paired dataset for a pix2pix model. These images were made by a dataset that had 6000 images in it. To create my dataset, I had to loop through each of the 300 synthetic images I had and find the closest match to it in the dataset with 6000 images.
In this post, we are going to discuss the method we can use to achieve this goal. My first idea was to use OpenCV (cv2) to read images and compute similarity via color histogram comparison. But soon, I realized even a small tilt on the images can break this method. I did a little bit of research, and now we are going to talk about what methods we can use to achieve this goal. These are some of the approaches:
SSIM: Great for detecting structural/quality similarities in mostly identical or near-identical images (like small variations in the same photo).
ORB / SIFT: Best for local feature matching of the same scene/objects under different conditions (scale, rotation, partial occlusion).
Perceptual Hashing: Fast, robust for minor transformations, and easy to implement for near-duplicate detection.
Deep Learning Embeddings: The most semantic and flexible approach. You can detect images that are of the same “type” or have the same object(s), even if they look quite different at a pixel level.
Of course, as always, with a little bit of search, you can find that Deep Learning approaches are best for working with images and finding some semantic match in them. In the following section, we are going to discuss what code you can write to achieve this.
You need the torch, torchvision, pillow, and Tkinter for this example. Tkinter should be installed by default, and you can install extra packages via pip:
Pip install torch torchvision pillow.
First, we import the packages:
import tkinter as tk
from tkinter import filedialog
import os
import torch
import torchvision.models as models
import torchvision.transforms as transforms
from PIL import Image
Now, we can use Tkinter to choose the first image that we want to find the closest match to it.
def choose_image():
root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename(
title="Choose a reference image", filetypes=[("Image files", "*.png *.jpg *.jpeg *.bmp *.gif *.tiff *.webp")]
)
return file_path Now, we can Tkinter to get the path of the folder that we want to check its content for the closest match.
def choose_directory():
root = tk.Tk()
root.withdraw()
folder_path = filedialog.askdirectory(
title="Choose a folder containing images to compare"
)
return folder_path Now, we load a ResNet50 (trained on ImageNet) and remove the final classification (FC) layer. This leaves a 2048-dimensional embedding layer that outputs a general-purpose feature vector for an input image.
resnet = models.resnet50(pretrained=True)
resnet = torch.nn.Sequential(*list(resnet.children())[:-1])
resnet.eval()
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
),
]) Now we need a function that Loads an image, applies the transforms, and extracts a 2048-d embedding from ResNet50. Returns the embedding as a 1D NumPy array.
def get_embedding(image_path):
img = Image.open(image_path).convert('RGB')
img_t = transform(img).unsqueeze(0)
with torch.no_grad():
embedding = resnet(img_t)
embedding = embedding.view(2048).cpu().numpy()
return embedding Then we compare two embeddings via cosine similarity, which ranges from -1 (opposite) to +1 (identical). In practice, values above ~0.8 or 0.9 typically indicate strong similarity in content.
def cosine_similarity(vec1, vec2):
dot_product = (vec1 * vec2).sum()
norm1 = (vec1 ** 2).sum() ** 0.5
norm2 = (vec2 ** 2).sum() ** 0.5
if norm1 < 1e-8 or norm2 < 1e-8:
return -1
return dot_product / (norm1 * norm2) Finally we iterate through each image in the chosen folder, compute the embedding, measure similarity to the reference, and keep track of the highest-scoring image.
def find_closest_match_deep(reference_image_path, folder_path):
ref_emb = get_embedding(reference_image_path)
best_score = -1
best_image = None
for filename in os.listdir(folder_path):
if not filename.lower().endswith((".png", ".jpg", ".jpeg", ".bmp", ".gif", ".tiff", ".webp")):
continue
full_path = os.path.join(folder_path, filename)
emb = get_embedding(full_path)
score = cosine_similarity(ref_emb, emb)
if score > best_score:
best_score = score
best_image = full_path
return best_image, best_score And in the final step, we need a main function that sets the entry point of the app.
def main():
print("Select the reference (query) image...")
ref_image_path = choose_image()
if not ref_image_path:
print("No image selected; exiting.")
return
print("Select the folder of images to compare against...")
folder_path = choose_directory()
if not folder_path:
print("No folder selected; exiting.")
return
best_image, best_score = find_closest_match_deep(ref_image_path, folder_path)
if best_image:
print(f"\nClosest match:\n {best_image}")
print(f"Similarity score (closer to 1.0 = more similar): {best_score:.4f}")
else:
print("No valid images found in the folder or unable to compute similarity.")
if __name__ == "__main__":
main()
The full version of the script:
import tkinter as tk
from tkinter import filedialog
import os
import torch
import torchvision.models as models
import torchvision.transforms as transforms
from PIL import Image
def choose_image():
root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename(
title="Choose a reference image",
filetypes=[("Image files", "*.png *.jpg *.jpeg *.bmp *.gif *.tiff *.webp")]
)
return file_path
def choose_directory():
root = tk.Tk()
root.withdraw()
folder_path = filedialog.askdirectory(
title="Choose a folder containing images to compare"
)
return folder_path
resnet = models.resnet50(pretrained=True)
resnet = torch.nn.Sequential(*list(resnet.children())[:-1])
resnet.eval()
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225]
),
])
def get_embedding(image_path):
img = Image.open(image_path).convert('RGB')
img_t = transform(img).unsqueeze(0)
with torch.no_grad():
embedding = resnet(img_t)
embedding = embedding.view(2048).cpu().numpy()
return embedding
def cosine_similarity(vec1, vec2):
dot_product = (vec1 * vec2).sum()
norm1 = (vec1 ** 2).sum() ** 0.5
norm2 = (vec2 ** 2).sum() ** 0.5
if norm1 < 1e-8 or norm2 < 1e-8:
return -1
return dot_product / (norm1 * norm2)
def find_closest_match_deep(reference_image_path, folder_path):
ref_emb = get_embedding(reference_image_path)
best_score = -1
best_image = None
for filename in os.listdir(folder_path):
if not filename.lower().endswith((".png", ".jpg", ".jpeg", ".bmp", ".gif", ".tiff", ".webp")):
continue
full_path = os.path.join(folder_path, filename)
emb = get_embedding(full_path)
score = cosine_similarity(ref_emb, emb)
if score > best_score:
best_score = score
best_image = full_path
return best_image, best_score
def main():
print("Select the reference (query) image...")
ref_image_path = choose_image()
if not ref_image_path:
print("No image selected; exiting.")
return
print("Select the folder of images to compare against...")
folder_path = choose_directory()
if not folder_path:
print("No folder selected; exiting.")
return
best_image, best_score = find_closest_match_deep(ref_image_path, folder_path)
if best_image:
print(f"\nClosest match:\n {best_image}")
print(f"Similarity score (closer to 1.0 = more similar): {best_score:.4f}")
else:
print("No valid images found in the folder or unable to compute similarity.")
if __name__ == "__main__":
main()