To **compare similarity between two lists in Python** we can calculate:

- set intersection
- cosine similarity
- etc

Similarity would depend also on the data types of the items. For example:

- integer
- float
`5.04`

vs`5.03`

- string
`grapefruit`

vs`grape`

Let's cover several cases on how to compute similarity between two Python lists or arrays.

We will answer on next questions:

- python get list difference
- similarity of two lists in python
- python - find similarity between two lists
- find similarity between two words in python

## 1. Calculate similarity with difflib

Quick and efficient way to compute similarity of two numeric or string lists is done by: `difflib.SequenceMatcher(None,s1,s2)`

:

### Similarity of numeric lists

```
import difflib
l1 = [1, 3, 4, 5, 2]
l2 = [2, 4, 6, 8, 5, 1]
sm = difflib.SequenceMatcher(None,l1,l2)
sm.ratio()
```

which give use:

```
0.18181818181818182
```

### How does it work?

While testing with:

```
l3 = [1, 3, 4, 5]
l4 = [2, 3, 4, 5]
sm = difflib.SequenceMatcher(None,l3,l4)
sm.ratio()
```

result:

```
0.75
```

So it seems that it compares element by element and checks if items match.

More information is available in the docs - link in the resources sections:

Return a measure of the sequences’ similarity as a float in the range [0, 1].

### Similarity two string lists

To find similarity of lists of words we can use the same method:

```
l5 = list("abcded")
l6 = list("acdefd")
sm = difflib.SequenceMatcher(None,l5,l6)
sm.ratio()
```

So for lists:

`['a', 'b', 'c', 'd', 'e', 'd']`

`['a', 'c', 'd', 'e', 'f', 'd']`

we get:

```
0.8333333333333334
```

## 2. Set Intersection

We can find the similarity of elements of lists by checking the intersection of their elements. This is helpful when the location of the element doesn't matter:

```
l1 = [1, 3, 4, 5, 2]
l2 = [2, 4, 6, 8, 5, 1]
intersection = set(l1) & set(l1)
similarity = len(intersection) / (len(l1) + len(l2) - len(intersection))
print(similarity)
```

- first we find the intersection
`{1, 2, 3, 4, 5}`

- then we calculate ratio intersection length and sum of list length
`0.8333333333333334`

- we can check also elements not present in the lists:
`set(l1) - intersection`

- elements of l1 not present in l2`set()`

`set(l2) - intersection`

-elements of l2 not present in l1`{6, 8}`

## 3. Jaccard index in Python

The Jaccard index(Jaccard similarity coefficient), is a statistical method used for finding the similarity and diversity of sample sets.

We can use Jaccard index to calculate similarity of two lists in Python:

```
l1 = [1, 3, 4, 5, 2]
l2 = [2, 4, 6, 8, 5, 1]
intersection = len(set(l1) & set(l2))
union = len(set(l1) | set(l2))
similarity = intersection / union
print(similarity)
```

result:

```
0.5714285714285714
```

## 5. Cosine Similarity in Python

In data science, cosine similarity helps to measure similarity between two non-zero arrays/vectors.

To calculate cosine similarity in Python we can:

```
from math import sqrt
l1 = [1, 3, 4, 5, 2]
l2 = [2, 4, 6, 8, 5, 1]
dot_product = sum(i * j for i, j in zip(l1, l2))
magnitude1 = sqrt(sum(i ** 2 for i in l1))
magnitude2 = sqrt(sum(j ** 2 for j in l2))
similarity = dot_product / (magnitude1 * magnitude2)
print(similarity)
```

result:

```
0.9820303262342949
```

## 6. Euclidean Distance - compare lists in Python

In mathematics, the Euclidean distance between two points in Euclidean space is the length of a line segment between the two points.

We can implement euclidean distance in Python to compare two lists:

```
import math
l1 = [1, 3, 4, 5, 2]
l2 = [2, 4, 6, 8, 5, 1]
squared_distance = sum([(i - j) ** 2 for i, j in zip(l1, l2)])
distance = math.sqrt(squared_distance)
similarity = 1 / (1 + distance)
print(similarity)
```

result:

```
0.16952084719853724
```

## 7. Hamming distance in Python

In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different.

Examples:

- "ka
**rol**in" and "ka**thr**in" - 3 - "k
**a**r**ol**in" and "k**e**r**st**in" - 3 - "k
**athr**in" and "k**erst**in" - 4 **0000**and**1111**- 4- 2
**17**3**8**96 and 2**23**3**7**96 - 3

To compare two lists of strings or numbers and find similarity we can do:

```
list1 = [1, 2, 3, 4, 5]
list2 = [2, 4, 6, 8]
distance = sum(i != j for i, j in zip(list1, list2))
similarity = 1 - (distance / len(list1))
print(similarity)
```

Similarity of the both lists by comparing with Hamming distance is:

```
0.0
```

we can measure hamming distance also by using module - `scipy`

- but arrays should have equal size:

```
from scipy.spatial import distance
l1 = [1, 3, 4, 5, 2]
l2 = [2, 4, 6, 8, 5]
d = round(distance.hamming(l1, l2) * len(l1))
print(d)
```

result:

```
5
```

## Summary

In this post, we saw how to compare two lists in Python and calculate the similarity between them.

We saw detailed examples for 7 different techniques to compute similarity. The examples show the basics for comparison of sets, arrays or lists. To learn more you can read the materials in Resources.