How to Extract Location Names (Country, City, Tourist Places) Using Python and NLP

In this short article we will see how to extract structured location in Python data such as:

continent
country
city
location
and tourist places

from free text.

This a common task in data mining, NLP, travel apps, and web scraping. Python offers several practical approaches depending on accuracy and data source.

Problem Overview

Free-form text often contains ambiguous location references like:

"Paris, France"
"Eiffel Tower"
"New York City"
"London"
"Visited Rome and Vatican City"

The challenge is to normalize and classify these into structured fields like:

Country
City
Continent
Landmark / Tourist place etc

1. Named Entity Recognition (NER) with spaCy

spaCy can automatically detect geopolitical entities (GPE) and locations.

Example

import spacy

nlp = spacy.load("en_core_web_sm")
text = "I visited Paris and the Eiffel Tower in France. London is capital of England. Real Madrid will play in Madrid."

doc = nlp(text)

for ent in doc.ents:
    if ent.label_ in ("GPE", "LOC"):
        print(ent.text, ent.label_)

result:

Paris GPE
France GPE
London GPE
England GPE
Madrid GPE
Madrid GPE

As we can see from Real Madrid will play in Madrid Madrid is extract 2 times.

Use Case

Good for:

Quick extraction
General country and city names
Lightweight processing

Limitations:

May miss tourist attractions
Limited normalization
Can extract mixed results

2. Geocoding APIs (Google, Nominatim, OpenCage)

APIs can resolve place names into structured components.

Example with geopy (Nominatim)

from geopy.geocoders import Nominatim

geolocator = Nominatim(user_agent="measurements")
location = geolocator.geocode("Eiffel Tower")

print(location.address)

result:

Tour Eiffel, 5, Avenue Anatole France, Quartier du Gros-Caillou, Paris 7e Arrondissement, Paris, Île-de-France, France métropolitaine, 75007, France

P.S. If you face error try to use 'Photon' instead of 'Nominatim' or change the user_agent:

geopy.exc.GeocoderInsufficientPrivileges: HTTP Error 403: Forbidden

try:

from geopy.geocoders import Photon
geolocator = Photon(user_agent="measurements")

Use Case

Best for:

Tourist places
Exact city/country resolution
High accuracy

Limitations:

API rate limits
Requires internet access

3. LocationTagger Library

LocationTagger combines NER + GeoNames data.

Example

import locationtagger
import nltk
nltk.download('averaged_perceptron_tagger_eng')
nltk.download('maxent_ne_chunker_tab')

text = "Unlike India and Japan, A winter weather advisory remains in effect through 5 PM along and east of a line from Blue Earth, to Red Wing line in Minnesota and continuing to along an Ellsworth, to Menomonie, and Chippewa Falls line in Wisconsin."

entities = locationtagger.find_locations(text = text)

print(entities.countries)
print(entities.cities)

Output:

['India', 'Japan']
['Ellsworth', 'Red Wing', 'Blue Earth', 'Chippewa Falls', 'Menomonie']

P.S. If you face error:

ImportError: cannot import name 'CACHE_DIRECTORY' from 'newspaper.settings'

You may need to update the package newspaper version.

Use Case

Useful for:

Batch text processing
Simple country/city extraction

4. Country named entity recognition in Python

We are going to use Python package country-named-entity-recognition which is advertised as:

from country_named_entity_recognition import find_countries
print(find_countries("We are expanding in the UK, then in Spain"))

result is extraction of 2 countries UK and Spain:

[(Country(alpha_2='GB', alpha_3='GBR', flag='🇬🇧', name='United Kingdom', numeric='826', official_name='United Kingdom of Great Britain and Northern Ireland'), <re.Match object; span=(24, 26), match='UK'>), (Country(alpha_2='ES', alpha_3='ESP', flag='🇪🇸', name='Spain', numeric='724', official_name='Kingdom of Spain'), <re.Match object; span=(36, 41), match='Spain'>)]

When to Use Each Method

Scenario	Best Tool
Quick city/country extraction	spaCy
Tourist places & accuracy	Geocoding APIs
Bulk text processing	LocationTagger
General extraction	country-named-entity-recognition

Conclusion

To extract country, city, and tourist places in Python:

Use spaCy for fast NER
Use geocoding APIs for accurate place resolution
Use LocationTagger for large-scale text mining

For production systems, combining NER with geocoding provides the best results.

> Python Basics

> Advanced Python Tutorials

> Python Errors

> Pandas Advanced

> Pandas Count

> Pandas Column

> Pandas Basics

> Pandas DataFrame

> Pandas Row

> User Interface

> Advanced Linux

> Troubleshoot

> Video & Sound

> Linux Commands

> MySQL

> SQL Basics

> Python

> DB apps

> JupyterLab

> Jupyter Tips

> Jupyter Display

> Regex in Text Editor

> Regex Basics

> Regex Match

> Regex Date

> PyCharm Advanced

> Git and PyCharm

> PyCharm Error

> PyCharm Tips

> Linux Mint Applications

> VIrtual Machine

> Miscellaneous

> Java

> Automation

> Windows

> Office

> Cheat Sheet

Problem Overview

1. Named Entity Recognition (NER) with spaCy

Example

Use Case

2. Geocoding APIs (Google, Nominatim, OpenCage)

Example with geopy (Nominatim)

Use Case

3. LocationTagger Library

Example

Use Case

4. Country named entity recognition in Python

When to Use Each Method

Conclusion

Resource