In this short article we will see how to extract structured location in Python data such as:

  • continent
  • country
  • city
  • location
  • and tourist places

from free text.

This a common task in data mining, NLP, travel apps, and web scraping. Python offers several practical approaches depending on accuracy and data source.

Problem Overview

Free-form text often contains ambiguous location references like:

  • "Paris, France"
  • "Eiffel Tower"
  • "New York City"
  • "London"
  • "Visited Rome and Vatican City"

The challenge is to normalize and classify these into structured fields like:

  • Country
  • City
  • Continent
  • Landmark / Tourist place etc

1. Named Entity Recognition (NER) with spaCy

spaCy can automatically detect geopolitical entities (GPE) and locations.

Example

import spacy

nlp = spacy.load("en_core_web_sm")
text = "I visited Paris and the Eiffel Tower in France. London is capital of England. Real Madrid will play in Madrid."

doc = nlp(text)

for ent in doc.ents:
    if ent.label_ in ("GPE", "LOC"):
        print(ent.text, ent.label_)

result:

Paris GPE
France GPE
London GPE
England GPE
Madrid GPE
Madrid GPE

As we can see from Real Madrid will play in Madrid Madrid is extract 2 times.

Use Case

Good for:

  • Quick extraction
  • General country and city names
  • Lightweight processing

Limitations:

  • May miss tourist attractions
  • Limited normalization
  • Can extract mixed results

2. Geocoding APIs (Google, Nominatim, OpenCage)

APIs can resolve place names into structured components.

Example with geopy (Nominatim)

from geopy.geocoders import Nominatim

geolocator = Nominatim(user_agent="measurements")
location = geolocator.geocode("Eiffel Tower")

print(location.address)

result:

Tour Eiffel, 5, Avenue Anatole France, Quartier du Gros-Caillou, Paris 7e Arrondissement, Paris, Île-de-France, France métropolitaine, 75007, France

P.S. If you face error try to use 'Photon' instead of 'Nominatim' or change the user_agent:

geopy.exc.GeocoderInsufficientPrivileges: HTTP Error 403: Forbidden

try:

from geopy.geocoders import Photon
geolocator = Photon(user_agent="measurements")

Use Case

Best for:

  • Tourist places
  • Exact city/country resolution
  • High accuracy

Limitations:

  • API rate limits
  • Requires internet access

3. LocationTagger Library

LocationTagger combines NER + GeoNames data.

Example

import locationtagger
import nltk
nltk.download('averaged_perceptron_tagger_eng')
nltk.download('maxent_ne_chunker_tab')

text = "Unlike India and Japan, A winter weather advisory remains in effect through 5 PM along and east of a line from Blue Earth, to Red Wing line in Minnesota and continuing to along an Ellsworth, to Menomonie, and Chippewa Falls line in Wisconsin."

entities = locationtagger.find_locations(text = text)

print(entities.countries)
print(entities.cities)

Output:

['India', 'Japan']
['Ellsworth', 'Red Wing', 'Blue Earth', 'Chippewa Falls', 'Menomonie']

P.S. If you face error:

ImportError: cannot import name 'CACHE_DIRECTORY' from 'newspaper.settings'

You may need to update the package newspaper version.

Use Case

Useful for:

  • Batch text processing
  • Simple country/city extraction

4. Country named entity recognition in Python

We are going to use Python package country-named-entity-recognition which is advertised as:

from country_named_entity_recognition import find_countries
print(find_countries("We are expanding in the UK, then in Spain"))

result is extraction of 2 countries UK and Spain:

[(Country(alpha_2='GB', alpha_3='GBR', flag='🇬🇧', name='United Kingdom', numeric='826', official_name='United Kingdom of Great Britain and Northern Ireland'), <re.Match object; span=(24, 26), match='UK'>), (Country(alpha_2='ES', alpha_3='ESP', flag='🇪🇸', name='Spain', numeric='724', official_name='Kingdom of Spain'), <re.Match object; span=(36, 41), match='Spain'>)]

When to Use Each Method

Scenario Best Tool
Quick city/country extraction spaCy
Tourist places & accuracy Geocoding APIs
Bulk text processing LocationTagger
General extraction country-named-entity-recognition

Conclusion

To extract country, city, and tourist places in Python:

  • Use spaCy for fast NER
  • Use geocoding APIs for accurate place resolution
  • Use LocationTagger for large-scale text mining

For production systems, combining NER with geocoding provides the best results.

Resource