In this short article we will see how to extract structured location in Python data such as:
- continent
- country
- city
- location
- and tourist places
from free text.
This a common task in data mining, NLP, travel apps, and web scraping. Python offers several practical approaches depending on accuracy and data source.
Problem Overview
Free-form text often contains ambiguous location references like:
- "Paris, France"
- "Eiffel Tower"
- "New York City"
- "London"
- "Visited Rome and Vatican City"
The challenge is to normalize and classify these into structured fields like:
- Country
- City
- Continent
- Landmark / Tourist place etc
1. Named Entity Recognition (NER) with spaCy
spaCy can automatically detect geopolitical entities (GPE) and locations.
Example
import spacy
nlp = spacy.load("en_core_web_sm")
text = "I visited Paris and the Eiffel Tower in France. London is capital of England. Real Madrid will play in Madrid."
doc = nlp(text)
for ent in doc.ents:
if ent.label_ in ("GPE", "LOC"):
print(ent.text, ent.label_)
result:
Paris GPE
France GPE
London GPE
England GPE
Madrid GPE
Madrid GPE
As we can see from Real Madrid will play in Madrid Madrid is extract 2 times.
Use Case
Good for:
- Quick extraction
- General country and city names
- Lightweight processing
Limitations:
- May miss tourist attractions
- Limited normalization
- Can extract mixed results
2. Geocoding APIs (Google, Nominatim, OpenCage)
APIs can resolve place names into structured components.
Example with geopy (Nominatim)
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="measurements")
location = geolocator.geocode("Eiffel Tower")
print(location.address)
result:
Tour Eiffel, 5, Avenue Anatole France, Quartier du Gros-Caillou, Paris 7e Arrondissement, Paris, Île-de-France, France métropolitaine, 75007, France
P.S. If you face error try to use 'Photon' instead of 'Nominatim' or change the user_agent:
geopy.exc.GeocoderInsufficientPrivileges: HTTP Error 403: Forbidden
try:
from geopy.geocoders import Photon
geolocator = Photon(user_agent="measurements")
Use Case
Best for:
- Tourist places
- Exact city/country resolution
- High accuracy
Limitations:
- API rate limits
- Requires internet access
3. LocationTagger Library
LocationTagger combines NER + GeoNames data.
Example
import locationtagger
import nltk
nltk.download('averaged_perceptron_tagger_eng')
nltk.download('maxent_ne_chunker_tab')
text = "Unlike India and Japan, A winter weather advisory remains in effect through 5 PM along and east of a line from Blue Earth, to Red Wing line in Minnesota and continuing to along an Ellsworth, to Menomonie, and Chippewa Falls line in Wisconsin."
entities = locationtagger.find_locations(text = text)
print(entities.countries)
print(entities.cities)
Output:
['India', 'Japan']
['Ellsworth', 'Red Wing', 'Blue Earth', 'Chippewa Falls', 'Menomonie']
P.S. If you face error:
ImportError: cannot import name 'CACHE_DIRECTORY' from 'newspaper.settings'
You may need to update the package newspaper version.
Use Case
Useful for:
- Batch text processing
- Simple country/city extraction
4. Country named entity recognition in Python
We are going to use Python package country-named-entity-recognition which is advertised as:
from country_named_entity_recognition import find_countries
print(find_countries("We are expanding in the UK, then in Spain"))
result is extraction of 2 countries UK and Spain:
[(Country(alpha_2='GB', alpha_3='GBR', flag='🇬🇧', name='United Kingdom', numeric='826', official_name='United Kingdom of Great Britain and Northern Ireland'), <re.Match object; span=(24, 26), match='UK'>), (Country(alpha_2='ES', alpha_3='ESP', flag='🇪🇸', name='Spain', numeric='724', official_name='Kingdom of Spain'), <re.Match object; span=(36, 41), match='Spain'>)]
When to Use Each Method
| Scenario | Best Tool |
|---|---|
| Quick city/country extraction | spaCy |
| Tourist places & accuracy | Geocoding APIs |
| Bulk text processing | LocationTagger |
| General extraction | country-named-entity-recognition |
Conclusion
To extract country, city, and tourist places in Python:
- Use spaCy for fast NER
- Use geocoding APIs for accurate place resolution
- Use LocationTagger for large-scale text mining
For production systems, combining NER with geocoding provides the best results.