Python flatten dictionary with pandas
Dictionary/maps are very common data structures in programming and data worlds. Sometimes you will need to access data in flatten format. This can be done in several ways - one example is shown below - how to get inner values embedded in dictionary lists:
data = {
'Java': {
'OOP': ['a', 'v', 'x'],
'NN': ['y', 't', 'z'],
'DS': ['o', 'b', 'd'],
},
'Python': {
'OOP': ['a', 'v', 'z'],
'NN': ['y', 'p', 'o'],
'DS': ['q', 'd', 'f'],
},
'Lua': {
'OOP': ['x', 'v', 'n'],
'NN': ['h', 'm', 'x'],
'DS': ['e', 'i', 'c'],
}
}
dict_flatted = [ i for names in data.values() for unit in names.values() for i in unit]
result:
['y', 'p', 'o', 'a', 'v', 'z', 'q', 'd', 'f', 'y', 't', 'z', 'a', 'v', 'x', 'o', 'b', 'd', 'h', 'm', 'x', 'x', 'v', 'n', 'e', 'i', 'c']
You can play with dictionary and pandas in order to get similar result. Lets have a look on the different stages of data transformation with pandas. In order to achieve the same result we will use - json_normalize:
from pandas.io.json import json_normalize
data = {
'Java': {
'OOP': ['a', 'v', 'x'],
'NN': ['y', 't', 'z'],
'DS': ['o', 'b', 'd'],
},
'Python': {
'OOP': ['a', 'v', 'z'],
'NN': ['y', 'p', 'o'],
'DS': ['q', 'd', 'f'],
},
'Lua': {
'OOP': ['x', 'v', 'n'],
'NN': ['h', 'm', 'x'],
'DS': ['e', 'i', 'c'],
}
}
norm = json_normalize(data)
print (norm)
result:
Java.DS Java.NN Java.OOP ... Python.DS Python.NN Python.OOP
0 [o, b, d] [y, t, z] [a, v, x] ... [q, d, f] [y, p, o] [a, v, z]
[1 rows x 9 columns]
The previous result shown us the normalized form of the dictionary data. We can access data in this normalized form as:
for x in norm:
print (x)
print (norm[x])
this would result in:
Java.DS
0 [o, b, d]
Name: Java.DS, dtype: object
Java.NN
0 [y, t, z]
Name: Java.NN, dtype: object
Java.OOP
0 [a, v, x]
Name: Java.OOP, dtype: object
Lua.DS
0 [e, i, c]
Name: Lua.DS, dtype: object
...
If we want we can get flatten data from the inner list in a form like:
for x in norm.values:
print (x.tolist())
result:
[['o', 'b', 'd'], ['y', 't', 'z'], ['a', 'v', 'x'], ['e', 'i', 'c'], ['h', 'm', 'x'], ['x', 'v', 'n'], ['q', 'd', 'f'], ['y', 'p', 'o'], ['a', 'v', 'z']]
Getting the items one by one can be done by nesting for loops:
for x in norm.values:
for y in x:
print (y)
which results in:
['o', 'b', 'd']
['y', 't', 'z']
['a', 'v', 'x']
And finally to get flatten information from the dictionary by pandas - simply to do:
from pandas.io.json import json_normalize
data = {
'Java': {
'OOP': ['a', 'v', 'x'],
'NN': ['y', 't', 'z'],
'DS': ['o', 'b', 'd'],
},
'Python': {
'OOP': ['a', 'v', 'z'],
'NN': ['y', 'p', 'o'],
'DS': ['q', 'd', 'f'],
},
'Lua': {
'OOP': ['x', 'v', 'n'],
'NN': ['h', 'm', 'x'],
'DS': ['e', 'i', 'c'],
}
}
norm = json_normalize(data)
for x in norm.values:
for y in x:
for z in y:
print (z, end='; ')
result:
o; b; d; y; t; z; a; v; x; e; i; c; h; m; x; x; v; n; q; d; f; y; p; o; a; v; z;