Basemap实战---图形化显示海地地震危机数据
·
数据来源
https://github.com/wesm/pydata-book
>>> import pandas as pd
>>> from pandas import DataFrame,Series
>>> import numpy as np
>>> data = pd.read_csv('D:\python\DataAnalysis\data\Haiti.csv')
>>> data[:5]
Serial ... VERIFIED
0 4052 ... NO
1 4051 ... NO
2 4050 ... NO
3 4049 ... NO
4 4042 ... NO
[5 rows x 10 columns]
查看数据
看看那些数据是我们想要的,每行表示一条从某人手机上发生的紧急或其他问题的报告,每条报告包含一个时间戳和位置(经度和纬度)
['INCIDENT DATE','LATITUDE','LONGITUDE'] 将要查看的列索引
[:10] 前十行
>>> data[['INCIDENT DATE','LATITUDE','LONGITUDE']][:10]
INCIDENT DATE LATITUDE LONGITUDE
0 05/07/2010 17:26 18.233333 -72.533333
1 28/06/2010 23:06 50.226029 5.729886
2 24/06/2010 16:21 22.278381 114.174287
3 20/06/2010 21:59 44.407062 8.933989
4 18/05/2010 16:26 18.571084 -72.334671
5 26/04/2010 13:14 18.593707 -72.310079
6 26/04/2010 14:19 18.482800 -73.638800
7 26/04/2010 14:27 18.415000 -73.195000
8 15/03/2010 10:58 18.517443 -72.236841
9 15/03/2010 11:00 18.547790 -72.410010
category字段含有一组以逗号分隔的代码,这些代码表示消息的类型:
>>> data['CATEGORY'][:6]
0 1. Urgences | Emergency, 3. Public Health,
1 1. Urgences | Emergency, 2. Urgences logistiqu...
2 2. Urgences logistiques | Vital Lines, 8. Autr...
3 1. Urgences | Emergency,
4 1. Urgences | Emergency,
5 5e. Communication lines down,
Name: CATEGORY, dtype: object
调用describe还能发现一些异常的地理位置,根据经纬度判断
>>> data.describe()
Serial LATITUDE LONGITUDE
count 3593.000000 3593.000000 3593.000000
mean 2080.277484 18.611495 -72.322680
std 1171.100360 0.738572 3.650776
min 4.000000 18.041313 -74.452757
25% 1074.000000 18.524070 -72.417500
50% 2163.000000 18.539269 -72.335000
75% 3088.000000 18.561820 -72.293570
max 4052.000000 50.226029 114.174287
清理数据
错误位置信息并移除缺失分类信息,海地的经纬度的处理。
通过条件选择,过滤掉不在海地范围内的数据,并排除分类信息为空的数据
>>> data = data[(data.LATITUDE >18)&(data.LATITUDE<20)&(data.LONGITUDE > -75)&(data.LONGITUDE < -70)&(data.CATEGORY.notnull())]
>>> data.describe()
Serial LATITUDE LONGITUDE
count 3569.000000 3569.000000 3569.000000
mean 2081.498459 18.592503 -72.424994
std 1170.311824 0.273695 0.291018
min 4.000000 18.041313 -74.452757
25% 1074.000000 18.524200 -72.417498
50% 2166.000000 18.539269 -72.335000
75% 3089.000000 18.561800 -72.293939
max 4052.000000 19.940630 -71.099489
由于每个分类可能包含多个分类,每个名称除了英文还包含法文,所以编写了三个函数,前两个用于获取所有分类列表,第三个是获取编码和英文名称
to_cat_list(catstr):
移除字符串的首位空格并返回
get_all_categories(cat_series):
将传入的字符串处理后放到set集合中,并排序返回
get_english(cat):
处理名称,获得‘|’后面的英文字符。
>>> def to_cat_list(catstr):
... stripped = (x.strip() for x in catstr.split(','))
... return [x for x in stripped if x]
...
>>> def get_all_categories(cat_series):
... cat_sets = (set(to_cat_list(x)) for x in cat_series)
... return sorted(set.union(*cat_sets))
...
>>> def get_english(cat):
... code,names = cat.split('.')
... if '|' in names:
... names = names.split('|')[1]
... return code,names.strip()
...
测试get_english
>>> get_english('2.Urgences logistiques | Vital Lines')
('2', 'Vital Lines')
制作一个将编码和名称映射的字典
>>> all_cats = get_all_categories(data.CATEGORY)
>>> english_mapping = dict(get_english(x) for x in all_cats)
>>> english_mapping['2a']
'Food Shortage'
>>> english_mapping['6c']
'Earthquake and aftershocks'
构造全零的DataFrame,列为分类编码,索引跟data的索引一样
>>> def get_code(seq):
... return [x.split('.')[0] for x in seq if x]
...
>>> all_codes = get_code(all_cats)
>>> code_index = pd.Index(np.unique((all_codes)))
>>> dummy_frame = DataFrame(np.zeros((len(data),len(code_index))),index=data.index,columns=code_index)
>>> dummy_frame.ix[:,:6]
1 1a 1b 1c 1d 2
0 0.0 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 0.0 0.0
5 0.0 0.0 0.0 0.0 0.0 0.0
6 0.0 0.0 0.0 0.0 0.0 0.0
7 0.0 0.0 0.0 0.0 0.0 0.0
8 0.0 0.0 0.0 0.0 0.0 0.0
9 0.0 0.0 0.0 0.0 0.0 0.0
10 0.0 0.0 0.0 0.0 0.0 0.0
11 0.0 0.0 0.0 0.0 0.0 0.0
12 0.0 0.0 0.0 0.0 0.0 0.0
13 0.0 0.0 0.0 0.0 0.0 0.0
14 0.0 0.0 0.0 0.0 0.0 0.0
15 0.0 0.0 0.0 0.0 0.0 0.0
16 0.0 0.0 0.0 0.0 0.0 0.0
17 0.0 0.0 0.0 0.0 0.0 0.0
18 0.0 0.0 0.0 0.0 0.0 0.0
19 0.0 0.0 0.0 0.0 0.0 0.0
20 0.0 0.0 0.0 0.0 0.0 0.0
21 0.0 0.0 0.0 0.0 0.0 0.0
22 0.0 0.0 0.0 0.0 0.0 0.0
23 0.0 0.0 0.0 0.0 0.0 0.0
24 0.0 0.0 0.0 0.0 0.0 0.0
25 0.0 0.0 0.0 0.0 0.0 0.0
26 0.0 0.0 0.0 0.0 0.0 0.0
27 0.0 0.0 0.0 0.0 0.0 0.0
28 0.0 0.0 0.0 0.0 0.0 0.0
29 0.0 0.0 0.0 0.0 0.0 0.0
30 0.0 0.0 0.0 0.0 0.0 0.0
31 0.0 0.0 0.0 0.0 0.0 0.0
32 0.0 0.0 0.0 0.0 0.0 0.0
... ... ... ... ... ...
3563 0.0 0.0 0.0 0.0 0.0 0.0
3564 0.0 0.0 0.0 0.0 0.0 0.0
3565 0.0 0.0 0.0 0.0 0.0 0.0
3566 0.0 0.0 0.0 0.0 0.0 0.0
3567 0.0 0.0 0.0 0.0 0.0 0.0
3568 0.0 0.0 0.0 0.0 0.0 0.0
3569 0.0 0.0 0.0 0.0 0.0 0.0
3570 0.0 0.0 0.0 0.0 0.0 0.0
3571 0.0 0.0 0.0 0.0 0.0 0.0
3572 0.0 0.0 0.0 0.0 0.0 0.0
3573 0.0 0.0 0.0 0.0 0.0 0.0
3574 0.0 0.0 0.0 0.0 0.0 0.0
3575 0.0 0.0 0.0 0.0 0.0 0.0
3576 0.0 0.0 0.0 0.0 0.0 0.0
3577 0.0 0.0 0.0 0.0 0.0 0.0
3578 0.0 0.0 0.0 0.0 0.0 0.0
3579 0.0 0.0 0.0 0.0 0.0 0.0
3580 0.0 0.0 0.0 0.0 0.0 0.0
3581 0.0 0.0 0.0 0.0 0.0 0.0
3582 0.0 0.0 0.0 0.0 0.0 0.0
3583 0.0 0.0 0.0 0.0 0.0 0.0
3584 0.0 0.0 0.0 0.0 0.0 0.0
3585 0.0 0.0 0.0 0.0 0.0 0.0
3586 0.0 0.0 0.0 0.0 0.0 0.0
3587 0.0 0.0 0.0 0.0 0.0 0.0
3588 0.0 0.0 0.0 0.0 0.0 0.0
3589 0.0 0.0 0.0 0.0 0.0 0.0
3590 0.0 0.0 0.0 0.0 0.0 0.0
3591 0.0 0.0 0.0 0.0 0.0 0.0
3592 0.0 0.0 0.0 0.0 0.0 0.0
将需要的项设置为1,在于data进行连接。
for row,cat in zip(data.index,data.CATEGORY):
codes = get_code(to_cat_list(cat))
dummy_frame.ix[row,codes] = 1
data = data.join(dummy_frame.add_prefix('category_'))
绘制海地地图
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
def basic_haiti_map(ax=None,lllat=17.25,urlat=20.25,lllon=-75,urlon=-71):
m = Basemap(ax=ax,projection='stere',
lon_0 = (urlon+lllon)/2,
lat_0 = (urlat+lllat)/2,
llcrnrlat = lllat,urcrnrlat = urlat,
llcrnrlon = lllon,urcrnrlon = urlon,
resolution = 'f')
m.drawcoastlines()
m.drawstates()
m.drawcountries()
return m
绘制点
fig,axes = plt.subplots(nrows = 2,ncols = 2,figsize=(12,10))
fig.subplots_adjust(hspace=0.05,wspace=0.05)
to_plot = ['2a','1','3c','7a']
maplist = []
for code,ax in zip(to_plot,axes.flat):
m = basic_haiti_map(ax)
maplist.append(m)
for code,ax ,m in zip(to_plot,axes.flat,maplist):
cat_data = data[data['category_%s' %code] == 1]
x, y = m(cat_data.LONGITUDE.values, cat_data.LATITUDE values)
m.plot(x,y,'k.',alpha=0.5)
ax.set_title('%s: %s' % (code,english_mapping[code]))
更多推荐
已为社区贡献4条内容
所有评论(0)