Why Pandas?
Vectorization
Pandas DS - Series, DataFrame, Panel
No extensive data analysis/modeling.
Python - lists, dict
Dont reinvent wheel even if it is easy in Python.
Pandas - Defacto data analysis toolkit.
Useful for cleaning, analysing & modeling data.
Built on top of NumPy
Compact, Fast & Easy data manipulation.
Series - Fixed length ordered dict
In [28]: from pandas import Series
In [29]: s = Series([1, 2, 3, 4])
In [30]: s
Out[30]:
0 1
1 2
2 3
3 4
dtype: int64
In [17]: s = Series([34, -4, 45, 89], index=['a', 'b', 'c', 'd'])
In [18]: s['b']
Out[18]: -4
In [19]: s > 0
Out[19]:
a True
b False
c True
d True
dtype: bool
In [20]: s[s > 0]
Out[20]:
a 34
c 45
d 89
dtype: int64
Batch operations on data without for loops.
In [21]: l = [34, 67, 89]
In [22]: l * 2
Out[22]: [34, 67, 89, 34, 67, 89]
In [23]: s = Series([34, -4, 45, 89])
In [24]: s * 2
Out[24]:
0 68
1 -8
2 90
3 178
dtype: int64
In [30]: from pandas import DataFrame
In [31]: df = DataFrame([ [71, -2, -93, -34], [44, 5, 34, 7] ])
In [32]: df
Out[32]:
0 1 2 3
0 71 -2 -93 -34
1 44 5 34 7
In [34]: df[2]
Out[34]:
0 -93
1 34
Name: 2, dtype: int64
In [35]: df.ix[1]
Out[35]:
0 44
1 5
2 34
3 7
Name: 1, dtype: int64
In [36]: df.ix[1][2]
Out[36]: 34
In [37]: df[2].ix[1]
Out[37]: 34
In [64]: from pandas import Series, DataFrame, Panel
In [65]: p = Panel( [ [[1, 4, 1], [4, 66, 7]], [[23, 45, 56], [23, 4, 6]] ])
In [66]: p
Out[66]:
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 2 (major_axis) x 3 (minor_axis)
Items axis: 0 to 1
Major_axis axis: 0 to 1
Minor_axis axis: 0 to 2
In [67]: p[0]
Out[67]:
0 1 2
0 1 4 1
1 4 66 7
[Docs] http://pandas.pydata.org/
[Book] Python for data analysis.