Wednesday, April 22, 2020

Data Handling Using Pandas and Data Visualiation

UNIT-1
Data Handling using Pandas-I
         i)Introduction to Python Libraries-Pandas ,Matplotlib

           Python library is a collection of functions and methods that allows you to perform many actions without writing your code. For example, the Python imaging library (PIL).is one of the core libraries for image manipulation in Python. 

❖ Python library-Matplotlib
Matplotlib is a part , rather say a library of python . Using Matplotlib you can plot graphs , histogram and bar plot and all those things .The matplotlib is a python library that provides many interfaces and functionality for 2D graphics similar to MATLAB in various forms.it provides both a very quick way to visualize data form Python and publication-quality figures in many formats. The matplotlib library offers many different named collection of methods, Pyplot is one such interface.


❖ Pandas: 
Pandas is an open-source, BSD-licensed (Berkeley Software Distribution) Python library providing high-performance, easy-to-use data structures and data analysis tools for the Python programming language. Python with Pandas is used in a wide range of fields including academic and commercial domains including finance, economics, Statistics, analytics, etc. Pandas builds on packages like NumPy and matplotlib to give us a single and convenient place for data analysis and visualization work.

     
 ii)Data Structures in in Pandas
        Pandas stands for “python Data Analysis Library”. pandas is fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the python programming language.
Key Features of Pandas
    • Fast and efficient DataFrame object with default and               customized indexing.
  • Tools for loading data into in-memory data objects from different file formats.
• Data alignment and integrated handling of missing data.
• Reshaping and pivoting of date sets.
• Label-based slicing, indexing and subsetting of large data sets.
• Columns from a data structure can be deleted or inserted.
• Group by data for aggregation and transformations.
• High performance merging and joining of data.
• Time Series functionality.
Two main basic structure of pandas is :
• Series
• DataFrame
Series:
A series is a pandas data structure that represents a one dimensional array like object containing an array of data and an associated array of data labels called index. a)Size Immutable b)Values of Data Mutable
DataFrame:
A dataframe is a two dimensional array like, pandas data structure that stores an ordered collections columns that can stores data of different types.
Major characteristics of a DataFrame data structure are:
a) It has two indices or two axes –a row index (axis=0) and column index (axis=1).
b) A DataFrame is the combination of row index and column index .The row index is known as index in general and the column index is called the column_name.
c )The indices can be of numbers or letters or strings.
d)You can easily changes its values, i.e it is mutable.
e) You can easily add or delete rows /columns in a DataFrame.

f) There is no conditions for all the data are the same types across the columns.
                                                                     Download the PDF
                                  



0 comments:

Post a Comment