CREATE DataFrame
AND ACCESSING THE SUBSET OF DataFrame
Creating a Pandas DataFrame
A Pandas DataFrame will be created by loading the datasets from
existing storage, storage can be SQL Database, CSV file, and Excel file. Pandas
DataFrame can be created from the lists, dictionary, and from a list of
dictionary etc. Dataframe can be created in different ways here are some ways
by which we create a dataframe:
1.Create
a Pandas DataFrame from Lists
CODE1:
import
pandas as pd
import numpy as np
# intialise data of lists of Samsung
Products.
data = {
'ITEM NAME':['Galaxy 5G', 'GALAXY
S', 'GALAXY TAB S6', 'GALAXY A', 'QLED 8K'],
'EXPENDITURE':[25000, 28000,
32000, 29500,45000],
'ITEM CATEGORY':['SMART PHONE','SMART
PHONE','TABLET','SMART PHONE','TV']
}
# Create DataFrame #df is the object
df = pd.DataFrame(data)
# Print the output.
df
output:
ITEM NAME EXPENDITURE ITEM CATEGORY COLUMN
NAME
____________________________________________
0
|
Galaxy 5G
|
25000
|
SMART PHONE
|
1
|
GALAXY S
|
28000
|
SMART PHONE
|
2
|
GALAXY TAB S6
|
32000
|
TABLET
|
3
|
GALAXY A
|
29500
|
SMART PHONE
|
4
|
QLED 8K
|
45000
|
TV
|
|
|
|
|
·
IF WE DONOT DECLARE INDEXES NAME THEN IT
SHOWS BY DEFAULT STARTS FROM 0 TO ONWARDS
Code #2: Dataframe using list with index and column names:
import pandas as pd
import numpy as np
# intialise data of lists of Samsung Products.
data = { 'ITEM NAME':['Galaxy 5G', 'GALAXY S', 'GALAXY
TAB S6', 'GALAXY A',
'QLED 8K'],
'EXPENDITURE':[25000, 28000, 32000, 29500,45000],
'ITEM
CATEGORY':['SMART PHONE','SMART PHONE','TABLET','SMART PHONE','TV']
}
# Create DataFrame
df = pd.DataFrame(data,index=['a','b','c','d','e'])
# Print the output.
df
output
ITEM NAME EXPENDITURE ITEM CATEGORY COLUMN
NAME
____________________________________________
a
|
Galaxy 5G
|
25000
|
SMART PHONE
|
b
|
GALAXY S
|
28000
|
SMART PHONE
|
c
|
GALAXY TAB S6
|
32000
|
TABLET
|
d
|
GALAXY A
|
29500
|
SMART PHONE
|
e
|
QLED 8K
|
45000
|
TV
|
Selecting/accessing a column:
v Select
one column using this syntax:
<DataFrame
object> [<column name>]
Or
<DataFrame
object>.<column name>
v Select
multiple column using this syntax:
<DataFrame
object> [[<column name>,<column name>,……]]
Example:
From code#2
df [‘ITEM NAME’]
ITEM
NAME
a
|
Galaxy 5G
|
b
|
GALAXY S
|
c
|
GALAXY TAB S6
|
d
|
GALAXY A
|
e
|
QLED 8K
|
df[[‘ITEM NAME’,’EXPENDITURE’]]
ITEM
NAME EXPENDITURE
a
|
Galaxy 5G
|
25000
|
b
|
GALAXY S
|
28000
|
c
|
GALAXY TAB S6
|
32000
|
d
|
GALAXY A
|
29500
|
e
|
QLED 8K
|
45000
|
Selecting/accessing a subset from a DataFrame
using Row/Columns Names
We can to select/acess a subset from a dataframe object:
syntax
<DataFrameObject>.loc[<startrow>:<endrow>,<startcolumn>:<endcolumn>]
v To
acess a single row,just give the row name/label at this
Syntax
df object.loc[<row
label>,:]
Examples:
df.loc[‘d’]
v To
acess multiple rows
Syntax
<Dfobject>.loc[<start row>:<end row>,
:]
Example
df.loc[‘a’:’b’, : ]
v To
acess selective columns
Syntax:
df.loc [:,<start column>:<end column>]
example:
df.loc[: ,’ITEM NAME’:’EXPENDITURE’]
v To
acess range of columns from a range of rows
Syntax:
<df object>.loc[<start column>:<end
row>,<start column>:<end column>]
Example:
Df.loc [‘a’: ’b’,’ ITEM NAME’:’EXPENDITURE’]
Obtaining a subset/slice from a DataFrame
using Row/Colmun Numeric Index/position
We can extract subset from dataframe using the row and
column numeric index/position,but this time we will use iloc instead of loc
means integer location.
Syntax:
<df object>.iloc[<start row
index>:<end row index>,<start col index>:<end col index>]
Example:
df.iloc[0:2,0:2]
v Selecting/accessing
Individual Value
Syntax
<dfobject>.<column>[<row
name or numeric index>]
Example:
Df.ITEMNAME[‘a’]
Questions:
v Create
a DataFrame object df1 having an column name rollno, class, phone
number and indexes is Name of the
students.
Download the PDF