【Vol.5】 Pandas info and describe for data structure & stats

When performing data analysis in Python, it’s crucial to understand the overall structure of your data. Use the info() and describe() methods from the pandas library to inspect a DataFrame’s schema and summary statistics. In this article, we’ll walk through beginner-friendly examples showing how to use these methods.

Dataset Used
Inspecting Structure with info()
Examining Summary Statistics with describe()
1. Including All Columns
2. Categorical Statistics Explained
Author’s Takeaway
Summary

Dataset Used

We’ll convert the following dictionary into a pandas DataFrame:

import pandas as pd

data = {
    "Name": ["Taro", "Hanako", "Jiro", "Mika", "Kenichi", "Keiko", "Sho", "Akane", "Takashi", "Aoi"],
    "Age": [23, 29, 35, 42, 18, 33, 27, 24, 31, 30],
    "Occupation": ["Engineer", "Designer", "Teacher", "Doctor", "Student", "Nurse", "Programmer", "Sales", "Lawyer", "Researcher"],
    "Annual Income (¥)": [4500000, 5500000, 4900000, 7300000, 0, 4000000, 6000000, 3200000, 8000000, 5800000],
    "Location": ["Tokyo", "Osaka", "Nagoya", "Sapporo", "Fukuoka", "Tokyo", "Kobe", "Sendai", "Yokohama", "Chiba"],
    "Years Employed": [2, 4, 10, 15, 1, 5, 3, 1, 12, 8]
}

df = pd.DataFrame(data)

DataFrame output:

df

	Name	Age	Occupation	Annual Income (¥)	Location	Years Employed
0	Taro	23	Engineer	4500000	Tokyo	2
1	Hanako	29	Designer	5500000	Osaka	4
2	Jiro	35	Teacher	4900000	Nagoya	10
3	Mika	42	Doctor	7300000	Sapporo	15
4	Kenichi	18	Student	0	Fukuoka	1
5	Keiko	33	Nurse	4000000	Tokyo	5
6	Sho	27	Programmer	6000000	Kobe	3
7	Akane	24	Sales	3200000	Sendai	1
8	Takashi	31	Lawyer	8000000	Yokohama	12
9	Aoi	30	Researcher	5800000	Chiba	8

Inspecting Structure with info()

df.info()

From this output, you can see:

Entries: 10 rows indexed 0–9, and 6 columns
Data types:
- Name (object): strings
- Age (int64): integers
- Occupation (object): strings
- Annual Income (¥) (int64): integers
- Location (object): strings
- Years Employed (int64): integers
Non-null counts: All columns have 10 non-null entries (no missing values)
Memory usage: 612 bytes (small dataset)

Note: int64 is 64-bit integer; object covers strings or mixed types; non-null means no missing data.

▶️ For reference, see the official info documentation:
pandas DataFrame info Documentation

Examining Summary Statistics with describe()

df.describe()

	Age	Annual Income (¥)	Years Employed
count	10	10	10
mean	29.2	4920000	6.1
std	6.49	2170619.12	4.76
min	18	0	1
25%	24	4000000	2.25
50%	29.5	5200000	4.5
75%	33.25	6000000	9.25
max	42	8000000	15

The key statistics are:

count: number of non-missing values
mean: average
std: standard deviation
min / max: minimum and maximum
25% / 50% / 75%: quartiles

Including All Columns

By default, describe() shows only numeric columns. Use include='all' to include categorical data:

df.describe(include='all')

	Name	Age	Occupation	Annual Income (¥)	Location	Years Employed
count	10	10.0	10	10.0	10	10.0
unique	10	NaN	10	NaN	9	NaN
top	Taro	NaN	Engineer	NaN	Tokyo	NaN
freq	1	NaN	1	NaN	2	NaN
mean	NaN	29.2	NaN	4920000	NaN	6.1
std	NaN	6.76	NaN	2251321	NaN	4.91
min	NaN	18	NaN	0	NaN	1
25%	NaN	24.75	NaN	4125000	NaN	2.25
50%	NaN	29.5	NaN	5200000	NaN	4.5
75%	NaN	32.5	NaN	5950000	NaN	9.5
max	NaN	42	NaN	8000000	NaN	15

Categorical Statistics Explained

unique: number of distinct values
top: most frequent value
freq: frequency of the top value
NaN: indicates that a statistic (e.g. mean) is not applicable

Note: For example, “Tokyo” appears twice so freq=2. In ties, one value is shown.

Author’s Takeaway

The first time I ran df.describe(), I assumed no issues—but later found missing values in categorical columns that broke my model training. Now I always start with df.info() and follow with df.describe(include='all').

Lesson Learned: Don’t rely on numeric summaries alone—check structure and stats together!

▶️ For reference, see the official describe documentation:
pandas DataFrame describe Documentation

Summary

Use info() to inspect types, non-null counts, and dimensions
Use describe() to review numeric statistics
Use include='all' to include categorical columns
NaN indicates that a statistic is not applicable

Key Statistic Terms

Statistic	Numeric	Categorical
count	non-missing entries	non-missing entries
mean	average	–
std	standard deviation	–
min/max	minimum/maximum	–
25%/50%/75%	quartiles	–
unique	–	distinct values
top	–	most frequent value
freq	–	frequency of top value

Next time, we’ll cover loc for label-based row and column selection!

▲ Back to Top