Two lines of code using Sweetviz

Sweetviz report

Before going to start Sweetviz, let’s talk about EDA first. EDA is statistics exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual(Graphical representation) methods. EDA is used in Data Science projects to find out data types, missing information, distribution of values, correlations, etc.

Sweetviz

One of the latest and new open-source Python library called Sweetviz created by a Created & maintained by Francois Bertrand and Graphic design by Jean-Francois Hains. It takes pandas data frames and creates a self-contained HTML report.

In addition to that Sweetviz create beautiful visualizations with just two lines of code. It saves time and makes reports quickly. It has mainly two components:-

Comparison of 2 datasets (e.g. Train vs Test)

Visualization of the target value against all other variables (e.g. “What was the sale price of the house” etc.)

Analyzing the House Prices: Advanced Regression Techniques

We will get the dataset from the link https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data.

Install Sweetviz library using pip install sweetviz.

Let’s start loading datasets

import sweetviz
import pandas as pd
train = pd.read_csv("Downloads/train.csv")
test = pd.read_csv("Downloads/test.csv")

We now have 2 data frames (train and test), and we would like to analyze the target value “SalePrice”

train.head()
House price data sets

We can generate a report with this line of code:

my_report = sweetviz.compare([train, "Train"], [test, "Test"], "SalePrice")

Running this command will perform the analysis and create the report object. To get the output, simply use the show_html() command:

my_report.show_html("Report.html") # Not providing a filename will default to SWEETVIZ_REPORT.html

After generating the file, it will open it through your default browser and should look something like this

This is the report generated by Sweetviz and its two-line code. We get to know many things here like summary, SalePrice, missing values, etc.

Associations

We move the mouse over the “Associations” button in the summary will make the Associations graph appear on the right-hand side:

This is a graph is visual Heatmap and Correlation Matrix in Python

Target variable

Target variable

It will show up first, in a special black box.

Conclusion

All this information from just two lines of code!

Sweetviz easily summarises our datasets by just typing two-line code. It is very useful in analyzing the data. It gives a quick overview of how data varies by graphical representation. I hope you will find sweetviz as a useful tool for quickly analyzing the data.

--

--

--

I am Data Scientist working in Cognizant | Writing about Data Science, AI, ML, DL, Stats, Math

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Continuous integration and code coverage report for a Rust project

How to compress image while uploading to the directory | Django

Interdax Development Update May 2020

Progressive Enhancement to PWAs

System Design — Stock/Crypto Market Trading App (Zerodha+NSE)

Solve a Sudoku Puzzle Using Backtracking in Python

Salesforce: Enforcing Security in Apex

Netezos: forge an operation locally and sign it using Ledger

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jane Alam

Jane Alam

I am Data Scientist working in Cognizant | Writing about Data Science, AI, ML, DL, Stats, Math

More from Medium

Day 35: Box & Whisker Plots for Spotify Audio Features

Exploring Seaborn’s Scatterplots

These 4 Programming Languages are Fueling the Data Visualization.

Getting started with Spatial Data Science with GeoPandas