Two lines of code using Sweetviz
Before going to start Sweetviz, let’s talk about EDA first. EDA is statistics exploratory data analysis (EDA) is an approach to analyzing data sets to summarize their main characteristics, often with visual(Graphical representation) methods. EDA is used in Data Science projects to find out data types, missing information, distribution of values, correlations, etc.
One of the latest and new open-source Python library called Sweetviz created by a Created & maintained by Francois Bertrand and Graphic design by Jean-Francois Hains. It takes pandas data frames and creates a self-contained HTML report.
In addition to that Sweetviz create beautiful visualizations with just two lines of code. It saves time and makes reports quickly. It has mainly two components:-
Comparison of 2 datasets (e.g. Train vs Test)
Visualization of the target value against all other variables (e.g. “What was the sale price of the house” etc.)
Analyzing the House Prices: Advanced Regression Techniques
We will get the dataset from the link https://www.kaggle.com/c/house-prices-advanced-regression-techniques/data.
Install Sweetviz library using pip install sweetviz.
Let’s start loading datasets
import pandas as pd
train = pd.read_csv("Downloads/train.csv")
test = pd.read_csv("Downloads/test.csv")
We now have 2 data frames (train and test), and we would like to analyze the target value “SalePrice”
We can generate a report with this line of code:
my_report = sweetviz.compare([train, "Train"], [test, "Test"], "SalePrice")
Running this command will perform the analysis and create the report object. To get the output, simply use the
my_report.show_html("Report.html") # Not providing a filename will default to SWEETVIZ_REPORT.html
After generating the file, it will open it through your default browser and should look something like this
This is the report generated by Sweetviz and its two-line code. We get to know many things here like summary, SalePrice, missing values, etc.
We move the mouse over the “Associations” button in the summary will make the Associations graph appear on the right-hand side:
This is a graph is visual Heatmap and Correlation Matrix in Python
It will show up first, in a special black box.
All this information from just two lines of code!
Sweetviz easily summarises our datasets by just typing two-line code. It is very useful in analyzing the data. It gives a quick overview of how data varies by graphical representation. I hope you will find sweetviz as a useful tool for quickly analyzing the data.