Learning Python for Data Analytics

ID: 2103Words in prompt: 418 Tested
1
-
Comments
This prompt will teach you to analyse data in Python. ChatGPT will become a personalised Data Analytics in Python coach. Your new coach is friendly and encouraging. No ego. You will feel good while learning. It first will provide code to generate a dataset. Then it will give you step by step lessons. So you will have a dataset to practice your new data analysis skills. This prompt has been written by an experienced Data Scientist, Slava Razbash. He has worked in data science roles since 2011. His resume includes Commonwealth Bank of Australia (Australia’s largest bank), Sportsbet (Flutter Entertainment), Tabcorp (multinational gambling business), Coles (one of Australia’s two largest supermarket chains) and AGL (one of Australia’s largest energy companies). Over his career, Slava has taught countless analytics professionals to level up their game. He has taught data science masterclasses, mentored colleagues, presented live speeches and webinars, taught at Monash University, and even organised two conferences. With Slava's prompts, you are in good hands. Python for Data Analytics courses sell for over $30 on course websites such as Udemy. This prompt is less than $6! It's a no-brainer. This prompt assumes a basic knowledge of Python.
Created: 2023-08-28
Powered by: ChatGPT Version: 3.5
-/5 (0)Use & rate
Comments (0)

System:

Hey there, bro! 🌟 I'm super excited to dive into the world of data analysis with you. Don't worry about a thing – I'm here to guide you every step of the way. Today, let's start by generating our sample DataFrame filled with awesome data using NumPy. 😄

import numpy as np
import pandas as pd

# Set a random seed for reproducibility
np.random.seed(42)

# Generate data for each column
num_samples = 1000

ages = np.random.randint(18, 65, size=num_samples)
jobs = np.random.choice(['manager', 'technician', 'student', 'admin', 'blue-collar'], size=num_samples)
marital_statuses = np.random.choice(['married', 'single', 'divorced'], size=num_samples)
education_levels = np.random.choice(['High School', 'TAFE', 'University'], size=num_samples)
defaults = np.random.choice([True, False], size=num_samples)
housing_loans = np.random.choice([True, False], size=num_samples)
personal_loans = np.random.choice([True, False], size=num_samples)
contacts = np.random.choice(['cellular', 'telephone'], size=num_samples)
months = np.random.choice(['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct', 'nov', 'dec'], size=num_samples)
days = np.random.randint(1, 31, size=num_samples)
campaigns = np.random.randint(1, 15, size=num_samples)
pdays = np.random.randint(1, 100, size=num_samples)
previous = np.random.randint(0, 10, size=num_samples)
poutcomes = np.random.choice(['success', 'failure', 'other'], size=num_samples)
term_deposits = np.random.choice([True, False], size=num_samples)

# Create a DataFrame
data = {
    'age': ages,
    'job': jobs,
    'marital': marital_statuses,
    'education': education_levels,
    'default': defaults,
    'housing': housing_loans,
    'loan': personal_loans,
    'contact': contacts,
    'month': months,
    'day': days,
    'campaign': campaigns,
    'pdays': pdays,
    'previous': previous,
    'poutcome': poutcomes,
    'y': term_deposits
}

df = pd.DataFrame(data)
print(df.head())

This code generates our sample DataFrame with the columns you mentioned. Each row represents a bank customer, and each column represents an attribute of the customer. We used NumPy's random number generation to fill the columns with random data. Give it a try and let me know if you have any questions or if there's something specific you'd like to learn about this data! 😄

User: Done! What's next?

System: Awesome job, bro! 😄 Now that we have our sample DataFrame, let's start exploring the data a bit. One of the first things we often do is to take a peek at the basic statistics of our dataset. This gives us a quick overview of the data's central tendencies and spreads. Let's get started:

# Display basic statistics of the numerical columns
numerical_columns = ['age', 'campaign', 'pdays', 'previous']
numerical_stats = df[numerical_columns].describe()
print(numerical_stats)

This code will show you statistics like mean, standard deviation, minimum, and maximum for the numerical columns in our DataFrame. This can give us insights into things like the average age of customers, the number of times they've been contacted, and more.

Feel free to explore the output and let me know what you think or if you have any questions! 😊