Learning Python for Data Analytics
System:
Hey there, bro! 🌟 I'm super excited to dive into the world of data analysis with you. Don't worry about a thing – I'm here to guide you every step of the way. Today, let's start by generating our sample DataFrame filled with awesome data using NumPy. 😄
import numpy as np import pandas as pd # Set a random seed for reproducibility np.random.seed(42) # Generate data for each column num_samples = 1000 ages = np.random.randint(18, 65, size=num_samples) jobs = np.random.choice(['manager', 'technician', 'student', 'admin', 'blue-collar'], size=num_samples) marital_statuses = np.random.choice(['married', 'single', 'divorced'], size=num_samples) education_levels = np.random.choice(['High School', 'TAFE', 'University'], size=num_samples) defaults = np.random.choice([True, False], size=num_samples) housing_loans = np.random.choice([True, False], size=num_samples) personal_loans = np.random.choice([True, False], size=num_samples) contacts = np.random.choice(['cellular', 'telephone'], size=num_samples) months = np.random.choice(['jan', 'feb', 'mar', 'apr', 'may', 'jun', 'jul', 'aug', 'sep', 'oct', 'nov', 'dec'], size=num_samples) days = np.random.randint(1, 31, size=num_samples) campaigns = np.random.randint(1, 15, size=num_samples) pdays = np.random.randint(1, 100, size=num_samples) previous = np.random.randint(0, 10, size=num_samples) poutcomes = np.random.choice(['success', 'failure', 'other'], size=num_samples) term_deposits = np.random.choice([True, False], size=num_samples) # Create a DataFrame data = { 'age': ages, 'job': jobs, 'marital': marital_statuses, 'education': education_levels, 'default': defaults, 'housing': housing_loans, 'loan': personal_loans, 'contact': contacts, 'month': months, 'day': days, 'campaign': campaigns, 'pdays': pdays, 'previous': previous, 'poutcome': poutcomes, 'y': term_deposits } df = pd.DataFrame(data) print(df.head())
This code generates our sample DataFrame with the columns you mentioned. Each row represents a bank customer, and each column represents an attribute of the customer. We used NumPy's random number generation to fill the columns with random data. Give it a try and let me know if you have any questions or if there's something specific you'd like to learn about this data! 😄
User: Done! What's next?
System: Awesome job, bro! 😄 Now that we have our sample DataFrame, let's start exploring the data a bit. One of the first things we often do is to take a peek at the basic statistics of our dataset. This gives us a quick overview of the data's central tendencies and spreads. Let's get started:
# Display basic statistics of the numerical columns numerical_columns = ['age', 'campaign', 'pdays', 'previous'] numerical_stats = df[numerical_columns].describe() print(numerical_stats)
This code will show you statistics like mean, standard deviation, minimum, and maximum for the numerical columns in our DataFrame. This can give us insights into things like the average age of customers, the number of times they've been contacted, and more.
Feel free to explore the output and let me know what you think or if you have any questions! 😊