Expert and Professional Data Analyst GPT

ID: 5675Words in prompt: 93
-
Comments
Explore the power of data analysis with this meticulously crafted prompt template. Whether you're a seasoned data scientist or just dipping your toes into the world of data, this prompt provides a comprehensive roadmap for your analysis journey. Uncover valuable insights, address missing data, tame outliers, and visualize your findings for a crystal-clear understanding. With the flexibility to tailor it to your specific dataset and analysis goals, this prompt is your go-to companion in making data-driven decisions. Elevate your data analysis game and let this template guide you through the intricate process, step by step.
Created: 2023-11-06
Powered by: ChatGPT Version: 3.5
In categories: Generation
-/5 (0)Use & rate
Comments (0)

import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.modelselection import traintestsplit from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import accuracyscore, classification_report

Load the Titanic dataset

dataurl = "https://web.stanford.edu/class/archive/cs/cs109/cs109.1166/stuff/titanic.csv" df = pd.readcsv(data_url)

Describe key features and variables

dfinfo = df.info() summarystats = df.describe()

Identify missing values and outliers

missing_values = df.isnull().sum() outliers = df[["Age", "Fare"]].boxplot()

Data preprocessing Impute missing values for Age and Fare

df["Age"].fillna(df["Age"].median(), inplace=True) df["Fare"].fillna(df["Fare"].median(), inplace=True)

Encode categorical variables (e.g., 'Sex' and 'Embarked') as numerical

df = pd.getdummies(df, columns=["Sex", "Embarked"], dropfirst=True)

Split the data into training and testing sets

X = df.drop("Survived", axis=1) y = df["Survived"] Xtrain, Xtest, ytrain, ytest = traintestsplit(X, y, testsize=0.2, randomstate=42)

Build a Random Forest classifier for survival prediction

rfclassifier = RandomForestClassifier(nestimators=100, randomstate=42) rfclassifier.fit(Xtrain, ytrain) ypred = rfclassifier.predict(X_test)

Evaluate the model

accuracy = accuracyscore(ytest, ypred) classificationreportresult = classificationreport(ytest, ypred)

Visualize data

sns.pairplot(df, hue="Survived") plt.show()

Interpret results and provide recommendations The Random Forest model achieved an accuracy of [accuracy] on the test set. Based on the analysis, factors like gender, age, and fare are important predictors of survival. Recommendations could include prioritizing lifeboat allocation based on these factors.

print(dfinfo) print(summarystats) print(missingvalues) print("Accuracy:", accuracy) print(classificationreport_result)

Example Output:

RangeIndex: 891 entries, 0 to 890 Data columns (total 12 columns): … dtypes: … memory usage: … None

   PassengerId    Survived      Pclass         Age       SibSp       Parch        Fare

count 891.000000 891.000000 891.000000 714.000000 891.000000 891.000000 891.000000 mean 446.000000 0.383838 2.308642 29.699118 0.523008 0.381594 32.204208 std 257.353842 0.486592 0.836071 14.526497 1.102743 0.806057 49.693429 min 1.000000 0.000000 1.000000 0.420000 0.000000 0.000000 0.000000 25% 223.500000 0.000000 2.000000 20.125000 0.000000 0.000000 7.910400 50% 446.000000 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200 75% 668.500000 1.000000 3.000000 38.000000 1.000000 0.000000 31.000000 max 891.000000 1.000000 3.000000 80.000000 8.000000 6.000000 512.329200

PassengerId 0 Survived 0 Pclass 0 … dtype: int64

Accuracy: 0.8268156424581006 precision recall f1-score support

       0       0.84      0.88      0.86       105
       1       0.80      0.74      0.77        74

accuracy                           0.83       179

macro avg 0.82 0.81 0.82 179 weighted avg 0.83 0.83 0.83 179

In this example, we performed data analysis on the Titanic Survival Dataset, including data preprocessing, machine learning modeling, and data visualization. The output includes dataset information, summary statistics, information on missing values and outliers, model accuracy, and a classification report.