Quant Developers' Tools and Techniques
معرفی کتاب «Quant Developers' Tools and Techniques» نوشتهٔ Hindering, Manfred، منتشرشده توسط نشر Bookllo Publishing در سال 2023. این کتاب در فرمت epub، زبان انگلیسی ارائه شده است. «Quant Developers' Tools and Techniques» در دستهٔ بدون دستهبندی قرار دارد.
QUANT DEVELOPERS' TOOLS AND TECHNIQUES Vol. 1 Is the first part of a comprehensive series of books, enabling QUANT Developers to succeed in this ever changing and challenging field of business. This series is intended for finance professionals, such as controllers, equity analysts, quantitative developers, data scientists, and everybody interested in trading and winning in the financial markets. If you plan to become a QUANT developer, this first volume is the prelude to a comprehensive series, presenting and demonstrating the tools and techniques required to succeed in this new data driven world, such as: Volume 1 Statistics, Visualization, Pandas DataFrame, Simple- & Multiple Linear Regression ...build up the basis for a solid understanding of the covered parts of the QUANT body of knowledge, in a learning-by-example fashion. Volume 2 Multiple- & Logistic Regression, SQLite Database Programming, QUANT Tools, Ratios, Indicators, Time Series Analysis, Autoregression Models, Monte Carlo Simulation ...further develop your advanced understanding of the tools and techniques required to succeed in the QUANT profession, again in a learning-by-example format (to be released shortly). Volume 3 Data Management, Clustering & Forecasting Techniques, Stochastic Programming, Microstructure Analysis, Trading Strategy Development, Neural Network- & Machine Leaning Techniques ...set up the foundation for very lucrative career in QUANT development. As learning-by-example is the most effective way to learn and master this complex material essential for the QUANT profession, which is the most lucrative although challenging profession, i know of. To become a QUANT pro, certainly requires an in-depth mastery of the QUANT body of knowledge, in order to excel in this ever evolving and challenging profession (release: TBT). Validation This EPUB is validated against the 3.3 Rule set, set forth by the The DAISY Consortium. The validation report is generated by the industry standard validation tool epubcheck 5.0.0 validating by EPUB 3.3 rules Validating using EPUB version 3.3 rules. No errors or warnings detected. Messages: 0 fatals / 0 errors / 0 warnings / 0 infos EPUBCheck completed QUANT DEVELOPERS’ TOOLS AND TECHNIQUES Introduction About the Series The Author Installing Python Creating a Virtual Environment Installing Packages Caveat Emptor 1. Gaussian Normal Distribution 1.1. Probability density function PDF 1.1.1. Plot density function with different standard deviations 1.1.2. Plot density function with different means 1.1.3. Random numbers drawn from the standard normal distribution 1.2. Cumulative density function CDF 1.2.1. Calculate probabilities 1.2.2. Visualize PDF and CDF 1.2.3. Compute the interval between values 1.3. Survival function 1.4. Computing quantile 1.4.1. Compute 20 %-quantile manually 1.4.2. Compute 20 %-quantile using the NumPy percentile() function 1.4.3. Compute quantile using cdf() function 1.4.4. Compute quantile using cdf() and ppf() function with plot 1.5. Probability density function PDF 1.6. Inverse survival function 1.7. Confidence Interval around the mean 1.8. Standard normal distribution 1.8.1. Standardized normal variable 1.8.2. Finding the standard normal variable 2. Map data to a normal distribution 2.1. log-normal distribution to normal distribution 2.2. Chi-square distribution to normal distribution 2.3. Uniform distribution to normal distribution 3. Generate Random Numbers using Normalvariate 3.1. Random number, drawn from normal distribution 3.2. Random number, with seed 3.3. Generate synthetic charts 3.4. Generate a histogram, which converges to normal distribution 4. Test Distribution for Normality 4.1. Visualize the data in plots 4.2. Plot theoretical versus ordered quantile using probplot 4.3. Shapiro-Wilk (SW) Test for Normality 4.3.1. Sample drawn from normal distribution 4.3.2. Sample drawn from the uniform distribution 4.4. Skewness and Kurtosis 4.4.1. Sample drawn from the normal distribution 4.4.2. Sample drawn from the uniform distribution 4.5. Sample drawn from the log-normal distribution 4.5.1. Create a Histogram 4.5.2. Visual Method - Create a Q-Q plot 4.5.3. Statistical Method - Shapiro-Wilk (SW) Test 4.5.4. Statistical Method - Kolmogorov-Smirnov (KS) Test 4.6. Preprocessing non-normal input data 5. Statistical Significance Test 5.1. Hypothesis Testing 5.1.1. One-tailed test 5.1.2. Two-tailed test 5.1.3. Alpha () value 5.1.4. Tail probability () 5.2. Student-t Test for equality of means 5.2.1. Samples v1 and v2 with equal means 5.2.2. Samples v1 and v2 with unequal means 5.3. Kolmogorov-Smirnov (KS) Test for Normality 5.3.1. Sample v is drawn from the normal distribution 5.3.2. Sample v is drawn from the log-normal distribution 5.4. Extract summary statistics from the describe() results 5.5. Skewness and Kurtosis 5.6. Testing for Normality using the normaltest() method 5.6.1. Sample v drawn from the normal distribution 5.6.2. Sample v drawn from the log-normal distribution 6. Runs Test to confirm randomness 6.1. Runs Test for Randomness with the runsTest() function 6.1.1. Sample drawn from the normal distribution 6.1.2. Sample drawn from the uniform distribution 6.2. Runs Test for Randomness using the runstest_1samp() function 6.2.1. Sample dataset containing “runs” 6.2.2. Sample dataset generated at “random” 7. Kolmogorov-Smirnov (KS) Test 7.1. One Sample Kolmogorov-Smirnov (KS) Test 7.1.1. KS-Test for uniformity of pseudo random numbers with plot 7.1.2. KS-Test for uniform distribution with plot 7.1.3. KS-Test for normal distribution with plot 7.2. Two Sample Kolmogorov-Smirnov (KS) Test 7.2.1. KS-Test on uniform vs. log-normal distributed samples 7.2.2. KS-Test on log-normal vs. log-normal distributed samples 7.2.3. KS-Test on uniform vs. uniform distributed samples 7.2.4. KS-Test on normal vs. normal distributed samples 8. Compute Z-score and Z-critical Values 8.1. Z-score Values 8.1.1. Compute Z-score values in one-dimensional NumPy array 8.1.2. Compute Z-score values in multidimensional NumPy array 8.1.3. Compute Z-score values in a Pandas DataFrame 8.2. Z-critical Values 8.2.1. Left-tailed test 8.2.2. Right-tailed test 8.2.3. Two-tailed test 9. Pearson Correlation Coefficient 9.1. Student-t Test for the correlation coefficient 9.2. Five underlying assumptions for the Pearson correlation analysis 9.2.1. Assumption 1: Level of Measurement 9.2.2. Assumption 2: Linear Relationship 9.2.3. Assumption 3: Normal Distribution 9.2.4. Assumption 4: Related Pairs 9.2.5. Assumption 5: No Outliers 9.3. Three methods to compute the correlation coefficient 9.3.1. Pearson correlation coefficient using NumPy corrcoef() 9.3.2. Pearson correlation coefficient using Scipy pearsonr() 9.3.3. Pearson correlation coefficient using Pandas corr() 9.3.4. Comparison of the three methods used 10. F-Test for regression model significance 10.1. F-Test in regression analysis 10.2. Ordinary least square OLS regression model 10.3. Influence measures reported by the summary_table() function 10.4. Analysis of Variation ANOVA 10.5. ANOVA table results 11. F-regression (FR) and Mutual Information (MI) 11.1. F-regression (FR) analysis 11.2. Mutual Information (MI) analysis 11.3. Visualizing data as scatter plots 12. Coefficient of Variation CV 12.1. Usage of the coefficient of variation 12.2. Coefficient of variation custom function mycv() 12.3. Use mycv() on a single array 12.4. Apply mycv() on a Pandas DataFrame 12.5. Explore NaN or Missing Values in Pandas DataFrame 13. Covariance and Correlation Matrix 13.1. Covariance Matrix 13.1.1. Dataset 13.1.2. Create the covariance matrix using the cov() function 13.1.3. Visualize covariance matrix as heatmap 13.2. Correlation Matrix 13.2.1. Dataset 13.2.2. Create a correlation matrix using the corr() function 13.2.3. Visualize correlation matrix as heatmap 14. Breusch-Pagan Test for Heteroscedasticity 14.1. Linear Multiple Regression model 14.2. Breusch-Pagan Test for Heteroscedasticity 14.3. Heteroscedasticity and how to correct for it 15. Chi-square Test for Independence 15.1. Gender versus ice-cream-flavor preference 15.2. Compute the chi-square statistic 15.3. Quarter of residency versus occupation type 15.4. Compute the chi-square statistic 16. Mann-Kendall Test for Trends 16.1. MK-Test on sample containing no trend 16.2. Plot the data series containing no trend 16.3. MK-Test for Trends on sample with a major trend 16.4. Plot the data series containing a major trend 17. Analysis of Variance ANOVA 17.1. Dataset: Moore Car Dataset 17.1.1. ANOVA table type 1 17.1.2. ANOVA table type 2 17.1.3. ANOVA table type 3 18. Regression Diagnostics 18.1. Loading the dataset 18.2. Ordinary Least Square (OLS) regression model 18.3. Test for Normality of the residuals 18.3.1. Jarque-Bera (JB) Test for Normality 18.3.2. Chi-square Omni Test for Normality 18.4. Influence Analysis 18.4.1. Influence analysis of the observations 18.4.2. Plot Influence analysis with bubble plots 18.4.3. Leverage versus Normalized residuals squared Analysis 18.5. Detecting Multicollinearity 18.5.1. Evaluate Condition Number to detect multicollinearity 18.6. Detecting Heteroscedasticity 18.6.1. Breusch-Pagan Test for Heteroscedasticity 18.6.2. Goldfeld-Quandt Test for Heteroscedasticity 18.7. Linearity 18.7.1. Harvey-Collier Multiplier Test for Linear Specification 19. Test for Stationarity and detrending 19.1. Loading the Sunspots dataset 19.2. Visualize and inspect the dataset with plots 19.3. Assess Stationarity 19.3.1. Augmented Dickey-Fuller (ADF) Test 19.3.2. Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test 19.3.3. Detrending by First Differencing 20. Create Plots and Charts from Pandas DataFrame 20.1. Scatter Plots 20.2. Line Charts 20.3. Bar Charts 20.4. Pie Charts 21. Plot a histogram using matplotlib 21.1. Preparing the dataset 21.2. Plot a histogram using the hist() method 21.3. Change the styling with predefined styles 21.4. List all available styles 21.5. Compute statistical moments for the sample data 21.6. Compute the appropriate number of bins to use 21.7. Plot histogram using custom bins 22. Creating Charts with mplfinance Library 22.1. Daily data 22.1.1. OHLC Charts on daily data 22.1.2. Candle Charts on daily data 22.1.3. Line Charts on daily data 22.1.4. Renko Charts on daily data 22.1.5. Point & Figure Charts on daily data 22.1.6. OHLC Charts on daily data with moving average overlays 22.1.7. Candle Charts with multiple moving average overlays 22.1.8. OHLC Charts on daily data with a volume in extra diagram 22.1.9. Candle Charts with multiple moving averages and volume plot 22.2. Intraday data 22.2.1. Candle Charts with 1 min resolution and multiple overlay studies 22.2.2. Candle Charts with 1 min resolution 22.2.3. Candle Charts with 1 min resolution 22.2.4. OHLC Charts with 1 min resolution 22.2.5. Line Charts with 1 min resolution, showing non-trading days 22.2.6. Candle Charts on daily data with moving average overlay studies 23. Live Chart Animation using IPython display 23.1. Animated cosine function 23.2. Animated arbitrary function 24. Customize the size and orientation of Charts 24.1. Syntax of Matplotlib Figsize 24.2. Default Figsize 24.3. Customize figures using rcParams 24.4. Set the figures size in centimeters instead of inches 24.5. Change the height and width of the Charts 24.6. Generate sample data using Faker library 24.7. Change the orientation of the Chart 24.8. Change the Figsize in Subplots 24.9. Set the figure size in pixels instead of inches 24.10. Choose a pre-defined styles for the Charts 25. Receiver Operating Characteristics ROC Curve 25.1. Create a ROC Curve 25.2. Interpreting the ROC Curve 25.3. Comparing three logistic regression models 26. Create Plots directly from a Pandas DataFrame 26.1. Prepare sample dataset 26.2. Plot histogram from single column 26.3. Plot histogram from multiple columns 26.4. Horizontal histogram of two columns 26.5. Line Chart of two columns 26.6. Box Plots from three columns 26.7. Three charts side-by-side using subplots 27. Visualize Pima Indian Diabetes dataset 27.1. Inspect the dataset 27.2. Visualize dataset with Box Plots 27.3. Visualize the dataset with multiple histogram Plots 27.4. Extract single plot from the scatter_matrix object 27.5. Visualize dataset with grouped histograms 27.6. Visualize the dataset with a scatter matrix plot 27.7. Extract single plot from the scatter_matrix object 28. Pandas DataFrame - creating, editing, viewing 28.1. Create empty DataFrame 28.2. Create new DataFrame from Python dictionary 28.3. Create new DataFrame from utf-8 encoded csv file 28.4. Create new DataFrame from latin-1 encoded csv file 28.5. Create new DataFrame from SQL query result set 28.6. Create new DataFrame from Excel file 28.7. Inspect DataFrame using the slice operator [] and column list 28.8. Inspect DataFrame using the describe() method 28.9. Inspect DataFrame using the info() method 29. Combine Pandas DataFrames using Merge 29.1. Two DataFrame to be joined 29.2. Applying the INNER Join 29.3. Applying the LEFT Join 29.4. Applying the RIGHT Join 29.5. Applying the OUTER Join 30. Multiple ways to set the Index of a Pandas DataFrame 30.1. Set Index while creating the DataFrame 30.2. Set Index from a column while replacing the previous column 30.3. Set Index from column while retaining the previous column 30.4. Set Index to multiple existing columns of the DataFrame 30.5. Set Index from a Python list() while creating a DataFrame 30.6. Set Index from a Python range() while creating a DataFrame 30.7. Set Index from a Python list() in an existing DataFrame 30.8. Set Index in a DataFrame while retaining the existing index 31. Select rows and columns from a Pandas DataFrame 31.1. Select rows using .loc[] method with “==” condition 31.2. Select rows using .loc[] method with “>=” condition 31.3. Select rows using .loc[] method with AND operator 31.4. Select rows using .loc[] method with OR operator 31.5. Select rows using .loc[] method with NOT equal condition 31.6. Select a slice of rows using iloc[] method 31.7. Select a slice of rows and columns using iloc[] method 32. NaN Values in DataFrame 32.1. Check for NaN Values in a single DataFrame column 32.2. Check for NaN Values by using the df.isnull() method 32.3. Check for NaN Values by using the df.isnull().sum() method 32.4. Check for NaN Values and count their occurrences 32.5. Report NaN Values in the DataFrame and count occurrences 32.6. Report NaN Values using the df.isnull().any() method 32.7. Report NaN Values using the df.isnull().sum() method 33. Replace NaN values with zeros in DataFrame 33.1. Replace NaN Values with zeros using fillna() 33.2. Replace NaN values using np.nan and pd.replace() 33.3. Replace NaN Values for all columns using pd.fillna() method 33.4. Replace NaN Values with zeros using pd.replace() 33.5. Drop rows containing NaN Values using pd.dropna() 33.6. Reset Index after drop rows using pd.reset_index() 34. Create Pandas DataFrame from NumPy array 34.1. Using first row from NumPy array as headers information 34.2. Using first column from NumPy array as indexes information 35. Add headers to Pandas DataFrame 35.1. Add headers to existing DataFrame using columns attribute 35.2. Add headers to existing DataFrame using set_axis() 35.3. Add multilevel column headers 36. Concatenate two columns in Pandas DataFrame 36.1. Using the pd.concat() method 36.2. Using the df.join() method 37. Rename columns in Pandas DataFrame 37.1. Rename a single column 37.2. Rename multiple columns simultaneously 37.3. Assign a list of new names to the columns attribute 37.4. Replace column header strings using the .str.replace() method 37.5. Rename the columns using the set_axis() method 38. Multiple ways to assign column names in DataFrames 38.1. Assign column names while creating a DataFrame 38.2. Apply lambda function to generate column values 38.3. Identify and exclude outliers in a column 38.4. Select columns using the startswith() method 38.4.1. Select columns using the .loc() method that start with “year” 38.4.2. Drop columns using the “~” operator that start with “year” 38.5. Filter rows using isin(search) names contained in search list 39. Join two text columns into a single combined column 39.1. Using the cat() method 39.2. Using the lambda() function 39.3. Using the + operator 40. Subtract two columns within a Pandas DataFrame 40.1. Using the slice operator [] to manipulate columns 40.2. Helper function diff() applied within lambda expression 40.3. Using the apply() method and lambda expression directly 40.4. Using the assign() method on the DataFrame 41. Plot a scatter matrix using Pandas 41.1. Loading dataset from online resource 41.2. Plot scatter diagram using the scatter() method 41.3. Plot two dimensional Histograms using the hist2d() method 41.4. Plot line chart with two individual scales 41.5. Plot scatter matrix from Pandas DataFrame 41.5.1. Scatter matrix with histograms as diagonal items 41.5.2. Scatter matrix with kernel density estimate as diagonal items 42. Linear Regression Modelling 42.1. Part I: Simple Linear Regression 42.1.1. Estimating model coefficients 42.1.2. Loading dataset 42.1.3. Feature variables and response variable 42.1.4. Visual inspection of the data using a scatter diagram 42.1.5. Visual inspection of the kernel density estimate 42.1.6. Study the impact of TV Ads on Sales 42.1.7. Study the impact of Radio Ads on Sales 42.2. Part II: Multiple Linear Regression using statsmodels 42.2.1. Summary table of key performance figures 42.2.2. Feature Selection 42.2.3. The coefficient of determination 42.2.4. Cross-validation of the model 42.2.5. Prediction with new data item 42.3. Part III: Multiple Linear Regression with scikit-learn 42.3.1. Evaluate the regression results 42.3.2. Prediction with new data items 42.3.3. The coefficient of determination 42.4. Part IV: Categorical predictors 42.4.1. Predictors with two categories 42.4.2. Predictors with more than two categories 42.4.3. Modeling with dummy variables 42.5. Limitation of linear regression 43. Linear Regression with Forecasts 43.1. Loading dataset from online resource 43.2. Split dataset into training and testing dataset 43.3. Simple Linear Regression 43.3.1. Visualize as scatter diagram with regression line 43.3.2. Make predictions using the testing dataset 43.3.3. Make predictions with out-of-sample data items 43.3.4. Two step ahead predictions with new data items 43.3.5. Visualize regression line with forecast values 44. The Regression Summary Report 44.1. Visualizing the dataset 44.2. Typical model summary report 44.3. Explain the summary report table-by-table 44.4. Summary report on table 0 44.4.1. Extracting individual values from table 0 44.4.2. Explain reported metric line-items on table 0 44.5. Summary report table 1 44.5.1. Extracting individual values from table 1 44.5.2. Explain reported metric line-items on table 1 44.5.3. Renaming coefficients 44.6. Summary report table 2 44.6.1. Extract specific values from table 2 44.6.2. Explain reported metric line-items on table 2 45. Data Fitting 45.1. Generate dataset and visually inspect 45.2. Part 1: Data Fitting using the NumPy library 45.2.1. Linear relationship 45.2.2. Quadratic relationship 45.2.3. 3rd order polynomial 45.2.4. 4th order polynomial 45.3. Part 2: Data Fitting using the Scikit-learn library 45.3.1. Linear relationship 45.3.2. Quadratic relationship 45.3.3. 3rd order polynomial 45.3.4. 4th order polynomial 46. Linear Regression using Ordinary Least Square 46.1. Part 1: Multiple linear regression model 46.1.1. Fitting the regression model 46.1.2. Extract specific values from the result object 46.1.3. Visualize the data in a scatter plot with confidence interval 46.1.4. Investigate the impact of added noise to the data 46.1.5. F-Test for joint hypothesis testing 46.2. Part 2: Ordinary Least Squares regression on non-linear curve 46.2.1. Fit the model and report the summary table 46.2.2. Visualize the data in a plot with confidence interval 46.3. Part 3: Modeling categorical features as dummy variables 46.3.1. Dataset without dummy variables 46.3.2. Visualize the data in a plot with confidence interval 46.3.3. Dataset with 3 dummy variables 46.3.4. F-Test to assess the model significance 46.3.5. F-Test for joint hypothesis testing 46.4. Part 4: Study the impact of multicollinearity 46.4.1. Condition number 46.4.2. Drop feature or outlier observations to remedy multicollinearity 46.4.3. Influencing factors 47. Multiple Linear Regression 47.1. Dataset: unemployment rate versus interest rate 47.2. Checking for Linearity 47.3. Part 1: Multiple Linear Regression using the sklearn 47.3.1. Make predictions with new data items 47.3.2. Extract specific values from the regression result object 47.3.3. Report performance metrics for the metrics object 47.4. Part 2: Multiple Linear Regression using statsmodels 47.4.1. Ordinary Least Square regression 47.4.2. Report specific values from the result object 47.4.3. The Durbin-Watson (DW) statistic 47.4.4. Renaming regressor variables 47.4.5. Extract specific values from the model object 47.4.6. Statistical evaluation of the model 47.4.7. Make predictions using the fitted model 48. Cook’s Distance 48.1. Fit the linear regression model 48.2. Test Cook’s distance for significance 48.3. Calculate the value 48.4. Visualize example dataset without outliers 48.5. Visualize example dataset with outliers 48.6. The summary_table report 48.6.1. Read summary_table and save as image file 48.6.2. Load the summary_table report into a Pandas DataFrame 48.6.3. Extract specific line-items from the OLSInfluence object 48.7. The summary_frame report 48.7.1. Load the summary_frame report into a Pandas DataFrame 48.7.2. Extract specific line-items from the OlSInfluence object 48.8. Scrutinize outlier and highly influential observations 49. Datasets used 50. Bibliography Index Symbols A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
دانلود کتاب Quant Developers' Tools and Techniques