Dinarvand, Pouyan (2020) Detecting Financial Fraud and Crimes in Capital Markets: a Study of Data-driven and Computational Approaches. Doctoral thesis (PhD), Manchester Metropolitan University.
|
Available under License Creative Commons Attribution Non-commercial No Derivatives. Download (3MB) | Preview |
Abstract
Automatic surveillance of abnormal trading behaviours/patterns (ATPs) in capital markets is essential to protect the capital of legitimate traders from price distortion of finance assets. Detection of ATPs involves the finding of single (one trading order with large trading volume and long cancellation time, e.g. several minutes) or sequential (correlated multiple trading orders with small volume and short cancellation time, e.g. milliseconds) anomalies in trading data. However, accurate and timely identification of ATPs remains an open challenge due to high volume and high frequency data as well as unlabelled data. In this research, we have investigated anomaly detection approaches to address the challenges and filled the knowledge gap through the following four contributions: Firstly, we have performed a literature review and conducted a thorough benchmark evaluation on existing state-of-the-art anomaly detection algorithms (i.e. Artificial Neural Network- Auto Encoder, Isolation Forest, Local Outlier Factor (LOF), Histogram-based Outlier Score (HBOS), Angle-based Outlier Detection (ABOD), Principle Component Analysis (PCA) and K-Nearest Neighbors (KNN) ) using publicly available datasets from different domains such as health and finance. The experimental results show that Isolation Forest, HBOS and PCA are robust algorithms in terms of both high detection performance (Area Under the ROC Curve (AUC) = 0.95) and low computational time for large dataset. Secondly, as one of the major contributions of this research, we have proposed a novel generic unsupervised anomaly detection model, which can be applied to anomaly detection of both financial and non-financial datasets. The essence of the proposed model consists in partitioning a bounded D-dimensional space (e.g. the unit hyper-cube ID) by a sequence of random shapes, in which each data will be trapped either inside or outside, followed by probabilistic modelling of a pattern of falling inside or outside for a data point. Anomalous data which are rare and iv different from the rest of the dataset will be assigned a higher anomaly score. Thirdly, to investigate the robustness of the proposed anomaly detection model, we have performed a thorough sensitivity analysis under different hyper-parameters settings (i.e. the number of random shapes, shape of random shapes, etc.) and different publicly available datasets. The results show that the model performance stabilises as the number of random shapes increases. Furthermore, the shape of random shapes could affect the performance of the algorithm which needs to be optimised for a given dataset. Also, the results indicate that the algorithm’s computational time increases linearly with the number of random shapes which shows the robustness of the algorithm for detecting anomalies in a timely manner. Finally, we have applied the proposed algorithm on real Bitcoin prices as a case study and tested, evaluated and compared its performance with the benchmark algorithms such as Auto Encoder, Isolation Forest, LOF, HBOS, ABOD, PCA and KNN. The results show that the proposed algorithm achieves AUC = 0.94. Comparing to the benchmark algorithms, it also outperforms the existing algorithms by 8.5 percent increase while having low computational time.
Impact and Reach
Statistics
Additional statistics for this dataset are available via IRStats2.