Time Complexity for Machine Learning Algorithms

Raisul Hazari
2 min readSep 20, 2020

Time Complexity is an important factor for any kind of algorithm. It gives us an intuitive idea how much time it would take to complete the execution. Based on the Time complexity we could choose the algorithm more efficiently for low latency requirement.

In this article we will discuss about the time complexity for both Supervised an unsupervised algorithms.However, we will not discuss about the mathematical formulation here or how the complexity derived .

Supervised Algorithm

K-Nearest Neighbor
Technically the model doesn’t learn anything in training phase.For e.g. the logistic regression algorithm learns its model weights (parameters) during training time. In contrast, there is no training time in K-NN.
So Train/Test time complexity should be : O(n*d), considering ‘k’ is small
for KD-Tree it should be : O((logn)*d)

  1. Naive Bayes:
    Train: O(n*d)
    Test: O(c*d), where ‘c’ is the no of class labels. Naive Bayes typically quiet fast.
  2. Logistic Regression:
    Train: O(n*d)
    Test: O(d), Logistic regression is extremely useful for low latency requirement.
  3. Support Vector Machine (Kernel SVM):
    Train: O(n²)
    Test: O(k*d) , where ‘k’ is the number of support vectors. SVM is typically slow
  4. Linear Regression (with Gradient Descent):
    Train: O(n²*d)
    Test: O(d)
  5. Decision Tree:
    Train: (O(n*logn)*d)
    Test: O(d), here ‘d’ is the depth of the tree
  6. Random Forest:
    Since it’s an Ensemble model, the time complexity of Random Forest is (number of Trees* (Time complexity of Decision Tree))
  7. Stochastic Gradient Descent:
    To be honest , we can’t confirm about the time complexity of SGD as it is the optimization problem which tries to minimize the error in each iteration by updating the weight. So we can’t conclude when it will reach the minima for the optimization functions to minimize the error also the error function could be non convex. So number of datapoints(n), features(d) are irrelevant here . Although in terms of the parameters time complexity should be linear.
  8. Gradient Boosting:
    Train: O(m*d*(logn)), m=no. of trees, d=depth of the tree
    Test: O(m*d), m=no. of trees , d=depth of the tree

Unsupervised Algorithm

  1. K-means clustering:
    O(n*k*d*i), k: no. of clusters, i: no of iteration
    generally k, i are small in comparison with n,k. So we can rewrite as O(n*d), typically fast
  2. Agglomerative Hierarchical Clustering:
    O(n³) , extremely slow and cannot be optimized further
  3. DBSCAN:
    O(n²),although using efficient data structures it can be reduced to O(n*log(n)) in lower dimensional.

--

--