I have a dataset of around 430K GPS Locations in a CSV (They are from my Google Location Data if that helps), that i would like to denoise before further proceding with my project. There are some, i guess, easier ones to denoise, like in the image below, where the latitude varies by 3 degrees in a couple of seconds, but also a couple harder ones due to GPS Noise when moving slowly or not at all. There are also some that were from flying in a plane with points far away from each other (approx. 500km inbetween), due to few measurements and high speed or Car rides with no Cellular to send the Location to Google servers.
I would like to clear as much of the noise as possible up. I searched for a soulution and found "Kalman Filtering". I have very much only seen some stuff on smaller datasets that i guess would not work on such big one. Also i just can't wrap my head around Kalman Filtering so it is hard for me to Design one by myself. In addition because it's such a huge dataset i guess it would have to be done iterativly? Eg. grapping a few Values and then apply the filter, than the next few and so on. Also i guess one could use the Time Column to predict more precise?
EDIT:
Thanks to this question and answer, I've found this code which i modified to read from my csv:
from pykalman import KalmanFilter
import numpy as np
import matplotlib.pyplot as plt
import time
measurements = np.genfromtxt('locations.csv', delimiter=',', names=True)
finalArray = []
for i in measurements:
measureArray = np.asarray([i["Latitude"], i["Longitude"]])
finalArray.append(measureArray)
measurements = np.asarray(finalArray)
initial_state_mean = [measurements[0, 0],
0,
measurements[0, 1],
0]
transition_matrix = [[1, 1, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 1],
[0, 0, 0, 1]]
observation_matrix = [[1, 0, 0, 0],
[0, 0, 1, 0]]
kf1 = KalmanFilter(transition_matrices = transition_matrix,
observation_matrices = observation_matrix,
initial_state_mean = initial_state_mean)
kf1 = kf1.em(measurements, n_iter=5)
(smoothed_state_means, smoothed_state_covariances) = kf1.smooth(measurements)
kf2 = KalmanFilter(transition_matrices = transition_matrix,
observation_matrices = observation_matrix,
initial_state_mean = initial_state_mean,
observation_covariance = 10*kf1.observation_covariance,
em_vars=['transition_covariance', 'initial_state_covariance'])
kf2 = kf2.em(measurements, n_iter=5)
(smoothed_state_means, smoothed_state_covariances) = kf2.smooth(measurements)
plt.figure(2)
times = range(measurements.shape[0])
plt.plot(times, measurements[:, 0], 'bo',
times, measurements[:, 1], 'ro',
times, smoothed_state_means[:, 0], 'b--',
times, smoothed_state_means[:, 2], 'r--',)
plt.show()
This works pretty well for a smaller Dataset, but would take very long with a much larger one.
Any Idea how to do this iteravly?