Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
237 views
in Technique[技术] by (71.8m points)

Denoise GPS data iteravly from large CSV Table using Python

I have a dataset of around 430K GPS Locations in a CSV (They are from my Google Location Data if that helps), that i would like to denoise before further proceding with my project. There are some, i guess, easier ones to denoise, like in the image below, where the latitude varies by 3 degrees in a couple of seconds, but also a couple harder ones due to GPS Noise when moving slowly or not at all. There are also some that were from flying in a plane with points far away from each other (approx. 500km inbetween), due to few measurements and high speed or Car rides with no Cellular to send the Location to Google servers.

enter image description here

I would like to clear as much of the noise as possible up. I searched for a soulution and found "Kalman Filtering". I have very much only seen some stuff on smaller datasets that i guess would not work on such big one. Also i just can't wrap my head around Kalman Filtering so it is hard for me to Design one by myself. In addition because it's such a huge dataset i guess it would have to be done iterativly? Eg. grapping a few Values and then apply the filter, than the next few and so on. Also i guess one could use the Time Column to predict more precise?

EDIT: Thanks to this question and answer, I've found this code which i modified to read from my csv:

from pykalman import KalmanFilter
import numpy as np
import matplotlib.pyplot as plt
import time

measurements = np.genfromtxt('locations.csv', delimiter=',', names=True)

finalArray = []
for i in measurements:
    measureArray = np.asarray([i["Latitude"], i["Longitude"]])
    finalArray.append(measureArray)

measurements = np.asarray(finalArray)



initial_state_mean = [measurements[0, 0],
                      0,
                      measurements[0, 1],
                      0]

transition_matrix = [[1, 1, 0, 0],
                     [0, 1, 0, 0],
                     [0, 0, 1, 1],
                     [0, 0, 0, 1]]

observation_matrix = [[1, 0, 0, 0],
                      [0, 0, 1, 0]]



kf1 = KalmanFilter(transition_matrices = transition_matrix,
                  observation_matrices = observation_matrix,
                  initial_state_mean = initial_state_mean)

kf1 = kf1.em(measurements, n_iter=5)
(smoothed_state_means, smoothed_state_covariances) = kf1.smooth(measurements)


kf2 = KalmanFilter(transition_matrices = transition_matrix,
                  observation_matrices = observation_matrix,
                  initial_state_mean = initial_state_mean,
                  observation_covariance = 10*kf1.observation_covariance,
                  em_vars=['transition_covariance', 'initial_state_covariance'])

kf2 = kf2.em(measurements, n_iter=5)
(smoothed_state_means, smoothed_state_covariances)  = kf2.smooth(measurements)

plt.figure(2)
times = range(measurements.shape[0])
plt.plot(times, measurements[:, 0], 'bo',
         times, measurements[:, 1], 'ro',
         times, smoothed_state_means[:, 0], 'b--',
         times, smoothed_state_means[:, 2], 'r--',)
plt.show()

This works pretty well for a smaller Dataset, but would take very long with a much larger one. Any Idea how to do this iteravly?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

1.4m articles

1.4m replys

5 comments

57.0k users

...