💾 Archived View for bacaliu.de › analyzing_gadgetbridge_data_in_python.gmi captured on 2023-07-22 at 16:45:25. Gemini links have been rewritten to link to archived content
⬅️ Previous capture (2023-07-10)
➡️ Next capture (2023-09-08)
-=-=-=-=-=-=-
import hsluv import matplotlib.pyplot as plt plt.rcParams["figure.figsize"] = (8, 4) import seaborn as sns from datetime import datetime
The FOSS-Application Gadgedbridge (“Gadgetbridge for android” 2022) supports exporting the collected Data into an sqlite-file. Loading them into Python is not that difficult. In my case the file is automaticly mirrored into my `~/Sync'-folder through Syncthing (“Syncthing” 2019) and named `ggb.sqlite `.
import pandas as pd import sqlite3 conn = sqlite3.connect("/home/adrian/Sync/ggb.sqlite") df = pd.read_sql_query( """SELECT TIMESTAMP, RAW_INTENSITY, STEPS, RAW_KIND, HEART_RATE FROM MI_BAND_ACTIVITY_SAMPLE;""", conn ) df.describe().to_markdown(tablefmt="orgtbl")
TIMESTAMP RAW_INTENSITY STEPS RAW_KIND HEART_RATE ------------------------------------------------------------------ count 331004 331004 331004 331004 331004 mean 1.66892e+09 24.3221 4.33082 125.133 76.3764 std 5.7575e+06 28.2967 15.5202 84.1054 33.7458 min 1.65897e+09 -1 0 1 -1 25% 1.66394e+09 0 0 80 60 50% 1.6689e+09 17 0 90 71 75% 1.67387e+09 38 0 240 81 max 1.67895e+09 198 144 251 255
I did not include the useless columns `DEVICE_ID ` and `USER_ID `. They always have the same value if you use only one device as one user; I don't load them to make the tables smaller; otherwise a `SELECT * FROM MI_BAND_ACTIVITY_DATA' would be sufficient.
But what is this strange `TIMESTAMP `-column? Oh, maybe just an unix-timestamp. Throw it into `pd.to_datetime':
pd.to_datetime(df.TIMESTAMP) \ .describe(datetime_is_numeric=True) \ .to_markdown(tablefmt="orgtbl")
TIMESTAMP -------------------------------------- count 331004 mean 1970-01-01 00:00:01.668917075 min 1970-01-01 00:00:01.658969040 25% 1970-01-01 00:00:01.663935105 50% 1970-01-01 00:00:01.668900930 75% 1970-01-01 00:00:01.673866575 max 1970-01-01 00:00:01.678946580
Hmmm... This don't look right. I ran into this type of problem last year when analyzing the Deutsche Bahn (results, not the progress: [Momentane PĂĽnktlichkeit der Deutschen Bahn]). To safe memory and network capacity they divided the unix-timestamps by a factor of `1e6 ` or `1e9 `.
pd.to_datetime(df.TIMESTAMP * 1e9) \ .describe(datetime_is_numeric=True) \ .to_markdown(tablefmt="orgtbl")
TIMESTAMP -------------------------------------- count 331004 mean 2022-11-20 04:04:35.349853696 min 2022-07-28 00:44:00 25% 2022-09-23 12:11:45 50% 2022-11-19 23:35:30 75% 2023-01-16 10:56:15 max 2023-03-16 06:03:00
Yes! This matches the span in which I used Gadgedbridge with my watch. Let's make some useful columns out of this. By using the `.dt `-accessor Object (“pandas.Series.dt — pandas 1.5.3 documentation” 2023) attributes like `date `, `hour `, etc. can be used easily:
df["utc"] = pd.to_datetime(df.TIMESTAMP * 1e9) df["date"] = df.utc.dt.date df["weekday"] = df.utc.dt.day_name() df["hour"] = df.utc.dt.hour df["hourF"] = df.utc.dt.hour + df.utc.dt.minute/60
`date ` and `weekday ` can be used for grouping data; `hour ` and espeically `hourF ` (meaning the hour as floating point number) for x/y diagrams.
Momentane PĂĽnktlichkeit der Deutschen Bahn
plt.hist( df.HEART_RATE, color=hsluv.hsluv_to_hex((0, 75, 25)), bins=256 ) plt.title("Histogram: Heart rate") plt.yscale("log") plt.savefig(file) plt.close() file
Looking at the Histogram of the Heart Rate it's obvious that the Values of `255 ` and below `0 ` are errors or failed measures. Therefore I set them to `None'.
df["heartRate"] = df.HEART_RATE df.loc[df.heartRate<=0, "heartRate"] = None df.loc[df.heartRate>=255, "heartRate"] = None
To avoid strange problems when executing the org-babel-blocks in the wrong order, I follow the best-practise of copying and *not overwriting* the original data.
df[ ["HEART_RATE", "heartRate"] ].describe().to_markdown(tablefmt="orgtbl")
HEART_RATE heartRate ------------------------------ count 331004 321445 mean 76.3764 71.0781 std 33.7458 14.0401 min -1 39 25% 60 59 50% 71 71 75% 81 81 max 255 178
This is much better!
Now I want to see how the data looks. Today is a good day, because
fig, ax = plt.subplots(figsize=(8, 4)) span = df[ ( df.utc > datetime(2023, 3, 15, 3) ) & ( # & is the bitwise AND df.utc < datetime(2023, 3, 15, 15) ) ] ax.plot( span.utc, span.RAW_INTENSITY, label="Intensity", color=hsluv.hsluv_to_hex((240, 80, 20)), linewidth=0.75 ) ax.plot( span.utc, span.RAW_KIND, label="Kind", color=hsluv.hsluv_to_hex((120, 80, 40)), linewidth=0.5 ) bx = ax.twinx() bx.plot( span.utc, span.heartRate, label="Heart Rate", color=hsluv.hsluv_to_hex((0, 80, 60)), linewidth=0.25 ) ax.set_ylim([0, 256]) ax.set_yticks(list(range(0, 256, 32))) bx.set_ylim([0, 160]) ax.set_xlim([span.utc.min(), span.utc.max()]) fig.legend() ax.grid() fig.autofmt_xdate() # tilting the x-labels fig.tight_layout() # less space around the plot fig.savefig(file) plt.close(fig) file
You can't clearly see what's going on, because the wiggeli wobbelyness of the lines. Try using a rolling mean:
fig, ax = plt.subplots(figsize=(8, 4)) span = df[ ( df.utc > datetime(2023, 3, 15, 3) ) & ( df.utc < datetime(2023, 3, 15, 15) ) ] ax.plot( span.utc, span.RAW_INTENSITY.rolling(5, min_periods=1).mean(), label="Intensity", color=hsluv.hsluv_to_hex((240, 80, 20)), linewidth=0.75 ) ax.plot( span.utc, span.RAW_KIND.rolling(5, min_periods=1).median(), # ! label="Kind", color=hsluv.hsluv_to_hex((120, 80, 40)), linewidth=0.5 ) bx = ax.twinx() bx.plot( span.utc, span.heartRate.rolling(5, min_periods=1).mean(), label="Heart Rate", color=hsluv.hsluv_to_hex((0, 80, 60)), linewidth=0.25 ) ax.set_ylim([0, 256]) ax.set_yticks(list(range(0, 256, 32))) bx.set_ylim([0, 160]) ax.set_xlim([span.utc.min(), span.utc.max()]) fig.legend() ax.grid() fig.autofmt_xdate() fig.tight_layout() fig.savefig(file) plt.close(fig) file
/Now/ you can clearly see
For `RAW_KIND ` I used the rolling *median*, because this looks more discrete than continuous. There might be some strange encoding happening: Sleep is very high, the spikes towards arround 100 are short occurrences of me waking up and turning around; while working the value is arround 80 and during sport it drops to below 20.
Now combine features. And to add a color-dimension let's assume `RAW_KIND ` above 192 means sleep; below 32 activity.
df["assumption"] = [ "sleep" if r>192 else "normal" if r>32 else "activity" for r in df.RAW_KIND ] fig, ax = plt.subplots(figsize=(6, 6)) sns.scatterplot( ax=ax, data=df.sample(2048), # use not /all/ but only 2048 data-points x="heartRate", y="RAW_INTENSITY", hue="assumption", palette={ "sleep": hsluv.hsluv_to_hex((240, 60, 60)), "normal": hsluv.hsluv_to_hex((120, 80, 40)), "activity": hsluv.hsluv_to_hex((0, 100, 20)), } ) ax.set_xlim([30, 130]) ax.set_ylim([0, None]) fig.savefig(file) plt.close(fig) file
It seems intuitive that intensity and heart rate are lower while sleeping. But do you see some strangeness? There are Lines of frequent heart rates when awake but not while sleep.
I assume my watch has a high precission, but a medium accuracy. Randall Munroe made a useful table to keep in mind the difference between them:
Maybe it's like the following: While sleeping I don't move that much (like the position on the y-axis implies) so the precision is as high as possible. But when moving around the watch measures just the moments it can and estimates the pulse with a lower precision.
“Gadgetbridge for android,”. 2022. September 10, 2022, URL: .
Munroe, R. 2022. “Precision vs Accuracy,” /Xkcd/ November 9, 2022, URL: .
“pandas.Series.dt — pandas 1.5.3 documentation,”. 2023. January 19, 2023, URL: .
“Syncthing,”. 2019. September 5, 2019, URL: .
License: CC BY-4.0 [Impressum und Datenschutz]