πΎ Archived View for bacaliu.de βΊ analyzing_gadgetbridge_data_in_python.txt captured on 2023-07-10 at 13:48:38.
-=-=-=-=-=-=-
ββββββββββββββββββββββββββββββββββββββββββββββββββ ANALYZING GADGETBRIDGE-DATA WITH PYTHON Amazfit Neo β Gadgetbridge β Sqlite β Python-Pandas ββββββββββββββββββββββββββββββββββββββββββββββββββ 2023-03-15 1 Some helpful imports ββββββββββββββββββββββ βββββ β import hsluv β import matplotlib.pyplot as plt β plt.rcParams["figure.figsize"] = (8, 4) β import seaborn as sns β from datetime import datetime βββββ Listing 1: some helpful imports 2 Getting the Data ββββββββββββββββββ The FOSS-Application Gadgedbridge (βGadgetbridge for androidβ 2022) supports exporting the collected Data into an sqlite-file. Loading them into Python is not that difficult. In my case the file is automaticly mirrored into my `~/Sync'-folder through Syncthing (βSyncthingβ 2019) and named `ggb.sqlite'. βββββ β import pandas as pd β import sqlite3 β conn = sqlite3.connect("/home/adrian/Sync/ggb.sqlite") β df = pd.read_sql_query( β """SELECT TIMESTAMP, RAW_INTENSITY, STEPS, RAW_KIND, HEART_RATE β FROM MI_BAND_ACTIVITY_SAMPLE;""", β conn β ) β β df.describe().to_markdown(tablefmt="orgtbl") βββββ ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ TIMESTAMP RAW_INTENSITY STEPS RAW_KIND HEART_RATE ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ count 331004 331004 331004 331004 331004 mean 1.66892e+09 24.3221 4.33082 125.133 76.3764 std 5.7575e+06 28.2967 15.5202 84.1054 33.7458 min 1.65897e+09 -1 0 1 -1 25% 1.66394e+09 0 0 80 60 50% 1.6689e+09 17 0 90 71 75% 1.67387e+09 38 0 240 81 max 1.67895e+09 198 144 251 255 ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ I did not include the useless columns `DEVICE_ID' and `USER_ID'. They always have the same value if you use only one device as one user; I don't load them to make the tables smaller; otherwise a `SELECT * FROM MI_BAND_ACTIVITY_DATA' would be sufficient. 3 Preperation βββββββββββββ 3.1 Datetime ββββββββββββ But what is this strange `TIMESTAMP'-column? Oh, maybe just an unix-timestamp. Throw it into `pd.to_datetime': βββββ β pd.to_datetime(df.TIMESTAMP) \ β .describe(datetime_is_numeric=True) \ β .to_markdown(tablefmt="orgtbl") βββββ ββββββββββββββββββββββββββββββββββββββ TIMESTAMP ββββββββββββββββββββββββββββββββββββββ count 331004 mean 1970-01-01 00:00:01.668917075 min 1970-01-01 00:00:01.658969040 25% 1970-01-01 00:00:01.663935105 50% 1970-01-01 00:00:01.668900930 75% 1970-01-01 00:00:01.673866575 max 1970-01-01 00:00:01.678946580 ββββββββββββββββββββββββββββββββββββββ Hmmmβ¦ This don't look right. I ran into this type of problem last year when analyzing the Deutsche Bahn (results, not the progress: [Momentane PΓΌnktlichkeit der Deutschen Bahn]). To safe memory and network capacity they divided the unix-timestamps by a factor of `1e6' or `1e9'. βββββ β pd.to_datetime(df.TIMESTAMP * 1e9) \ β .describe(datetime_is_numeric=True) \ β .to_markdown(tablefmt="orgtbl") βββββ ββββββββββββββββββββββββββββββββββββββ TIMESTAMP ββββββββββββββββββββββββββββββββββββββ count 331004 mean 2022-11-20 04:04:35.349853696 min 2022-07-28 00:44:00 25% 2022-09-23 12:11:45 50% 2022-11-19 23:35:30 75% 2023-01-16 10:56:15 max 2023-03-16 06:03:00 ββββββββββββββββββββββββββββββββββββββ Yes! This matches the span in which I used Gadgedbridge with my watch. Let's make some useful columns out of this. By using the `.dt'-accessor Object (βpandas.Series.dt β pandas 1.5.3 documentationβ 2023) attributes like `date', `hour', etc. can be used easily: βββββ β df["utc"] = pd.to_datetime(df.TIMESTAMP * 1e9) β df["date"] = df.utc.dt.date β df["weekday"] = df.utc.dt.day_name() β df["hour"] = df.utc.dt.hour β df["hourF"] = df.utc.dt.hour + df.utc.dt.minute/60 βββββ `date' and `weekday' can be used for grouping data; `hour' and espeically `hourF' (meaning the hour as floating point number) for x/y diagrams. [Momentane PΓΌnktlichkeit der Deutschen Bahn] See file momentane_puenktlichkeit_der_deutschen_bahn_in_nrw.org 3.2 Heart Rate ββββββββββββββ βββββ β plt.hist( β df.HEART_RATE, β color=hsluv.hsluv_to_hex((0, 75, 25)), β bins=256 β ) β plt.title("Histogram: Heart rate") β plt.yscale("log") β plt.savefig(file) β plt.close() β file βββββ <file:./images/20230315-01.png> Looking at the Histogram of the Heart Rate it's obvious that the Values of `255' and below `0' are errors or failed measures. Therefore I set them to `None'. βββββ β df["heartRate"] = df.HEART_RATE β df.loc[df.heartRate<=0, "heartRate"] = None β df.loc[df.heartRate>=255, "heartRate"] = None βββββ To avoid strange problems when executing the org-babel-blocks in the wrong order, I follow the best-practise of copying and *not overwriting* the original data. βββββ β df[ β ["HEART_RATE", "heartRate"] β ].describe().to_markdown(tablefmt="orgtbl") βββββ ββββββββββββββββββββββββββββββ HEART_RATE heartRate ββββββββββββββββββββββββββββββ count 331004 321445 mean 76.3764 71.0781 std 33.7458 14.0401 min -1 39 25% 60 59 50% 71 71 75% 81 81 max 255 178 ββββββββββββββββββββββββββββββ This is much better! 4 Plotting some Data ββββββββββββββββββββ Now I want to see how the data looks. Today is a good day, because β I slept (not that surprise) β I worked at the Computer (doing /this/) β I rode 47km with the bike βββββ β fig, ax = plt.subplots(figsize=(8, 4)) β span = df[ β ( β df.utc > datetime(2023, 3, 15, 3) β ) & ( # & is the bitwise AND β df.utc < datetime(2023, 3, 15, 15) β ) β ] β ax.plot( β span.utc, span.RAW_INTENSITY, β label="Intensity", β color=hsluv.hsluv_to_hex((240, 80, 20)), β linewidth=0.75 β ) β ax.plot( β span.utc, span.RAW_KIND, β label="Kind", β color=hsluv.hsluv_to_hex((120, 80, 40)), β linewidth=0.5 β ) β bx = ax.twinx() β bx.plot( β span.utc, span.heartRate, β label="Heart Rate", β color=hsluv.hsluv_to_hex((0, 80, 60)), β linewidth=0.25 β ) β ax.set_ylim([0, 256]) β ax.set_yticks(list(range(0, 256, 32))) β bx.set_ylim([0, 160]) β ax.set_xlim([span.utc.min(), span.utc.max()]) β fig.legend() β ax.grid() β fig.autofmt_xdate() # tilting the x-labels β fig.tight_layout() # less space around the plot β fig.savefig(file) β plt.close(fig) β file βββββ <file:./images/20230315-02.png> You can't clearly see what's going on, because the wiggeli wobbelyness of the lines. Try using a rolling mean: βββββ β fig, ax = plt.subplots(figsize=(8, 4)) β span = df[ β ( β df.utc > datetime(2023, 3, 15, 3) β ) & ( β df.utc < datetime(2023, 3, 15, 15) β ) β ] β ax.plot( β span.utc, β span.RAW_INTENSITY.rolling(5, min_periods=1).mean(), β label="Intensity", β color=hsluv.hsluv_to_hex((240, 80, 20)), β linewidth=0.75 β ) β ax.plot( β span.utc, β span.RAW_KIND.rolling(5, min_periods=1).median(), # ! β label="Kind", β color=hsluv.hsluv_to_hex((120, 80, 40)), β linewidth=0.5 β ) β bx = ax.twinx() β bx.plot( β span.utc, β span.heartRate.rolling(5, min_periods=1).mean(), β label="Heart Rate", β color=hsluv.hsluv_to_hex((0, 80, 60)), β linewidth=0.25 β ) β ax.set_ylim([0, 256]) β ax.set_yticks(list(range(0, 256, 32))) β bx.set_ylim([0, 160]) β ax.set_xlim([span.utc.min(), span.utc.max()]) β fig.legend() β ax.grid() β fig.autofmt_xdate() β fig.tight_layout() β fig.savefig(file) β plt.close(fig) β file βββββ <file:./images/20230315-02-rolling.png> /Now/ you can clearly see β Low activity and pulse while sleeping until 06:30 UTC β Normal activity while working until 11:00 UTC β High activity and pulse from 11:00-13:30 UTC For `RAW_KIND' I used the rolling *median*, because this looks more discrete than continuous. There might be some strange encoding happening: Sleep is very high, the spikes towards arround 100 are short occurrences of me waking up and turning around; while working the value is arround 80 and during sport it drops to below 20. 4.1 x/y - combining features! βββββββββββββββββββββββββββββ Now combine features. And to add a color-dimension let's assume `RAW_KIND' above 192 means sleep; below 32 activity. βββββ β df["assumption"] = [ β "sleep" if r>192 else "normal" if r>32 else "activity" β for r in df.RAW_KIND β ] β fig, ax = plt.subplots(figsize=(6, 6)) β sns.scatterplot( β ax=ax, β data=df.sample(2048), # use not /all/ but only 2048 data-points β x="heartRate", β y="RAW_INTENSITY", β hue="assumption", β palette={ β "sleep": hsluv.hsluv_to_hex((240, 60, 60)), β "normal": hsluv.hsluv_to_hex((120, 80, 40)), β "activity": hsluv.hsluv_to_hex((0, 100, 20)), β } β ) β ax.set_xlim([30, 130]) β ax.set_ylim([0, None]) β fig.savefig(file) β plt.close(fig) β file βββββ <file:./images/20230315-03.png> It seems intuitive that intensity and heart rate are lower while sleeping. But do you see some strangeness? There are Lines of frequent heart rates when awake but not while sleep. I assume my watch has a high precission, but a medium accuracy. Randall Munroe made a useful table to keep in mind the difference between them: <https://imgs.xkcd.com/comics/precision_vs_accuracy.png> Maybe it's like the following: While sleeping I don't move that much (like the position on the y-axis implies) so the precision is as high as possible. But when moving around the watch measures just the moments it can and estimates the pulse with a lower precision. Bibliography ββββββββββββ βGadgetbridge for android,β. 2022. September 10, 2022, URL: <https://www.gadgetbridge.org>. Munroe, R. 2022. βPrecision vs Accuracy,β /Xkcd/ November 9, 2022, URL: <https://xkcd.com/2696>. βpandas.Series.dt β pandas 1.5.3 documentation,β. 2023. January 19, 2023, URL: <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.html>. βSyncthing,β. 2019. September 5, 2019, URL: <https://syncthing.net>. Nav βββ β Tags: [Python] - [Data] β Formats: [md] - [txt] - [html] - [gmi] [Python] <./tags/Python.org> [Data] <./tags/Data.org> [md] <./analyzing_gadgetbridge_data_in_python.md> [txt] <./analyzing_gadgetbridge_data_in_python.txt> [html] <./analyzing_gadgetbridge_data_in_python.html> [gmi] <./analyzing_gadgetbridge_data_in_python.gmi> Footer ββββββ License: CC BY-4.0 [Impressum und Datenschutz] [Impressum und Datenschutz] <./impressum-datenschutz.gmi>