💾 Archived View for bacaliu.de › analyzing_gadgetbridge_data_in_python.md captured on 2023-07-10 at 13:48:37.
-=-=-=-=-=-=-
# Some helpful imports import hsluv import matplotlib.pyplot as plt plt.rcParams["figure.figsize"] = (8, 4) import seaborn as sns from datetime import datetime # Getting the Data The FOSS-Application Gadgedbridge (<a href="#citeproc_bib_item_1">“Gadgetbridge for android” 2022</a>) supports exporting the collected Data into an sqlite-file. Loading them into Python is not that difficult. In my case the file is automaticly mirrored into my `~/Sync`-folder through Syncthing (<a href="#citeproc_bib_item_4">“Syncthing” 2019</a>) and named `ggb.sqlite`. import pandas as pd import sqlite3 conn = sqlite3.connect("/home/adrian/Sync/ggb.sqlite") df = pd.read_sql_query( """SELECT TIMESTAMP, RAW_INTENSITY, STEPS, RAW_KIND, HEART_RATE FROM MI_BAND_ACTIVITY_SAMPLE;""", conn ) df.describe().to_markdown(tablefmt="orgtbl") <div id='getting-data,table--1' class='table'> <div class='gutter'><a href='#getting-data,table--1'>#</a></div> <div class='tabular'> <table> <colgroup> <col class="org-left"> <col class="org-right"> <col class="org-right"> <col class="org-right"> <col class="org-right"> <col class="org-right"> </colgroup> <thead> <tr> <th scope="col" class="org-left"> </th> <th scope="col" class="org-right">TIMESTAMP</th> <th scope="col" class="org-right">RAW_INTENSITY</th> <th scope="col" class="org-right">STEPS</th> <th scope="col" class="org-right">RAW_KIND</th> <th scope="col" class="org-right">HEART_RATE</th> </tr> </thead> <tbody> <tr> <td class="org-left">count</td> <td class="org-right">331004</td> <td class="org-right">331004</td> <td class="org-right">331004</td> <td class="org-right">331004</td> <td class="org-right">331004</td> </tr> <tr> <td class="org-left">mean</td> <td class="org-right">1.66892e+09</td> <td class="org-right">24.3221</td> <td class="org-right">4.33082</td> <td class="org-right">125.133</td> <td class="org-right">76.3764</td> </tr> <tr> <td class="org-left">std</td> <td class="org-right">5.7575e+06</td> <td class="org-right">28.2967</td> <td class="org-right">15.5202</td> <td class="org-right">84.1054</td> <td class="org-right">33.7458</td> </tr> <tr> <td class="org-left">min</td> <td class="org-right">1.65897e+09</td> <td class="org-right">-1</td> <td class="org-right">0</td> <td class="org-right">1</td> <td class="org-right">-1</td> </tr> <tr> <td class="org-left">25%</td> <td class="org-right">1.66394e+09</td> <td class="org-right">0</td> <td class="org-right">0</td> <td class="org-right">80</td> <td class="org-right">60</td> </tr> <tr> <td class="org-left">50%</td> <td class="org-right">1.6689e+09</td> <td class="org-right">17</td> <td class="org-right">0</td> <td class="org-right">90</td> <td class="org-right">71</td> </tr> <tr> <td class="org-left">75%</td> <td class="org-right">1.67387e+09</td> <td class="org-right">38</td> <td class="org-right">0</td> <td class="org-right">240</td> <td class="org-right">81</td> </tr> <tr> <td class="org-left">max</td> <td class="org-right">1.67895e+09</td> <td class="org-right">198</td> <td class="org-right">144</td> <td class="org-right">251</td> <td class="org-right">255</td> </tr> </tbody> </table> </div></div> I did not include the useless columns `DEVICE_ID` and `USER_ID`. They always have the same value if you use only one device as one user; I don't load them to make the tables smaller; otherwise a `SELECT * FROM MI_BAND_ACTIVITY_DATA` would be sufficient. # Preperation ## Datetime But what is this strange `TIMESTAMP`-column? Oh, maybe just an unix-timestamp. Throw it into `pd.to_datetime`: pd.to_datetime(df.TIMESTAMP) \ .describe(datetime_is_numeric=True) \ .to_markdown(tablefmt="orgtbl") <div id='datetime,table--1' class='table'> <div class='gutter'><a href='#datetime,table--1'>#</a></div> <div class='tabular'> <table> <colgroup> <col class="org-left"> <col class="org-left"> </colgroup> <thead> <tr> <th scope="col" class="org-left"> </th> <th scope="col" class="org-left">TIMESTAMP</th> </tr> </thead> <tbody> <tr> <td class="org-left">count</td> <td class="org-left">331004</td> </tr> <tr> <td class="org-left">mean</td> <td class="org-left">1970-01-01 00:00:01.668917075</td> </tr> <tr> <td class="org-left">min</td> <td class="org-left">1970-01-01 00:00:01.658969040</td> </tr> <tr> <td class="org-left">25%</td> <td class="org-left">1970-01-01 00:00:01.663935105</td> </tr> <tr> <td class="org-left">50%</td> <td class="org-left">1970-01-01 00:00:01.668900930</td> </tr> <tr> <td class="org-left">75%</td> <td class="org-left">1970-01-01 00:00:01.673866575</td> </tr> <tr> <td class="org-left">max</td> <td class="org-left">1970-01-01 00:00:01.678946580</td> </tr> </tbody> </table> </div></div> Hmmm… This don't look right. I ran into this type of problem last year when analyzing the Deutsche Bahn (results, not the progress: [Momentane PĂĽnktlichkeit der Deutschen Bahn](momentane_puenktlichkeit_der_deutschen_bahn_in_nrw.md)). To safe memory and network capacity they divided the unix-timestamps by a factor of `1e6` or `1e9`. pd.to_datetime(df.TIMESTAMP * 1e9) \ .describe(datetime_is_numeric=True) \ .to_markdown(tablefmt="orgtbl") <div id='datetime,table--2' class='table'> <div class='gutter'><a href='#datetime,table--2'>#</a></div> <div class='tabular'> <table> <colgroup> <col class="org-left"> <col class="org-left"> </colgroup> <thead> <tr> <th scope="col" class="org-left"> </th> <th scope="col" class="org-left">TIMESTAMP</th> </tr> </thead> <tbody> <tr> <td class="org-left">count</td> <td class="org-left">331004</td> </tr> <tr> <td class="org-left">mean</td> <td class="org-left">2022-11-20 04:04:35.349853696</td> </tr> <tr> <td class="org-left">min</td> <td class="org-left">2022-07-28 00:44:00</td> </tr> <tr> <td class="org-left">25%</td> <td class="org-left">2022-09-23 12:11:45</td> </tr> <tr> <td class="org-left">50%</td> <td class="org-left">2022-11-19 23:35:30</td> </tr> <tr> <td class="org-left">75%</td> <td class="org-left">2023-01-16 10:56:15</td> </tr> <tr> <td class="org-left">max</td> <td class="org-left">2023-03-16 06:03:00</td> </tr> </tbody> </table> </div></div> Yes! This matches the span in which I used Gadgedbridge with my watch. Let's make some useful columns out of this. By using the `.dt`-accessor Object (<a href="#citeproc_bib_item_3">“pandas.Series.dt — pandas 1.5.3 documentation” 2023</a>) attributes like `date`, `hour`, etc. can be used easily: df["utc"] = pd.to_datetime(df.TIMESTAMP * 1e9) df["date"] = df.utc.dt.date df["weekday"] = df.utc.dt.day_name() df["hour"] = df.utc.dt.hour df["hourF"] = df.utc.dt.hour + df.utc.dt.minute/60 `date` and `weekday` can be used for grouping data; `hour` and espeically `hourF` (meaning the hour as floating point number) for x/y diagrams. ## Heart Rate plt.hist( df.HEART_RATE, color=hsluv.hsluv_to_hex((0, 75, 25)), bins=256 ) plt.title("Histogram: Heart rate") plt.yscale("log") plt.savefig(file) plt.close() file ![img](./images/20230315-01.png "Histogram: Heart-Rate") Looking at the Histogram of the Heart Rate it's obvious that the Values of `255` and below `0` are errors or failed measures. Therefore I set them to `None`. df["heartRate"] = df.HEART_RATE df.loc[df.heartRate<=0, "heartRate"] = None df.loc[df.heartRate>=255, "heartRate"] = None To avoid strange problems when executing the org-babel-blocks in the wrong order, I follow the best-practise of copying and **not overwriting** the original data. df[ ["HEART_RATE", "heartRate"] ].describe().to_markdown(tablefmt="orgtbl") <div id='heart-rate,table--1' class='table'> <div class='gutter'><a href='#heart-rate,table--1'>#</a></div> <div class='tabular'> <table> <colgroup> <col class="org-left"> <col class="org-right"> <col class="org-right"> </colgroup> <thead> <tr> <th scope="col" class="org-left"> </th> <th scope="col" class="org-right">HEART_RATE</th> <th scope="col" class="org-right">heartRate</th> </tr> </thead> <tbody> <tr> <td class="org-left">count</td> <td class="org-right">331004</td> <td class="org-right">321445</td> </tr> <tr> <td class="org-left">mean</td> <td class="org-right">76.3764</td> <td class="org-right">71.0781</td> </tr> <tr> <td class="org-left">std</td> <td class="org-right">33.7458</td> <td class="org-right">14.0401</td> </tr> <tr> <td class="org-left">min</td> <td class="org-right">-1</td> <td class="org-right">39</td> </tr> <tr> <td class="org-left">25%</td> <td class="org-right">60</td> <td class="org-right">59</td> </tr> <tr> <td class="org-left">50%</td> <td class="org-right">71</td> <td class="org-right">71</td> </tr> <tr> <td class="org-left">75%</td> <td class="org-right">81</td> <td class="org-right">81</td> </tr> <tr> <td class="org-left">max</td> <td class="org-right">255</td> <td class="org-right">178</td> </tr> </tbody> </table> </div></div> This is much better! # Plotting some Data Now I want to see how the data looks. Today is a good day, because - I slept (not that surprise) - I worked at the Computer (doing *this*) - I rode 47km with the bike fig, ax = plt.subplots(figsize=(8, 4)) span = df[ ( df.utc > datetime(2023, 3, 15, 3) ) & ( # & is the bitwise AND df.utc < datetime(2023, 3, 15, 15) ) ] ax.plot( span.utc, span.RAW_INTENSITY, label="Intensity", color=hsluv.hsluv_to_hex((240, 80, 20)), linewidth=0.75 ) ax.plot( span.utc, span.RAW_KIND, label="Kind", color=hsluv.hsluv_to_hex((120, 80, 40)), linewidth=0.5 ) bx = ax.twinx() bx.plot( span.utc, span.heartRate, label="Heart Rate", color=hsluv.hsluv_to_hex((0, 80, 60)), linewidth=0.25 ) ax.set_ylim([0, 256]) ax.set_yticks(list(range(0, 256, 32))) bx.set_ylim([0, 160]) ax.set_xlim([span.utc.min(), span.utc.max()]) fig.legend() ax.grid() fig.autofmt_xdate() # tilting the x-labels fig.tight_layout() # less space around the plot fig.savefig(file) plt.close(fig) file ![img](./images/20230315-02.png "A normal day") You can't clearly see what's going on, because the wiggeli wobbelyness of the lines. Try using a rolling mean: fig, ax = plt.subplots(figsize=(8, 4)) span = df[ ( df.utc > datetime(2023, 3, 15, 3) ) & ( df.utc < datetime(2023, 3, 15, 15) ) ] ax.plot( span.utc, span.RAW_INTENSITY.rolling(5, min_periods=1).mean(), label="Intensity", color=hsluv.hsluv_to_hex((240, 80, 20)), linewidth=0.75 ) ax.plot( span.utc, span.RAW_KIND.rolling(5, min_periods=1).median(), # ! label="Kind", color=hsluv.hsluv_to_hex((120, 80, 40)), linewidth=0.5 ) bx = ax.twinx() bx.plot( span.utc, span.heartRate.rolling(5, min_periods=1).mean(), label="Heart Rate", color=hsluv.hsluv_to_hex((0, 80, 60)), linewidth=0.25 ) ax.set_ylim([0, 256]) ax.set_yticks(list(range(0, 256, 32))) bx.set_ylim([0, 160]) ax.set_xlim([span.utc.min(), span.utc.max()]) fig.legend() ax.grid() fig.autofmt_xdate() fig.tight_layout() fig.savefig(file) plt.close(fig) file ![img](./images/20230315-02-rolling.png "A normal day but with rolling mean")