gemini - kennedy.gemi.dev

💾 Archived View for bacaliu.de › analyzing_gadgetbridge_data_in_python.md captured on 2023-07-22 at 17:11:02.
-=-=-=-=-=-=-


# Some helpful imports

    import hsluv
    import matplotlib.pyplot as plt
    plt.rcParams["figure.figsize"] = (8, 4)
    import seaborn as sns
    from datetime import datetime


# Getting the Data

The FOSS-Application Gadgedbridge (<a href="#citeproc_bib_item_1">“Gadgetbridge for android” 2022</a>) supports exporting the collected Data into an sqlite-file. Loading them into Python is not that difficult. In my case the file is automaticly mirrored into my `~/Sync`-folder through Syncthing (<a href="#citeproc_bib_item_4">“Syncthing” 2019</a>) and named `ggb.sqlite`.  

    import pandas as pd
    import sqlite3
    conn = sqlite3.connect("/home/adrian/Sync/ggb.sqlite")
    df = pd.read_sql_query(
        """SELECT TIMESTAMP, RAW_INTENSITY, STEPS, RAW_KIND, HEART_RATE
        FROM MI_BAND_ACTIVITY_SAMPLE;""",
        conn
    )
    
    df.describe().to_markdown(tablefmt="orgtbl")

<div id='getting-data,table--1' class='table'>
<div class='gutter'><a href='#getting-data,table--1'>#</a></div>
<div class='tabular'>
<table>


<colgroup>
<col  class="org-left">

<col  class="org-right">

<col  class="org-right">

<col  class="org-right">

<col  class="org-right">

<col  class="org-right">
</colgroup>
<thead>
<tr>
<th scope="col" class="org-left">&#xa0;</th>
<th scope="col" class="org-right">TIMESTAMP</th>
<th scope="col" class="org-right">RAW_INTENSITY</th>
<th scope="col" class="org-right">STEPS</th>
<th scope="col" class="org-right">RAW_KIND</th>
<th scope="col" class="org-right">HEART_RATE</th>
</tr>
</thead>

<tbody>
<tr>
<td class="org-left">count</td>
<td class="org-right">331004</td>
<td class="org-right">331004</td>
<td class="org-right">331004</td>
<td class="org-right">331004</td>
<td class="org-right">331004</td>
</tr>


<tr>
<td class="org-left">mean</td>
<td class="org-right">1.66892e+09</td>
<td class="org-right">24.3221</td>
<td class="org-right">4.33082</td>
<td class="org-right">125.133</td>
<td class="org-right">76.3764</td>
</tr>


<tr>
<td class="org-left">std</td>
<td class="org-right">5.7575e+06</td>
<td class="org-right">28.2967</td>
<td class="org-right">15.5202</td>
<td class="org-right">84.1054</td>
<td class="org-right">33.7458</td>
</tr>


<tr>
<td class="org-left">min</td>
<td class="org-right">1.65897e+09</td>
<td class="org-right">-1</td>
<td class="org-right">0</td>
<td class="org-right">1</td>
<td class="org-right">-1</td>
</tr>


<tr>
<td class="org-left">25%</td>
<td class="org-right">1.66394e+09</td>
<td class="org-right">0</td>
<td class="org-right">0</td>
<td class="org-right">80</td>
<td class="org-right">60</td>
</tr>


<tr>
<td class="org-left">50%</td>
<td class="org-right">1.6689e+09</td>
<td class="org-right">17</td>
<td class="org-right">0</td>
<td class="org-right">90</td>
<td class="org-right">71</td>
</tr>


<tr>
<td class="org-left">75%</td>
<td class="org-right">1.67387e+09</td>
<td class="org-right">38</td>
<td class="org-right">0</td>
<td class="org-right">240</td>
<td class="org-right">81</td>
</tr>


<tr>
<td class="org-left">max</td>
<td class="org-right">1.67895e+09</td>
<td class="org-right">198</td>
<td class="org-right">144</td>
<td class="org-right">251</td>
<td class="org-right">255</td>
</tr>
</tbody>
</table>
</div></div>

I did not include the useless columns `DEVICE_ID` and `USER_ID`. They always have the same value if you use only one device as one user; I don't load them to make the tables smaller; otherwise a `SELECT * FROM MI_BAND_ACTIVITY_DATA` would be sufficient.  


# Preperation


## Datetime

But what is this strange `TIMESTAMP`-column? Oh, maybe just an unix-timestamp. Throw it into `pd.to_datetime`:  

    pd.to_datetime(df.TIMESTAMP) \
      .describe(datetime_is_numeric=True) \
      .to_markdown(tablefmt="orgtbl")

<div id='datetime,table--1' class='table'>
<div class='gutter'><a href='#datetime,table--1'>#</a></div>
<div class='tabular'>
<table>


<colgroup>
<col  class="org-left">

<col  class="org-left">
</colgroup>
<thead>
<tr>
<th scope="col" class="org-left">&#xa0;</th>
<th scope="col" class="org-left">TIMESTAMP</th>
</tr>
</thead>

<tbody>
<tr>
<td class="org-left">count</td>
<td class="org-left">331004</td>
</tr>


<tr>
<td class="org-left">mean</td>
<td class="org-left">1970-01-01 00:00:01.668917075</td>
</tr>


<tr>
<td class="org-left">min</td>
<td class="org-left">1970-01-01 00:00:01.658969040</td>
</tr>


<tr>
<td class="org-left">25%</td>
<td class="org-left">1970-01-01 00:00:01.663935105</td>
</tr>


<tr>
<td class="org-left">50%</td>
<td class="org-left">1970-01-01 00:00:01.668900930</td>
</tr>


<tr>
<td class="org-left">75%</td>
<td class="org-left">1970-01-01 00:00:01.673866575</td>
</tr>


<tr>
<td class="org-left">max</td>
<td class="org-left">1970-01-01 00:00:01.678946580</td>
</tr>
</tbody>
</table>
</div></div>

Hmmm&#x2026; This don't look right. I ran into this type of problem last year when analyzing the Deutsche Bahn (results, not the progress: [Momentane Pünktlichkeit der Deutschen Bahn](momentane_puenktlichkeit_der_deutschen_bahn_in_nrw.md)). To safe memory and network capacity they divided the unix-timestamps by a factor of `1e6` or `1e9`.  

    pd.to_datetime(df.TIMESTAMP * 1e9) \
      .describe(datetime_is_numeric=True) \
      .to_markdown(tablefmt="orgtbl")

<div id='datetime,table--2' class='table'>
<div class='gutter'><a href='#datetime,table--2'>#</a></div>
<div class='tabular'>
<table>


<colgroup>
<col  class="org-left">

<col  class="org-left">
</colgroup>
<thead>
<tr>
<th scope="col" class="org-left">&#xa0;</th>
<th scope="col" class="org-left">TIMESTAMP</th>
</tr>
</thead>

<tbody>
<tr>
<td class="org-left">count</td>
<td class="org-left">331004</td>
</tr>


<tr>
<td class="org-left">mean</td>
<td class="org-left">2022-11-20 04:04:35.349853696</td>
</tr>


<tr>
<td class="org-left">min</td>
<td class="org-left">2022-07-28 00:44:00</td>
</tr>


<tr>
<td class="org-left">25%</td>
<td class="org-left">2022-09-23 12:11:45</td>
</tr>


<tr>
<td class="org-left">50%</td>
<td class="org-left">2022-11-19 23:35:30</td>
</tr>


<tr>
<td class="org-left">75%</td>
<td class="org-left">2023-01-16 10:56:15</td>
</tr>


<tr>
<td class="org-left">max</td>
<td class="org-left">2023-03-16 06:03:00</td>
</tr>
</tbody>
</table>
</div></div>

Yes! This matches the span in which I used Gadgedbridge with my watch. Let's make some useful columns out of this. By using the `.dt`-accessor Object (<a href="#citeproc_bib_item_3">“pandas.Series.dt — pandas 1.5.3 documentation” 2023</a>) attributes like `date`, `hour`, etc. can be used easily:  

    df["utc"] = pd.to_datetime(df.TIMESTAMP * 1e9)
    df["date"] = df.utc.dt.date
    df["weekday"] = df.utc.dt.day_name()
    df["hour"] = df.utc.dt.hour
    df["hourF"] = df.utc.dt.hour + df.utc.dt.minute/60

`date` and `weekday` can be used for grouping data; `hour` and espeically `hourF` (meaning the hour as floating point number) for x/y diagrams.  


## Heart Rate

    plt.hist(
        df.HEART_RATE,
        color=hsluv.hsluv_to_hex((0, 75, 25)),
        bins=256
    )
    plt.title("Histogram: Heart rate")
    plt.yscale("log")
    plt.savefig(file)
    plt.close()
    file

![img](./images/20230315-01.png "Histogram: Heart-Rate")  

Looking at the Histogram of the Heart Rate it's obvious that the Values of `255` and below `0` are errors or failed measures. Therefore I set them to `None`.  

    df["heartRate"] = df.HEART_RATE
    df.loc[df.heartRate<=0, "heartRate"] = None
    df.loc[df.heartRate>=255, "heartRate"] = None

To avoid strange problems when executing the org-babel-blocks in the wrong order, I follow the best-practise of copying and **not overwriting** the original data.  

    df[
        ["HEART_RATE", "heartRate"]
    ].describe().to_markdown(tablefmt="orgtbl")

<div id='heart-rate,table--1' class='table'>
<div class='gutter'><a href='#heart-rate,table--1'>#</a></div>
<div class='tabular'>
<table>


<colgroup>
<col  class="org-left">

<col  class="org-right">

<col  class="org-right">
</colgroup>
<thead>
<tr>
<th scope="col" class="org-left">&#xa0;</th>
<th scope="col" class="org-right">HEART_RATE</th>
<th scope="col" class="org-right">heartRate</th>
</tr>
</thead>

<tbody>
<tr>
<td class="org-left">count</td>
<td class="org-right">331004</td>
<td class="org-right">321445</td>
</tr>


<tr>
<td class="org-left">mean</td>
<td class="org-right">76.3764</td>
<td class="org-right">71.0781</td>
</tr>


<tr>
<td class="org-left">std</td>
<td class="org-right">33.7458</td>
<td class="org-right">14.0401</td>
</tr>


<tr>
<td class="org-left">min</td>
<td class="org-right">-1</td>
<td class="org-right">39</td>
</tr>


<tr>
<td class="org-left">25%</td>
<td class="org-right">60</td>
<td class="org-right">59</td>
</tr>


<tr>
<td class="org-left">50%</td>
<td class="org-right">71</td>
<td class="org-right">71</td>
</tr>


<tr>
<td class="org-left">75%</td>
<td class="org-right">81</td>
<td class="org-right">81</td>
</tr>


<tr>
<td class="org-left">max</td>
<td class="org-right">255</td>
<td class="org-right">178</td>
</tr>
</tbody>
</table>
</div></div>

This is much better!  


# Plotting some Data

Now I want to see how the data looks. Today is a good day, because  

-   I slept (not that surprise)
-   I worked at the Computer (doing *this*)
-   I rode 47km with the bike

    fig, ax = plt.subplots(figsize=(8, 4))
    span = df[
        (
    	df.utc > datetime(2023, 3, 15, 3)
        ) & (  # & is the bitwise AND
    	df.utc < datetime(2023, 3, 15, 15)
        )
    ]
    ax.plot(
        span.utc, span.RAW_INTENSITY,
        label="Intensity",
        color=hsluv.hsluv_to_hex((240, 80, 20)),
        linewidth=0.75
    )
    ax.plot(
        span.utc, span.RAW_KIND,
        label="Kind",
        color=hsluv.hsluv_to_hex((120, 80, 40)),
        linewidth=0.5
    )
    bx = ax.twinx()
    bx.plot(
        span.utc, span.heartRate,
        label="Heart Rate",
        color=hsluv.hsluv_to_hex((0, 80, 60)),
        linewidth=0.25
    )
    ax.set_ylim([0, 256])
    ax.set_yticks(list(range(0, 256, 32)))
    bx.set_ylim([0, 160])
    ax.set_xlim([span.utc.min(), span.utc.max()])
    fig.legend()
    ax.grid()
    fig.autofmt_xdate()  # tilting the x-labels
    fig.tight_layout()  # less space around the plot
    fig.savefig(file)
    plt.close(fig)
    file

![img](./images/20230315-02.png "A normal day")  

You can't clearly see what's going on, because the wiggeli wobbelyness of the lines. Try using a rolling mean:  

    fig, ax = plt.subplots(figsize=(8, 4))
    span = df[
        (
    	df.utc > datetime(2023, 3, 15, 3)
        ) & (
    	df.utc < datetime(2023, 3, 15, 15)
        )
    ]
    ax.plot(
        span.utc,
        span.RAW_INTENSITY.rolling(5, min_periods=1).mean(),
        label="Intensity",
        color=hsluv.hsluv_to_hex((240, 80, 20)),
        linewidth=0.75
    )
    ax.plot(
        span.utc,
        span.RAW_KIND.rolling(5, min_periods=1).median(), # !
        label="Kind",
        color=hsluv.hsluv_to_hex((120, 80, 40)),
        linewidth=0.5
    )
    bx = ax.twinx()
    bx.plot(
        span.utc,
        span.heartRate.rolling(5, min_periods=1).mean(),
        label="Heart Rate",
        color=hsluv.hsluv_to_hex((0, 80, 60)),
        linewidth=0.25
    )
    ax.set_ylim([0, 256])
    ax.set_yticks(list(range(0, 256, 32)))
    bx.set_ylim([0, 160])
    ax.set_xlim([span.utc.min(), span.utc.max()])
    fig.legend()
    ax.grid()
    fig.autofmt_xdate()
    fig.tight_layout()
    fig.savefig(file)
    plt.close(fig)
    file

![img](./images/20230315-02-rolling.png "A normal day but with rolling mean")  


	Now* you can clearly see  


-   Low activity and pulse while sleeping until 06:30 UTC
-   Normal activity while working until 11:00 UTC
-   High activity and pulse from 11:00-13:30 UTC

For `RAW_KIND` I used the rolling **median**, because this looks more discrete than continuous. There might be some strange encoding happening: Sleep is very high, the spikes towards arround 100 are short occurrences of me waking up and turning around; while working the value is arround 80 and during sport it drops to below 20.  


## x/y - combining features!

Now combine features. And to add a color-dimension let's assume `RAW_KIND` above 192 means sleep; below 32 activity.  

    df["assumption"] = [
        "sleep" if r>192 else "normal" if r>32 else "activity"
        for r in df.RAW_KIND
    ]
    fig, ax = plt.subplots(figsize=(6, 6))
    sns.scatterplot(
        ax=ax,
        data=df.sample(2048), # use not /all/ but only 2048 data-points
        x="heartRate",
        y="RAW_INTENSITY",
        hue="assumption",
        palette={
    	"sleep": hsluv.hsluv_to_hex((240, 60, 60)),
    	"normal": hsluv.hsluv_to_hex((120, 80, 40)),
    	"activity": hsluv.hsluv_to_hex((0, 100, 20)),
        }
    )
    ax.set_xlim([30, 130])
    ax.set_ylim([0, None])
    fig.savefig(file)
    plt.close(fig)
    file

![img](./images/20230315-03.png "heartRate against Raw-Intensity")  

It seems intuitive that intensity and heart rate are lower while sleeping. But do you see some strangeness? There are Lines of frequent heart rates when awake but not while sleep.  

I assume my watch has a high precission, but a medium accuracy. Randall Munroe made a useful table to keep in mind the difference between them:  

![img](https://imgs.xkcd.com/comics/precision_vs_accuracy.png "Precision vs Accuracy (<a href="#citeproc_bib_item_2">Munroe 2022</a>)")  

Maybe it's like the following: While sleeping I don't move that much (like the position on the y-axis implies) so the precision is as high as possible. But when moving around the watch measures just the moments it can and estimates the pulse with a lower precision.  


# Bibliography

<style>.csl-entry{text-indent: -0; margin-left: 0;}</style><div class="csl-bib-body">
  <div class="csl-entry"><a id="citeproc_bib_item_1"></a>“Gadgetbridge for android,”. 2022. September 10, 2022, URL: <a href="https://www.gadgetbridge.org">https://www.gadgetbridge.org</a>.</div>
  <div class="csl-entry"><a id="citeproc_bib_item_2"></a>Munroe, R. 2022. “Precision vs Accuracy,” <i>Xkcd</i> November 9, 2022, URL: <a href="https://xkcd.com/2696">https://xkcd.com/2696</a>.</div>
  <div class="csl-entry"><a id="citeproc_bib_item_3"></a>“pandas.Series.dt — pandas 1.5.3 documentation,”. 2023. January 19, 2023, URL: <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.html">https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dt.html</a>.</div>
  <div class="csl-entry"><a id="citeproc_bib_item_4"></a>“Syncthing,”. 2019. September 5, 2019, URL: <a href="https://syncthing.net">https://syncthing.net</a>.</div>
</div>


# Nav

-   Tags: [Python](./tags/Python.md) - [Data](./tags/Data.md)

<!-- BEGIN insert Backlinks (but there are no) -->

-   Formats: [md](./analyzing_gadgetbridge_data_in_python.md) - [txt](./analyzing_gadgetbridge_data_in_python.txt) - [html](./analyzing_gadgetbridge_data_in_python.html) - [gmi](./analyzing_gadgetbridge_data_in_python.gmi)


# Footer

License: CC BY-4.0  
[Impressum und Datenschutz](./impressum-datenschutz.gmi)