💾 Archived View for bacaliu.de › analyzing_gadgetbridge_data_in_python.gmi captured on 2023-07-22 at 16:45:25. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2023-07-10)

➡️ Next capture (2023-09-08)

-=-=-=-=-=-=-

Analyzing Gadgetbridge-Data with Python

Some helpful imports

import hsluv
import matplotlib.pyplot as plt
plt.rcParams["figure.figsize"] = (8, 4)
import seaborn as sns
from datetime import datetime

Getting the Data

The FOSS-Application Gadgedbridge (“Gadgetbridge for android” 2022) supports exporting the collected Data into an sqlite-file. Loading them into Python is not that difficult. In my case the file is automaticly mirrored into my `~/Sync'-folder through Syncthing (“Syncthing” 2019) and named `ggb.sqlite `.

import pandas as pd
import sqlite3
conn = sqlite3.connect("/home/adrian/Sync/ggb.sqlite")
df = pd.read_sql_query(
    """SELECT TIMESTAMP, RAW_INTENSITY, STEPS, RAW_KIND, HEART_RATE
    FROM MI_BAND_ACTIVITY_SAMPLE;""",
    conn
)

df.describe().to_markdown(tablefmt="orgtbl")
          TIMESTAMP  RAW_INTENSITY    STEPS  RAW_KIND  HEART_RATE 
------------------------------------------------------------------
 count       331004         331004   331004    331004      331004 
 mean   1.66892e+09        24.3221  4.33082   125.133     76.3764 
 std     5.7575e+06        28.2967  15.5202   84.1054     33.7458 
 min    1.65897e+09             -1        0         1          -1 
 25%    1.66394e+09              0        0        80          60 
 50%     1.6689e+09             17        0        90          71 
 75%    1.67387e+09             38        0       240          81 
 max    1.67895e+09            198      144       251         255 

I did not include the useless columns `DEVICE_ID ` and `USER_ID `. They always have the same value if you use only one device as one user; I don't load them to make the tables smaller; otherwise a `SELECT * FROM MI_BAND_ACTIVITY_DATA' would be sufficient.

Preperation

Datetime

But what is this strange `TIMESTAMP `-column? Oh, maybe just an unix-timestamp. Throw it into `pd.to_datetime':

pd.to_datetime(df.TIMESTAMP) \
  .describe(datetime_is_numeric=True) \
  .to_markdown(tablefmt="orgtbl")
        TIMESTAMP                     
--------------------------------------
 count  331004                        
 mean   1970-01-01 00:00:01.668917075 
 min    1970-01-01 00:00:01.658969040 
 25%    1970-01-01 00:00:01.663935105 
 50%    1970-01-01 00:00:01.668900930 
 75%    1970-01-01 00:00:01.673866575 
 max    1970-01-01 00:00:01.678946580 

Hmmm... This don't look right. I ran into this type of problem last year when analyzing the Deutsche Bahn (results, not the progress: [Momentane PĂĽnktlichkeit der Deutschen Bahn]). To safe memory and network capacity they divided the unix-timestamps by a factor of `1e6 ` or `1e9 `.

pd.to_datetime(df.TIMESTAMP * 1e9) \
  .describe(datetime_is_numeric=True) \
  .to_markdown(tablefmt="orgtbl")
        TIMESTAMP                     
--------------------------------------
 count  331004                        
 mean   2022-11-20 04:04:35.349853696 
 min    2022-07-28 00:44:00           
 25%    2022-09-23 12:11:45           
 50%    2022-11-19 23:35:30           
 75%    2023-01-16 10:56:15           
 max    2023-03-16 06:03:00           

Yes! This matches the span in which I used Gadgedbridge with my watch. Let's make some useful columns out of this. By using the `.dt `-accessor Object (“pandas.Series.dt — pandas 1.5.3 documentation” 2023) attributes like `date `, `hour `, etc. can be used easily:

df["utc"] = pd.to_datetime(df.TIMESTAMP * 1e9)
df["date"] = df.utc.dt.date
df["weekday"] = df.utc.dt.day_name()
df["hour"] = df.utc.dt.hour
df["hourF"] = df.utc.dt.hour + df.utc.dt.minute/60

`date ` and `weekday ` can be used for grouping data; `hour ` and espeically `hourF ` (meaning the hour as floating point number) for x/y diagrams.

Momentane PĂĽnktlichkeit der Deutschen Bahn

Heart Rate

plt.hist(
    df.HEART_RATE,
    color=hsluv.hsluv_to_hex((0, 75, 25)),
    bins=256
)
plt.title("Histogram: Heart rate")
plt.yscale("log")
plt.savefig(file)
plt.close()
file

Looking at the Histogram of the Heart Rate it's obvious that the Values of `255 ` and below `0 ` are errors or failed measures. Therefore I set them to `None'.

df["heartRate"] = df.HEART_RATE
df.loc[df.heartRate<=0, "heartRate"] = None
df.loc[df.heartRate>=255, "heartRate"] = None

To avoid strange problems when executing the org-babel-blocks in the wrong order, I follow the best-practise of copying and *not overwriting* the original data.

df[
    ["HEART_RATE", "heartRate"]
].describe().to_markdown(tablefmt="orgtbl")
        HEART_RATE  heartRate 
------------------------------
 count      331004     321445 
 mean      76.3764    71.0781 
 std       33.7458    14.0401 
 min            -1         39 
 25%            60         59 
 50%            71         71 
 75%            81         81 
 max           255        178 

This is much better!

Plotting some Data

Now I want to see how the data looks. Today is a good day, because

fig, ax = plt.subplots(figsize=(8, 4))
span = df[
    (
	df.utc > datetime(2023, 3, 15, 3)
    ) & (  # & is the bitwise AND
	df.utc < datetime(2023, 3, 15, 15)
    )
]
ax.plot(
    span.utc, span.RAW_INTENSITY,
    label="Intensity",
    color=hsluv.hsluv_to_hex((240, 80, 20)),
    linewidth=0.75
)
ax.plot(
    span.utc, span.RAW_KIND,
    label="Kind",
    color=hsluv.hsluv_to_hex((120, 80, 40)),
    linewidth=0.5
)
bx = ax.twinx()
bx.plot(
    span.utc, span.heartRate,
    label="Heart Rate",
    color=hsluv.hsluv_to_hex((0, 80, 60)),
    linewidth=0.25
)
ax.set_ylim([0, 256])
ax.set_yticks(list(range(0, 256, 32)))
bx.set_ylim([0, 160])
ax.set_xlim([span.utc.min(), span.utc.max()])
fig.legend()
ax.grid()
fig.autofmt_xdate()  # tilting the x-labels
fig.tight_layout()  # less space around the plot
fig.savefig(file)
plt.close(fig)
file

You can't clearly see what's going on, because the wiggeli wobbelyness of the lines. Try using a rolling mean:

fig, ax = plt.subplots(figsize=(8, 4))
span = df[
    (
	df.utc > datetime(2023, 3, 15, 3)
    ) & (
	df.utc < datetime(2023, 3, 15, 15)
    )
]
ax.plot(
    span.utc,
    span.RAW_INTENSITY.rolling(5, min_periods=1).mean(),
    label="Intensity",
    color=hsluv.hsluv_to_hex((240, 80, 20)),
    linewidth=0.75
)
ax.plot(
    span.utc,
    span.RAW_KIND.rolling(5, min_periods=1).median(), # !
    label="Kind",
    color=hsluv.hsluv_to_hex((120, 80, 40)),
    linewidth=0.5
)
bx = ax.twinx()
bx.plot(
    span.utc,
    span.heartRate.rolling(5, min_periods=1).mean(),
    label="Heart Rate",
    color=hsluv.hsluv_to_hex((0, 80, 60)),
    linewidth=0.25
)
ax.set_ylim([0, 256])
ax.set_yticks(list(range(0, 256, 32)))
bx.set_ylim([0, 160])
ax.set_xlim([span.utc.min(), span.utc.max()])
fig.legend()
ax.grid()
fig.autofmt_xdate()
fig.tight_layout()
fig.savefig(file)
plt.close(fig)
file

/Now/ you can clearly see

For `RAW_KIND ` I used the rolling *median*, because this looks more discrete than continuous. There might be some strange encoding happening: Sleep is very high, the spikes towards arround 100 are short occurrences of me waking up and turning around; while working the value is arround 80 and during sport it drops to below 20.

x/y - combining features!

Now combine features. And to add a color-dimension let's assume `RAW_KIND ` above 192 means sleep; below 32 activity.

df["assumption"] = [
    "sleep" if r>192 else "normal" if r>32 else "activity"
    for r in df.RAW_KIND
]
fig, ax = plt.subplots(figsize=(6, 6))
sns.scatterplot(
    ax=ax,
    data=df.sample(2048), # use not /all/ but only 2048 data-points
    x="heartRate",
    y="RAW_INTENSITY",
    hue="assumption",
    palette={
	"sleep": hsluv.hsluv_to_hex((240, 60, 60)),
	"normal": hsluv.hsluv_to_hex((120, 80, 40)),
	"activity": hsluv.hsluv_to_hex((0, 100, 20)),
    }
)
ax.set_xlim([30, 130])
ax.set_ylim([0, None])
fig.savefig(file)
plt.close(fig)
file

It seems intuitive that intensity and heart rate are lower while sleeping. But do you see some strangeness? There are Lines of frequent heart rates when awake but not while sleep.

I assume my watch has a high precission, but a medium accuracy. Randall Munroe made a useful table to keep in mind the difference between them:

Maybe it's like the following: While sleeping I don't move that much (like the position on the y-axis implies) so the precision is as high as possible. But when moving around the watch measures just the moments it can and estimates the pulse with a lower precision.

Bibliography

“Gadgetbridge for android,”. 2022. September 10, 2022, URL: .

Munroe, R. 2022. “Precision vs Accuracy,” /Xkcd/ November 9, 2022, URL: .

“pandas.Series.dt — pandas 1.5.3 documentation,”. 2023. January 19, 2023, URL: .

“Syncthing,”. 2019. September 5, 2019, URL: .

Nav

Python

Data

md

txt

html

gmi

Footer

License: CC BY-4.0 [Impressum und Datenschutz]

Impressum und Datenschutz