In 2017 I announced a git repo containing *handwritten digits* suitable for machine learning. Some coworkers and I wanted to get some hands-on experience with Deeplearning4j and that included the entire experience of collecting and preparing the data. That’s why we didn’t want to just use the MNIST database. («The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.»)
In addition to that, the MNIST database has handwritten digits *by Americans*. But here in Switzerland we don’t necessarily write the digits in exactly the same way, so that was an additional incentive to collect our own data. And we did!
This year, however, I talked to one of our customers. We had trained a neural network to recognise digits on a much, much larger set of data. I wanted to make these digits available, too. And they agreed! Thank you!
That’s why there are now more than 800,000 handwritten digits in this repository!
https://github.com/kensanata/numbers
#Programming #Machine Learning
(Please contact me if you want to remove your comment.)
⁂
800000! gewaltig!
– Chris 2018-12-13 18:49 UTC
---
`git add` hat auf Windows 10 über 1h gedauert, also habe ich den Rechner über Nacht laufen gelassen. `git commit` hat dann nochmal etwa 1h gedauert. Hochladen waren dann relativ schnell, haha.
– Alex Schroeder 2018-12-13 22:24 UTC