NumerAI: Week 2

My second week had me actually submit my solution. It is a straight forward neural network so I didn’t expect anything great. With that said, they do give you 0.01 NMR (~$4) so it feels like you have some skin in the game. Once I get a model a bit higher (>0.02 Spearmen Correlation) I will put in more money. I added $100 that I can stake at some point.

Google Colab

It appears that if you add [“”] to the Colab settings you get 2x the memory. It is probably best that I didn’t find this sooner to make sure I cleaned up my code.

Large CSV Files

After looking around at some documentation I saw that converting each of the numeric fields into float16 saved me over 74% on memory. It still takes a while to get the CSV into my code but after that I can do quite a bit with the DataFrame pretty quickly.

Custom Loss/MEtric Method

I am still trying to find the TF version of the Spearman loss function. There is an implementation in PyTorch that I might be able to use.

NumerAI: Week 1

My first week was really just getting everything set up. My biggest issue was trying to figure out the best way to handle over 3GB of data in a csv file.

Overall, I think it was a good start but I will have to keep touching up my model to see if I can squeeze out some performance. Below are some things I discovered.

Google Colab

I have never stored data in the local session before and was surprised when all of my work was gone when I came back to it. Note to self, stored in Google Drive.

I also run into issues while trying to build all the datasets. Since they are so large that is a lot to store in memory.

I really wish I had more knowledge and power within python/colab to release some memory.

Large CSV Files

I have either used a relational database for large datasets or used CSV for smaller datasets. This is the first time I am getting large datasets in CSV format. The training CSV is ~750mb and the validation dataset is >2GB.

I need to be aware of the sizes and not load too much into memory.

Custom Loss/MEtric Method

I am trying to implement their scoring metric (Spearman Pearson) and it is a giant pain in the butt. I have seen a few places that have done it but it doesn’t work with TFv2. Fun times.

NumerAI Tournament

I was sent a link about NumerAI’s tournament. The idea is that the knowledge of the crowd is better than the knowledge of a few. A bunch (2,424 as of right now) of different models are created based on the same dataset and then the company uses all of them to predict price movement. My assumption is that something like a random forest is used.

I am going to use this opportunity to create a web series about my experiences. This should keep me busy and my data science skills sharp until next basketball season when I release my soon to be build models

GDG San Diego Talk

Overview

This will be the fourth time I am giving this talk about my work on the Raspberry Pi. I last gave it at the Google Developers ML Summit and it was recording and that was kind of odd. We will see if this one is a little better with some audience feedback.

TensorFlow Lite

Since I had some extra time I was able to implement TensorFlow Lite on the Raspberry Pi. I went to the TF Quickstart [Link] page and since I had already converted the Keras model into a tflite model I only needed the interpreter. This allowed me to not have to have the full TF install*.

*I still used the Keras preprocessing and decoding methods that came with the MobileNetV2 model. I probably should have not done that but at this point I don’t think it matters as much. Maybe in the future I will do ONLY TF Lite.

Links

Original Blog Series: Here
Meetup: https://gdg.community.dev/events/details/google-gdg-san-diego-presents-tensorflow-lite-on-the-raspberry-pi/
Code: Main, MobileNetV2Base, and PiCameraManager
Presentation: ImageDetection/GDG-SanDiego.pptx