NumerAI: Week 4

My fourth week was just submitting my results from my vanilla neural network. I didn’t get enough time to actually work through my era neural network. I am also concerned that once I do get it working it will take more work on the weekends to run it as I have to find the matching eras.

257 Results

While I won’t know my real results (the action of my model predictions versus the real life market movement) I did get the results from the “test” data. It wasn’t great. I was in the bottom 15%. I expected this so it wasn’t a big deal.

CORR Rank4850/6229 (+246)
CORR Reputation-0.0944 (+.0033)
MMC Reputation-0.0955 (+.0029)
FNC Reputation-0.0957 (+.0021)
Current Reputation (Build over the last 20 rounds)

Validation Sharpe0.636314.97%
Validation Corr0.016020.05%
Validation FNC0.011232.74%
Corr + MMC Sharpe0.512411.01%
MMC Mean0.004175.73%
Corr With Example Preds0.4026
Diagnostic Results

NumerAI: Week 3

My third week included getting my results back (bottom 15%) and looking at different ways that I can use the feaures.

256 Results

While I won’t know my real results (the action of my model predictions versus the real life market movement) I did get the results from the “test” data. It wasn’t great. I was in the bottom 15%. I expected this so it wasn’t a big deal.

CORR Rank5096/6025
CORR Reputation-0.0977
MMC Reputation-0.0984
FNC Reputation-0.0978
Current Reputation (Build over the last 20 rounds)
Validation Sharpe0.636314.97%
Validation Corr0.016020.05%
Validation FNC0.011232.74%
Corr + MMC Sharpe0.512411.01%
MMC Mean0.004175.73%
Corr With Example Preds0.4026
Diagnostic Results

Data Analysis

The first thing I am looking at doing is trying to figure out “what” each column/section means. In doing so I was checking to see what the average of each column was and it turns out it is 0.5.

NumerAI: Week 2

My second week had me actually submit my solution. It is a straight forward neural network so I didn’t expect anything great. With that said, they do give you 0.01 NMR (~$4) so it feels like you have some skin in the game. Once I get a model a bit higher (>0.02 Spearmen Correlation) I will put in more money. I added $100 that I can stake at some point.

Google Colab

It appears that if you add [“”] to the Colab settings you get 2x the memory. It is probably best that I didn’t find this sooner to make sure I cleaned up my code.

Large CSV Files

After looking around at some documentation I saw that converting each of the numeric fields into float16 saved me over 74% on memory. It still takes a while to get the CSV into my code but after that I can do quite a bit with the DataFrame pretty quickly.

Custom Loss/MEtric Method

I am still trying to find the TF version of the Spearman loss function. There is an implementation in PyTorch that I might be able to use.

NumerAI: Week 1

My first week was really just getting everything set up. My biggest issue was trying to figure out the best way to handle over 3GB of data in a csv file.

Overall, I think it was a good start but I will have to keep touching up my model to see if I can squeeze out some performance. Below are some things I discovered.

Google Colab

I have never stored data in the local session before and was surprised when all of my work was gone when I came back to it. Note to self, stored in Google Drive.

I also run into issues while trying to build all the datasets. Since they are so large that is a lot to store in memory.

I really wish I had more knowledge and power within python/colab to release some memory.

Large CSV Files

I have either used a relational database for large datasets or used CSV for smaller datasets. This is the first time I am getting large datasets in CSV format. The training CSV is ~750mb and the validation dataset is >2GB.

I need to be aware of the sizes and not load too much into memory.

Custom Loss/MEtric Method

I am trying to implement their scoring metric (Spearman Pearson) and it is a giant pain in the butt. I have seen a few places that have done it but it doesn’t work with TFv2. Fun times.