NumerAI: Week 4

My fourth week was just submitting my results from my vanilla neural network. I didn’t get enough time to actually work through my era neural network. I am also concerned that once I do get it working it will take more work on the weekends to run it as I have to find the matching eras.

257 Results

While I won’t know my real results (the action of my model predictions versus the real life market movement) I did get the results from the “test” data. It wasn’t great. I was in the bottom 15%. I expected this so it wasn’t a big deal.

CORR Rank4850/6229 (+246)
CORR Reputation-0.0944 (+.0033)
MMC Reputation-0.0955 (+.0029)
FNC Reputation-0.0957 (+.0021)
Current Reputation (Build over the last 20 rounds)

Validation Sharpe0.636314.97%
Validation Corr0.016020.05%
Validation FNC0.011232.74%
Corr + MMC Sharpe0.512411.01%
MMC Mean0.004175.73%
Corr With Example Preds0.4026
Diagnostic Results

Sponsored Post Learn from the experts: Create a successful blog with our brand new courseThe WordPress.com Blog

Are you new to blogging, and do you want step-by-step guidance on how to publish and grow your blog? Learn more about our new Blogging for Beginners course and get 50% off through December 10th.

WordPress.com is excited to announce our newest offering: a course just for beginning bloggers where you’ll learn everything you need to know about blogging from the most trusted experts in the industry. We have helped millions of blogs get up and running, we know what works, and we want you to to know everything we know. This course provides all the fundamental skills and inspiration you need to get your blog started, an interactive community forum, and content updated annually.

NumerAI: Week 3

My third week included getting my results back (bottom 15%) and looking at different ways that I can use the feaures.

256 Results

While I won’t know my real results (the action of my model predictions versus the real life market movement) I did get the results from the “test” data. It wasn’t great. I was in the bottom 15%. I expected this so it wasn’t a big deal.

CORR Rank5096/6025
CORR Reputation-0.0977
MMC Reputation-0.0984
FNC Reputation-0.0978
Current Reputation (Build over the last 20 rounds)
Validation Sharpe0.636314.97%
Validation Corr0.016020.05%
Validation FNC0.011232.74%
Corr + MMC Sharpe0.512411.01%
MMC Mean0.004175.73%
Corr With Example Preds0.4026
Diagnostic Results

Data Analysis

The first thing I am looking at doing is trying to figure out “what” each column/section means. In doing so I was checking to see what the average of each column was and it turns out it is 0.5.

NumerAI: Week 2

My second week had me actually submit my solution. It is a straight forward neural network so I didn’t expect anything great. With that said, they do give you 0.01 NMR (~$4) so it feels like you have some skin in the game. Once I get a model a bit higher (>0.02 Spearmen Correlation) I will put in more money. I added $100 that I can stake at some point.

Google Colab

It appears that if you add [“”] to the Colab settings you get 2x the memory. It is probably best that I didn’t find this sooner to make sure I cleaned up my code.

Large CSV Files

After looking around at some documentation I saw that converting each of the numeric fields into float16 saved me over 74% on memory. It still takes a while to get the CSV into my code but after that I can do quite a bit with the DataFrame pretty quickly.

Custom Loss/MEtric Method

I am still trying to find the TF version of the Spearman loss function. There is an implementation in PyTorch that I might be able to use.

NumerAI: Week 1

My first week was really just getting everything set up. My biggest issue was trying to figure out the best way to handle over 3GB of data in a csv file.

Overall, I think it was a good start but I will have to keep touching up my model to see if I can squeeze out some performance. Below are some things I discovered.

Google Colab

I have never stored data in the local session before and was surprised when all of my work was gone when I came back to it. Note to self, stored in Google Drive.

I also run into issues while trying to build all the datasets. Since they are so large that is a lot to store in memory.

I really wish I had more knowledge and power within python/colab to release some memory.

Large CSV Files

I have either used a relational database for large datasets or used CSV for smaller datasets. This is the first time I am getting large datasets in CSV format. The training CSV is ~750mb and the validation dataset is >2GB.

I need to be aware of the sizes and not load too much into memory.

Custom Loss/MEtric Method

I am trying to implement their scoring metric (Spearman Pearson) and it is a giant pain in the butt. I have seen a few places that have done it but it doesn’t work with TFv2. Fun times.