NumerAI: Week 1

My first week was really just getting everything set up. My biggest issue was trying to figure out the best way to handle over 3GB of data in a csv file.

Overall, I think it was a good start but I will have to keep touching up my model to see if I can squeeze out some performance. Below are some things I discovered.

Google Colab

I have never stored data in the local session before and was surprised when all of my work was gone when I came back to it. Note to self, stored in Google Drive.

I also run into issues while trying to build all the datasets. Since they are so large that is a lot to store in memory.

I really wish I had more knowledge and power within python/colab to release some memory.

Large CSV Files

I have either used a relational database for large datasets or used CSV for smaller datasets. This is the first time I am getting large datasets in CSV format. The training CSV is ~750mb and the validation dataset is >2GB.

I need to be aware of the sizes and not load too much into memory.

Custom Loss/MEtric Method

I am trying to implement their scoring metric (Spearman Pearson) and it is a giant pain in the butt. I have seen a few places that have done it but it doesn’t work with TFv2. Fun times.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s