Tools and Tips for Remote Machine Learning Work

March 27, 2020



A lot goes into effective remote work: effective communication, milestones, expectation management and, for larger teams, effective project management. But if you’re working as a distributed team or interacting with a client remotely, good tools are an indispensable part of work that make it easier and much more enjoyable.

Here are a few tools that may help specifically with the machine learning aspect of distributed or remote machine learning work. Video conferencing, communication and project management tools are important and help structure your work processes, and if you’re working remotely, chances are you’re using them already.

We’ll look at three major building blocks of a machine learning workflow:

Data Modeling Sharing results
Storing, accessing and labeling data Training models Presenting the results of model training to your client, or sharing them within your team

Interfacing with the client


This is the “outward” view: you’re a machine learning professional or a team of said professionals, interacting with a client. Why is it relevant to bring the client into the picture? Well, there are some peculiarities here, such as:

Here’s what you may consider using.


Data

There could be a few challenges on the data front. First, the client may want to keep the data on-prem, and you’ll need to work with whatever stack they’ve got. That is more of an engineering challenge.

Second, you may discover that the data your client has is unlabeled, and you’ll need to do some annotations on your end.

If the dataset that you need labeled can be made public:

If the data is confidential, or the labeling needs specialized expertise (think medical images, and perhaps your client has radiologists who can annotate the images), there are a few options:


Modeling

If there are restrictions on data access from the client, you may still be able to use your favorite tools for modeling depending on exactly where the data is located. Jupyter notebooks in particular can be launched on:


Sharing results

Great, you’ve trained your model, and now it’s time to show to the client just how shiny it is. You can present your results in a PowerPoint or as a PDF, of course. But you can also go a step further and build a dashboard, especially if the model can provide insights for the client based on new data. Among the many dashboard tools are:



Distributed team


The “inward” view is that of a distributed team working on a machine learning project. Yes, you need to talk and use some controls, however simple, to keep the project on track. But you may also need to work collaboratively on a model or data. And you definitely want to share your own modeling progress with the rest of the team.


Data

It’s really helpful to agree to one source of truth, a single data repository, for the entire team. That agreement is more important than the actual tool that you use for that purpose, which doesn’t have to be super complicated. For example:


Modeling

Sometimes a single person is responsible for training a particular model. Sometimes that model needs to be worked on by several people. To collaborate on building a model, you can use:


Sharing results

Sharing the progress of hyperparameter tuning and (hopefully) increasing model performance can be very helpful to keep the distributed team aware of where everyone stands. Here’s a couple of suggestions:




In the end, you may end up using a combination of tools from the “inward” and “outward” views. And of course, instead of using a cocktail mix of different tools, you can instead go with an integrated solution like Dataiku. But a custom stack is likely to be better tailored to the needs of your clients and your team. It’s also more flexible — and more fun.