Intro to TensorFlow
Deep Learning
R Interfaces to TensorFlow
Supporting Tools
Learning More
Intro to TensorFlow
Deep Learning
R Interfaces to TensorFlow
Supporting Tools
Learning More
Very general built-in optimization algorithms (SGD, Adam) that don’t require that all data is in RAM
Robust foundation for machine learning and deep learning applications
TensorFlow models can be deployed with a low-latency C++ runtime
R has a lot to offer as an interface language for TensorFlow
Tensors
Data flow
Runtime execution
Some example uses
Dimension | R object |
---|---|
0D | 42 |
1D | c(42, 42, 42) |
2D | matrix(42, nrow = 2, ncol = 2) |
3D | array(42, dim = c(2,3,2)) |
4D | array(42, dim = c(2,3,2,3)) |
Data | Tensor |
---|---|
Vector data | 2D tensors of shape (samples, features) |
Timeseries data | 3D tensors of shape (samples, timesteps, features) |
Images | 4D tensors of shape (samples, height, width, channels) |
Video | 5D tensors of shape (samples, frames, height, width, channels) |
Note that samples
is always the first dimension
head(data.matrix(iris), n = 10)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species [1,] 5.1 3.5 1.4 0.2 1 [2,] 4.9 3.0 1.4 0.2 1 [3,] 4.7 3.2 1.3 0.2 1 [4,] 4.6 3.1 1.5 0.2 1 [5,] 5.0 3.6 1.4 0.2 1 [6,] 5.4 3.9 1.7 0.4 1 [7,] 4.6 3.4 1.4 0.3 1 [8,] 5.0 3.4 1.5 0.2 1 [9,] 4.4 2.9 1.4 0.2 1 [10,] 4.9 3.1 1.5 0.1 1
You define the graph in R
Graph is compiled and optimized
Graph is executed on devices
Nodes represent computations
Data (tensors) flows between them
R Code
TensorFlow Graph
I’m confident the R community will also find novel new uses for TensorFlow…
# Greta theta <- normal(0, 32, dim = 2) mu <- alpha + beta * Z X <- normal(mu, sigma) p <- ilogit(theta[1] + theta[2] * X) distribution(y) <- binomial(n, p)
# BUGS/JAGS for(j in 1 : J) { y[j] ~ dbin(p[j], n[j]) logit(p[j]) <- theta[1] + theta[2] * X[j] X[j] ~ dnorm(mu[j], tau) mu[j] <- alpha + beta * Z[j] } theta[1] ~ dnorm(0.0, 0.001) theta[2] ~ dnorm(0.0, 0.001)
What is deep learning?
What is it useful for?
Why should R users care?
How does it work?
Some examples
Special thanks to François Chollet (creator of Keras) for the concepts and figures used to explain deep learning! (all drawn from Chapter 1 of his Deep Learning with R book)
A layer is a geometric transformation function on the data that goes through it (transformations must be differentiable for stochastic gradient descent)
Weights determine the data transformation behavior of a layer
The layers of representation in a deep learning model are the feature engineering for the model (i.e. feature transformations are learned rather than hard coded).
A new take on learning representations from data that puts an emphasis on learning successive layers of increasingly meaningful representations.
Other possibly more appropriate names for the field:
Modern deep learning often involves tens or even hundreds of successive layers of representation
Other approaches to machine learning tend to focus on learning only one or two layers of representations of the data
Deep learning is proven to be effective at various complex “perceptual” tasks but not yet proven to be of widespread benefit in other domains.
A simple mechanism that, once scaled, ends up looking like magic
Statistics: Often focused on inferring the process by which data is generated.
Machine learning: Principally focused on predicting future data.
Breiman, L. (2001). Statistical modeling: The two cultures. Statistical Science, 16(3), 199–215.
Ayres, I. (2008). Super Crunchers: Why Thinking-By-Numbers is the New Way To Be Smart, Bantam.
Shmueli, G. (2010). To explain or to predict? Statistical Science, 25(3), 289–310.
Breiman, L. (1997). No Bayesians in foxholes. IEEE Expert, 12(6), 21–24.
Boulesteix, A.-L., & Schmid, M. (2014). Machine learning versus statistical modeling. Biometrical Journal, 56(4), 588–593. (and other articles in this issue)
library(keras) model <- keras_model_sequential() %>% layer_conv_2d(filters = 32, kernel_size = c(3,3), activation = 'relu', input_shape = c(28,28,1)) %>% layer_conv_2d(filters = 64, kernel_size = c(3,3), activation = 'relu') %>% layer_max_pooling_2d(pool_size = c(2, 2)) %>% layer_flatten() %>% layer_dense(units = 128, activation = 'relu') %>% layer_dense(units = 10, activation = 'softmax')
The loss function takes the predictions of the network and the true targets (what you wanted the network to output) and computes a distance score, capturing how well the network has done on a batch of examples.
The fundamental trick in deep learning is to use this score as a feedback signal to adjust the value of the weights a little, in a direction that will lower the loss score for the current batch of examples. This adjustment is the job of the optimizer.
Updates are done via the backpropagation algorithm, using the chain rule to iteratively compute gradients for each layer.
Traditional optimization algorithms update weights by averaging the gradients of all data points. To economize computational cost we use stochastic gradient descent and only calculate gradients for a small random sample of the data.
The training loop, which, repeated a sufficient number of times (typically tens of iterations over thousands of examples), yields weight values that minimize the loss function.
Deep-learning models are mathematical machines for uncrumpling complicated manifolds of high-dimensional data.
Deep learning is turning meaning into vectors, into geometric spaces, and then incrementally learning complex geometric transformations that map one space to another.
How can we do this with simple parametric models trained with gradient descent? We just need sufficiently large parametric models trained with gradient descent on sufficiently many examples.
summary(model)
______________________________________________________________________________________ Layer (type) Output Shape Param # ====================================================================================== conv2d_3 (Conv2D) (None, 26, 26, 32) 320 ______________________________________________________________________________________ conv2d_4 (Conv2D) (None, 24, 24, 64) 18496 ______________________________________________________________________________________ max_pooling2d_2 (MaxPooling2D) (None, 12, 12, 64) 0 ______________________________________________________________________________________ flatten_2 (Flatten) (None, 9216) 0 ______________________________________________________________________________________ dense_3 (Dense) (None, 128) 1179776 ______________________________________________________________________________________ dense_4 (Dense) (None, 10) 1290 ====================================================================================== Total params: 1,199,882 Trainable params: 1,199,882 Non-trainable params: 0 ______________________________________________________________________________________
summary(vgg16_imagenet_model)
Layer (type) Output Shape Param # ====================================================================================== input_1 (InputLayer) (None, 224, 224, 3) 0 ______________________________________________________________________________________ block1_conv1 (Conv2D) (None, 224, 224, 64) 1792 ______________________________________________________________________________________ block1_conv2 (Conv2D) (None, 224, 224, 64) 36928 ______________________________________________________________________________________ block1_pool (MaxPooling2D) (None, 112, 112, 64) 0 ______________________________________________________________________________________ ... (entire model not shown) ... ______________________________________________________________________________________ fc2 (Dense) (None, 4096) 16781312 ______________________________________________________________________________________ predictions (Dense) (None, 1000) 4097000 ====================================================================================== Total params: 138,357,544 Trainable params: 138,357,544 Non-trainable params: 0 ______________________________________________________________________________________
Computer vision
Natural language processing
Time series
Biomedical
What next?
ImageNet: 3.2 million labelled images, separated into 5,247 categories, sorted into 12 subtrees like “mammal,” “vehicle,” and “furniture.”
ImageNet Challenge: An annual competition (2010-2017) to see which algorithms could identify objects in the dataset’s images with the lowest error rate.
Accuracy improved from 71.8% to 97.3% over the lifetime of the contest.
Deep learning was used for the first time in 2012, and beat the field by 10.8%.
Paper submissions could just indicate a passing fad or could be a fundamental transformation, we don’t know yet!
Google’s Neural Machine Translation System
Time series classification based on convolutional neural networks (same technique commonly used for images)
Convolution and pooling operations are alternatively used to generate deep features of the raw data, which are then fed through standard dense layers.
Wavelet transforms used to address noise in financial time series.
Stacked autoencoder used to learn the deep features of financial time series in an unsupervised manner.
Applications of deep learning to a variety of biomedical problems—patient classification, fundamental biological processes, and treatment of patients.
January 2018 study by Google, UCSF, Stanford, and U of Chicago Medicine
Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient’s record.
Produced predictions for a wide range of clinical problems and outcomes that outperformed state-of-the-art traditional predictive models.
Of course we can help here by promoting a more balanced dialog about the strengths and weaknesses of these methods.
High-level R interfaces for neural nets and traditional models
Low-level interface to enable new applications (e.g. Greta)
Tools to facilitate productive workflow / experiment management
Straightforward access to GPUs for training models
Breadth and depth of educational resources
library(keras) model <- keras_model_sequential() %>% layer_conv_2d(filters = 32, kernel_size = c(3,3), activation = 'relu', input_shape = input_shape) %>% layer_conv_2d(filters = 64, kernel_size = c(3,3), activation = 'relu') %>% layer_max_pooling_2d(pool_size = c(2, 2)) %>% layer_dropout(rate = 0.25) %>% layer_flatten() %>% layer_dense(units = 128, activation = 'relu') %>% layer_dropout(rate = 0.5) %>% layer_dense(units = 10, activation = 'softmax')
Estimator | Description |
---|---|
linear_regressor() | Linear regressor model. |
linear_classifier() | Linear classifier model. |
dnn_regressor() | Dynamic neural network regression. |
dnn_classifier() | Dynamic neural network classification. |
dnn_linear_combined_regressor() | DNN Linear Combined Regression. |
dnn_linear_combined_classifier() | DNN Linear Combined Classification. |
library(tensorflow) W <- tf$Variable(tf$random_uniform(shape(1L), -1.0, 1.0)) b <- tf$Variable(tf$zeros(shape(1L))) y <- W * x_data + b loss <- tf$reduce_mean((y - y_data) ^ 2) optimizer <- tf$train$GradientDescentOptimizer(0.5) train <- optimizer$minimize(loss) sess = tf$Session() sess$run(tf$global_variables_initializer()) for (step in 1:200) sess$run(train)
library(keras) # Load MNIST images datasets (built in to Keras) c(c(x_train, y_train), c(x_test, y_test)) %<-% dataset_mnist() # Flatten images and transform RGB values into [0,1] range x_train <- array_reshape(x_train, c(nrow(x_train), 784)) x_test <- array_reshape(x_test, c(nrow(x_test), 784)) x_train <- x_train / 255 x_test <- x_test / 255 # Convert class vectors to binary class matrices y_train <- to_categorical(y_train, 10) y_test <- to_categorical(y_test, 10)
model <- keras_model_sequential() %>% layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>% layer_dropout(rate = 0.4) %>% layer_dense(units = 128, activation = 'relu') %>% layer_dropout(rate = 0.3) %>% layer_dense(units = 10, activation = 'softmax') model %>% compile( loss = 'categorical_crossentropy', optimizer = optimizer_rmsprop(), metrics = c('accuracy') )
# Modify model object in place (note that it is not assigned back to) model %>% compile( optimizer = 'rmsprop', loss = 'binary_crossentropy', metrics = c('accuracy') )
Keras models are directed acyclic graphs of layers whose state is updated during training.
Keras layers can be shared by multiple parts of a Keras model.
history <- model %>% fit( x_train, y_train, batch_size = 128, epochs = 10, validation_split = 0.2 )
Feed 128 samples at a time to the model (batch_size = 128
)
Traverse the input dataset 10 times (epochs = 10
)
Hold out 20% of the data for validation (validation_split = 0.2
)
plot(history)
model %>% evaluate(x_test, y_test)
$loss [1] 0.1078904 $acc [1] 0.9815
model %>% predict_classes(x_test[1:100,])
[1] 7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4 9 6 6 5 4 0 7 4 0 1 3 1 3 4 7 [36] 2 7 1 2 1 1 7 4 2 3 5 1 2 4 4 6 3 5 5 6 0 4 1 9 5 7 8 9 3 7 4 6 4 3 0 [71] 7 0 2 9 1 7 3 2 9 7 7 6 2 7 8 4 7 3 6 1 3 6 9 3 1 4 1 7 6 9
layer_dense() | Add a densely-connected NN layer to an output. |
layer_dropout() | Applies Dropout to the input. |
layer_batch_normalization() | Batch normalization layer (Ioffe and Szegedy, 2014). |
layer_conv_2d() | 2D convolution layer (e.g. spatial convolution over images). |
layer_max_pooling_2d() | Max pooling operation for spatial data. |
layer_gru() | Gated Recurrent Unit - Cho et al. |
layer_lstm() | Long-Short Term Memory unit. |
layer_embedding() | Turns positive integers (indexes) into dense vectors of fixed size. |
layer_reshape() | Reshapes an output to a certain shape. |
layer_flatten() | Flattens an input. |
layer_dense(units = 128)
output = activation(dot(input, kernel) + bias)
kernel
is a weights matrixbias
is a bias vectoractivation
is a function often used to introduce non-linearity (e.g. sigmoid)layer_conv_2d()
layer_simple_rnn() layer_gru() layer_lstm()
model <- keras_model_sequential() %>% layer_embedding(input_dim = 10000, output_dim = 8, input_length = 20) %>% layer_flatten() %>% layer_dense(units = 1, activation = "sigmoid")
Learn the embeddings jointly with the main task you care about (e.g. classification); or
Load pre-trained word embeddings (e.g. Word2vec, GloVe)
Model compilation prepares the model for training by:
model %>% compile( loss = 'categorical_crossentropy', optimizer = optimizer_rmsprop(), metrics = c('accuracy') )
loss_binary_crossentropy()
loss_categorical_crossentropy()
loss_categorical_hinge()
loss_cosine_proximity()
loss_hinge()
loss_kullback_leibler_divergence()
loss_logcosh()
loss_mean_absolute_error()
loss_mean_absolute_percentage_error()
loss_mean_squared_error()
loss_mean_squared_logarithmic_error()
loss_poisson()
loss_sparse_categorical_crossentropy()
loss_squared_hinge()
optimizer_adadelta()
optimizer_adagrad()
optimizer_adam()
optimizer_adamax()
optimizer_nadam()
optimizer_rmsprop()
optimizer_sgd()
metric_binary_accuracy()
metric_binary_crossentropy()
metric_categorical_accuracy()
metric_categorical_crossentropy()
metric_cosine_proximity()
metric_hinge()
metric_kullback_leibler_divergence()
metric_mean_absolute_error()
metric_mean_absolute_percentage_error()
metric_mean_squared_error()
metric_mean_squared_logarithmic_error()
metric_poisson()
metric_sparse_categorical_crossentropy()
metric_sparse_top_k_categorical_accuracy()
metric_squared_hinge()
metric_top_k_categorical_accuracy()
Demonstrates transfer learning (training a new model with a pre-trained model as a starting point) for image classification.
Forecasting temperatures with a weather timeseries dataset recorded at the Weather Station at the Max Planck Institute for Biogeochemistry in Jena, Germany.
Reviews three advanced techniques for improving the performance and generalization power of recurrent neural networks: recurrent dropout, stacking recurrent layers, and bidirectional recurrent layers.
Illustrates how deep learning is successfully being applied to model key molecular interactions in the human immune system.
Molecular interactions are highly context dependent and therefore non-linear. Deep learning is a powerful tool for modeling this non-linearity.
An autoencoder is a neural network that is used to learn a representation (encoding) for a set of data, typically for the purpose of dimensionality reduction.
For this problem we train an autoencoder to encode non-fraud observations from our training set.
Since frauds are supposed to have a different distribution then normal transactions, we expect that our autoencoder will have higher reconstruction errors on frauds then on normal transactions.
Uses Kaggle Quora Question Pairs dataset and consists of approximately 400,000 pairs of questions along with a column indicating if the question pair is considered a duplicate.
Learn a function that maps input patterns into a target space such that a similarity measure in the target space approximates the “semantic” distance in the input space.
Bonus: Shiny front-end application to score example questions.
Word embedding is a method used to map words of a vocabulary to dense vectors of real numbers where semantically similar words are mapped to nearby points.
This in turn can improve natural language processing tasks like syntactic parsing and sentiment analysis by grouping similar words.
preds <- model %>% predict(img) imagenet_decode_predictions(preds, top = 3)[[1]]
class_name class_description score 1 n02504458 African_elephant 0.909420729 2 n01871265 tusker 0.086183183 3 n02504013 Indian_elephant 0.004354581
For any given prediction and any given classifier, determine a small set of features in the original data that has driven the outcome of the prediction.
Many fields of inquiry have a deep learning frontier that has not yet been reached or even well approached.
In these cases traditional methods are invariably cheaper and more accurate.
To approach the frontier you need understanding of the full range of DL techniques, iteration, patience, and lots of compute power!
Over time you will develop stronger intuitions about the various tools available and how they might be successfully combined to model your data.
Need lots of compute power!
Need tools that help you get the most out of the compute you have.
Tool | Description |
---|---|
GPUs | Using GPUs locally or in the cloud. |
tfruns | Track and manage training runs and experiments. |
cloudml | R interface to Google Cloud Machine Learning Engine. |
tfdeploy | Exporting and deploying TensorFlow models. |
Some applications—in particular, image processing with convolutional networks and sequence processing with recurrent neural networks—will be excruciatingly slow on CPU.
Successful deep learning requires a huge amount of experimentation.
This requires a systematic approach to conducting and tracking the results of experiments.
The training_run()
function is like the source()
function, but it automatically tracks and records output and metadata for the execution of the script:
library(tfruns) training_run("mnist_mlp.R")
ls_runs()
Data frame: 4 x 28 run_dir eval_loss eval_acc metric_loss metric_acc metric_val_loss metric_val_acc 1 runs/2017-12-09T21-01-11Z 0.1485 0.9562 0.2577 0.9240 0.1482 0.9545 2 runs/2017-12-09T21-00-11Z 0.1438 0.9573 0.2655 0.9208 0.1505 0.9559 3 runs/2017-12-09T19-59-44Z 0.1407 0.9580 0.2597 0.9241 0.1402 0.9578 4 runs/2017-12-09T19-56-48Z 0.1437 0.9555 0.2610 0.9227 0.1459 0.9551
ls_runs(eval_acc > 0.9570, order = eval_acc)
Data frame: 2 x 28 run_dir eval_acc eval_loss metric_loss metric_acc metric_val_loss metric_val_acc 1 runs/2017-12-09T19-59-44Z 0.9580 0.1407 0.2597 0.9241 0.1402 0.9578 2 runs/2017-12-09T21-00-11Z 0.9573 0.1438 0.2655 0.9208 0.1505 0.9559
# define flags and their defaults FLAGS <- flags( flag_integer("dense_units1", 128), flag_numeric("dropout1", 0.4), flag_integer("dense_units2", 128), flag_numeric("dropout2", 0.3) )
# use flag layer_dropout(rate = FLAGS$dropout1)
# train with flag training_run("mnist_mlp.R", flags = list(dropout1 = 0.3))
# run various combinations of dropout1 and dropout2 runs <- tuning_run("mnist_mlp.R", flags = list( dropout1 = c(0.2, 0.3, 0.4), dropout2 = c(0.2, 0.3, 0.4) ))
# find the best evaluation accuracy runs[order(runs$eval_acc, decreasing = TRUE), ]
Data frame: 9 x 28 run_dir eval_loss eval_acc metric_loss metric_acc metric_val_loss metric_val_acc 9 runs/2018-01-26T13-21-03Z 0.1002 0.9817 0.0346 0.9900 0.1086 0.9794 6 runs/2018-01-26T13-23-26Z 0.1133 0.9799 0.0409 0.9880 0.1236 0.9778 5 runs/2018-01-26T13-24-11Z 0.1056 0.9796 0.0613 0.9826 0.1119 0.9777 4 runs/2018-01-26T13-24-57Z 0.1098 0.9788 0.0868 0.9770 0.1071 0.9771 2 runs/2018-01-26T13-26-28Z 0.1185 0.9783 0.0688 0.9819 0.1150 0.9783 3 runs/2018-01-26T13-25-43Z 0.1238 0.9782 0.0431 0.9883 0.1246 0.9779 8 runs/2018-01-26T13-21-53Z 0.1064 0.9781 0.0539 0.9843 0.1086 0.9795 7 runs/2018-01-26T13-22-40Z 0.1043 0.9778 0.0796 0.9772 0.1094 0.9777 1 runs/2018-01-26T13-27-14Z 0.1330 0.9769 0.0957 0.9744 0.1304 0.9751
Scalable training of models built with the keras, tfestimators, and tensorflow R packages.
On-demand access to training on GPUs, including the new Tesla P100 GPUs from NVIDIA®.
Hyperparameter tuning to optimize key attributes of model architectures in order to maximize predictive accuracy.
Train on default CPU instance:
library(cloudml) cloudml_train("mnist_mlp.R")
Automatically uploads contents of working directory along with script
Automatically installs all required R packages on CloudML servers
# Train on a GPU instance cloudml_train("mnist_mlp.R", master_type = "standard_gpu") # Train on an NVIDIA Tesla P100 GPU cloudml_train("mnist_mlp.R", master_type = "standard_p100")
job_collect()
Collects job metadata and all files created by the job (e.g. event logs, saved models)
Uses tfruns to allow inspection, enumeration, and comparison of jobs
ls_runs()
Data frame: 6 x 37 run_dir eval_loss eval_acc metric_loss metric_acc metric_val_loss metric_val_acc 6 runs/cloudml_2018_01_26_135812740 0.1049 0.9789 0.0852 0.9760 0.1093 0.9770 2 runs/cloudml_2018_01_26_140015601 0.1402 0.9664 0.1708 0.9517 0.1379 0.9687 5 runs/cloudml_2018_01_26_135848817 0.1159 0.9793 0.0378 0.9887 0.1130 0.9792 3 runs/cloudml_2018_01_26_135936130 0.0963 0.9780 0.0701 0.9792 0.0969 0.9790 1 runs/cloudml_2018_01_26_140045584 0.1486 0.9682 0.1860 0.9504 0.1453 0.9693 4 runs/cloudml_2018_01_26_135912819 0.1141 0.9759 0.1272 0.9655 0.1087 0.9762 # ... with 30 more columns: # flag_dense_units1, flag_dropout1, flag_dense_units2, flag_dropout2, samples, validation_samples, # batch_size, epochs, epochs_completed, metrics, model, loss_function, optimizer, learning_rate, # script, start, end, completed, output, source_code, context, type, cloudml_console_url, # cloudml_created, cloudml_end, cloudml_job, cloudml_log_url, cloudml_ml_units, cloudml_start, # cloudml_state
FLAGS <- flags( flag_integer("dense_units1", 128), flag_numeric("dropout1", 0.4), flag_integer("dense_units2", 128), flag_numeric("dropout2", 0.3) ) model <- keras_model_sequential() %>% layer_dense(units = FLAGS$dense_units1, activation = 'relu', input_shape = c(784)) %>% layer_dropout(rate = FLAGS$dropout1) %>% layer_dense(units = FLAGS$dense_units2, activation = 'relu') %>% layer_dropout(rate = FLAGS$dropout2) %>% layer_dense(units = 10, activation = 'softmax')
trainingInput: hyperparameters: goal: MAXIMIZE hyperparameterMetricTag: val_acc maxTrials: 10 params: - parameterName: dropout1 type: DOUBLE minValue: 0.2 maxValue: 0.6 scaleType: UNIT_LINEAR_SCALE - parameterName: dropout2 type: DOUBLE minValue: 0.1 maxValue: 0.5 scaleType: UNIT_LINEAR_SCALE
cloudml_train("minst_mlp.R", config = "tuning.yml")
job_trials()
finalMetric.objectiveValue finalMetric.trainingStep hyperparameters.dropout1 hyperparameters.dropout2 trialId 1 0.973854 19 0.2011326172916916 0.32774705750441724 10 2 0.973458 19 0.20090378506439671 0.10079321757280404 3 3 0.973354 19 0.5476299090261757 0.49998941144858033 6 4 0.972875 19 0.597820322273044 0.4074512354566201 7 5 0.972729 19 0.25969787952729828 0.42851076497180118 1 6 0.972417 19 0.20045494784980847 0.15927383711937335 4 7 0.972188 19 0.33367593781223304 0.10077055587860367 5 8 0.972188 19 0.59880072314674071 0.10476853415572558 9 9 0.972021 19 0.40078175292512 0.49982245025905447 8 10
job_collect(trials = "best")
Collects trial with best objective metric by default
Specify trials = "all"
to collect all trials (runs) and then perform offline analysis of hyperparameter interactions via ls_runs()
.
TensorFlow was built from the ground up to enable deployment using a low-latency C++ runtime.
Deploying TensorFlow models requires no runtime R or Python code.
Key enabler for this is the TensorFlow SavedModel:
SavedModel provides a language-neutral format to save machine-learned models that is recoverable and hermetic. It enables higher-level systems and tools to produce, consume and transform TensorFlow models.
TensorFlow models can be deployed to servers, embedded devices, mobile phones, and even to a web browser!
model <- keras_model_sequential( %>% ) layer_dense(units = 256, activation = 'relu', input_shape = c(784), name = "image") %>% layer_dense(units = 128, activation = 'relu') %>% layer_dense(units = 10, activation = 'softmax', name = "prediction")
Note that we give the input and output layers names (“image” and “prediction”)
# ...compile and fit model # export model library(tfdeploy) export_savedmodel(model, "savedmodel")
serve_savedmodel("savedmodel")
TensorFlow Serving is a flexible, high-performance serving system for machine learning models, designed for production environments.
TensorFlow Serving makes it easy to deploy new algorithms and experiments, while keeping the same server architecture and APIs.
TensorFlow Serving provides out-of-the-box integration with TensorFlow models, but can be easily extended to serve other types of models and data.
library(rsconnect) deployTFModel("savedmodel", account = <username>, server = <internal_connect_server>)
library(cloudml) cloudml_deploy("savedmodel", name = "keras_mnist")
Copying file://savedmodel/variables/variables.data-00000-of-00001 ... Copying file://savedmodel/saved_model.pb ... Copying file://savedmodel/variables/variables.index ... / [3/3 files][ 1.9 MiB/ 1.9 MiB] 100% Done Operation completed over 3 objects/1.9 MiB. Model created and available in https://console.cloud.google.com/mlengine/models/keras_mnist
Straightforward to load already trained models into Shiny applications.
Training code (and hardware!) not required for inference.
model <- load_model_hdf5("model.hdf5") # Keras model %>% predict(input) predict_savedmodel(input, "savedmodel") # SavedModel
Run on embedded devices (e.g. Raspberry Pi)
Keras models can be converted to iOS CoreML
Keras models can be deployed to the browser with Keras.js
TensorFlow is a new general purpose numerical computing library with lots to offer the R community.
Deep learning has made great progress and will likely increase in importance in various fields in the coming years.
R now has a great set of APIs and supporting tools for using TensorFlow and doing deep learning.
Slides: https://rstd.io/ml-with-tensorflow-and-r/
Subscribe to the blog to stay up to date!