t3f - library for working with Tensor Train decomposition built on top of TensorFlow

t3f is a library for working with Tensor Train decomposition. Tensor Train decomposition is a generalization of the low-rank decomposition from matrices to tensors (=multidimensional arrays), i.e. it’s a tool to efficiently work with structured tensors. t3f is implemented on top of TensorFlow which gives it a few nice properties:

  • GPU support – just run your model on a machine with a CUDA-enabled GPU and GPU version of the TensorFlow, and t3f will execute most of the operations on it.
  • Autodiff – TensorFlow can automatically compute the derivative of a function with respect to the underlying parameters of the Tensor Train decomposition (TT-cores). Also, if you are into the Riemannian optimization, you can automatically compute the Riemannian gradient of a given function. Don’t worry if you don’t know what it is :)
  • Batch processing – you can run a single vectorized operation on a set of Tensor Train objects.
  • Easy to use with Deep Learning, e.g. you can define a layer parametrized with a Tensor Train object and use it as a part of your favorite neural network implemented in TensorFlow.

Installation

T3f assumes you have Python 3.6 and a working TensorFlow installation (tested with TF 2.4, see here for TF installation instructions).

We don’t include TF into pip requirements since the installation of TensorFlow varies depending on your setup.

Then, to install the stable version, run

pip install t3f

To install the latest version, run

git clone https://github.com/Bihaqo/t3f.git
cd t3f
pip install .

Quick start

Open this page in an interactive mode via Google Colaboratory.

In this quick starting guide we show the basics of working with t3f library. The main concept of the library is a TensorTrain object – a compact (factorized) representation of a tensor (=multidimensional array). This is generalization of the matrix low-rank decomposition.

To begin, let’s import some libraries.

[1]:
import numpy as np

# Import TF 2.
%tensorflow_version 2.x
import tensorflow as tf

# Fix seed so that the results are reproducable.
tf.random.set_seed(0)
np.random.seed(0)
try:
    import t3f
except ImportError:
    # Install T3F if it's not already installed.
    !git clone https://github.com/Bihaqo/t3f.git
    !cd t3f; pip install .
    import t3f
TensorFlow 2.x selected.

Converting to and from TT-format

Let’s start with converting a dense (numpy) matrix into the TT-format, which in this case coincides with the low-rank format.

[2]:
# Generate a random dense matrix of size 3 x 4.
a_dense = np.random.randn(3, 4)
# Convert the matrix into the TT-format with TT-rank = 3 (the larger the TT-rank,
# the more exactly the tensor will be converted, but the more memory and time
# everything will take). For matrices, matrix rank coinsides with TT-rank.
a_tt = t3f.to_tt_tensor(a_dense, max_tt_rank=3)
# a_tt stores the factorized representation of the matrix, namely it stores the matrix
# as a product of two smaller matrices which are called TT-cores. You can
# access the TT-cores directly.
print('factors of the matrix: ', a_tt.tt_cores)
# To check that the convertions into the TT-format didn't change the matrix too much,
# let's convert it back and compare to the original.
reconstructed_matrix = t3f.full(a_tt)
print('Original matrix: ')
print(a_dense)
print('Reconstructed matrix: ')
print(reconstructed_matrix)

factors of the matrix:  (<tf.Tensor: shape=(1, 3, 3), dtype=float64, numpy=
array([[[-0.86358906, -0.23239721,  0.44744327],
        [-0.42523249,  0.81253763, -0.3986978 ],
        [-0.27090823, -0.53457847, -0.80052145]]])>, <tf.Tensor: shape=(3, 4, 1), dtype=float64, numpy=
array([[[-2.2895998 ],
        [-0.04123559],
        [-1.28825847],
        [-2.2648235 ]],

       [[ 1.16267886],
        [-1.10656759],
        [ 0.46752401],
        [-1.42118407]],

       [[ 0.12735099],
        [ 0.23999328],
        [-0.05617841],
        [-0.10115877]]])>)
Original matrix:
[[ 1.76405235  0.40015721  0.97873798  2.2408932 ]
 [ 1.86755799 -0.97727788  0.95008842 -0.15135721]
 [-0.10321885  0.4105985   0.14404357  1.45427351]]
Reconstructed matrix:
tf.Tensor(
[[ 1.76405235  0.40015721  0.97873798  2.2408932 ]
 [ 1.86755799 -0.97727788  0.95008842 -0.15135721]
 [-0.10321885  0.4105985   0.14404357  1.45427351]], shape=(3, 4), dtype=float64)

The same idea applies to tensors

[3]:
# Generate a random dense tensor of size 3 x 2 x 2.
a_dense = np.random.randn(3, 2, 2).astype(np.float32)
# Convert the tensor into the TT-format with TT-rank = 3.
a_tt = t3f.to_tt_tensor(a_dense, max_tt_rank=3)
# The 3 TT-cores are available in a_tt.tt_cores.
# To check that the convertions into the TT-format didn't change the tensor too much,
# let's convert it back and compare to the original.
reconstructed_tensor = t3f.full(a_tt)
print('The difference between the original tensor and the reconsrtucted '
      'one is %f' % np.linalg.norm(reconstructed_tensor - a_dense))

The difference between the original tensor and the reconsrtucted one is 0.000002

Arithmetic operations

T3F is a library of different operations that can be applied to the tensors in the TT-format by working directly with the compact representation, i.e. without the need to materialize the tensors themself. Here are some basic examples

[4]:
# Create a random tensor of shape (3, 2, 2) directly in the TT-format
# (in contrast to generating a dense tensor and then converting it to TT).
b_tt = t3f.random_tensor((3, 2, 2), tt_rank=2)
# Compute the Frobenius norm of the tensor.
norm = t3f.frobenius_norm(b_tt)
print('Frobenius norm of the tensor is %f' % norm)
# Compute the TT-representation of the sum or elementwise product of two TT-tensors.
sum_tt = a_tt + b_tt
prod_tt = a_tt * b_tt
twice_a_tt = 2 * a_tt
# Most operations on TT-tensors increase the TT-rank. After applying a sequence of
# operations the TT-rank can increase by too much and we may want to reduce it.
# To do that there is a rounding operation, which finds the tensor that is of
# a smaller rank but is as close to the original one as possible.
rounded_prod_tt = t3f.round(prod_tt, max_tt_rank=3)
a_max_tt_rank = np.max(a_tt.get_tt_ranks())
b_max_tt_rank = np.max(b_tt.get_tt_ranks())
exact_prod_max_tt_rank = np.max(prod_tt.get_tt_ranks())
rounded_prod_max_tt_rank = np.max(rounded_prod_tt.get_tt_ranks())
difference = t3f.frobenius_norm(prod_tt - rounded_prod_tt)
print('The TT-ranks of a and b are %d and %d. The TT-rank '
      'of their elementwise product is %d. The TT-rank of '
      'their product after rounding is %d. The difference '
      'between the exact and the rounded elementwise '
      'product is %f.' % (a_max_tt_rank, b_max_tt_rank,
                         exact_prod_max_tt_rank,
                         rounded_prod_max_tt_rank,
                         difference))

Frobenius norm of the tensor is 2.943432
The TT-ranks of a and b are 3 and 2. The TT-rank of their elementwise product is 6. The TT-rank of their product after rounding is 3. The difference between the exact and the rounded elementwise product is 0.003162.

Working with TT-matrices

Recall that for 2-dimensional tensors the TT-format coincides with the matrix low-rank format. However, sometimes matrices can have full matrix rank, but some tensor structure (for example a kronecker product of matrices). In this case there is a special object called Matrix TT-format. You can think of it as a sum of kronecker products (although it’s a bit more complicated than that).

Let’s say that you have a matrix of size 8 x 27. You can convert it into the matrix TT-format of tensor shape (2, 2, 2) x (3, 3, 3) (in which case the matrix will be represented with 3 TT-cores) or, for example, into the matrix TT-format of tensor shape (4, 2) x (3, 9) (in which case the matrix will be represented with 2 TT-cores).

[5]:
a_dense = np.random.rand(8, 27).astype(np.float32)
a_matrix_tt = t3f.to_tt_matrix(a_dense, shape=((2, 2, 2), (3, 3, 3)), max_tt_rank=4)
# Now you can work with 'a_matrix_tt' like with any other TT-object, e.g.
print('Frobenius norm of the matrix is %f' % t3f.frobenius_norm(a_matrix_tt))
twice_a_matrix_tt = 2.0 * a_matrix_tt  # multiplication by a number.
prod_tt = a_matrix_tt * a_matrix_tt  # Elementwise product of two TT-matrices.

Frobenius norm of the matrix is 7.805310

But, additionally, you can also compute matrix multiplication between TT-matrices

[6]:
vector_tt = t3f.random_matrix(((3, 3, 3), (1, 1, 1)), tt_rank=3)
matvec_tt = t3f.matmul(a_matrix_tt, vector_tt)
# Check that the result coinsides with np.matmul.
matvec_expected = np.matmul(t3f.full(a_matrix_tt), t3f.full(vector_tt))
difference = np.linalg.norm(matvec_expected - t3f.full(matvec_tt))
print('Difference between multiplying matrix by vector in '
      'the TT-format and then converting the result into '
      'dense vector and multiplying dense matrix by '
      'dense vector is %f.' % difference)
Difference between multiplying matrix by vector in the TT-format and then converting the result into dense vector and multiplying dense matrix by dense vector is 0.000001.

Frequently asked questions

What is tensor anyway?

For most purposes, tensor is just a multidimensional array. For example, a matrix is a 2-dimensional array.

How to convert large scale tensors into the TT-format

For small tensors, you can directly convert them into the TT-format by an SVD-based algorithm (e.g. with t3f.to_tt_tensor). For large tensor that do not fit to memory there are two ways: either try to approximate the TT-format by sampling a fraction of the tensor elements (see TT-cross) or express your tensor analytically. For example, sometimes tensors can be expressed as arithmetic operations of simpler tensors, and for some functions (e.g. exp or sin) there are analytical expressions for the TT-format of their values on a grid.

What do people do with this Tensor Train format?

In machine learning, TT-format is used for compressing neural network layers (fully-connected, convolutional, recurrent), speeding up training of Gaussian processes, theoretical analysis of expressive power of Recurrent Neural Networks (one and two), reinforcement learning, etc. See an overview paper for more information.

TT-format is also known in physics community under the name of Matrix Product State (MPS) and is extensively used.

Are there other tensor decompositions?

Yes! Most notable there are Canonical decomposition, Tucker, and Hierarchical Tucker decompositions. They all have their pros and cons, for instance many operations for Canonical decomposition are NP-hard; Tucker decomposition scales exponentially with the dimensionality of the tensor (and thus is inaplicable to tensors of dimensionality > 40). Hierarchical Tucker is very similar to TT-decomposition.

Where can I read more about this Tensor Train format?

Look at the paper that proposed it. You can also check out my (Alexander Novikov’s) slides, from slide 3 to 14.

By the way, train means like actual train, with wheels. The name comes from the pictures like the one below that illustrate the Tensor Train format and naturally look like a train (at least they say so).

_images/TT.png

API

Here you can find description of functions and methods avaliable in t3f.

Module contents

class t3f.TensorTrain(tt_cores, shape=None, tt_ranks=None, convert_to_tensors=True, name='TensorTrain')

Bases: t3f.tensor_train_base.TensorTrainBase

Represents a Tensor Train object (a TT-tensor or TT-matrix).

t3f represents a Tensor Train object as a tuple of TT-cores.

left_tt_rank_dim

The dimension of the left TT-rank in each TT-core.

right_tt_rank_dim

The dimension of the right TT-rank in each TT-core.

tt_cores

A tuple of TT-cores.

Returns:
A tuple of 3d or 4d tensors shape
[r_k-1, n_k, r_k]
or
[r_k-1, n_k, m_k, r_k]
class t3f.TensorTrainBase(tt_cores)

Bases: object

An abstract class that represents a collection of Tensor Train cores.

dtype

The DType of elements in this tensor.

eval(feed_dict=None, session=None)

Evaluates this sparse tensor in a Session.

Calling this method will execute all preceding operations that produce the inputs needed for the operation that produces this tensor. N.B. Before invoking SparseTensor.eval(), its graph must have been launched in a session, and either a default session must be available, or session must be specified explicitly.

Parameters:
  • feed_dict – A dictionary that maps Tensor objects to feed values. See [Session.run()](../../api_docs/python/client.md#Session.run) for a description of the valid feed values.
  • session – (Optional.) The Session to be used to evaluate this sparse tensor. If none, the default session will be used.
get_raw_shape()

Get tuple of TensorShapes representing the shapes of the underlying TT-tensor.

Tuple contains one TensorShape for TT-tensor and 2 TensorShapes for TT-matrix

Returns:A tuple of TensorShape objects.
get_shape()

Get the TensorShape representing the shape of the dense tensor.

Returns:A TensorShape object.
get_tt_ranks()

Get the TT-ranks in an array of size `num_dims`+1.

The first and the last TT-rank are guarantied to be 1.

Returns:TensorShape of size `num_dims`+1.
graph

The Graph that contains the tt_cores tensors.

is_tt_matrix()

Returns True if the TensorTrain object represents a TT-matrix.

is_variable()

True if the TensorTrain object is a variable (e.g. is trainable).

name

The name of the TensorTrain.

Returns:String, the scope in which the TT-cores are defined.
ndims()

Get the number of dimensions of the underlying TT-tensor.

Returns:A number.
op

The Operation that evaluates all the cores.

tt_cores

A tuple of TT-cores.

class t3f.TensorTrainBatch(tt_cores, shape=None, tt_ranks=None, batch_size=None, convert_to_tensors=True, name='TensorTrainBatch')

Bases: t3f.tensor_train_base.TensorTrainBase

Represents a batch of Tensor Train objects (TT-tensors or TT-matrices).

t3f represents a Tensor Train object as a tuple of TT-cores.

batch_size

The number of elements or None if not known.

get_shape()

Get the TensorShape representing the shape of the dense tensor.

The first dimension is the batch_size.

Returns:A TensorShape object.
left_tt_rank_dim

The dimension of the left TT-rank in each TT-core.

right_tt_rank_dim

The dimension of the right TT-rank in each TT-core.

tt_cores

A tuple of TT-cores.

Returns:
A tuple of 4d or 5d tensors shape
[batch_size, r_k-1, n_k, r_k]
or
[batch_size, r_k-1, n_k, m_k, r_k]
t3f.add(tt_a, tt_b, name='t3f_add')

Returns a TensorTrain corresponding to elementwise sum tt_a + tt_b.

The shapes of tt_a and tt_b should coincide. Supports broadcasting:

add(TensorTrainBatch, TensorTrain)

adds TensorTrain to each element in the batch of TTs in TensorTrainBatch.

Parameters:
  • tt_aTensorTrain, TensorTrainBatch, TT-tensor, or TT-matrix
  • tt_bTensorTrain, TensorTrainBatch, TT-tensor, or TT-matrix
  • name – string, name of the Op.
Returns
a TensorTrain object corresponding to the element-wise sum of arguments if
both arguments are `TensorTrain`s.
OR a TensorTrainBatch if at least one of the arguments is
TensorTrainBatch
Raises
ValueError if the arguments shapes do not coincide
t3f.add_n_projected(tt_objects, coef=None)

Adds all input TT-objects that are projections on the same tangent space.

add_projected((a, b)) is equivalent add(a, b) for a and b that are from the same tangent space, but doesn’t increase the TT-ranks.
Parameters:
  • tt_objects – a list of TT-objects that are projections on the same tangent space.
  • coef

    a list of numbers or anything else convertable to tf.Tensor. If provided, computes weighted sum. The size of this array should be

    len(tt_objects) x tt_objects[0].batch_size
Returns:

TT-objects representing the sum of the tt_objects (weighted sum if coef is provided). The TT-rank of the result equals to the TT-ranks of the arguments.

t3f.assign(ref, value, validate_shape=None, use_locking=None, name=None)
t3f.batch_size(tt, name='t3f_batch_size')

Return the number of elements in a TensorTrainBatch.

Parameters:
  • ttTensorTrainBatch object.
  • name – string, name of the Op.
Returns:

0-D integer tensor.

Raises:

ValueError if got TensorTrain which doesn’t have batch_size as input.

t3f.bilinear_form(A, b, c, name='t3f_bilinear_form')

Bilinear form b^t A c; A is a TT-matrix, b and c can be batches.

Parameters:
  • ATensorTrain object containing a TT-matrix of size N x M.
  • bTensorTrain object containing a TT-matrix of size N x 1 or TensorTrainBatch with a batch of TT-matrices of size N x 1.
  • cTensorTrain object containing a TT-matrix of size M x 1 or TensorTrainBatch with a batch of TT-matrices of size M x 1.
  • name – string, name of the Op.
Returns:

A number, the value of the bilinear form if all the arguments are

`TensorTrain`s.

OR tf.Tensor of size batch_size if at least one of the arguments is

TensorTrainBatch

Raises:

ValueError if the arguments are not TT-matrices or if the shapes are – not consistent.

Complexity:
O(batch_size r_A r_c r_b n d (r_b + r_A n + r_c))

d is the number of TT-cores (A.ndims()); r_A is the largest TT-rank of A max(A.get_tt_rank()) n is the size of the axis dimensions e.g.

if b and c are tensors of shape (3, 3, 3), A is a 27 x 27 matrix of tensor shape (3, 3, 3) x (3, 3, 3) then n is 3
t3f.cast(tt, dtype, name='t3f_cast')

Casts a tt-tensor to a new type.

Parameters:
  • ttTensorTrain object.
  • dtype – The destination type.
  • name – string, name of the Op.
Raises:
  • TypeError – If tt cannot be cast to the dtype.
  • ValueError – If tt is not a TensorTrain or TensorTrainBatch.
t3f.clean_raw_shape(shape, name='t3f_clean_raw_shape')

Returns a tuple of TensorShapes for any valid shape representation.

Parameters:
  • shape – An np.array, a tf.TensorShape (for tensors), a tuple of tf.TensorShapes (for TT-matrices or tensors), or None
  • name – string, name of the Op.
Returns:

A tuple of tf.TensorShape, or None if the input is None

t3f.concat_along_batch_dim(tt_list, name='t3f_concat_along_batch_dim')

Concat all TensorTrainBatch objects along batch dimension.

Parameters:
  • tt_list – a list of TensorTrainBatch objects.
  • name – string, name of the Op.
Returns:

TensorTrainBatch

t3f.cores_regularizer(core_regularizer, scale, scope=None)

Returns a function that applies given regularization to each TT-core.

Parameters:
  • core_regularizer – a function with signature core_regularizer(core) that returns the penalty for the given TT-core.
  • scale – A scalar multiplier Tensor. 0.0 disables the regularizer.
  • scope – An optional scope name.
Returns:

A function with signature regularizer(weights) that applies the regularization.

Raises:

ValueError – If scale is negative or if scale is not a float.

t3f.expand_batch_dim(tt, name='t3f_expand_batch_dim')

Creates a 1-element TensorTrainBatch from a TensorTrain.

Parameters:
  • tt – TensorTrain or TensorTrainBatch.
  • name – string, name of the Op.
Returns:

TensorTrainBatch

t3f.eye(shape, dtype=tf.float32, name='t3f_eye')

Creates an identity TT-matrix.

Parameters:
  • shape – array which defines the shape of the matrix row and column indices.
  • dtype – [tf.float32] dtype of the resulting matrix.
  • name – string, name of the Op.
Returns:

TensorTrain containing an identity TT-matrix of size np.prod(shape) x np.prod(shape)

t3f.flat_inner(a, b, name='t3f_flat_inner')

Inner product along all axis.

The shapes of a and b should coincide.

Parameters:
  • aTensorTrain, TensorTrainBatch, tf.Tensor, or tf.SparseTensor
  • bTensorTrain, TensorTrainBatch, tf.Tensor, or tf.SparseTensor
  • name – string, name of the Op.
Returns
a number
sum of products of all the elements of a and b
OR or a tf.Tensor of size batch_size
sum of products of all the elements of a and b for each element in the batch.
t3f.frobenius_norm(tt, epsilon=1e-05, differentiable=False, name='t3f_frobenius_norm')

Frobenius norm of TensorTrain or of each TT in TensorTrainBatch

Frobenius norm is the sqrt of the sum of squares of all elements in a tensor.

Parameters:
  • ttTensorTrain or TensorTrainBatch object
  • epsilon – the function actually computes sqrt(norm_squared + epsilon) for numerical stability (e.g. gradient of sqrt at zero is inf).
  • differentiable – bool, whether to use a differentiable implementation or a fast and stable implementation based on QR decomposition.
  • name – string, name of the Op.
Returns
a number which is the Frobenius norm of tt, if it is TensorTrain OR a Tensor of size tt.batch_size, consisting of the Frobenius norms of each TensorTrain in tt, if it is TensorTrainBatch
t3f.frobenius_norm_squared(tt, differentiable=False, name='t3f_frobenius_norm_squared')

Frobenius norm squared of TensorTrain or of each TT in TensorTrainBatch.

Frobenius norm squared is the sum of squares of all elements in a tensor.

Parameters:
  • ttTensorTrain or TensorTrainBatch object
  • differentiable – bool, whether to use a differentiable implementation or a fast and stable implementation based on QR decomposition.
  • name – string, name of the Op.
Returns
a number which is the Frobenius norm squared of tt, if it is TensorTrain OR a Tensor of size tt.batch_size, consisting of the Frobenius norms squared of each TensorTrain in tt, if it is TensorTrainBatch
t3f.full(tt, name='t3f_full')

Converts a TensorTrain into a regular tensor or matrix (tf.Tensor).

Parameters:
  • ttTensorTrain or TensorTrainBatch object.
  • name – string, name of the Op.
Returns:

tf.Tensor.

t3f.gather_nd(tt, indices, name='t3f_gather_nd')

out[i] = tt[indices[i, 0], indices[i, 1], …]

Equivalent to
tf.gather_nd(t3f.full(tt), indices)

but much faster, since it does not materialize the full tensor.

For batches of TT works indices should include the batch dimension as well.

Parameters:
  • ttTensorTrain or TensorTrainBatch object representing a tensor (TT-matrices are not implemented yet)
  • indices

    numpy array, tf.Tensor, placeholder with 2 or more dimensions. The last dimension indices.shape[-1] should be equal to the numbers of dimensions in TT:

    indices.shape[-1] = tt.ndims for TensorTrain indices.shape[-1] = tt.ndims + 1 for TensorTrainBatch
  • name – string, name of the Op.
Returns:

tf.Tensor with elements specified by indices.

Raises:
  • ValueError if indices have wrong shape.
  • NotImplementedError if tt is a TT-matrix.
t3f.get_variable(name, dtype=None, initializer=None, regularizer=None, trainable=True, collections=None, caching_device=None, validate_shape=True)

Returns TensorTrain object with tf.Variables as the TT-cores.

Parameters:
  • name – The name of the new or existing TensorTrain variable. Used to name the TT-cores.
  • dtype – Type of the new or existing TensorTrain variable TT-cores (defaults to DT_FLOAT).
  • initializer – TensorTrain or TensorTrainBatch, initializer for the variable if one is created.
  • regularizer – A (TensorTrain -> Tensor or None) function; the result of applying it on a newly created variable will be added to the collection GraphKeys.REGULARIZATION_LOSSES and can be used for regularization.
  • trainable – If True also add the variable to the graph collection GraphKeys.TRAINABLE_VARIABLES (see tf.Variable).
  • collections – List of graph collections keys to add the Variables (underlying TT-cores). Defaults to [GraphKeys.GLOBAL_VARIABLES] (see tf.Variable).
  • caching_device – Optional device string or function describing where the Variable should be cached for reading. Defaults to the Variable’s device. If not None, caches on another device. Typical use is to cache on the device where the Ops using the Variable reside, to deduplicate copying through Switch and other conditional statements.
  • validate_shape – If False, allows the variable to be initialized with a value of unknown shape. If True, the default, the shape of initial_value must be known.
Returns:

The created or existing TensorTrain object with tf.Variables TT-cores.

Raises:

ValueError – when creating a new variable and shape is not declared, when violating reuse during variable creation, or when initializer dtype and dtype don’t match. Reuse is set inside variable_scope.

t3f.glorot_initializer(shape, tt_rank=2, dtype=tf.float32, name='t3f_glorot_initializer')

Constructs a random TT matrix with entrywise variance 2.0 / (n_in + n_out)

Parameters:
  • shape

    2d array, shape[0] is the shape of the matrix row-index, shape[1] is the shape of the column index. shape[0] and shape[1] should have the same number of elements (d) Also supports omitting one of the dimensions for vectors, e.g.

    glorot_initializer([[2, 2, 2], None])
    and
    glorot_initializer([None, [2, 2, 2]])

    will create an 8-element column and row vectors correspondingly.

  • tt_rank – a number or a (d+1)-element array with ranks.
  • dtype – [tf.float32] dtype of the resulting matrix.
  • name – string, name of the Op.
Returns:

TensorTrain containing a TT-matrix of size

np.prod(shape[0]) x np.prod(shape[1])

t3f.gradients(func, x, name='t3f_gradients', runtime_check=True)

Riemannian autodiff: returns gradient projected on tangent space of TT.

Computes projection of the gradient df/dx onto the tangent space of TT tensor at point x.

Warning: this is experimental feature and it may not work for some function, e.g. ones that include QR or SVD decomposition (t3f.project, t3f.round) or for functions that work with TT-cores directly (in contrast to working with TT-object only via t3f functions). In this cases this function can silently return wrong results!

Example

# Scalar product with some predefined tensor squared 0.5 * <x, t>**2. # It’s gradient is <x, t> t and it’s Riemannian gradient is # t3f.project(<x, t> * t, x) f = lambda x: 0.5 * t3f.flat_inner(x, t)**2 projected_grad = t3f.gradients(f, x) # t3f.project(t3f.flat_inner(x, t) * t, x)

Parameters:
  • func – function that takes TensorTrain object as input and outputs a number.
  • x – point at which to compute the gradient and on which tangent space to project the gradient.
  • name – string, name of the Op.
  • runtime_check – [True] whether to do a sanity check that the passed function is invariant to different TT representations (otherwise the Rieamnnian gradient doesn’t even exist). It makes things slower, but helps catching bugs, so turn it off during production deployment.
Returns:

TensorTrain, projection of the gradient df/dx onto the tangent space at point x.

See also

t3f.hessian_vector_product

t3f.gram_matrix(tt_vectors, matrix=None, name='t3f_gram_matrix')

Computes Gramian matrix of a batch of TT-vectors.

If matrix is None, computes
res[i, j] = t3f.flat_inner(tt_vectors[i], tt_vectors[j]).
If matrix is present, computes
res[i, j] = t3f.flat_inner(tt_vectors[i], t3f.matmul(matrix, tt_vectors[j]))
or more shortly
res[i, j] = tt_vectors[i]^T * matrix * tt_vectors[j]

but is more efficient.

Parameters:
  • tt_vectors – TensorTrainBatch.
  • matrix – None, or TensorTrain matrix.
  • name – string, name of the Op.
Returns:

tf.tensor with the Gram matrix.

Complexity:
If the matrix is not present, the complexity is O(batch_size^2 d r^3 n)

where d is the number of TT-cores (tt_vectors.ndims()), r is the largest TT-rank

max(tt_vectors.get_tt_rank())
and n is the size of the axis dimension, e.g.
for a tensor of size 4 x 4 x 4, n is 4; for a 9 x 64 matrix of raw shape (3, 3, 3) x (4, 4, 4) n is 12
If the matrix of TT-rank R is present, the complexity is
O(batch_size^2 d R r^2 n (r + nR))

where the matrix is of raw-shape (n, n, …, n) x (n, n, …, n); r is the TT-rank of vectors tt_vectors; R is the TT-rank of the matrix.

t3f.he_initializer(shape, tt_rank=2, dtype=tf.float32, name='t3f_he_initializer')

Constructs a random TT matrix with entrywise variance 2.0 / n_in

Parameters:
  • shape

    2d array, shape[0] is the shape of the matrix row-index, shape[1] is the shape of the column index. shape[0] and shape[1] should have the same number of elements (d) Also supports omitting one of the dimensions for vectors, e.g.

    he_initializer([[2, 2, 2], None])
    and
    he_initializer([None, [2, 2, 2]])

    will create an 8-element column and row vectors correspondingly.

  • tt_rank – a number or a (d+1)-element array with ranks.
  • dtype – [tf.float32] dtype of the resulting matrix.
  • name – string, name of the Op.
Returns:

TensorTrain containing a TT-matrix of size

np.prod(shape[0]) x np.prod(shape[1])

t3f.hessian_vector_product(func, x, vector, name='t3f_hessian_vector_product', runtime_check=True)

P_x [d^2f/dx^2] P_x vector, i.e. Riemannian hessian by vector product.

Computes
P_x [d^2f/dx^2] P_x vector

where P_x is projection onto the tangent space of TT at point x and d^2f/dx^2 is the Hessian of the function.

Note that the true Riemannian hessian also includes the manifold curvature term which is ignored here.

Warning: this is experimental feature and it may not work for some function, e.g. ones that include QR or SVD decomposition (t3f.project, t3f.round) or for functions that work with TT-cores directly (in contrast to working with TT-object only via t3f functions). In this cases this function can silently return wrong results!

Example

# Quadratic form with matrix A: <x, A x>. # It’s gradient is (A + A.T) x, it’s Hessian is (A + A.T) # It’s Riemannian Hessian by vector product is # proj_vec = t3f.project(vector, x) # t3f.project(t3f.matmul(A + t3f.transpose(A), proj_vec), x) f = lambda x: t3f.bilinear_form(A, x, x) res = t3f.hessian_vector_product(f, x, vector)

Parameters:
  • func – function that takes TensorTrain object as input and outputs a number.
  • x
    point at which to compute the Hessian and on which tangent space to
    project the gradient.

    vector: TensorTrain object which to multiply be the Hessian. name: string, name of the Op. runtime_check: [True] whether to do a sanity check that the passed

    function is invariant to different TT representations (otherwise the Rieamnnian gradient doesn’t even exist). It makes things slower, but helps catching bugs, so turn it off during production deployment.
Returns:

TensorTrain, result of the Riemannian hessian by vector product.

See also

t3f.gradients

t3f.is_batch_broadcasting_possible(tt_a, tt_b)

Check that the batch broadcasting possible for the given batch sizes.

Returns true if the batch sizes are the same or if one of them is 1.

If the batch size that is supposed to be 1 is not known on compilation stage, broadcasting is not allowed.

Parameters:
  • tt_a – TensorTrain or TensorTrainBatch
  • tt_b – TensorTrain or TensorTrainBatch
Returns:

Bool

t3f.l2_regularizer(scale, scope=None)

Returns a function that applies L2 regularization to TensorTrain weights.

Parameters:
  • scale – A scalar multiplier Tensor. 0.0 disables the regularizer.
  • scope – An optional scope name.
Returns:

A function with signature l2(tt) that applies L2 regularization.

Raises:

ValueError – If scale is negative or if scale is not a float.

t3f.lazy_batch_size(tt, name='t3f_lazy_batch_size')

Return static batch_size if available and dynamic otherwise.

Parameters:
  • ttTensorTrainBatch object.
  • name – string, name of the Op.
Returns:

A number or a 0-D tf.Tensor

Raises:

ValueError if got TensorTrain which doesn’t have batch_size as input.

t3f.lazy_raw_shape(tt, name='t3f_lazy_raw_shape')

Returns static raw shape of a TensorTrain if defined, and dynamic otherwise.

This operation returns a 2-D integer numpy array representing the raw shape of the input if it is available on the graph compilation stage and 2-D integer tensor of dynamic shape otherwise. If the input is a TT-tensor, the raw shape will have 1 x ndims() elements. If the input is a TT-matrix, the raw shape will have 2 x ndims() elements representing the underlying tensor shape of the matrix.

Parameters:
  • ttTensorTrain object.
  • name – string, name of the Op.
Returns:

A 2-D numpy array or tf.Tensor of size 1 x ndims() or 2 x ndims()

t3f.lazy_shape(tt, name='t3f_lazy_shape')

Returns static shape of a TensorTrain if defined, and dynamic otherwise.

This operation returns a 1-D integer numpy array representing the shape of the input if it is available on the graph compilation stage and 1-D integer tensor of dynamic shape otherwise.

Parameters:
  • ttTensorTrain object.
  • name – string, name of the Op.
Returns:

A 1-D numpy array or tf.Tensor

t3f.lazy_tt_ranks(tt, name='t3f_lazy_tt_ranks')

Returns static TT-ranks of a TensorTrain if defined, and dynamic otherwise.

This operation returns a 1-D integer numpy array of TT-ranks if they are available on the graph compilation stage and 1-D integer tensor of dynamic TT-ranks otherwise.

Parameters:
  • ttTensorTrain object.
  • name – string, name of the Op.
Returns:

A 1-D numpy array or tf.Tensor

t3f.lecun_initializer(shape, tt_rank=2, dtype=tf.float32, name='t3f_lecun_initializer')

Constructs a random TT matrix with entrywise variance 1.0 / n_in

Parameters:
  • shape

    2d array, shape[0] is the shape of the matrix row-index, shape[1] is the shape of the column index. shape[0] and shape[1] should have the same number of elements (d) Also supports omitting one of the dimensions for vectors, e.g.

    lecun_initializer([[2, 2, 2], None])
    and
    lecun_initializer([None, [2, 2, 2]])

    will create an 8-element column and row vectors correspondingly.

  • tt_rank – a number or a (d+1)-element array with ranks.
  • dtype – [tf.float32] dtype of the resulting matrix.
  • name – string, name of the Op.
Returns:

TensorTrain containing a TT-matrix of size

np.prod(shape[0]) x np.prod(shape[1])

t3f.matmul(a, b, name='t3f_matmul')

Multiplies two matrices that can be TT-, dense, or sparse.

Note that multiplication of two TT-matrices returns a TT-matrix with much larger ranks. Also works for multiplying two batches of TT-matrices or a product between a TT-matrix and a batch of TT-matrices (with broadcasting).

Parameters:
  • aTensorTrain, TensorTrainBatch, tf.Tensor, or tf.SparseTensor of size M x N
  • bTensorTrain, TensorTrainBatch, tf.Tensor, or tf.SparseTensor of size N x P
  • name – string, name of the Op.
Returns
If both arguments are TensorTrain objects, returns a TensorTrain
object containing a TT-matrix of size M x P.
If at least one of the arguments is a TensorTrainBatch object, returns
a TensorTrainBatch object containing a batch of TT-matrices of size M x P.

Otherwise, returns tf.Tensor of size M x P.

t3f.matrix_batch_with_random_cores(shape, tt_rank=2, batch_size=1, mean=0.0, stddev=1.0, dtype=tf.float32, name='t3f_matrix_batch_with_random_cores')

Generate a batch of TT-matrices of given shape with N(mean, stddev^2) cores.

Parameters:
  • shape

    2d array, shape[0] is the shape of the matrix row-index, shape[1] is the shape of the column index. shape[0] and shape[1] should have the same number of elements (d) Also supports omitting one of the dimensions for vectors, e.g.

    matrix_batch_with_random_cores([[2, 2, 2], None])
    and
    matrix_batch_with_random_cores([None, [2, 2, 2]])
  • create a batch of one 8-element column and row vector correspondingly. (will) –
  • tt_rank – a number or a (d+1)-element array with ranks.
  • batch_size – an integer.
  • mean – a number, the mean of the normal distribution used for initializing TT-cores.
  • stddev – a number, the standard deviation of the normal distribution used for initializing TT-cores.
  • dtype – [tf.float32] dtype of the resulting matrix.
  • name – string, name of the Op.
Returns:

TensorTrainBatch containing a batch of TT-matrices of size

np.prod(shape[0]) x np.prod(shape[1])

t3f.matrix_ones(shape, dtype=tf.float32, name='t3f_matrix_ones')

Generate a TT-matrix of the given shape with each entry equal to 1.

Parameters:
  • shape

    2d array, shape[0] is the shape of the matrix row-index, shape[1] is the shape of the column index. shape[0] and shape[1] should have the same number of elements (d) Also supports omitting one of the dimensions for vectors, e.g.

    matrix_ones([[2, 2, 2], None])
    and
    matrix_ones([None, [2, 2, 2]])

    will create an 8-element column and row vectors correspondingly.

  • dtype – [tf.float32] dtype of the resulting matrix.
  • name – string, name of the Op.
Returns:

TensorTrain containing a TT-matrix of size

np.prod(shape[0]) x np.prod(shape[1]) with each entry equal to 1

t3f.matrix_with_random_cores(shape, tt_rank=2, mean=0.0, stddev=1.0, dtype=tf.float32, name='t3f_matrix_with_random_cores')

Generate a TT-matrix of given shape with N(mean, stddev^2) cores.

Parameters:
  • shape

    2d array, shape[0] is the shape of the matrix row-index, shape[1] is the shape of the column index. shape[0] and shape[1] should have the same number of elements (d) Also supports omitting one of the dimensions for vectors, e.g.

    matrix_with_random_cores([[2, 2, 2], None])
    and
    matrix_with_random_cores([None, [2, 2, 2]])

    will create an 8-element column and row vectors correspondingly.

  • tt_rank – a number or a (d+1)-element array with ranks.
  • mean – a number, the mean of the normal distribution used for initializing TT-cores.
  • stddev – a number, the standard deviation of the normal distribution used for initializing TT-cores.
  • dtype – [tf.float32] dtype of the resulting matrix.
  • name – string, name of the Op.
Returns:

TensorTrain containing a TT-matrix of size

np.prod(shape[0]) x np.prod(shape[1])

t3f.matrix_zeros(shape, dtype=tf.float32, name='t3f_matrix_zeros')

Generate a TT-matrix of the given shape with each entry equal to 0.

Parameters:
  • shape

    2d array, shape[0] is the shape of the matrix row-index, shape[1] is the shape of the column index. shape[0] and shape[1] should have the same number of elements (d) Also supports omitting one of the dimensions for vectors, e.g.

    matrix_zeros([[2, 2, 2], None])
    and
    matrix_zeros([None, [2, 2, 2]])

    will create an 8-element column and row vectors correspondingly.

  • dtype – [tf.float32] dtype of the resulting matrix.
  • name – string, name of the Op.
Returns:

TensorTrain containing a TT-matrix of size

np.prod(shape[0]) x np.prod(shape[1]) with each entry equal to 0

t3f.multiply(tt_left, right, name='t3f_multiply')

Returns a TensorTrain corresponding to element-wise product tt_left * right.

Supports broadcasting:

multiply(TensorTrainBatch, TensorTrain) returns TensorTrainBatch consisting of element-wise products of TT in TensorTrainBatch and TensorTrain

multiply(TensorTrainBatch_a, TensorTrainBatch_b) returns TensorTrainBatch consisting of element-wise products of TT in TensorTrainBatch_a and TT in TensorTrainBatch_b

Batch sizes should support broadcasting

Parameters:
  • tt_leftTensorTrain OR TensorTrainBatch
  • rightTensorTrain OR TensorTrainBatch OR a number.
  • name – string, name of the Op.
Returns
a TensorTrain or TensorTrainBatch object corresponding to the element-wise product of the arguments.
Raises
ValueError if the arguments shapes do not coincide or broadcasting is not possible.
t3f.multiply_along_batch_dim(batch_tt, weights, name='t3f_multiply_along_batch_dim')

Multiply each TensorTrain in a batch by a number.

Parameters:
  • batch_tt – TensorTrainBatch object, TT-matrices or TT-tensors.
  • weights – 1-D tf.Tensor (or something convertible to it like np.array) of size tt.batch_size with weights.
  • name – string, name of the Op.
Returns:

TensorTrainBatch

t3f.ones_like(tt, name='t3f_ones_like')

Constructs t3f.ones with the shape of tt.

In the case when tt is TensorTrainBatch constructs t3f.ones with the shape of a TensorTrain in tt.

Parameters:
  • tt – TensorTrain object
  • name – string, name of the Op.
Returns:

TensorTrain object of the same shape as tt but with all entries equal to 1.

t3f.orthogonalize_tt_cores(tt, left_to_right=True, name='t3f_orthogonalize_tt_cores')

Orthogonalize TT-cores of a TT-object.

Parameters:
  • tt – TenosorTrain or a TensorTrainBatch.
  • left_to_right – bool, the direction of orthogonalization.
  • name – string, name of the Op.
Returns:

The same type as the input tt (TenosorTrain or a TensorTrainBatch).

t3f.pairwise_flat_inner(tt_1, tt_2, matrix=None, name='t3f_pairwise_flat_inner')

Computes all scalar products between two batches of TT-objects.

If matrix is None, computes
res[i, j] = t3f.flat_inner(tt_1[i], tt_2[j]).
If matrix is present, computes
res[i, j] = t3f.flat_inner(tt_1[i], t3f.matmul(matrix, tt_2[j]))
or more shortly
res[i, j] = tt_1[i]^T * matrix * tt_2[j]

but is more efficient.

Parameters:
  • tt_1 – TensorTrainBatch.
  • tt_2 – TensorTrainBatch.
  • matrix – None, or TensorTrain matrix.
  • name – string, name of the Op.
Returns:

tf.tensor with the matrix of pairwise scalar products (flat inners).

Complexity:
If the matrix is not present, the complexity is O(batch_size^2 d r^3 n)

where d is the number of TT-cores (tt_vectors.ndims()), r is the largest TT-rank

max(tt_vectors.get_tt_rank())
and n is the size of the axis dimension, e.g.
for a tensor of size 4 x 4 x 4, n is 4; for a 9 x 64 matrix of raw shape (3, 3, 3) x (4, 4, 4) n is 12
A more precise complexity is
O(batch_size^2 d r1 r2 n max(r1, r2))

where r1 is the largest TT-rank of tt_a and r2 is the largest TT-rank of tt_b.

If the matrix is present, the complexity is
O(batch_size^2 d R r1 r2 (n r1 + n m R + m r2))

where the matrix is of raw-shape (n, n, …, n) x (m, m, …, m) and TT-rank R; tt_1 is of shape (n, n, …, n) and is of the TT-rank r1; tt_2 is of shape (m, m, …, m) and is of the TT-rank r2;

t3f.pairwise_flat_inner_projected(projected_tt_vectors_1, projected_tt_vectors_2)

Scalar products between two batches of TTs from the same tangent space.

res[i, j] = t3f.flat_inner(projected_tt_vectors_1[i], projected_tt_vectors_1[j]).

pairwise_flat_inner_projected(projected_tt_vectors_1, projected_tt_vectors_2) is equivalent to

pairwise_flat_inner(projected_tt_vectors_1, projected_tt_vectors_2)

, but works only on objects from the same tangent space and is much faster than general pairwise_flat_inner.

Parameters:
  • projected_tt_vectors_1 – TensorTrainBatch of tensors projected on the same tangent space as projected_tt_vectors_2.
  • projected_tt_vectors_2 – TensorTrainBatch.
Returns:

tf.tensor with the scalar product matrix.

Complexity:
O(batch_size^2 d r^2 n), where

d is the number of TT-cores (projected_tt_vectors_1.ndims()); r is the largest TT-rank max(projected_tt_vectors_1.get_tt_rank())

(i.e. 2 * {the TT-rank of the object we projected vectors onto}.
and n is the size of the axis dimension, e.g.
for a tensor of size 4 x 4 x 4, n is 4; for a 9 x 64 matrix of raw shape (3, 3, 3) x (4, 4, 4) n is 12.
t3f.project(what, where)

Project what TTs on the tangent space of where TT.

project(what, x) = P_x(what) project(batch_what, x) = batch(P_x(batch_what[0]), …, P_x(batch_what[N]))

This function implements the algorithm from the paper [1], theorem 3.1.

[1] C. Lubich, I. Oseledets and B. Vandereycken, Time integration of
Tensor Trains.
Parameters:
  • what – TensorTrain or TensorTrainBatch. In the case of batch returns batch with projection of each individual tensor.
  • where – TensorTrain, TT-tensor or TT-matrix on which tangent space to project
Returns:

a TensorTrain with the TT-ranks equal 2 * tangent_space_tens.get_tt_ranks()

Complexity:
O(d r_where^3 m) for orthogonalizing the TT-cores of where

+O(batch_size d r_what r_where n (r_what + r_where))

d is the number of TT-cores (what.ndims()); r_what is the largest TT-rank of what max(what.get_tt_rank()) r_where is the largest TT-rank of where n is the size of the axis dimension of what and where e.g.

for a tensor of size 4 x 4 x 4, n is 4; for a 9 x 64 matrix of raw shape (3, 3, 3) x (4, 4, 4) n is 12
t3f.project_matmul(what, where, matrix)

Project matrix * what TTs on the tangent space of where TT.

project(what, x) = P_x(what) project(batch_what, x) = batch(P_x(batch_what[0]), …, P_x(batch_what[N]))

This function implements the algorithm from the paper [1], theorem 3.1.

[1] C. Lubich, I. Oseledets and B. Vandereycken, Time integration of
Tensor Trains.
Parameters:
  • what – TensorTrain or TensorTrainBatch. In the case of batch returns batch with projection of each individual tensor.
  • where – TensorTrain, TT-tensor or TT-matrix on which tangent space to project
  • matrix – TensorTrain, TT-matrix to multiply by what
Returns:

a TensorTrain with the TT-ranks equal 2 * tangent_space_tens.get_tt_ranks()

Complexity:
O(d r_where^3 m) for orthogonalizing the TT-cores of where

+O(batch_size d R r_what r_where (n r_what + n m R + m r_where))

d is the number of TT-cores (what.ndims()); r_what is the largest TT-rank of what max(what.get_tt_rank()) r_where is the largest TT-rank of where matrix is of TT-rank R and of raw-shape (m, m, …, m) x (n, n, …, n).

t3f.project_sum(what, where, weights=None)

Project sum of what TTs on the tangent space of where TT.

project_sum(what, x) = P_x(what) project_sum(batch_what, x) = P_x(sum_i batch_what[i]) project_sum(batch_what, x, weights) = P_x(sum_j weights[j] * batch_what[j])

This function implements the algorithm from the paper [1], theorem 3.1.

[1] C. Lubich, I. Oseledets and B. Vandereycken, Time integration of
Tensor Trains.
Parameters:
  • what – TensorTrain or TensorTrainBatch. In the case of batch returns projection of the sum of elements in the batch.
  • where – TensorTrain, TT-tensor or TT-matrix on which tangent space to project
  • weights – python list or tf.Tensor of numbers or None, weights of the sum
Returns:

a TensorTrain with the TT-ranks equal 2 * tangent_space_tens.get_tt_ranks()

Complexity:
O(d r_where^3 m) for orthogonalizing the TT-cores of where

+O(batch_size d r_what r_where n (r_what + r_where))

d is the number of TT-cores (what.ndims()); r_what is the largest TT-rank of what max(what.get_tt_rank()) r_where is the largest TT-rank of where n is the size of the axis dimension of what and where e.g.

for a tensor of size 4 x 4 x 4, n is 4; for a 9 x 64 matrix of raw shape (3, 3, 3) x (4, 4, 4) n is 12
t3f.quadratic_form(A, b, c, name='t3f_bilinear_form')

Outdated, see bilinear_form.

t3f.random_matrix(shape, tt_rank=2, mean=0.0, stddev=1.0, dtype=tf.float32, name='t3f_random_matrix')

Generate a random TT-matrix of the given shape with given mean and stddev.

Entries of the generated matrix (in the full format) will be iid and satisfy E[x_{i1i2..id}] = mean, Var[x_{i1i2..id}] = stddev^2, but the distribution is in fact not Gaussian.

In the current implementation only mean 0 is supported. To get a random_matrix with specified mean but tt_rank greater by 1 you can call x = t3f.random_matrix(shape, tt_rank, stddev=stddev) x = mean * t3f.ones_like(x) + x

Parameters:
  • shape

    2d array, shape[0] is the shape of the matrix row-index, shape[1] is the shape of the column index. shape[0] and shape[1] should have the same number of elements (d) Also supports omitting one of the dimensions for vectors, e.g.

    random_matrix([[2, 2, 2], None])
    and
    random_matrix([None, [2, 2, 2]])

    will create an 8-element column and row vectors correspondingly.

  • tt_rank – a number or a (d+1)-element array with ranks.
  • mean – a number, the desired mean for the distribution of entries.
  • stddev – a number, the desired standard deviation for the distribution of entries.
  • dtype – [tf.float32] dtype of the resulting matrix.
  • name – string, name of the Op.
Returns:

TensorTrain containing a TT-matrix of size

np.prod(shape[0]) x np.prod(shape[1])

t3f.random_matrix_batch(shape, tt_rank=2, batch_size=1, mean=0.0, stddev=1.0, dtype=tf.float32, name='t3f_random_matrix_batch')

Generate a batch of TT-matrices with given shape, mean and stddev.

Entries of the generated matrices (in the full format) will be iid and satisfy E[x_{i1i2..id}] = mean, Var[x_{i1i2..id}] = stddev^2, but the distribution is in fact not Gaussian.

In the current implementation only mean 0 is supported. To get a random_matrix_batch with specified mean but tt_rank greater by 1 you can call x = t3f.random_matrix_batch(shape, tt_rank, batch_size=bs, stddev=stddev) x = mean * t3f.ones_like(x) + x

Parameters:
  • shape

    2d array, shape[0] is the shape of the matrix row-index, shape[1] is the shape of the column index. shape[0] and shape[1] should have the same number of elements (d) Also supports omitting one of the dimensions for vectors, e.g.

    random_matrix_batch([[2, 2, 2], None])
    and
    random_matrix_batch([None, [2, 2, 2]])
  • create a batch of one 8-element column and row vector correspondingly. (will) –
  • tt_rank – a number or a (d+1)-element array with ranks.
  • batch_size – an integer.
  • mean – a number, the desired mean for the distribution of entries.
  • stddev – a number, the desired standard deviation for the distribution of entries.
  • dtype – [tf.float32] dtype of the resulting matrix.
  • name – string, name of the Op.
Returns:

TensorTrainBatch containing a batch of TT-matrices of size

np.prod(shape[0]) x np.prod(shape[1])

t3f.random_tensor(shape, tt_rank=2, mean=0.0, stddev=1.0, dtype=tf.float32, name='t3f_random_tensor')

Generate a random TT-tensor of the given shape with given mean and stddev.

Entries of the generated tensor (in the full format) will be iid and satisfy E[x_{i1i2..id}] = mean, Var[x_{i1i2..id}] = stddev^2, but the distribution is in fact not Gaussian (but is close for large tensors).

In the current implementation only mean 0 is supported. To get a random_tensor with specified mean but tt_rank greater by 1 you can call x = t3f.random_tensor(shape, tt_rank, stddev=stddev) x = mean * t3f.ones_like(x) + x

Parameters:
  • shape – array representing the shape of the future tensor.
  • tt_rank – a number or a (d+1)-element array with the desired ranks.
  • mean – a number, the desired mean for the distribution of entries.
  • stddev – a number, the desired standard deviation for the distribution of entries.
  • dtype – [tf.float32] dtype of the resulting tensor.
  • name – string, name of the Op.
Returns:

TensorTrain containing a TT-tensor

t3f.random_tensor_batch(shape, tt_rank=2, batch_size=1, mean=0.0, stddev=1.0, dtype=tf.float32, name='t3f_random_tensor_batch')

Generate a batch of TT-tensors with given shape, mean and stddev.

Entries of the generated tensors (in the full format) will be iid and satisfy E[x_{i1i2..id}] = mean, Var[x_{i1i2..id}] = stddev^2, but the distribution is in fact not Gaussian (but is close for large tensors).

In the current implementation only mean 0 is supported. To get a random_tensor_batch with specified mean but tt_rank greater by 1 you can call x = t3f.random_tensor_batch(shape, tt_rank, batch_size=bs, stddev=stddev) x = mean * t3f.ones_like(x) + x

Parameters:
  • shape – array representing the shape of the future tensor.
  • tt_rank – a number or a (d+1)-element array with ranks.
  • batch_size – an integer.
  • mean – a number, the desired mean for the distribution of entries.
  • stddev – a number, the desired standard deviation for the distribution of entries.
  • dtype – [tf.float32] dtype of the resulting tensor.
  • name – string, name of the Op.
Returns:

TensorTrainBatch containing TT-tensors.

t3f.raw_shape(tt, name='t3f_raw_shape')

Returns the shape of a TensorTrain.

This operation returns a 2-D integer tensor representing the shape of the input. If the input is a TT-tensor, the shape will have 1 x ndims() elements. If the input is a TT-matrix, the shape will have 2 x ndims() elements representing the underlying tensor shape of the matrix.

Parameters:
  • ttTensorTrain or TensorTrainBatch object.
  • name – string, name of the Op.
Returns:

A 2-D Tensor of size 1 x ndims() or 2 x ndims()

t3f.renormalize_tt_cores(tt, epsilon=1e-08, name='t3f_renormalize_tt_cores')

Renormalizes TT-cores to make them of the same Frobenius norm.

Doesn’t change the tensor represented by tt object, but renormalizes the TT-cores to make further computations more stable.

Parameters:
  • ttTensorTrain or TensorTrainBatch object
  • epsilon – parameter for numerical stability of sqrt
  • name – string, name of the Op.
Returns:

TensorTrain or TensorTrainBatch which represents the same tensor as tt, but with all cores having equal norm. In the batch case applies to each TT in TensorTrainBatch.

t3f.round(tt, max_tt_rank=None, epsilon=None, name='t3f_round')

TT-rounding procedure, returns a TT object with smaller TT-ranks.

Parameters:
  • ttTensorTrain object, TT-tensor or TT-matrix
  • max_tt_rank

    a number or a list of numbers If a number, than defines the maximal TT-rank of the result. If a list of numbers, than max_tt_rank length should be d+1 (where d is the rank of tens) and max_tt_rank[i] defines the maximal (i+1)-th TT-rank of the result. The following two versions are equivalent

    max_tt_rank = r
    and
    max_tt_rank = r * np.ones(d-1)
  • epsilon

    a floating point number or None If the TT-ranks are not restricted (max_tt_rank=np.inf), then the result would be guarantied to be epsilon close to tt in terms of relative Frobenius error:

    ||res - tt||_F / ||tt||_F <= epsilon

    If the TT-ranks are restricted, providing a loose epsilon may reduce the TT-ranks of the result. E.g.

    round(tt, max_tt_rank=100, epsilon=0.9)

    will probably return you a TT-tensor with TT-ranks close to 1, not 100. Note that providing a nontrivial (= not equal to None) epsilon will make the TT-ranks of the result undefined on the compilation stage (e.g. res.get_tt_ranks() will return None, but t3f.tt_ranks(res).eval() will work).

  • name – string, name of the Op.
Returns:

TensorTrain object containing a TT-tensor.

Raises:

ValueError if max_tt_rank is less than 0, if max_tt_rank is not a number and – not a vector of length d + 1 where d is the number of dimensions (rank) of the input tensor, if epsilon is less than 0.

t3f.shape(tt, name='t3f_shape')

Returns the shape of a TensorTrain.

This operation returns a 1-D integer tensor representing the shape of
the input. For TT-matrices the shape would have two values, see raw_shape for the tensor shape.
If the input is a TensorTrainBatch, the first dimension of the output is the
batch_size.
Parameters:
  • ttTensorTrain or TensorTrainBatch object.
  • name – string, name of the Op.
Returns:

A Tensor

t3f.squeeze_batch_dim(tt, name='t3f_squeeze_batch_dim')

Converts batch size 1 TensorTrainBatch into TensorTrain.

Parameters:
  • tt – TensorTrain or TensorTrainBatch.
  • name – string, name of the Op.
Returns:

TensorTrain if the input is a TensorTrainBatch with batch_size == 1 (known

at compilation stage) or a TensorTrain.

TensorTrainBatch otherwise.

t3f.tangent_space_to_deltas(tt, name='t3f_tangent_space_to_deltas')

Convert an element of the tangent space to deltas representation.

Tangent space elements (outputs of t3f.project) look like:
dP1 V2 … Vd + U1 dP2 V3 … Vd + … + U1 … Ud-1 dPd.

This function takes as input an element of the tangent space and converts it to the list of deltas [dP1, …, dPd].

Parameters:
  • ttTensorTrain or TensorTrainBatch that is a result of t3f.project, t3f.project_matmul, or other similar functions.
  • name – string, name of the Op.
Returns:

A list of delta-cores (tf.Tensors).

t3f.tensor_batch_with_random_cores(shape, tt_rank=2, batch_size=1, mean=0.0, stddev=1.0, dtype=tf.float32, name='t3f_tensor_batch_with_random_cores')

Generate a batch of TT-tensors of given shape with N(mean, stddev^2) cores.

Parameters:
  • shape – array representing the shape of the future tensor.
  • tt_rank – a number or a (d+1)-element array with ranks.
  • batch_size – an integer.
  • mean – a number, the mean of the normal distribution used for initializing TT-cores.
  • stddev – a number, the standard deviation of the normal distribution used for initializing TT-cores.
  • dtype – [tf.float32] dtype of the resulting tensor.
  • name – string, name of the Op.
Returns:

TensorTrainBatch containing TT-tensors

t3f.tensor_ones(shape, dtype=tf.float32, name='t3f_tensor_ones')

Generate TT-tensor of the given shape with all entries equal to 1.

Parameters:
  • shape – array representing the shape of the future tensor
  • dtype – [tf.float32] dtype of the resulting tensor.
  • name – string, name of the Op.
Returns:

TensorTrain object containing a TT-tensor

t3f.tensor_with_random_cores(shape, tt_rank=2, mean=0.0, stddev=1.0, dtype=tf.float32, name='t3f_tensor_with_random_cores')

Generate a TT-tensor of the given shape with N(mean, stddev^2) cores.

Parameters:
  • shape – array representing the shape of the future tensor.
  • tt_rank – a number or a (d+1)-element array with the desired ranks.
  • mean – a number, the mean of the normal distribution used for initializing TT-cores.
  • stddev – a number, the standard deviation of the normal distribution used for initializing TT-cores.
  • dtype – [tf.float32] dtype of the resulting tensor.
  • name – string, name of the Op.
Returns:

TensorTrain containing a TT-tensor

t3f.tensor_zeros(shape, dtype=tf.float32, name='t3f_tensor_zeros')

Generate TT-tensor of the given shape with all entries equal to 0.

Parameters:
  • shape – array representing the shape of the future tensor
  • dtype – [tf.float32] dtype of the resulting tensor.
  • name – string, name of the Op.
Returns:

TensorTrain object containing a TT-tensor

t3f.to_tt_matrix(mat, shape, max_tt_rank=10, epsilon=None, name='t3f_to_tt_matrix')

Converts a given matrix or vector to a TT-matrix.

The matrix dimensions should factorize into d numbers. If e.g. the dimensions are prime numbers, it’s usually better to pad the matrix with zeros until the dimensions factorize into (ideally) 3-8 numbers.

Parameters:
  • mat – two dimensional tf.Tensor (a matrix).
  • shape – two dimensional array (np.array or list of lists) Represents the tensor shape of the matrix. E.g. for a (a1 * a2 * a3) x (b1 * b2 * b3) matrix shape should be ((a1, a2, a3), (b1, b2, b3)) shape[0]` and shape[1]` should have the same length. For vectors you may use ((a1, a2, a3), (1, 1, 1)) or, equivalently, ((a1, a2, a3), None)
  • max_tt_rank

    a number or a list of numbers If a number, than defines the maximal TT-rank of the result. If a list of numbers, than max_tt_rank length should be d+1 (where d is the length of shape[0]) and max_tt_rank[i] defines the maximal (i+1)-th TT-rank of the result. The following two versions are equivalent

    max_tt_rank = r
    and
    max_tt_rank = r * np.ones(d-1)
  • epsilon

    a floating point number or None If the TT-ranks are not restricted (max_tt_rank=np.inf), then the result would be guarantied to be epsilon close to mat in terms of relative Frobenius error:

    ||res - mat||_F / ||mat||_F <= epsilon

    If the TT-ranks are restricted, providing a loose epsilon may reduce the TT-ranks of the result. E.g.

    to_tt_matrix(mat, shape, max_tt_rank=100, epsilon=0.9)

    will probably return you a TT-matrix with TT-ranks close to 1, not 100. Note that providing a nontrivial (= not equal to None) epsilon will make the TT-ranks of the result undefined on the compilation stage (e.g. res.get_tt_ranks() will return None, but t3f.tt_ranks(res).eval() will work).

  • name – string, name of the Op.
Returns:

TensorTrain object containing a TT-matrix.

Raises:

ValueError if max_tt_rank is less than 0, if max_tt_rank is not a number and – not a vector of length d + 1 where d is the number of dimensions (rank) of the input tensor, if epsilon is less than 0.

t3f.to_tt_tensor(tens, max_tt_rank=10, epsilon=None, name='t3f_to_tt_tensor')

Converts a given tf.Tensor to a TT-tensor of the same shape.

Parameters:
  • tens – tf.Tensor
  • max_tt_rank

    a number or a list of numbers If a number, than defines the maximal TT-rank of the result. If a list of numbers, than max_tt_rank length should be d+1 (where d is the rank of tens) and max_tt_rank[i] defines the maximal (i+1)-th TT-rank of the result. The following two versions are equivalent

    max_tt_rank = r
    and
    max_tt_rank = r * np.ones(d-1)
  • epsilon

    a floating point number or None If the TT-ranks are not restricted (max_tt_rank=np.inf), then the result would be guarantied to be epsilon close to tens in terms of relative Frobenius error:

    ||res - tens||_F / ||tens||_F <= epsilon

    If the TT-ranks are restricted, providing a loose epsilon may reduce the TT-ranks of the result. E.g.

    to_tt_tensor(tens, max_tt_rank=100, epsilon=0.9)

    will probably return you a TT-tensor with TT-ranks close to 1, not 100. Note that providing a nontrivial (= not equal to None) epsilon will make the TT-ranks of the result undefined on the compilation stage (e.g. res.get_tt_ranks() will return None, but t3f.tt_ranks(res).eval() will work).

  • name – string, name of the Op.
Returns:

TensorTrain object containing a TT-tensor.

Raises:

ValueError if the rank (number of dimensions) of the input tensor is – not defined, if max_tt_rank is less than 0, if max_tt_rank is not a number and not a vector of length d + 1 where d is the number of dimensions (rank) of the input tensor, if epsilon is less than 0.

t3f.transpose(tt_matrix, name='t3f_transpose')

Transpose a TT-matrix or a batch of TT-matrices.

Parameters:
  • tt_matrixTensorTrain or TensorTrainBatch object containing a TT-matrix (or a batch of TT-matrices).
  • name – string, name of the Op.
Returns:

TensorTrain or TensorTrainBatch object containing a transposed TT-matrix

(or a batch of TT-matrices).

Raises:

ValueError if the argument is not a TT-matrix.

t3f.tt_ranks(tt, name='t3f_tt_ranks')

Returns the TT-ranks of a TensorTrain.

This operation returns a 1-D integer tensor representing the TT-ranks of the input.

Parameters:
  • ttTensorTrain or TensorTrainBatch object.
  • name – string, name of the Op.
Returns:

A Tensor

t3f.zeros_like(tt, name='t3f_zeros_like')

Constructs t3f.zeros with the shape of tt.

In the case when tt is a TensorTrainBatch constructs t3f.zeros with the shape of a TensorTrain in tt.

Parameters:
  • tt – TensorTrain object
  • name – string, name of the Op.
Returns:

TensorTrain object of the same shape as tt but with all entries equal to 0.

t3f.nn module

Utils for simplifying building neural networks with TT-layers

class t3f.nn.KerasDense(input_dims, output_dims, tt_rank=2, activation=None, use_bias=True, kernel_initializer='glorot', bias_initializer=0.1, **kwargs)

Bases: tensorflow.python.keras.engine.base_layer.Layer

call(x)

This is where the layer’s logic lives.

Note here that call() method in tf.keras is little bit different from keras API. In keras API, you can pass support masking for layers as additional arguments. Whereas tf.keras has compute_mask() method to support masking.

Parameters:
  • inputs – Input tensor, or list/tuple of input tensors.
  • **kwargs – Additional keyword arguments. Currently unused.
Returns:

A tensor or list/tuple of tensors.

compute_output_shape(input_shape)

Computes the output shape of the layer.

If the layer has not been built, this method will call build on the layer. This assumes that the layer will later be used with inputs that match the input shape provided here.

Parameters:input_shape – Shape tuple (tuple of integers) or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
Returns:An input shape tuple.

t3f.utils module

t3f.utils.in_eager_mode()

Checks whether tensorflow eager mode is avaialable and active.

t3f.utils.replace_tf_svd_with_np_svd()

Replaces tf.svd with np.svd. Slow, but a workaround for tf.svd bugs.

t3f.utils.unravel_index(indices, shape)

t3f.kronecker module

t3f.kronecker.cholesky(kron_a, name='t3f_kronecker_cholesky')

Computes the Cholesky decomposition of a given Kronecker-factorized matrix.

Parameters:
  • kron_aTensorTrain or TensorTrainBatch object containing a matrix or a batch of matrices of size N x N, factorized into a Kronecker product of square matrices (all tt-ranks are 1 and all tt-cores are square). All the cores must be symmetric positive-definite.
  • name – string, name of the Op.
Returns:

TensorTrain object containing a TT-matrix of size N x N if the argument is

TensorTrain

TensorTrainBatch object, containing TT-matrices of size N x N if the

argument is TensorTrainBatch

Raises:
  • ValueError if the tt-cores of the provided matrix are not square,
  • or the tt-ranks are not 1.
t3f.kronecker.determinant(kron_a, name='t3f_kronecker_determinant')

Computes the determinant of a given Kronecker-factorized matrix.

Note, that this method can suffer from overflow.

Parameters:
  • kron_aTensorTrain or TensorTrainBatch object containing a matrix or a batch of matrices of size N x N, factorized into a Kronecker product of square matrices (all tt-ranks are 1 and all tt-cores are square).
  • name – string, name of the Op.
Returns:

A number or a Tensor with numbers for each element in the batch. The determinant of the given matrix.

Raises:
  • ValueError if the tt-cores of the provided matrix are not square,
  • or the tt-ranks are not 1.
t3f.kronecker.inv(kron_a, name='t3f_kronecker_inv')

Computes the inverse of a given Kronecker-factorized matrix.

Parameters:
  • kron_aTensorTrain or TensorTrainBatch object containing a matrix or a batch of matrices of size N x N, factorized into a Kronecker product of square matrices (all tt-ranks are 1 and all tt-cores are square).
  • name – string, name of the Op.
Returns:

TensorTrain object containing a TT-matrix of size N x N if the argument is

TensorTrain

TensorTrainBatch object, containing TT-matrices of size N x N if the

argument is TensorTrainBatch

Raises:
  • ValueError if the tt-cores of the provided matrix are not square,
  • or the tt-ranks are not 1.
t3f.kronecker.slog_determinant(kron_a, name='t3f_kronecker_slog_determinant')

Computes the sign and log-det of a given Kronecker-factorized matrix.

Parameters:
  • kron_aTensorTrain or TensorTrainBatch object containing a matrix or a batch of matrices of size N x N, factorized into a Kronecker product of square matrices (all tt-ranks are 1 and all tt-cores are square).
  • name – string, name of the Op.
Returns:

Two number or two Tensor with numbers for each element in the batch. Sign of the determinant and the log-determinant of the given matrix. If the determinant is zero, then sign will be 0 and logdet will be -Inf. In all cases, the determinant is equal to sign * np.exp(logdet).

Raises:
  • ValueError if the tt-cores of the provided matrix are not square,
  • or the tt-ranks are not 1.

t3f.approximate module

t3f.approximate.add_n(tt_objects, max_tt_rank, name='t3f_approximate_add_n')

Adds a bunch of TT-object and round after each summation.

This version implements a slow-to-compile but fast-to-execute (at least on a GPU) version: summing in a binary tree order. I.e. it uses the following idea:

round(a + b + c + d) ~= round(round(a + b) + round(c + d))

and so is able to compute the answer in log(N) parallel adds/rounds.

Parameters:
  • tt_objects – a list of TensorTrainBase objects.
  • max_tt_rank – a number, TT-rank for each individual rounding.
  • name – string, name of the Op.
Returns:

Object of the same type as each input.

See also

t3f.approximate.reduce_sum_batch

t3f.approximate.reduce_sum_batch(tt_batch, max_tt_rank, coef=None, name='t3f_approximate_reduce_sum_batch')

Sum of all TT-objects in the batch with rounding after each summation.

This version implements a slow-to-compile but fast-to-execute (at least on a GPU) version: summing in a binary tree order. I.e. it uses the following idea:

round(a + b + c + d) ~= round(round(a + b) + round(c + d))

and so is able to compute the answer in log(batch_size) parallel adds/rounds.

Parameters:
  • tt_batchTensorTrainBatch object.
  • max_tt_rank – a number, TT-rank for each individual rounding.
  • coef

    tf.Tensor, its shape is either batch_size, or batch_size x N. If coef is a vecotor of size batch_size, the result will

    be (approximate) weighted sum.
    If coef is a matrix of shape batch_size x N, the result will be
    a TensorTrainBatch res containing N TT-object such that
    res[j] ~= sum_i tt_batch[i] coef[i, j]
  • name – string, name of the Op.
Returns:

If coefficients are absent or is a vector of numbers, returns

a TensorTrain object representing (approximate) element-wise sum of all the objects in the batch, weighted if coef is provided.

If coefficients is a matrix, returns TensorTrainBatch.

See also

t3f.approximate.add_n

Comparison to other libraries

A brief overview of other libraries that support Tensor Train decomposition (which is also known under the name Matrix Product State in physics community).

Library


Language


GPU


autodiff


Riemannian


DMRG
AMen
TT-cross
t3f Python/TensorFlow Yes Yes Yes No
tntorch Python/PyTorch Yes Yes No No
ttpy Python No No Yes Yes
mpnum Python No No No DMRG
scikit_tt Python No No No No
mpys Python No No No No
TT-Toolbox Matlab Partial No No Yes
TENSORBOX Matlab Partial No ?? ??
Tensorlab Matlab Partial No ?? ??
ITensor C++ No No No DMRG
libtt C++ No No No TT-cross

If you use python, we would suggest using t3f if you need extensive Riemannian optimization support, t3f or tntorch if you need GPU or autodiff support, and ttpy if you need advanced algorithms such as AMen.

The performance of the libraries is a bit tricky to measure fairly and is actually not that different between the libraries because everyone relies on the same BLAS/MKL subruitines. However, GPU can help a lot if you need operations that can be expressed as large matrix-by-matrix multiplications, e.g. computing a gram matrix of a bunch of tensors. For more details on benchmarking t3f see Benchmark.

Benchmark

The performance of different libraries implementing Tensor Train decomposition is a bit tricky to compare fairly and is actually not that different because everyone relies on the same BLAS/MKL subruitines.

So the main purpose of this section is not to prove that T3F is faster than every other library (it is not), but rather to assess GPU gains on different ops and identify bottlenecks by comparing to some other library. As a reference implementation, we decided to use ttpy library.

See the following table for time in ms of different opeartions run in ttpy (second column) and in t3f (other columns).

Operation ttpy, one on CPU one on CPU one on GPU batch on CPU batch on GPU
matvec 11.142 1.19 0.744 1.885 0.14
matmul 86.191 9.849 0.95 17.483 1.461
norm 3.79 2.136 1.019 0.253 0.044
round 73.027 86.04 165.969 8.234 161.102
gram 0.145 0.606 0.973 0.021 0.001
project_rank100 116.868 3.001 13.239 1.645 0.226

The timing in the “batch” columns represent running the operation for a 100 of objects at the same time and then reporting the time per object. E.g. the last number in the first row (0.14) means that multiplying a single TT-matrix by 100 different TT-vectors takes 14 ms on GPU when using T3F, which translates to 0.14 ms per vector.

Note that rounding operation is slow on GPU. This is a known TensorFlow bug, that the SVD implementation is slower on GPU than on CPU.

The benchmark was run on NVIDIA DGX-1 server with Tesla V100 GPU and Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz with 80 logical cores

To run this benchmark on your own hardware, see docs/benchmark folder.

Troubleshooting

If something does not work, try

  • Installing the latest version of the library (see Installation)
  • Installing Tensorflow version 2.0
  • If you see the following error: NotImplementedError: QrGrad not implemented when ncols > nrows or full_matrices is true and ncols != nrows., see discussion in this issue: https://github.com/Bihaqo/t3f/issues/193
  • Creating an issue on GitHub

Tensor Nets (compressing neural networks)

Open this page in an interactive mode via Google Colaboratory.

In this notebook we provide an example of how to build a simple Tensor Net (see https://arxiv.org/abs/1509.06569).

The main ingredient is the so-called TT-Matrix, a generalization of the Kronecker product matrices, i.e. matrices of the form

\[A = A_1 \otimes A_2 \cdots \otimes A_n\]

In t3f TT-Matrices are represented using the TensorTrain class.

[1]:
# Import TF 2.
%tensorflow_version 2.x
import tensorflow as tf
import numpy as np
import tensorflow.keras.backend as K

# Fix seed so that the results are reproducable.
tf.random.set_seed(0)
np.random.seed(0)

try:
    import t3f
except ImportError:
    # Install T3F if it's not already installed.
    !git clone https://github.com/Bihaqo/t3f.git
    !cd t3f; pip install .
    import t3f
TensorFlow 2.x selected.
Cloning into 't3f'...
remote: Enumerating objects: 321, done.
remote: Counting objects: 100% (321/321), done.
remote: Compressing objects: 100% (182/182), done.
remote: Total 4715 (delta 209), reused 226 (delta 139), pack-reused 4394
Receiving objects: 100% (4715/4715), 1.52 MiB | 1.26 MiB/s, done.
Resolving deltas: 100% (3203/3203), done.
Processing /content/t3f
Requirement already satisfied: numpy in /usr/local/lib/python3.6/dist-packages (from t3f==1.1.0) (1.18.1)
Building wheels for collected packages: t3f
  Building wheel for t3f (setup.py) ... done
  Created wheel for t3f: filename=t3f-1.1.0-cp36-none-any.whl size=75051 sha256=a20c22745abcbe82d9a467cf607135da9d5399940712bfbf134bbf7e40ac53b3
  Stored in directory: /tmp/pip-ephem-wheel-cache-vnw71g5i/wheels/66/f2/16/8d2b16c34f7e12d446db3584514f9e34e681f4c602325d175c
Successfully built t3f
Installing collected packages: t3f
Successfully installed t3f-1.1.0
[3]:
W = t3f.random_matrix([[4, 7, 4, 7], [5, 5, 5, 5]], tt_rank=2)

print(W)
A TT-Matrix of size 784 x 625, underlying tensor shape: (4, 7, 4, 7) x (5, 5, 5, 5), TT-ranks: (1, 2, 2, 2, 1)

Using TT-Matrices we can compactly represent densely connected layers in neural networks, which allows us to greatly reduce number of parameters. Matrix multiplication can be handled by the t3f.matmul method which allows for multiplying dense (ordinary) matrices and TT-Matrices. Very simple neural network could look as following (for initialization several options such as t3f.glorot_initializer, t3f.he_initializer or t3f.random_matrix are available):

[ ]:
class Learner:
  def __init__(self):
    initializer = t3f.glorot_initializer([[4, 7, 4, 7], [5, 5, 5, 5]], tt_rank=2)
    self.W1 = t3f.get_variable('W1', initializer=initializer)
    self.W2 = tf.Variable(tf.random.normal([625, 10]))
    self.b2 = tf.Variable(tf.random.normal([10]))

  def predict(self, x):
    b1 = tf.Variable(tf.zeros([625]))
    h1 = t3f.matmul(x, W1) + b1
    h1 = tf.nn.relu(h1)
    return tf.matmul(h1, W2) + b2

  def loss(self, x, y):
    y_ = tf.one_hot(y, 10)
    logits = self.predict(x)
    return tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=logits))

For convenience we have implemented a layer analogous to Keras Dense layer but with a TT-Matrix instead of an ordinary matrix. An example of fully trainable net is provided below.

[ ]:
from tensorflow.keras.datasets import mnist
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout, Flatten
from tensorflow.keras.utils import to_categorical
from tensorflow.keras import optimizers
[9]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()
Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
11493376/11490434 [==============================] - 0s 0us/step

Some preprocessing…

[ ]:
x_train = x_train / 127.5 - 1.0
x_test = x_test / 127.5 - 1.0

y_train = to_categorical(y_train, num_classes=10)
y_test = to_categorical(y_test, num_classes=10)
[ ]:
model = Sequential()
model.add(Flatten(input_shape=(28, 28)))
tt_layer = t3f.nn.KerasDense(input_dims=[7, 4, 7, 4], output_dims=[5, 5, 5, 5],
                             tt_rank=4, activation='relu',
                             bias_initializer=1e-3)
model.add(tt_layer)
model.add(Dense(10))
model.add(Activation('softmax'))
[68]:
model.summary()
Model: "sequential_12"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
flatten_12 (Flatten)         (None, 784)               0
_________________________________________________________________
tt_dense_1 (KerasDense)      (None, 625)               1725
_________________________________________________________________
dense_8 (Dense)              (None, 10)                6260
_________________________________________________________________
activation_7 (Activation)    (None, 10)                0
=================================================================
Total params: 7,985
Trainable params: 7,985
Non-trainable params: 0
_________________________________________________________________

Note that in the dense layer we only have \(1725\) parameters instead of \(784 * 625 = 490000\).

[ ]:
optimizer = optimizers.Adam(lr=1e-2)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
[70]:
model.fit(x_train, y_train, epochs=3, batch_size=64, validation_data=(x_test, y_test))
Train on 60000 samples, validate on 10000 samples
Epoch 1/3
60000/60000 [==============================] - 4s 69us/sample - loss: 0.2549 - accuracy: 0.9248 - val_loss: 0.1195 - val_accuracy: 0.9638
Epoch 2/3
60000/60000 [==============================] - 4s 62us/sample - loss: 0.1448 - accuracy: 0.9574 - val_loss: 0.1415 - val_accuracy: 0.9585
Epoch 3/3
60000/60000 [==============================] - 4s 62us/sample - loss: 0.1308 - accuracy: 0.9619 - val_loss: 0.1198 - val_accuracy: 0.9638
[70]:
<tensorflow.python.keras.callbacks.History at 0x7fd5263629b0>

Compression of Dense layers

Let us now train an ordinary DNN (without TT-Matrices) and show how we can compress it using the TT decomposition. (In contrast to directly training a TT-layer from scratch in the example above.)

[ ]:
model = Sequential()
model.add(Flatten(input_shape=(28, 28)))
model.add(Dense(625, activation='relu'))
model.add(Dense(10))
model.add(Activation('softmax'))
[72]:
model.summary()
Model: "sequential_13"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
flatten_13 (Flatten)         (None, 784)               0
_________________________________________________________________
dense_9 (Dense)              (None, 625)               490625
_________________________________________________________________
dense_10 (Dense)             (None, 10)                6260
_________________________________________________________________
activation_8 (Activation)    (None, 10)                0
=================================================================
Total params: 496,885
Trainable params: 496,885
Non-trainable params: 0
_________________________________________________________________
[ ]:
optimizer = optimizers.Adam(lr=1e-3)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
[74]:
model.fit(x_train, y_train, epochs=5, batch_size=64, validation_data=(x_test, y_test))
Train on 60000 samples, validate on 10000 samples
Epoch 1/5
60000/60000 [==============================] - 3s 57us/sample - loss: 0.2779 - accuracy: 0.9158 - val_loss: 0.1589 - val_accuracy: 0.9501
Epoch 2/5
60000/60000 [==============================] - 3s 52us/sample - loss: 0.1297 - accuracy: 0.9610 - val_loss: 0.1632 - val_accuracy: 0.9483
Epoch 3/5
60000/60000 [==============================] - 3s 53us/sample - loss: 0.0991 - accuracy: 0.9692 - val_loss: 0.1083 - val_accuracy: 0.9674
Epoch 4/5
60000/60000 [==============================] - 3s 54us/sample - loss: 0.0835 - accuracy: 0.9742 - val_loss: 0.1191 - val_accuracy: 0.9619
Epoch 5/5
60000/60000 [==============================] - 3s 55us/sample - loss: 0.0720 - accuracy: 0.9771 - val_loss: 0.0918 - val_accuracy: 0.9714
[74]:
<tensorflow.python.keras.callbacks.History at 0x7fd5260c8240>

Let us convert the matrix used in the Dense layer to the TT-Matrix with tt-ranks equal to 16 (since we trained the network without the low-rank structure assumption we may wish start with high rank values).

[75]:
W = model.trainable_weights[0]
print(W)
Wtt = t3f.to_tt_matrix(W, shape=[[7, 4, 7, 4], [5, 5, 5, 5]], max_tt_rank=16)
print(Wtt)
<tf.Variable 'dense_9/kernel:0' shape=(784, 625) dtype=float32, numpy=
array([[-0.03238887,  0.06103956,  0.03255948, ..., -0.02577683,
         0.06993102, -0.00263362],
       [-0.05367032, -0.0324776 , -0.04441883, ...,  0.0338573 ,
         0.01554517,  0.04145934],
       [ 0.03441307,  0.04183276,  0.05157001, ...,  0.00082603,
         0.03731582, -0.01392014],
       ...,
       [ 0.03070629,  0.02113252,  0.01526976, ..., -0.00541451,
         0.03794012,  0.04027091],
       [-0.01376432, -0.0064889 , -0.03118961, ...,  0.06237663,
        -0.000577  , -0.02628548],
       [-0.01680673,  0.00364697,  0.01722438, ...,  0.01579029,
        -0.00826585,  0.03203061]], dtype=float32)>
A TT-Matrix of size 784 x 625, underlying tensor shape: (7, 4, 7, 4) x (5, 5, 5, 5), TT-ranks: (1, 16, 16, 16, 1)

We need to evaluate the tt-cores of Wtt. We also need to store other parameters for later (biases and the second dense layer).

[ ]:
cores = Wtt.tt_cores
other_params = model.get_weights()[1:]

Now we can construct a tensor network with the first Dense layer replaced by Wtt initialized using the previously computed cores.

[ ]:
model = Sequential()
model.add(Flatten(input_shape=(28, 28)))
tt_layer = t3f.nn.KerasDense(input_dims=[7, 4, 7, 4], output_dims=[5, 5, 5, 5],
                             tt_rank=16, activation='relu')
model.add(tt_layer)
model.add(Dense(10))
model.add(Activation('softmax'))
[ ]:
optimizer = optimizers.Adam(lr=1e-3)
model.compile(optimizer=optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
[ ]:
model.set_weights(list(cores) + other_params)
[97]:
print("new accuracy: ", model.evaluate(x_test, y_test)[1])
10000/10000 [==============================] - 1s 91us/sample - loss: 1.0276 - accuracy: 0.6443
new accuracy:  0.6443
[98]:
model.summary()
Model: "sequential_16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
flatten_16 (Flatten)         (None, 784)               0
_________________________________________________________________
tt_dense_2 (KerasDense)      (None, 625)               15585
_________________________________________________________________
dense_13 (Dense)             (None, 10)                6260
_________________________________________________________________
activation_11 (Activation)   (None, 10)                0
=================================================================
Total params: 21,845
Trainable params: 21,845
Non-trainable params: 0
_________________________________________________________________

We see that even though we now have about 5% of the original number of parameters we still achieve a relatively high accuracy.

Finetuning the model

We can now finetune this tensor network.

[99]:
model.fit(x_train, y_train, epochs=2, batch_size=64, validation_data=(x_test, y_test))
Train on 60000 samples, validate on 10000 samples
Epoch 1/2
60000/60000 [==============================] - 5s 81us/sample - loss: 0.1349 - accuracy: 0.9594 - val_loss: 0.0982 - val_accuracy: 0.9703
Epoch 2/2
60000/60000 [==============================] - 5s 75us/sample - loss: 0.0822 - accuracy: 0.9750 - val_loss: 0.0826 - val_accuracy: 0.9765
[99]:
<tensorflow.python.keras.callbacks.History at 0x7fd526574198>

We see that we were able to achieve higher validation accuracy than we had in the plain DNN, while keeping the number of parameters extremely small (21845 vs 496885 parameters in the uncompressed model).

[ ]:

Tensor completion (example of minimizing a loss w.r.t. TT-tensor)

Open this page in an interactive mode via Google Colaboratory.

In this example we will see how can we do tensor completion with t3f, i.e. observe a fraction of values in a tensor and recover the rest by assuming that the original tensor has low TT-rank. Mathematically it means that we have a binary mask \(P\) and a ground truth tensor \(A\), but we observe only a noisy and sparsified version of \(A\): \(P \odot (\hat{A})\), where \(\odot\) is the elementwise product (applying the binary mask) and \(\hat{A} = A + \text{noise}\). In this case our task reduces to the following optimization problem:

\[\begin{split}\begin{aligned} & \underset{X}{\text{minimize}} & & \|P \odot (X - \hat{A})\|_F^2 \\ & \text{subject to} & & \text{tt_rank}(X) \leq r_0 \end{aligned}\end{split}\]
[1]:
import numpy as np
import matplotlib.pyplot as plt
# Import TF 2.
%tensorflow_version 2.x
import tensorflow as tf

# Fix seed so that the results are reproducable.
tf.random.set_seed(0)
np.random.seed(0)

try:
    import t3f
except ImportError:
    # Install T3F if it's not already installed.
    !git clone https://github.com/Bihaqo/t3f.git
    !cd t3f; pip install .
    import t3f
TensorFlow 2.x selected.

Generating problem instance

Lets generate a random matrix \(A\), noise, and mask \(P\).

[ ]:
shape = (3, 4, 4, 5, 7, 5)
# Generate ground truth tensor A. To make sure that it has low TT-rank,
# let's generate a random tt-rank 5 tensor and apply t3f.full to it to convert to actual tensor.
ground_truth = t3f.full(t3f.random_tensor(shape, tt_rank=5))
# Make a (non trainable) variable out of ground truth. Otherwise, it will be randomly regenerated on each sess.run.
ground_truth = tf.Variable(ground_truth, trainable=False)
noise = 1e-2 * tf.Variable(tf.random.normal(shape), trainable=False)
noisy_ground_truth = ground_truth + noise
# Observe 25% of the tensor values.
sparsity_mask = tf.cast(tf.random.uniform(shape) <= 0.25, tf.float32)
sparsity_mask = tf.Variable(sparsity_mask, trainable=False)
sparse_observation = noisy_ground_truth * sparsity_mask

Initialize the variable and compute the loss

[ ]:
observed_total = tf.reduce_sum(sparsity_mask)
total = np.prod(shape)
initialization = t3f.random_tensor(shape, tt_rank=5)
estimated = t3f.get_variable('estimated', initializer=initialization)

SGD optimization

The simplest way to solve the optimization problem is Stochastic Gradient Descent: let TensorFlow differentiate the loss w.r.t. the factors (cores) of the TensorTrain decomposition of the estimated tensor and minimize the loss with your favourite SGD variation.

[ ]:
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)

def step():
  with tf.GradientTape() as tape:
    # Loss is MSE between the estimated and ground-truth tensor as computed in the observed cells.
    loss = 1.0 / observed_total * tf.reduce_sum((sparsity_mask * t3f.full(estimated) - sparse_observation)**2)

  gradients = tape.gradient(loss, estimated.tt_cores)
  optimizer.apply_gradients(zip(gradients, estimated.tt_cores))

  # Test loss is MSE between the estimated tensor and full (and not noisy) ground-truth tensor A.
  test_loss = 1.0 / total * tf.reduce_sum((t3f.full(estimated) - ground_truth)**2)
  return loss, test_loss
[7]:
train_loss_hist = []
test_loss_hist = []
for i in range(5000):
  tr_loss_v, test_loss_v = step()
  tr_loss_v, test_loss_v = tr_loss_v.numpy(), test_loss_v.numpy()
  train_loss_hist.append(tr_loss_v)
  test_loss_hist.append(test_loss_v)
  if i % 1000 == 0:
    print(i, tr_loss_v, test_loss_v)
0 1.768507 1.6856995
1000 0.0011041266 0.001477238
2000 9.759675e-05 3.4615714e-05
3000 8.749525e-05 2.0825255e-05
4000 9.1277245e-05 2.188003e-05
5000 9.666496e-05 3.5304438e-05
6000 8.7534434e-05 2.1069698e-05
7000 8.753277e-05 2.1103975e-05
8000 9.058935e-05 2.6075113e-05
9000 8.8796776e-05 2.2456348e-05
[8]:
plt.loglog(train_loss_hist, label='train')
plt.loglog(test_loss_hist, label='test')
plt.xlabel('Iteration')
plt.ylabel('MSE Loss value')
plt.title('SGD completion')
plt.legend()

[8]:
<matplotlib.legend.Legend at 0x7f242ae44d68>
_images/tutorials_tensor_completion_9_1.png

Speeding it up

The simple solution we have so far assumes that loss is computed by materializing the full estimated tensor and then zeroing out unobserved elements. If the tensors are really large and the fraction of observerd values is small (e.g. less than 1%), it may be much more efficient to directly work only with the observed elements.

[ ]:
shape = (10, 10, 10, 10, 10, 10, 10)

total_observed = np.prod(shape)
# Since now the tensor is too large to work with explicitly,
# we don't want to generate binary mask,
# but we would rather generate indecies of observed cells.

ratio = 0.001

# Let us simply randomly pick some indecies (it may happen
# that we will get duplicates but probability of that
# is 10^(-14) so lets not bother for now).

num_observed = int(ratio * total_observed)
observation_idx = np.random.randint(0, 10, size=(num_observed, len(shape)))
# and let us generate some values of the tensor to be approximated
observations = np.random.randn(num_observed)

[ ]:
# Our strategy is to feed the observation_idx
# into the tensor in the Tensor Train format and compute MSE between
# the obtained values and the desired values
[ ]:
initialization = t3f.random_tensor(shape, tt_rank=16)
estimated = t3f.get_variable('estimated', initializer=initialization)
[ ]:
# To collect the values of a TT tensor (withour forming the full tensor)
# we use the function t3f.gather_nd
[ ]:
def loss():
  estimated_vals = t3f.gather_nd(estimated, observation_idx)
  return tf.reduce_mean((estimated_vals - observations) ** 2)
[ ]:
optimizer = tf.keras.optimizers.Adam(learning_rate=0.01)
def step():
  with tf.GradientTape() as tape:
    loss_value = loss()
  gradients = tape.gradient(loss_value, estimated.tt_cores)
  optimizer.apply_gradients(zip(gradients, estimated.tt_cores))
  return loss_value

Compiling the function to additionally speed things up

[ ]:
# In TF eager mode you're supposed to first implement and debug
# a function, and then compile it to make it faster.
faster_step = tf.function(step)
[16]:
loss_hist = []
for i in range(2000):
    loss_v = faster_step().numpy()
    loss_hist.append(loss_v)
    if i % 100 == 0:
        print(i, loss_v)
0 2.513642
100 0.09261158
200 0.016660467
300 0.0062909224
400 0.0030982601
500 0.0018596936
600 0.0012290174
700 0.00086869544
800 0.00065623457
900 0.00052747165
1000 0.00044029654
1100 0.00038606362
1200 0.00033268757
1300 0.0002910529
1400 0.00028836995
1500 0.00023541097
1600 0.00022489333
1700 0.00022316887
1800 0.00039261775
1900 0.0003216249
[17]:
plt.loglog(loss_hist)
plt.xlabel('Iteration')
plt.ylabel('MSE Loss value')
plt.title('smarter SGD completion')
plt.legend()
No handles with labels found to put in legend.
[17]:
<matplotlib.legend.Legend at 0x7f242a629fd0>
_images/tutorials_tensor_completion_20_2.png
[18]:
print(t3f.gather_nd(estimated, observation_idx))
tf.Tensor(
[-0.12139133 -1.3777294  -0.5469675  ... -0.00776806  0.23622975
  0.7571926 ], shape=(10000,), dtype=float32)
[19]:
print(observations)
[-1.27225139e-01 -1.37794858e+00 -5.42469328e-01 ... -1.30643336e-03
  2.35629296e-01  7.53320726e-01]

Riemannian optimization

Open this page in an interactive mode via Google Colaboratory.

Riemannian optimization is a framework for solving optimization problems with a constraint that the solution belongs to a manifold.

Let us consider the following problem. Given some TT tensor \(A\) with large tt-ranks we would like to find a tensor \(X\) (with small prescribed tt-ranks \(r\)) which is closest to \(A\) (in the sense of Frobenius norm). Mathematically it can be written as follows:

\begin{equation*} \begin{aligned} & \underset{X}{\text{minimize}} & & \frac{1}{2}\|X - A\|_F^2 \\ & \text{subject to} & & \text{tt_rank}(X) = r \end{aligned} \end{equation*}

It is known that the set of TT tensors with elementwise fixed TT ranks forms a manifold. Thus we can solve this problem using the so called Riemannian gradient descent. Given some functional \(F\) on a manifold \(\mathcal{M}\) it is defined as

\[\hat{x}_{k+1} = x_{k} - \alpha P_{T_{x_k}\mathcal{M}} \nabla F(x_k),\]
\[ \begin{align}\begin{aligned}x_{k+1} = \mathcal{R}(\hat{x}_{k+1})\\with :math:`P_{T_{x_k}} \mathcal{M}` being the projection onto the tangent space of :math:`\mathcal{M}` at the point :math:`x_k` and :math:`\mathcal{R}` being a retraction - an operation which projects points to the manifold, and :math:`\alpha` is the learning rate.\end{aligned}\end{align} \]

We can implement this in t3f using the t3f.riemannian module. As a retraction it is convenient to use the rounding method (t3f.round).

[1]:
# Import TF 2.
%tensorflow_version 2.x
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

# Fix seed so that the results are reproducable.
tf.random.set_seed(0)
np.random.seed(0)

try:
    import t3f
except ImportError:
    # Install T3F if it's not already installed.
    !git clone https://github.com/Bihaqo/t3f.git
    !cd t3f; pip install .
    import t3f
TensorFlow 2.x selected.
[ ]:
# Initialize A randomly, with large tt-ranks
shape = 10 * [2]
init_A = t3f.random_tensor(shape, tt_rank=16)
A = t3f.get_variable('A', initializer=init_A, trainable=False)

# Create an X variable.
init_X = t3f.random_tensor(shape, tt_rank=2)
X = t3f.get_variable('X', initializer=init_X)

def step():
  # Compute the gradient of the functional. Note that it is simply X - A.
  gradF = X - A

  # Let us compute the projection of the gradient onto the tangent space at X.
  riemannian_grad = t3f.riemannian.project(gradF, X)

  # Compute the update by subtracting the Riemannian gradient
  # and retracting back to the manifold
  alpha = 1.0
  t3f.assign(X, t3f.round(X - alpha * riemannian_grad, max_tt_rank=2))

  # Let us also compute the value of the functional
  # to see if it is decreasing.
  return 0.5 * t3f.frobenius_norm_squared(X - A)
[3]:
log = []
for i in range(100):
    F = step()
    if i % 10 == 0:
        print(F)
    log.append(F.numpy())
tf.Tensor(749.22894, shape=(), dtype=float32)
tf.Tensor(569.4678, shape=(), dtype=float32)
tf.Tensor(502.00604, shape=(), dtype=float32)
tf.Tensor(490.0112, shape=(), dtype=float32)
tf.Tensor(489.01282, shape=(), dtype=float32)
tf.Tensor(488.71234, shape=(), dtype=float32)
tf.Tensor(488.56543, shape=(), dtype=float32)
tf.Tensor(488.47928, shape=(), dtype=float32)
tf.Tensor(488.4239, shape=(), dtype=float32)
tf.Tensor(488.38593, shape=(), dtype=float32)

It is intructive to compare the obtained result with the quasioptimum delivered by the TT-round procedure.

[4]:
quasi_sol = t3f.round(A, max_tt_rank=2)

val = 0.5 * t3f.frobenius_norm_squared(quasi_sol - A)
print(val)
tf.Tensor(518.3871, shape=(), dtype=float32)

We see that the value is slightly bigger than the exact minimum, but TT-round is faster and cheaper to compute, so it is often used in practice.

[5]:
plt.semilogy(log, label='Riemannian gradient descent')
plt.axhline(y=val.numpy(), lw=1, ls='--', color='gray', label='TT-round(A)')
plt.xlabel('Iteration')
plt.ylabel('Value of the functional')
plt.legend()
[5]:
<matplotlib.legend.Legend at 0x7f4102dbab70>
_images/tutorials_riemannian_8_1.png

Citation

If you use T3F in your research work, we kindly ask you to cite the paper describing this library

@article{JMLR:v21:18-008,
  author  = {Alexander Novikov and Pavel Izmailov and Valentin Khrulkov and Michael Figurnov and Ivan Oseledets},
  title   = {Tensor Train Decomposition on TensorFlow (T3F)},
  journal = {Journal of Machine Learning Research},
  year    = {2020},
  volume  = {21},
  number  = {30},
  pages   = {1-7},
  url     = {http://jmlr.org/papers/v21/18-008.html}
}