# Rknots Package Vignette

## Abstract

From a topological point of view, polymers can be modeled as open polygonal paths. Upon closure, these paths generate topological objects called knots. Multi-component knots are known as links. Rknots provides functions for the topological analysis of knots and links with a particular focus on biological polymers (e.g. proteins).

This vignette is divided in three main parts:

1. Data import and datasets
2. Methods
3. Case study illustrating how to use Rknots to identify knots in proteins.

For more details on the methods, please refer to Comoglio and Rinaldi, 2011 and Comoglio and Rinaldi, 2012.

#### Remarks

Rknots has been developed as a generalized framework. Theoretically, any arbitrary structure compliant with the requirements below can be loaded. Practically, this depends on the complexity of the structure to be analyzed, the task to be performed, and available memory.

## Data import and structure

Rknots deals with knots and links. These are represented as the coordinates of $$N$$ points in the three-dimensional space and a set of integer separators. $$N$$ points in 3D can be naturally represented by a $$N 3$$ matrix, where each row is a point and columns are $$x,y,z$$ coordinates. A knot is entirely defined by its points. However, an integer vector $$S$$ of length $$n-1$$ is required to represent $$n$$-component links. The elements of $$S$$ are called separators and describe the boundaries between components. Particularly, we defined a separator as the index of the edge that if not removed would connect components across.

For example, the Hopf link represented above is defined by an $$8 3$$ matrix and separator $$S={4}$$. The edge that would connect the two components is indicated by the gray dashed line.

### Datasets

Rknots provides two datasets:

1. The Rolfsen table of 250 knots with less than 11 crossings (enumerated according to Rolfsen). Full and minimal stickies representations are stored in the Rolfsen.table and Rolfsen.table.ms datasets, respectively.

library(Rknots)
## Loading required package: rgl
## Loading required package: rSymPy
## Loading required package: rJython
## Loading required package: rJava
## Loading required package: rjson
## Loading required package: bio3d
str( head(Rolfsen.table, 5) )
## List of 5
##  $3.1: num [1:48, 1:3] -5 -4.88 -4.39 -3.58 -2.54 ... ##$ 4.1: num [1:70, 1:3] -6.43 -6.26 -5.89 -5.32 -4.56 ...
##  $5.1: num [1:70, 1:3] -5.58 -5.85 -5.75 -5.32 -4.66 ... ##$ 5.2: num [1:82, 1:3] -1.191 -0.262 0.803 1.922 3.027 ...
##  $6.1: num [1:97, 1:3] -6.9 -6.46 -5.84 -5.05 -4.13 ... head( names(Rolfsen.table) ) ## [1] "3.1" "4.1" "5.1" "5.2" "6.1" "6.2" Note that knots are elements of a named list. Thus, instances can also be accessed by name str( Rolfsen.table['3.1'] ) ## List of 1 ##$ 3.1: num [1:48, 1:3] -5 -4.88 -4.39 -3.58 -2.54 ...

The difference between the full representation of a trefoil knot and its minimal stickies representation is shown below.

2. Coordinates and separators of 130 links up to 4 components, stored in the link.table dataset.

str( head(link.table, 5) )
## List of 5
##  $2.2.1:List of 2 ## ..$ points3D: num [1:74, 1:3] -6.3 -5.77 -5.13 -4.4 -3.59 ...
##   ..$ends : num 37 ##$ 4.2.1:List of 2
##   ..$points3D: num [1:83, 1:3] -0.603 -1.426 -2.171 -2.802 -3.292 ... ## ..$ ends    : num 40
##  $5.2.1:List of 2 ## ..$ points3D: num [1:79, 1:3] 0.333 -0.598 -1.518 -2.4 -3.213 ...
##   ..$ends : num 39 ##$ 6.2.1:List of 2
##   ..$points3D: num [1:102, 1:3] 2.131 1.328 0.447 -0.459 -1.338 ... ## ..$ ends    : num 50
##  $6.2.2:List of 2 ## ..$ points3D: num [1:98, 1:3] -6.57 -6.2 -5.67 -4.98 -4.14 ...
##   ..$ends : num 48 head( names(link.table) ) ## [1] "2.2.1" "4.2.1" "5.2.1" "6.2.1" "6.2.2" "6.2.3" Each element of the list has two slots containing link coordinates and separators. We now randomly sample a knot and a link structure to illustrate the main features of the package in the next sections. set.seed(123) knot <- makeExampleKnot(k = TRUE) #a knot link <- makeExampleKnot(k = FALSE) #a link ### Creating objects of class Knot Rknots introduces an S4 class called Knot. An object of class Knot has two mandatory slots: 1. points3D: an $$N 3$$ matrix, where each row is a point and columns are $$x,y,z$$ coordinates of a polygonal knot or link 2. ends: a vector of integers containing the separators of the link components. This slot is by default set to numeric(0) for knots. An new instance of a Knot object can be created either by the class constructor newKnot or with the generic constructor new. knot.cls <- new('Knot', points3D = knot) link.cls <- new('Knot', points3D = link$points3D, ends = link$ends) ( knot.cls <- newKnot(points3D = knot) ) ## An object of class 'Knot' ## Slot points3D: 114 x 3 matrix ## x y z ## [1,] -3.621864 -7.859621 -2.811422 ## [2,] -4.646111 -7.423467 -2.425915 ## [3,] -5.546167 -6.849921 -1.935343 ## [4,] -6.327957 -6.169424 -1.388736 ## [5,] -7.004683 -5.402181 -0.820578 ## [6,] -7.585773 -4.559776 -0.254575 ## [7,] -8.073381 -3.646790 0.290954 ## [8,] -8.457810 -2.662413 0.797152 ## [9,] -8.714276 -1.603682 1.237616 ## [10,] -8.801561 -0.474100 1.570860 ## ........ ........ ........ ## Slot ends: ( link.cls <- newKnot(points3D = link$points3D, ends = link$ends) ) ## An object of class 'Knot' ## Slot points3D: 127 x 3 matrix ## x y z ## [1,] -9.162255 -1.509655 1.829496 ## [2,] -9.245968 -2.659378 1.805300 ## [3,] -9.113029 -3.803538 1.856751 ## [4,] -8.744250 -4.892910 1.956594 ## [5,] -8.136572 -5.871593 2.055047 ## [6,] -7.310295 -6.682295 2.092592 ## [7,] -6.311310 -7.268003 2.009992 ## [8,] -5.216097 -7.575875 1.761636 ## [9,] -4.130063 -7.568088 1.329447 ## [10,] -3.171682 -7.239913 0.731781 ## ........ ........ ........ ## Slot ends: 49 100 Slots can be accessed with the accessors getCoordinates and getEnds, or by subsetting with the [ operator. Direct access with @ is discouraged. head( getCoordinates(knot.cls), 5 ) ## x y z ## [1,] -3.621864 -7.859621 -2.811422 ## [2,] -4.646111 -7.423467 -2.425915 ## [3,] -5.546167 -6.849921 -1.935343 ## [4,] -6.327957 -6.169424 -1.388736 ## [5,] -7.004683 -5.402181 -0.820578 getEnds(knot.cls) ## numeric(0) head( link.cls['points3D'], 5) ## x y z ## [1,] -9.162255 -1.509655 1.829496 ## [2,] -9.245968 -2.659378 1.805300 ## [3,] -9.113029 -3.803538 1.856751 ## [4,] -8.744250 -4.892910 1.956594 ## [5,] -8.136572 -5.871593 2.055047 link.cls['ends'] ## [1] 49 100 Similarly, slots can be modified by the setters setCoordinates, setEnds or using the [<- operator. knot.bu <- knot.cls #copy new.coordinates <- matrix( runif(60), ncol = 3 ) #generate random coordinates setCoordinates(knot.cls) <- new.coordinates #replace knot.cls ## An object of class 'Knot' ## Slot points3D: 20 x 3 matrix ## [,1] [,2] [,3] ## [1,] 0.4089769 0.6405068 0.41372433 ## [2,] 0.8830174 0.9942698 0.36884545 ## [3,] 0.9404673 0.6557058 0.15244475 ## [4,] 0.0455565 0.7085305 0.13880606 ## [5,] 0.5281055 0.5440660 0.23303410 ## [6,] 0.8924190 0.5941420 0.46596245 ## [7,] 0.5514350 0.2891597 0.26597264 ## [8,] 0.4566147 0.1471136 0.85782772 ## [9,] 0.9568333 0.9630242 0.04583117 ## [10,] 0.4533342 0.9022990 0.44220007 ## ........ ........ ........ ## Slot ends: knot.cls <- knot.bu #restore link.bu <- link.cls #copy setEnds(link.cls) <- c(10, 50, 90) #replace separators getEnds(link.cls) ## [1] 10 50 90 link.cls <- link.bu #restore Note that Rknots does not expect the new vector of separators to be of the same length as the original one, i.e. it is possible to change the link type and this is useful when local operations on the link are performed. ## Simplyfing structures and computing invariants The computational cost of polynomial invariants is generally very high. Therefore, structures must be reduced beforehand by reducing the number of points while retaining the topology of the original structure. To this end, two structure reduction algorithms have been implemented in Rknots: 1. the Alexander-Briggs (AB) algorithm based on elementary deformations Alexander and Briggs, 1926; Reidemeister, 1926. 2. the Minimal Structure Reduction (MSR) algorithm Comoglio and Rinaldi, 2011 based on generalized Reidemeister moves. The AB algorithm is very efficient, whereas the MSR algorithm, by working on the knot projection, contains intrinsically more information but is significantly slower. The reduction can be applied directly to coordinates and separators knot.AB <- AlexanderBriggs(points3D = knot, ends = c()) str(knot.AB) ## List of 2 ##$ points3D: num [1:17, 1:3] -3.62 -8.07 -6 6.32 5.09 ...
##  $ends : num(0) knot.msr <- msr(points3D = knot, ends = c()) str(knot.msr) ## List of 3 ##$ points3D: num [1:20, 1:3] -3.622 -6.782 2.587 7.076 0.714 ...
##  $ends : NULL ##$ M       : num [1:19, 1:19] 0 0 0 0 0 0 0 0 0 -1 ...
link.AB <- AlexanderBriggs(points3D = link$points3D, ends = link$ends)
str(link.AB)
## List of 2
##  $points3D: num [1:19, 1:3] -9.16 -6.31 2.25 4.3 -2.53 ... ##$ ends    : num [1:2] 6 13
link.msr <- msr(points3D = link$points3D, ends = link$ends)
str(link.msr)
## List of 3
##  $points3D: num [1:17, 1:3] -9.162 -1.974 4.102 -0.367 -5.506 ... ##$ ends    : num [1:2] 6 13
##  $M : num [1:16, 1:16] 0 0 0 0 0 0 0 0 0 0 ... or to an object of class Knot with the reduceStructure function. knot.cls.AB <- reduceStructure(knot.cls, algorithm = 'AB' ) knot.cls.MSR <- reduceStructure(knot.cls, algorithm = 'MSR' ) link.cls.AB <- reduceStructure(link.cls, algorithm = 'AB' ) link.cls.MSR <- reduceStructure(link.cls, algorithm = 'MSR' ) link.cls.AB ## An object of class 'Knot' ## Slot points3D: 19 x 3 matrix ## x y z ## [1,] -9.162255 -1.509655 1.829496 ## [2,] -6.311310 -7.268003 2.009992 ## [3,] 2.249036 2.716286 -0.849297 ## [4,] 4.296789 9.491975 0.390813 ## [5,] -2.529795 6.531452 -0.926613 ## [6,] -9.162255 -1.509655 1.829496 ## [7,] -5.362280 3.025527 -1.999231 ## [8,] -6.574879 7.215227 -1.337988 ## [9,] -1.540334 8.744882 1.913969 ## [10,] 5.546724 4.548880 -2.562918 ## ........ ........ ........ ## Slot ends: 6 13 The function msr (and the MSR method) return both the simplified structure, as well as the intersection matrix $$M$$, which contains position and sign of the crossings in the knot/link diagram. This function can also be used to partially reduce structures by controlling the number of iterations (default to 100). The original and simplified structures can be inspected by plotting a knot diagram with plotDiagram. par( mfrow=c(1,2) ) #plotDiagram(knot.AB$points3D, knot.AB$ends, lwd = 2, main = 'Alexander-Briggs') #plotDiagram(knot.msr$points3D, knot.msr$ends, lwd = 2, main = 'MSR') plotDiagram(link.AB$points3D, link.AB$ends, lwd = 2, main = 'Alexander-Briggs') plotDiagram(link.msr$points3D, link.msr$ends, lwd = 2, main = 'MSR') or directly with the plot function is the object is of class Knot. plot(link.cls.AB, lend = 2, lwd = 3, main = 'using par()') ## Case study: Protein knot analysis In this section, we apply Rknots to detect and characterize knots in proteins. Two PDB files are part of the package data and will be used in the following case study: • The Rds3p protein (PDB identifier 2K0A), a member of the U2 snRNP essential for pre-mRNA splicing. Rds3p features a left-handed trefoil knotted structure van Roon et al. 2008 • The first chain (A) of the E. Coli alkaline phosphatase (D153G mutant). This protein presents a structural gap, a potential source of false positives. A protein can be loaded from the file system or fetched from the Protein Data Bank using the loadProtein function, which returns a list of matrices where each element contains the 3D coordinates of a given protein chain. By default, loadProtein performs gap finding for each chain backbone. A cutoff parameter corresponding to the maximum allowed euclidean distance between two consecutive alpha-Carbon residues facilitates a custom definition of a gap. If a distance is greater than cutoff, the chain is split at the corresponding position and subchains are generated. protein <- loadProtein(system.file("extdata/2K0A.pdb", package="Rknots")) ## PDB has multiple END/ENDMDL records ## multi=FALSE: taking first record only ## HEADER METAL BINDING PROTEIN 31-JAN-08 2K0A ## Warning in read.table(text = sapply(raw.atom, split.fields), ## stringsAsFactors = FALSE, : not all columns named in 'colClasses' exist ## Summary of the distance vector ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 3.744 3.789 3.799 3.800 3.810 3.856 ## No gap found protein<- loadProtein('2K0A') #from the PDB ## Note: Accessing on-line PDB file ## PDB has multiple END/ENDMDL records ## multi=FALSE: taking first record only ## HEADER METAL BINDING PROTEIN 31-JAN-08 2K0A ## Warning in read.table(text = sapply(raw.atom, split.fields), ## stringsAsFactors = FALSE, : not all columns named in 'colClasses' exist ## Summary of the distance vector ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 3.744 3.789 3.799 3.800 3.810 3.856 ## No gap found str(protein) ## List of 1 ##$ A: num [1:109, 1:3] -9.22 -9.93 -8.36 -11.83 -14.64 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$: chr [1:109] "2" "11" "18" "29" ... ## .. ..$ : chr [1:3] "x" "y" "z"

The 3D structure of the imported protein and the corresponding backbone model can be then visualized with plotKnot3D, which leverages on the superb graphics of the rgl package. Particularly, any parameter of the rgl functions lines3d and spheres3d can be supplied.

ramp <- colorRamp(c('blue', 'white', 'red'))
pal <- rgb( ramp(seq(0, 1, length = 109)), max = 255)

plotKnot3D(protein$A, colors = list( pal ), lwd = 8, radius = .4, showNC = TRUE, text = TRUE) By setting showNC = TRUE the chain is oriented by labeling the protein N-ter and C-ter, whereas text = TRUE adds a residue number to each point. A snaphot of the 3D plot is illustrated below. To find knots in proteins, a single chain has to be supplied and coerced to a Knot object. The available chains are returned by loadProtein. protein <- newKnot(protein$A)
protein
## An object of class 'Knot'
## Slot points3D:  109 x 3 matrix
##           x       y      z
## 2    -9.225 -24.265 -3.881
## 11   -9.927 -20.804 -5.217
## 18   -8.363 -19.745 -8.522
## 29  -11.829 -18.718 -9.673
## 40  -14.645 -16.690 -8.082
## 64  -12.639 -15.926 -4.953
## 82  -10.105 -13.650 -6.562
## 99  -10.944 -10.485 -4.591
## 119  -9.794  -8.744 -7.783
## 131  -6.918  -7.069 -5.969
##        ........  ........  ........
## Slot ends:

Closure and projection can then be performed using closeAndProject. This function applies a Principal Component Analysis on the closed structure to simplify the computation of invariants and for visualization purposes.

protein.cp <- closeAndProject( protein, w = 2 )
par(mfrow = c(1,2))
plot(protein, main = 'original', lwd = 2)
plot(protein.cp, main = 'closed & projected', lwd = 2)

Next, invariants can be computed with

( delta.p <- computeInvariant( protein.cp, invariant = 'Alexander') )
## [1] "-1 + t1 + 1/t1"
( jones.p <- computeInvariant( protein.cp, invariant = 'Jones', skein.sign = -1) )
## [1] "1/t - 1/t^4 + t^(-3)"
( homfly.p <- computeInvariant( protein.cp, invariant = 'HOMFLY', skein.sign = -1) )
## [1] "2*l^2 + l^2*m^2 - l^4"

For simple knots, the knot type can be assigned automatically using getKnotType

getKnotType(polynomial = delta.p, invariant = 'Alexander')
## [1] "Left-handed Trefoil knot (3_1*)" "Right-handed Trefoil knot (3_1)"
getKnotType(polynomial = homfly.p, invariant = 'HOMFLY')
## [1] "Left-handed Trefoil knot (3_1*)"
getKnotType(polynomial = homfly.p, invariant = 'HOMFLY', full.output = TRUE)
##                         Knot.type                            HOMFLY
## "Left-handed Trefoil knot (3_1*)"           "2*l^2 + l^2*m^2 - l^4"
##                             Jones                         Alexander
##            "1/t - 1/t^4 + t^(-3)"                  "-1 + t1 + 1/t1"
##      "http://katlas.org/wiki/3_1"

The results indicate the Rds3p protein has a left-handed knot, which can be compared with the right-handed trefoil polynomial in the Rolfsen knot table by means of another Rknots utility:

trefoil <- Rolfsen.table[[1]]
trefoil <- newKnot(trefoil)
( homfly.tr <- computeInvariant(trefoil, 'HOMFLY') )
## [1] "-1/l^4 + 2/l^2 + m^2/l^2"
( homfly.tl <- HOMFLY2mirror(homfly.tr) )
## [1] "2*l^2 + l^2*m^2 - l^4"
identical( homfly.p, homfly.tl )
## [1] TRUE

Finally, if a protein has more than one chain one can interate over all possible chains using lapply. The following code illustrates how to perform this analysis by fetching a protein with two chains from the PDB

processChain <- function(protein, i) {
chain <- newKnot(protein[[i]])
chain <- closeAndProject( chain )
return( computeInvariant(chain, 'HOMFLY') )
}

lengthChain <- function(protein, i) return( nrow(protein[[i]]))

protein <- loadProtein('1AJC', join.gaps = FALSE, cutoff = 7)
##   Note: Accessing on-line PDB file
##   HEADER    NON SPECIFIC MONO-ESTERASE              18-JUL-95   1AJC
## Warning in read.table(text = sapply(raw.atom, split.fields),
## stringsAsFactors = FALSE, : not all columns named in 'colClasses' exist
## Loading chains:
##   #aminoacids
## A         446
## B         449
## Summary of the distance vector
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##   3.756   3.793   3.805   3.822   3.817  11.040
## Chain split at position
##  404
## Summary of the distance vector
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
##   3.741   3.793   3.805   3.806   3.817   3.864
## No gap found
str(protein)
## List of 3
##  $A1: num [1:404, 1:3] 66.9 68.4 67.3 67.4 63.9 ... ## ..- attr(*, "dimnames")=List of 2 ## .. ..$ : chr [1:404] "2" "9" "16" "25" ...
##   .. ..$: chr [1:3] "x" "y" "z" ##$ A2: num [1:42, 1:3] 67.8 67.3 66.5 69.4 70.4 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$: chr [1:42] "2962" "2970" "2976" "2985" ... ## .. ..$ : chr [1:3] "x" "y" "z"
##  $B : num [1:449, 1:3] 73.6 74.6 71.9 71.3 73.8 ... ## ..- attr(*, "dimnames")=List of 2 ## .. ..$ : chr [1:449] "3278" "3285" "3292" "3301" ...
##   .. ..$: chr [1:3] "x" "y" "z" chains <- names(protein) polynomials <- lapply( 1: length(chains) , function(i) { ifelse(lengthChain(protein, i) > 6, processChain(protein, i), 1) } ) cbind(chains, polynomials)  ## chains polynomials ## [1,] "A1" "-1/l^4 + 2/l^2 + m^2/l^2" ## [2,] "A2" "1" ## [3,] "B" "1" The results indicate that the first chain has been split and resulted in a two unknotted subchains. The second chain instead bears a right-handed trefoil knot. Note: by ignoring gap finding, we would have found a knot in the first chain. protein <- loadProtein('1AJC', join.gaps = TRUE) ## Note: Accessing on-line PDB file ## HEADER NON SPECIFIC MONO-ESTERASE 18-JUL-95 1AJC ## Warning in read.table(text = sapply(raw.atom, split.fields), ## stringsAsFactors = FALSE, : not all columns named in 'colClasses' exist ## Loading chains: ## #aminoacids ## A 446 ## B 449 str(protein) ## List of 2 ##$ A: num [1:446, 1:3] 66.9 68.4 67.3 67.4 63.9 ...
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$: chr [1:446] "2" "9" "16" "25" ... ## .. ..$ : chr [1:3] "x" "y" "z"
##  $B: num [1:449, 1:3] 73.6 74.6 71.9 71.3 73.8 ... ## ..- attr(*, "dimnames")=List of 2 ## .. ..$ : chr [1:449] "3278" "3285" "3292" "3301" ...
##   .. ..\$ : chr [1:3] "x" "y" "z"
chains <- names(protein)
polynomials <- lapply( 1: length(chains) ,
function(i) {
ifelse(lengthChain(protein, i) > 6, processChain(protein, i), 1) } )
cbind(chains, polynomials)        
##      chains polynomials
## [1,] "A"    "1"
## [2,] "B"    "1"

## Session Info

sessionInfo()
## R version 3.3.1 (2016-06-21)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.12 (Sierra)
##
## locale:
## [1] C/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
##
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base
##
## other attached packages:
## [1] Rknots_1.3.2   bio3d_2.2-4    rSymPy_0.2-1.1 rJython_0.0-4
## [5] rjson_0.2.15   rJava_0.9-8    rgl_0.96.0
##
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.7     knitr_1.14      magrittr_1.5    xtable_1.8-2
##  [5] R6_2.1.3        stringr_1.1.0   tools_3.3.1     parallel_3.3.1
##  [9] grid_3.3.1      htmltools_0.3.5 yaml_2.1.13     digest_0.6.10
## [13] shiny_0.14      formatR_1.4     htmlwidgets_0.7 evaluate_0.9
## [17] mime_0.5        rmarkdown_1.0   stringi_1.1.1   jsonlite_1.0
## [21] httpuv_1.3.3

## References

Adler D. and Murdoch D. rgl: 3D visualization device system (OpenGL).

Alexander J.W. and Briggs G.B. (1926) On types of knotted curves. Ann of Math, 28, 562-586.

Comoglio F. and Rinaldi M. (2011) A Topological Framework for the Computation of the HOMFLY Polynomial and Its Application to Proteins. PLoS ONE 6(4), e18693.

Comoglio F. and Rinaldi M. (2012) Rknots: topological analysis of knotted biopolymers with R. Bioinformatics 28(10), 1400-1401.

Reidemeister K. (1926), Abh Math Sem Univ Hamburg 5: 24-32.

van Roon A.M., Loening N.M., Obayashi E., Yang J.C., Newman A.J., Hernandez H., Nagai K. and Neuhaus D., (2008) Solution structure of the U2 snRNP protein Rds3p reveals a knotted zinc-finger motif, Proc Natl Acad Sci USA, 105, 9621-9626.