The name FunCoup [fən kəp] stands for functional coupling. FunCoup is a framework to infer genome-wide functional couplings in 17 model organisms. Functional coupling, or functional association, is an unspecific form of association that encompasses direct physical interaction but also more general types of direct or indirect interaction like regulatory interaction or participation the same process or pathway.
Briefly, the FunCoup framework integrates 10 different evidence types derived from high-throughput genomics and proteomics data in a naive Bayesian integration procedure. The evidence types are discussed in more detail below. Evidence is transfered across species using orthology assignments from InParanoid.
The naive Bayesian integration combines the likelihood for coupling and no coupling in the form of log-likelihood ratios (LLRs) for all data sets. LLRs for data of the same type are corrected to account for cross-data redundancies. The sum of LLRs for a gene pair is called the a final Bayesian score (FBS) and expresses the amount of support the data shows for a coupling. To simplify the interpretation the FBS is transformed into a probabilistic confidence score that ranges from 0 to 1. For more details, please have a look at the FunCoup publications.
FunCoup differentiates between five different classes of functional couplings: protein-protein interaction (PPI), complex co-membership, co-membership in a metabolic pathway, co-membership in a signaling pathway and shared operon. For each class a separate network is created. Additionally a composite or summary network is created by taking the strongest coupling from the different classes for each pair.
Evidences are the signals that support or contradict the presence of functional coupling. Typically some kind of scoring function is used to convert raw data into evidence. For a complete list of all data see here. FunCoup integrates 9 different evidence types listed below.
Physical protein interaction (PINs) from iRefIndex are combined, where interactions confirmed by multiple publications get a higher score. The scoring function further down-weights interactions from large scale experiments and prey-prey interactions.
mRNA co-expression across multiple experimental conditions or tissues provides a strong signal for functional coupling. FunCoup evaluates co-expression as Spearman correlation of expression profiles. For each species multiple selected large scale experiments from GEO are integrated.
The concordance between mRNA and protein expression is low. Directly measured protein expression from the Human protein atlas provides a more accurate estimation of protein abundances and is used to complement the mRNA expression data.
FunCoup does not explicitly consider genetic interactions as functional coupling. Rather, between pathway genetic interactions are integrated in the form of genetic interaction profile similarity. The underlying assumption is that genes in same process or pathway have similar genetic interactions with genes in other alternative processes or pathways.
Genes are regulated by multiple transcription factors (TFs) and FunCoup uses TF profile similarity as a evidence for functional coupling.
Similar to shared transcription factor binding, co-regulation by multiple miRNA is used as evidence for function coupling.
Shared sub-cellular localization and dissimilar localization are good positive and negative indicators for functional couplings. FunCoup uses localizations from the cellular component GO ontology. Co-localizations is weighted by the specificity of the localization, where specific localizations get a high weight and unspecific localizations get a low weight.
Predicted domain interaction from UniDomInt are used a evidence. The confidence score provided by UniDomInt is summed up for all domain pairs of two proteins.
A phylogenetic profile is a gene conservation pattern across multiple species. Phylogenetic profile similarity provides an indication for functional coupling. FunCoup scores profile similarity as fraction of branch lengths shared by both genes or exclusive covered by only one gene in a phylogeny of 273 species derived from InParanoid.
QMS data sets were obtained via PaxDB(v. 4.0). In a preprocessing step only the highest abundant proteins per condition were extracted and labeled accordingly. These profiles were further evaluated using an adapted Jaccard index (12). Here two proteins being abundant across different tissues would achieve high similarity scores.
Schmitt, T., Ogris, C., & Sonnhammer, E. L. (2013).
FunCoup 3.0: database of genome-wide functional coupling networks.
Nucleic Acids Research, 42(Database issue), D380-8
Alexeyenko, A., Schmitt, T., Tjärnberg, A., Guala, D., Frings, O., & Sonnhammer, E. L. (2012).
Comparative interactomics with Funcoup 2.0.
Nucleic Acids Research, 40(Database issue), D821-8
Alexeyenko, A., & Sonnhammer, E. L. (2009).
Global networks of functional coupling in eukaryotes from comprehensive data integration.
Genome Research, 19(6), 1107-1116
The default query retrieves the most strongly connected genes to one or multiple genes from the selected species network. The query searches for exact matches of symbols or identifiers and supports a variety of different identifier types including Ensembl gene, protein, and transcript IDs, NCBI gene IDs ,RefSeq IDs and UniProt IDs. For a search with multiple genes the identifiers should be separated by spaces. To get more control and alternative query options expand the advanced search options.
There are 4 different categories of advanced search options. The first category "Sub-network selection" controls how the subnetwork around the query is retrieved. The sub-network retrieval starts from the query genes and adds the top most strongly connected genes which have at least one connection to the query that is stronger than the given confidence threshold, finally all links between the retrieved gene set that are stronger than the threshold are added. Three parameters can be adjusted for this expansion: the confidence threshold, the number of most strongly connected genes that should be added, and how many expansion steps should be performed. If more than one expansion step (the default) is used, genes that were retrieved in the previous iteration are used as query set in the next iteration and the process is repeated. If 0 expansions steps are selected only links between the query genes are retrieved and no genes are added.
There are 3 different algorithms to expand the network that differ in how multiple query genes are handled. The simplest algorithms retrieves the X strongest interactors to any of query genes. If common neighbors are prioritized all links to all query genes are consider and genes that are most strongly linked to many query genes are prioritized. Otherwise only the strongest link to a query gene counts. The third options is to threat the query genes as independent and retrieve the X interactors for every query gene.
The focus of FunCoup is the prediction of novel couplings, but known coupled genes that are part of the PPI or Complex gold standard can be added.
The next advanced search option tap alows run a comparative query across the networks of multiple species. This query retrieves the orthologs to the query genes and the sub-networks around them that maximizes the number of conserved links. If the checkbox at the bottom of the tab is check, the serach requires sufficient species-specific evidence (determint by the automatically lowered threshold in the sub-network tab). If this option is not checked orthology transfer might led to spurious sub-networks conservation.
The next tab allows to restrict the search to a specific functional coupling class, per default the search operates on the strongest coupling class for each link. Furthermore, it is possible to require sufficient evidence from a subset of the evidence types or from a subset of species. It should however be noted that the display sub-network will always show all classes, species, and evidence types.
The last advanced search option tab allows to restrict the gene set from which the subnetwork is drawn either to a user-defined set of genes or to genes with a given annotation.
It is possible to combine search options from different tabs whenever this is sensible.
The MaxLink search provides an alternative to the standard search. MaxLink has been successfully applied to predict novel cancer genes and was first described in Network-based Identification of Novel Cancer Genes (Östlund 2010). It is meant to be used with a long list of related query genes and retrives genes that are signifcantly stronger connected to the query than expected by chance.
The network view displays the retrieved sub-network as a graph. Please note that the displayed network includes only the strongest links between the non-query subnetwork genes.
Per default the viewer shows a summary network with the links from the strongest coupling class for each gene pair. The menu box on the left is grouped in three section; Info, Nodes and Links. The sections Nodes and Links have various options to manipulate the network. The Info section displays additional information about a node or a link when the user hovers over it, otherwise the total number of genes and links within the subnetwork are shown. Within the Nodes section the user can vary node Label and node Size, highlight a Pathway or manipulate a node Charge. Label: the default node label refers to the query identifier, but can be set to UniProt, Ensembl or NCBI ID. Additionally the label can also display species name, node degree or, if set to none, hide all the labels. Size: Node sizes scale with node degrees to emphasize gene importance. This can be adapted to scale depending on the number of participated pathways or not scale at all if set to none. Pathway: This option is disabled per default. If a pathway is chosen the viewer highlights participating nodes in black. Charge: This slider alters the tension between the nodes. The Link section contains three options, Evidence source, Min confidence and Link distance. Evidence source: Per default, a link represents the functional association inferred using all gold standards.
The interactions view lists all interactions between subnetwork genes and shows details about how the links that have been derived. The query genes are highlighted in yellow query orthologs are highlighted in green. Clicking on a Gene identifier will bring up a box that allows to use the gene or the pair as a query or to add the gene to the current query. Futhermore, cross-references and gene description are given. The green and red boxes represent positive and negative LLR for the different evidence types and species, hovering with the cursor over the box will display the LLR. Kown coupled pairs in the PPI or Complex gold standard are highlighted with a blue box. Initially only the strongest coupling class for each pair is shown. Clicking on the little triange in front of the interaction partner will expand all other coupling classes.
Clicking on the green or red boxes or on the info symbol on the right will bring up a box displaying all evidence that led to the prediction. A green or red box shows if the evidence is positive or negative, hovering over the box will display the LLR. Next to box is a description of the evidence with crosslinks to data sources. The following two columns show the species from which the evidence stems and the type of the evidence.
The interactors view gives an overview of all subnetwork genes. Query genes and other subnetwork genes are displayed in separted boxes. For each gene the symbol/displayed identfier and the Ensembl gene ID are shown. A grey circle in front of the identifier shows the degree of the gene in the sub-network clicking on the cirle will highlight all connected genes. A network symbol to the left of the circle allows to use the gene as query or to add it to the current query. The plus button shows cross-references for the gene. If the results of a MaxLink search are shown the number of links to the query (MaxLink score) and the significance of the hit are show shown in separate columns.
Enricht terms in the subnetwork are shown below or next to the genes list. The indivdual terms can be collapsed and clicking on the number of annoted terms will highlight all genes that are annoted with the term as is show in picture below for "chromatin modifiaction"
The save view allows to download the query subnetwork either in XML format or as tab-separated values (TSV) file. TSV is a decato standard for network data and can amongst other things be import into Cytoscape. The different colums are: the confidence score, the FBS score, the gene pair, FBS scores for the 4 different coupling classes, LLRs for the different evidence types, LLRs for the species, and the class with strongest coupling. The links in the network file correspond to the strongest coupling class. For a comparative query the networks for the different species are given in separate TSV files.
The modify serach view brings back the current search and allows to change the keywords or to review and modify the parameters.
Please cite the latest paper from here if you are using the database or if you want refer to the FunCoup algorithm. If you are using MaxLink please also cite Network-based Identification of Novel Cancer Genes (Östlund 2010).
Per default the FunCoup search returns the subnetwork of you query and the strongest coupled genes to your query. If your are only interested in links between your query genes go into the advanced search options and set the "Expansion depth" on the "Sub-network selection" tab to 0.
Have you tried lowering the confidence threshold?
It should be noted that the main objective of FunCoup is the prediction of
Creating a functional coupling network for a species requires a lot of data including high-quality kown couplings. The procces involes a lot of manulal work and is computationally demanding, we therfore focus on a small number of well studied model organisms.