There are currently three ways to find the data you need on CGC
Please read tutorial here
Seven Bridges’ SPARQL console, available at https://opensparql.sbgenomics.com.
Please read following tutorials first
Here let me show you an example here, you will need R package “SPARQL”
library(SPARQL)
endpoint = "https://opensparql.sbgenomics.com/bigdata/namespace/tcga_metadata_kb/sparql"
query = "
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix tcga: <https://www.sbgenomics.com/ontologies/2014/11/tcga#>
select distinct ?case ?sample ?file_name ?path ?xs_label ?subtype_label
where
{
?case a tcga:Case .
?case tcga:hasDiseaseType ?disease_type .
?disease_type rdfs:label 'Lung Adenocarcinoma' .
?case tcga:hasHistologicalDiagnosis ?hd .
?hd rdfs:label 'Lung Adenocarcinoma Mixed Subtype' .
?case tcga:hasFollowUp ?follow_up .
?follow_up tcga:hasDaysToLastFollowUp ?days_to_last_follow_up .
filter(?days_to_last_follow_up>550)
?follow_up tcga:hasVitalStatus ?vital_status .
?vital_status rdfs:label ?vital_status_label .
filter(?vital_status_label='Alive')
?case tcga:hasDrugTherapy ?drug_therapy .
?drug_therapy tcga:hasPharmaceuticalTherapyType ?pt_type .
?pt_type rdfs:label ?pt_type_label .
filter(?pt_type_label='Chemotherapy')
?case tcga:hasSample ?sample .
?sample tcga:hasSampleType ?st .
?st rdfs:label ?st_label
filter(?st_label='Primary Tumor')
?sample tcga:hasFile ?file .
?file rdfs:label ?file_name .
?file tcga:hasStoragePath ?path.
?file tcga:hasExperimentalStrategy ?xs.
?xs rdfs:label ?xs_label .
filter(?xs_label='WXS')
?file tcga:hasDataSubtype ?subtype .
?subtype rdfs:label ?subtype_label
}
"
qd <- SPARQL(endpoint,query)
df <- qd$results
head(df)
You can use the CGC API to access the TCGA files found using SPARQL queries. To get files that have download links, you will need to use the SPARQL variable path in your query.
## api(api_url=base,auth_token=auth_token,path='action/files/get_ids', method='POST',query=None,data=filelist)
df.path <- df[,"path"]
df.path
You can directly copy those files to a project, make sure if the files is controled access
library(sevenbridges)
a = Auth(platform = "cgc", username = "tengfei")
## get id (only works for CGC platform)
ids = a$get_id_from_path(df.path)
## copy file from id to project with controlled access
(p = a$project(id = "tengfei/control-test"))
a$copyFile(ids, p$id)
Now have fun with more examples in this tutorial