Publications by Nguyen Ngoc Thieu

Fix DADA2 Errors in R

05.12.2024

Dada2 is a plugin from QIIME2 needed for targeted gene metagenomic data analysis (16S sequencing). I have errors when running dada2. The error messeage is: “An error was encountered while running DADA2 in R” I want to run these codes system("qiime dada2 denoise-single \ --i-demultiplexed-seqs demux.qza \ --p-trim-left 13 \ --p-trunc-len ...

1439 sym R (722 sym/6 pcs)

ChIP-seq Data Analysis-Part 2-Visualizing Using Genome Browsers and R

25.11.2024

R packages needed for visualizing ChIP-seq data library(clusterProfiler) library(ChIPseeker) library(TxDb.Hsapiens.UCSC.hg19.knownGene) library(EnsDb.Hsapiens.v75) library(AnnotationDbi) library(org.Hs.eg.db) library(dplyr) library(ComplexHeatmap) library(circlize) library(GenomicRanges) library(GenomicFeatures) 1. Convert files for visualizing T...

1620 sym R (10958 sym/20 pcs) 1 img

ChIP-seq Data Analysis - Part 1

25.11.2024

1. Introduction ChIP-Seq (Chromatin Immunoprecipitation followed by Sequencing) data analysis aims to identify and understand the binding sites of DNA-associated proteins (such as transcription factors, histones, or chromatin remodelers) across the genome. This information is crucial for studying gene regulation, chromatin states, and epigenetic me...

2992 sym

R Packages for Bioinformatics

20.11.2024

R 4.4.3 is needed Reference: https://compgenomr.github.io/book/software-information-and-conventions.html#packages-needed-to-run-the-book-code 1. Core Bioconductor and dependencies packages install.packages("BiocManager") # install bioconductor package "rtracklayer" BiocManager::install("rtracklayer") install.packages(c("tidyverse", "snpStats", "k...

516 sym Python (3368 sym/10 pcs)

HISAT2: Gene Expression Counts

17.11.2024

This post is aimmed at showing how to count gene expression using HISAT2. Loading and preparing Data Data that I need are cfRDNA sequences downloaded from ENR Browser. SRR15852393.fastq SRR15852394.fastq SRR15852395.fastq The first step is to check quality of data system("fastqc SRR15852393.fastq") Then I trim data as: system(" java -jar /U...

1351 sym 3 img

Determine Fetal Gender Using cfDNA Sequences

15.11.2024

This post shows how to determine the fetal gender using cfDNA as in NIPT test. 1. Download data Downloading open-access data from the NBCI system("prefetch SRR31264073") the output file is SRR31264073.sralite Convert SRR31264073.sralite to SRR31264073.fastq.gz file system("fasterq-dump --split-files ./SRR31264073/SRR31264073.sralite ") The output...

1186 sym

Somatic Variant Calling Using Mutect2 and GATK

05.11.2024

1. Data In this post I use fastq.gz files from normal duodenal tissue: HG008-N-D_CGGACAAC-AATCCGGA_H3LLJDSXC_L001_001.R1.fastq.gz, and HG008-N-D_CGGACAAC-AATCCGGA_H3LLJDSXC_L001_001.R2.fastq.gz and tumour sample of pancrease: HG008-T_TTCCTGTT-AAGATACT_HJVY2DSX7_L001_001.R1.fastq.gz, and HG008-T_TTCCTGTT-AAGATACT_HJVY2DSX7_L001_001.R2.fastq.gz syste...

2042 sym Python (8612 sym/20 pcs) 2 img

Variant Calling Using GATK for One Paired-End Sample

02.11.2024

1. Data download Script to call germline variants in a human WGS paired end reads 2 X 100bp Following GATK4 best practices workflow - https://gatk.broadinstitute.org/hc/en-us/articles/360035535932-Germline-short-variant-discovery-SNPs-Indels- directories: ref=“/Users/nnthieu/vcfgatk/refgenome/hg38.fa” known_sites=“/Users/nnthieu/vcfgatk/refvc...

1918 sym 1 tbl

Variant Calling Using GATK

23.10.2024

My post is aimed at showing nessessory steps to download and process data to call variants from sequencing fastq files mapping to the reference genome file. 1. Acquiring the raw data in FASTQ files I will use human raw data from 1000 Genome project to demonstrate variant calling using GATK4 best practice. These fastq files that I need to download ...

4530 sym Python (10123 sym/43 pcs) 10 img

Read Sequence Alignment-Mapping Using BWA, Bowtie2

11.10.2024

Except for the sequencing applications that use de novo genome assembly, read alignment/mapping to a reference genome is the most fundamental step in the workflow of the sequencing data analysis. # setwd("/Users/nnthieu/refgenome") Download data system("mkdir data") system("cd data") system("prefetch SRR769545") system("fasterq-dump SRR769545.sra"...

2466 sym