搜索 - 腾讯云开发者社区-腾讯云

文章/答案/技术大牛

发布

来自专栏生信修炼手册
使用pysam操作BAM文件
pysam模块对samtools和tabix进行了封装，可以在python程序内部来操作和访问相关的文件，具体地，支持以下4种文件 1. Fasta/Fastq 2. VCF 3. Fasta和Fastq Fasta和Fastq，也常称为fastx格式，对于读取而言，pysam提供了以下接口 >>> with pysam.FastxFile(fastq) as f: ... 对于有fai索引的fasta文件，还可以通过fetch函数来提取对应region的碱基，此时的读取方式如下 >>> import pysam >>> fasta = pysam.FastaFile('input.fasta BAM 对于Bam文件，遍历行的操作如下 >>> bam = pysam.AlignmentFile('input.bam') >>> for i in bam: ... 除了访问操作，也可以调用samtools的功能，因为pysam是对samtools的封装，所以samtools的子命令在该模块中，可以通过函数形式来调用，用法如下 >>> print(pysam.view.usage
2K20发布于 2020-12-24
来自专栏生物信息学、python、R、linux
python处理bamsam文件利器pysam
在python中读取、处理文件可以用pysam这个包。以下简单介绍一下这个包的使用。读取文件 import pysam samfile = pysam.AlignmentFile("ENCFF191HCE.sort.bam", "rb") 仅读取某条染色体某个区域的reads： # MarkDuplicates.2'), ('XG', 0), ('NM', 0), ('XM', 0), ('XO', 0), ('XT', 'U')] 写入文件 # 将双端reads写入新文件 pairedreads = pysam.AlignmentFile ): if read.is_paired: pairedreads.write(read) pairedreads.close() samfile.close() sort pysam.sort 详细可参阅手册：https://buildmedia.readthedocs.org/media/pdf/pysam/v0.11.2.2/pysam.pdf
3.3K20发布于 2020-04-09
来自专栏简说基因
生物信息基础：基因组文件读写（pysam）
本文以 Fasta/Fastq 文件的读写为例，介绍 Pysam 的用法，详细教程请查看官网。 Install pip install pysam 或者 conda install pysam Fasta files 对于 Fasta 文件，可以实现随机访问，前提是要先创建 faidx 索引。 import pysam # 构建FastaFile对象，随机访问需要先创建faidx，没有的话在这里会自动创建faidx fa = pysam.FastaFile("ex1.fa") # Fasta import pysam with pysam.FastxFile("ex1.fa") as fh: for record in fh: print(record.name) 参考资料 [1] Pysam: https://pysam.readthedocs.io/en/latest/index.html [2] htslib: http://www.htslib.org/
2.5K10发布于 2020-11-19
来自专栏小明的数据分析笔记本
学python：使用python的pysam模块统计bam文件中spliced alignment的reads的数量
what-is-a-cigar image.png image.png 所以如果是spliced alignment 的reads cigar关键词中间会有N，只要统计cigar关键词就可以了 python的pysam 模块能够统计一个给定区间内所有reads的数量，也可以统计每个reads的一些性质 import pysam bamfile = pysam.AlignmentFile(".. image.png image.png 可以探索的内容很多结合gtf文件统计每个基因区间内的spliced alignment 的reads的数量 import argparse import pysam (".")[0] Sam = args.bam.split("/")[-2] new_df = df.loc[df['chromosome'] == chromo_name] bamfile = pysam.AlignmentFile 这里只统计reads1中的spliced alignment 如果是双端测序的数据，pysam统计reads数量的时候会计算为2个分为reads1和reads2 脚本的使用方式 python stat_spliced_junction_read_orientation.py
1.2K30编辑于 2023-01-06
来自专栏生信技能树
使用pip安装python包的时候会多次安装依赖包
subprocess32, setuptools, kiwisolver, pytz, python-dateutil, pyparsing, matplotlib, pandas, pyfaidx, pysam matplotlib-2.2.2 numpy-1.14.5 pandas-0.23.3 pillow-5.2.0 pip-10.0.1 pyfaidx-0.5.4.1 pyparsing-2.2.0 pysam 1.1.0 setuptools-40.0.0 six-1.11.0 subprocess32-3.5.2 下面的会安装； Installing collected packages: numpy, pysam , HTSeq Successfully installed HTSeq-0.10.0 numpy-1.14.5 pysam-0.14.1 很明显 pysam-0.14.1 就被安装了两次。
3.5K20发布于 2018-07-27
来自专栏简说基因
生物信息学算法之Python实现|Rosalind刷题笔记：005 GC含量计算
TGGGAACCTGCGGGCAGTAGGTGGAAT 示例结果 Rosalind_0808 60.919540 Python 实现 Computing_GC_Content.py import sys import pysam return (s.count('G') + s.count('C')) * 100 / len(s) def max_gc_content(infasta): dna = {} with pysam.FastxFile item = max_gc_content('rosalind_gc.txt') print(item[0]) print(gc_content(item)) 本题要点：用 pysam 读取 Fasta 文件，并将其放入字典中；详细用法见：基因组文件读写（pysam） max 函数的使用，特别是为其构造一个 key 函数并传入，这是解本题的关键，GC 含量本身是很容易理解的。
1.5K20发布于 2020-12-14
来自专栏生物信息学、python、R、linux
10X bam文件按barcode分割
## OUTPUT: .bam file for each unique barcode, best to make a new directory ### Python 3.6.8 import pysam hold barcode index CB_hold = 'unset' itr = 0 # read in upsplit file and loop reads by line samfile = pysam.AlignmentFile =0): split_file.close() CB_hold = CB_itr itr+=1 split_file = pysam.AlignmentFile
2.8K20发布于 2020-08-12
来自专栏简说基因
生物信息学算法之Python实现|Rosalind刷题笔记：010 DNA一致性序列计算
2 0 6 1 G: 1 1 6 3 0 1 0 0 T: 1 5 0 0 0 1 1 6 Python 实现 Consensus_and_Profile.py import sys import pysam def consensus(infasta): # Create profile matrix base = 'ACGT' profile = [] with pysam.FastxFile
1K20发布于 2020-12-15
来自专栏生信技能树
CPAT和CPC2软件安装报错的思考
uninstall python#卸载不符合要求的python 3.10.4 $ conda install python=3.7#通过conda指定安装3.7版本python $ cpat.py -h#报错缺pysam 查询得知：pysam是python的一种组件，并获得其安装代码。 $ conda install -c bioconda pysam #通过conda安装pysam $ cpat.py -h #打印帮助文档成功。
1.5K30编辑于 2022-06-08
来自专栏简说基因
生物信息学算法之Python实现|Rosalind刷题笔记：011 DNA六框翻译
MLLGSFRLIPKETLIQVAGSSPCNLS M MGMTPRLGLESLLE MTPRLGLESLLE Python 实现 Open_Reading_Frames.py import sys import pysam main__': if not test(): print("six_frame_translate: Failed") sys.exit(1) with pysam.FastxFile
1.5K30发布于 2020-12-15
来自专栏北野茶缸子的专栏
workflow03-用snakemake制作比对及变异查找流程
plot-quals.py" py 脚本如下： import matplotlib matplotlib.use("Agg") import matplotlib.pyplot as plt from pysam ├── A.fastq │ ├── B.fastq │ └── C.fastq 下载： mamba create -n snakemake_wes_simple -y pysam py 脚本： import matplotlib matplotlib.use("Agg") import matplotlib.pyplot as plt from pysam import VariantFile
1.8K51编辑于 2022-07-07
来自专栏单细胞天地
velocyto的正确安装方法
install python mamba install numpy scipy cython numba matplotlib scikit-learn h5py click mamba install pysam Verifying transaction: done Executing transaction: done (pyvelo) rstudio ~ 3.再安装依赖包 $ mamba install pysam 00s) No change Transaction Prefix: /home/rstudio/miniconda3/envs/pyvelo Updating specs: - pysam download: 3 MB ───────────────────────────────────────────────────────────────────────── Finished pysam tar.gz (41 kB) |################################| 41 kB 376 kB/s Requirement already satisfied: pysam
6K40编辑于 2022-01-10
来自专栏科技记者
宏病毒组binning工具—— vRhyme教程
创建环境conda虚拟环境 conda create -c bioconda -n vRhyme python=3 networkx pandas numpy numba scikit-learn pysam Success (v1.0.2) numpy: Success (v1.21.5) numba: Success (v0.56.4) pandas: Success (v1.3.5) pysam version >= 1.0.0)、Numpy (version >= 1.17.0)、Scikit-learn (version >= 0.23.0)、Numba (version >= 0.50.0)、PySam
1.2K10编辑于 2025-02-28
来自专栏生信修炼手册
人生苦短，我用python
实际使用中最高频的使用场景内置标准库的使用，os, sys等等科学计算相关模块，numpy, scipy等数据可视化，matplotlib, seaborn等生信专用模块的学习，比如biopython, pysam
58120发布于 2020-05-07
来自专栏作图丫
rMATS进行差异可变剪切分析并可视化
需预先安装如下软件： 4）InstallPython 2.7.x and corresponding versions of NumPy and SciPy 5）Downloadand install pysam NumPy，SciPy，pysam 下面以python setup.py install安装的方式介绍Python模块的安装过程（需要预先安装setuptools模块）。 NumPy和pysam模块可以选择该安装方式。
11.8K43编辑于 2022-03-29
来自专栏EMQ 物联网
Python 插件虚拟环境支持
{ "version": "v1.0.0", "language": "python", "executable": "pysam.py", "virtualEnvType
63740编辑于 2023-03-07
来自专栏小明的数据分析笔记本
跟着Bioinformatics学数据分析:StainedGlass可视化展示基因组水平上的tandem repeat
R.yaml下 - pandas - numpy - numba - cooler - minimap2==2.18 - bedtools - samtools>=1.9 - pysam
91730编辑于 2023-08-23
来自专栏生信修炼手册
HiC-Pro实战详解
python yum install -y gcc gcc-c++ make yum install -y python2 python-devel python2-pip pip install pysam
2.8K10发布于 2019-12-19
课前准备----空间转录组微生物检测与分析
python3import pysamimport sysimport gzipdef read_cell_names1(pathseq_bam_file, write_bac): seqbam = pysam.AlignmentFile white_list: each_line = each_line.rstrip('\n') white_list_set.add(each_line) seqbam = pysam.AlignmentFile out_readname_cell_path,'w') unmap_cbub_fasta = open(unmap_cbub_fasta_file,'w') unmap_cbub_bam = pysam.AlignmentFile
34610编辑于 2024-07-18
来自专栏生信修炼手册
使用find_circ识别环状RNA
github.com/marvin-jens/find_circ/archive/v1.2.tar.gz tar xzvf v1.2.tar.gz 需要注意的是，该软件是基于python2的语法开发的，依赖pysam
1.7K30发布于 2019-12-19

第 2 页

使用pysam操作BAM文件

python处理bamsam文件利器pysam

生物信息基础：基因组文件读写（pysam）

学python：使用python的pysam模块统计bam文件中spliced alignment的reads的数量

使用pip安装python包的时候会多次安装依赖包

生物信息学算法之Python实现|Rosalind刷题笔记：005 GC含量计算

10X bam文件按barcode分割

生物信息学算法之Python实现|Rosalind刷题笔记：010 DNA一致性序列计算

CPAT和CPC2软件安装报错的思考

生物信息学算法之Python实现|Rosalind刷题笔记：011 DNA六框翻译

workflow03-用snakemake制作比对及变异查找流程

velocyto的正确安装方法

宏病毒组binning工具—— vRhyme教程

人生苦短，我用python

rMATS进行差异可变剪切分析并可视化

Python 插件虚拟环境支持

跟着Bioinformatics学数据分析:StainedGlass可视化展示基因组水平上的tandem repeat

HiC-Pro实战详解

课前准备----空间转录组微生物检测与分析

使用find_circ识别环状RNA

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐