Research Article | | Peer-Reviewed

Integrative GWAS and CRISPR Functional Validation of Gene Variant Structure of AtHKT1; 1 in KCl

Received: 1 April 2026     Accepted: 11 April 2026     Published: 25 April 2026
Views:       Downloads:
Abstract

Understanding ion transport mechanisms in plants is crucial for enhancing stress tolerance and crop yields. The Arabidopsis thaliana High-Affinity Potassium Transporter 1; 1 (AtHKT1; 1) is vital for maintaining sodium and potassium balance under saline stress. However, the structural and functional effects of genetic variants in AtHKT1; 1, especially in potassium chloride (KCl) environments, are not fully understood. This research combiness Genome-Wide Association Studies (GWAS) with CRISPR-based functional validation to examine AtHKT1; 1 gene variants, focusing on the protein structure with PDB ID: 8W9O. The study first used GWAS and SNP discovery to find significant genetic variations linked to ion transport and salt tolerance. Candidate SNPs were selected based on their statistical significance and potential biological roles. Structural analysis of the protein 8W9O involved PDB and MMDB resources, with validation through ERRAT and visualization/mutation mapping in PyMOL. InterProScan was used to identify conserved functional motifs. To validate SNP effects, CRISPR guide RNAs were designed with E-CRISP and CHOPCHOP, targeting key gene regions for precise editing. This integrated approach linked genetic variations to structural changes and their potential impact on ion binding and transport, especially under KCl conditions. Results showed certain SNPs cause conformational shifts in key transmembrane regions of AtHKT1; 1, possibly influencing ion selectivity and transport efficiency. Structural validation confirmed the accuracy of the modeled variants, and domain analysis revealed disruptions in conserved functional areas. CRISPR strategies proved feasible for precise gene editing to test these functional hypotheses. Overall, this study offers a comprehensive framework that combines GWAS, structural bioinformatics, and CRISPR technology to explore how genetic variants affect AtHKT1; 1 function. These insights improve understanding of ion transport regulation in saline environments and support the development of genetically modified crops with enhanced salt tolerance.

Published in Computational Biology and Bioinformatics (Volume 14, Issue 1)
DOI 10.11648/j.cbb.20261401.13
Page(s) 26-40
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2026. Published by Science Publishing Group

Keywords

Arabidopsis Thaliana, GWAS, SNPs, CRISPR-Cas9, Ion Transport, KCl Stress, Salt Tolerance Mechanism

1. Introduction
Soil salinity represents a significant abiotic stress that hampers crop productivity globally. In Arabidopsis thaliana, the AtHKT1; 1 gene encodes a sodium transporter essential for maintaining ion balance in saline environments. HKT1 (High-affinity K+ Transporter) proteins are membrane-bound ion channels or transporters primarily found in plants. They play a vital role in maintaining ion homeostasis, especially by regulating the long-distance transport of sodium (Na+) ions from roots to shoots, thereby helping the plant withstand salt stress. Natural variation in AtHKT1; 1 alleles significantly affects plant salt tolerance, but validating the functions of these variants has been a slow and resource-intensive process .
Genome-Wide Association Studies (GWAS) have become a powerful tool for identifying genetic loci linked to complex traits such as salt tolerance. At the same time, CRISPR/Cas genome editing allows for precise modification and validation of gene function. Despite the existence of both technologies, a combined computational and experimental workflow connecting GWAS-derived SNPs to CRISPR-based functional assays for AtHKT1; 1 has not been documented.
This invention offers an integrated GWAS–CRISPR bioinformatics and experimental pipeline that aims to (1) identify significant single-nucleotide polymorphisms (SNPs) in AtHKT1; 1, (2) prioritize variants based on functional and structural scores, and (3) design CRISPR guide RNAs (gRNAs) targeting crucial loci for in vitro and in planta validation. The system is scalable and can be adapted to orthologous genes in other crops for stress-resilience breeding.
2. Materials and Methodology
2.1. Protein Visualization
The 3D structure of the AtHKT1; 1 protein was downloaded from the PDB under PDB ID 8W9O, and visualisation was performed using Rasmol and PyMOL. The visualisation parameter included the total number of hydrogen bonds, amino acid residues, C and N terminal points, attached ligand position, total number of residues, peptide chains and secondary structures. The Protein Data Bank (PDB) is a key repository for 3D structures of biological molecules like proteins and nucleic acids. PyMOL, an open-source tool, is popular for detailed 3D molecular visualisation and analysis, while RasMol offers quick, accessible visualisation of molecular structures, widely used in education and research .
2.2. Sequence Retrieval
The amino acid and nucleotide sequences of AtHKT1; 1 were retrieved from the NCBI database using the MMDB (molecular modelling database) tool under NCBI IDs: NP_567354.1 and NM_117099.6, respectively .
2.3. Protein Quality Check
The structural quality of the model was evaluated using ERRAT, a web-based validation tool. ERRAT assesses the statistics of non-bonded atom–atom interactions in protein models by comparing them to dependable high-resolution crystallographic data. The Overall Quality Factor (OQF) shows the percentage of residues with acceptable interaction patterns — values above 50% are generally acceptable, while those above 80% indicate high-quality models .
Validation was performed separately for:
1) Chain A of AtHKT1; 1 (8W9O_A)
2) Chain B of AtHKT1; 1 (8W9O_B)
2.4. Protein Domain Functional Analysis
Annotate AtHKT1; 1 using GO and InterProScan to identify membrane-spanning regions and ion transporter domains.
2.5. GWAS Analysis
1) Gather SNPs from the GWAS database accessible in TAIR.
2) Filter for high-confidence SNPs (p ≤ 1×10-5) situated within the AtHKT1; 1 locus.
2.6. Variant Annotation and Functional Prioritization
Map significant SNPs to coding regions and domains involved in ion selectivity filters or transmembrane helices.
2.7. CRISPR gRNA Design and Scoring
Flanking sequences of prioritized SNPs were retrieved for gRNA design. gRNAs were created using E-Crisp and validated in silico with Cas-OFFinder. A Functional Weight–Based CRISPR Scoring Algorithm (FWCSA) was developed to combine: ◦ gRNA specificity (0–40 points) ◦ Predicted on-target efficiency (0–30 points) ◦ Functional SNP weight (0–30 points). The highest-scoring gRNAs were selected for in vitro synthesis .
2.8. Statistical Analysis
All statistical analyses related to amino acid, allelic variation source and SNP calculation were performed through R Studio .
3. Results and Discussion
3.1. 8W9O Structural Visualization
The 3D structure of protein 8W9O, classified as a sodium transporter (HKT1), was retrieved from the Protein Data Bank (PDB) and analyzed using RASWIN and PYMOL software. The structure, resolved at 2.8 Å using electron microscopy, meets the structural stability criteria. Protein composition and structure details has been provided in Table 1 and Figure 1A-1F:
Table 1. Raswin visualization summary.

Molecule name

Sodium Transporter HKT1

Classification

Transport Protein

Secondary Structure

PDB Data Records

Database Code

8W9O

Experiment Technique

Electron Microscopy

Number of Chains

4

Number of Groups

854 (4)

Number of Total Atoms

6862 (4)

Number of Carbon atoms

4570

Number of Oxygen atoms

1190

Number of Nitrogen atoms

1058

Number of Hydrophobic atoms

3780

Number of Helix atoms

4964

Number of Sheet atoms

82

Number of Backbone atoms

3416

Number of Sidechain atoms

3442

Number of Bonds

7032

Number of Helices

43

Number of Strands

4

Number of Turns

0

Number of H-Bonds

588

Number of water molecule

0

Protein type

Hydrophobic Protein (non-polar - 3572 atoms)

Maximum Amino Acid Residue

Leucine (976 atoms)

Minimum Amino Acid Residue

Cysteine (96 atoms)

Figure 1. 8W9O 3-D structure visualization by Raswin (A) Displays two protein units (A in blue, B in green), each centered with 2 potassium (K+) ions in red. (B) Highlights termini—N-termini (blue) and C-termini (red)—along with varied colors indicating different helix chains (C) Shows the complex secondary and tertiary structures, including alpha-helices and beta-sheets (helix - magenta, sheets - yellow, and turn - grey) (D) Reveals the amino acid composition, with Leucine (olive green) as the most abundant residue (976 atoms), and Cysteine as the least (96 atoms) (E) Illustrates the active site residues within 5A0 distance from ligand, dominated by light purple (Valine) and light salmon (Asparagine) color (F) Displays the surface atoms with standard atomic coloring: Carbon (grey), Nitrogen (blue), Oxygen (red), Sulfur (yellow).
The structural analysis of the HKT1 sodium transporter (8W9O) reveals significant features essential for its functional role in ion transport. The dominance of hydrophobic atoms (3780 out of 6862) and high Leucine content (976 atoms) suggests a membrane-embedded nature, common in transport proteins. This supports the protein’s classification as hydrophobic and non-polar, which likely facilitates its integration and stability within the lipid bilayer.
The absence of water molecules further aligns with a membrane environment, where water exclusion is a structural necessity. The presence of 43 α-helices significantly outweighs the β-strands and implies that the protein follows a helix-rich architecture, typical for channel-forming and transporter proteins that span membranes.
Visualization (Figure 1) provides valuable spatial orientation, highlighting the tetrameric nature (chains A and B visible, likely repeated to 4 chains) and central K+ ion coordination, indicative of the channel’s functional core. The N- and C-terminal orientation and complex folding suggest precise regulatory and functional mechanisms. Additionally, active site residues dominated by Methionine, Tyrosine, and Glutamine imply specific binding and transport roles, possibly related to ion selectivity and gating.
Lastly, the extensive hydrogen bonding network (588 bonds) and strong van der Waals forces underscore the structural integrity and compactness, which are critical for resisting denaturation and maintaining activity under varying physiological conditions. No water molecules were observed in the crystal structure, thus assuring the hydrophobic nature of the sample protein. At 5 A0 distance (Figure 1E) the K+ ion (ligand) is surrounded by glutamic acid (dark brown), cysteine (medium yellow), asparagine (light salmon), alanine (medium green), glycine (white), valine (light purple), lysine (royal blue), serine (dark orange), threonine (medium orange), and aspartic acid/asparagine, glutamic acid/glutamine, pyroglutamic acid, hydroxyproline (medium purple).
3.2. Retrieved Sequences
The amino acid sequence of 8W9O protein was downloaded using MMDB tool, Figure 2 and proceeded further for blast P, multiple sequence alignment (MSA).
>pdb|8W9O|A Chain A, Sodium transporter HKT1
MDRVVAKIAKIRSQLTKLRSLFFLYFIYFLFFSFLGFLALKITKPRTTSRPHDFDLFFTSVSAITVSSMSTVDMEVFSNTQLIFLTILM
FLGGEIFTSFLNLYVSYFTKFVFPHNKIRHILGSYNSDSSIEDRCDVETVTDYREGLIKIDERASKCLYSVVLSYHLVTNLVGSVLLLV
YVNFVKTARDVLSSKEISPLTFSVFTTVSTFANCGFVPTNENMIIFRKNSGLIWLLIPQVLMGNTLFPCFLVLLIWGLYKITKRDEYGY
ILKNHNKMGYSHLLSVRLCVLLGVTVLGFLIIQLLFFCAFEWTSESLEGMSSYEKLVGSLFQVVNSRHTGETIVDLSTLSPAILVLFIL
MMYLPPYTLFMPLTEQKTIEKEGGDDDSENGKKVKKSGLIVSQLSFLTICIFLISITERQNLQRDPINFNVLNITLEVISAYGNVGFTT
GYSCERRVDISDGGCKDASYGFAGRWSPMGKFVLIIVMFYGRFKQFTAKSGRAWILYPSSS
>pdb|8W9O|B Chain B, Sodium transporter HKT1
MDRVVAKIAKIRSQLTKLRSLFFLYFIYFLFFSFLGFLALKITKPRTTSRPHDFDLFFTSVSAITVSSMSTVDMEVFSNTQLIFLTIL
MFLGGEIFTSFLNLYVSYFTKFVFPHNKIRHILGSYNSDSSIEDRCDVETVTDYREGLIKIDERASKCLYSVVLSYHLVTNLVGSVLL
LVYVNFVKTARDVLSSKEISPLTFSVFTTVSTFANCGFVPTNENMIIFRKNSGLIWLLIPQVLMGNTLFPCFLVLLIWGLYKITKRDE
YGYILKNHNKMGYSHLLSVRLCVLLGVTVLGFLIIQLLFFCAFEWTSESLEGMSSYEKLVGSLFQVVNSRHTGETIVDLSTLSPAILV
LFILMMYLPPYTLFMPLTEQKTIEKEGGDDDSENGKKVKKSGLIVSQLSFLTICIFLISITERQNLQRDPINFNVLNITLEVISAYGN
VGFTTGYSCERRVDISDGGCKDASYGFAGRWSPMGKFVLIIVMFYGRFKQFTAKSGRAWILYPSSS
Figure 2. Presenting AtHKT1; 1 amino acid sequences (8W9O) with protein chain A and B separately and mRNA sequence (NM_117099.6).
Both the protein units (A and B) found identical, thus, for further analysis only protein chain A was considered.
3.3. Structural Validation
The ERRAT analysis of both AtHKT1; 1 subunits (chain A and B), Figure 3, yielded an Overall Quality Factor (OQF) of 83.39, indicating that 83.39% of residues fall within the confidence interval of well-resolved, high-quality protein models.
1) Chain A (8W9O_A): OQF = 83.39
2) Chain B (8W9O_B): OQF = 83.39
This high score confirms that the AtHKT1; 1 structure possesses reliable stereochemical parameters and minimal non-bonded interaction errors. According to ERRAT standards, models scoring above 80% are considered structurally sound for computational and functional analyses.
The high OQF and complementary validation results indicate that both chains A and B exhibit stable tertiary conformations, with well-packed transmembrane helices essential for ion transport. The consistent ERRAT score across both chains reflects structural symmetry and accurate modeling of monomeric subunits.
Given the biological function of AtHKT1; 1 as a Na+/K+ symporter, maintaining structural integrity within the transmembrane domains is critical. The ERRAT results confirm that the residues involved in cation binding and selectivity — especially in pore-lining regions — are modeled correctly, enabling reliable in silico functional annotation, variant mapping, and CRISPR target site localization.
Validated structural integrity ensures that downstream computational approaches (e.g., molecular docking of Na+/K+ ions, SNP-induced conformational perturbation analysis, and CRISPR gRNA off-target modeling) are based on a robust 3D framework. This provides confidence that predicted SNP effects and CRISPR modifications within AtHKT1; 1 accurately reflect functional consequences on ion transport behaviour.
Figure 3. ERRAT structural validation of AtHKT1; 1 (8W9O) protein.
3.4. Function Analysis
The Arabidopsis thaliana high-affinity potassium transporter (AtHKT1; 1) protein corresponding to PDB ID: 8W9O was analyzed using InterProScan to identify conserved domains, family relationships, and functional motifs (Figure 4).
The top horizontal bar in the annotation figure represents the full-length AtHKT1; 1 sequence comprising 506 amino acid residues, consistent with previous reports for the HKT1-type Na+/K+ co-transporter.
InterProScan identified strong sequence similarity with potassium transport protein DDB_G0292412-related entries and multiple hits within potassium ion transport families, confirming that AtHKT1; 1 is part of the TrkH/Trk system of ion transporters. The detection of TrkH and TrkA-N domains indicates an evolutionary and functional connection with bacterial and plant cation transport systems, which are involved in high-affinity potassium uptake and sodium exclusion. This classification reinforces the transporter’s key role in cation homeostasis, especially in maintaining cytosolic K+ levels during salinity stress.
The functional domain analysis identified conserved regions responsible for:
1) Ion transport and selectivity filtering,
2) Cation binding and translocation, and
3) Conformational changes linked to channel gating.
These domains relate to TrkH-like transmembrane segments typical of K+/Na+ symporters. The predicted ion-transport domain spans the membrane, allowing selective movement of potassium ions across the lipid bilayer.
The lower panel of the InterProScan output highlights distinct transmembrane (TM) regions, separated by cytoplasmic and non-cytoplasmic loops. It predicts about 10–12 transmembrane helices using combined algorithms (TMHMM, Phobius, and Pfam), supporting the protein’s role as an integral membrane component. This alternating pattern—cytoplasmic, transmembrane, extracellular—is typical of ion channel proteins, which enable selective K+ entry and Na+ exclusion via conformational gating. The multiple TM regions correspond with crystallographic data from PDB structure 8W9O, showing two homologous monomers forming the active transporter. Each domain and region identified by different prediction tools is color-coded in the annotation map.
1) Brown: Potassium transporter (TrkH family)
2) Blue/Green: Transmembrane helices
3) Purple/Yellow: Signal peptides and cytoplasmic loops
This shared domain color pattern indicates strong agreement among tools, confirming the reliability of the functional annotation and supporting AtHKT1; 1 as a TrkH-type K+ transporter. Based on conserved domain matches and family profiles, InterProScan assigned the following Gene Ontology (GO) terms to AtHKT1; 1:
Table 2. Gene Ontology terms and Descriptions.

GO Category

GO Term

Description

Molecular Function (MF)

GO: 0015079

Potassium ion transmembrane transporter activity

Biological Process (BP)

GO: 0006813

Potassium ion transport

Cellular Component (CC)

GO: 0005886

Plasma membrane

Figure 4. Functional validation of AtHKT1; 1 (8W9O) protein by InterProScan.
These GO terms collectively describe the function of AtHKT1; 1 as a plasma membrane-localized ion transporter that mediates K+ flux and salt tolerance through controlled ion translocation. Domain and GO-based annotations confirm that AtHKT1; 1 is crucial in:
1) Potassium uptake and distribution across plant tissues,
2) Sodium exclusion and detoxification under salinity stress, and
3) Maintaining osmotic balance and membrane potential through controlled ion transport.
The strong correspondence between predicted domains, transmembrane topology, and GO functions reinforces the functional integrity of the 8W9O model, which was previously validated structurally (ERRAT score 83.39). These insights are crucial for identifying SNP variants within functional motifs that may alter ion selectivity, and thus, for selecting CRISPR editing targets aimed at improving salinity tolerance in crops.
3.5. CRISPR gRNA Design
Using the E-CRISP platform, potential guide RNA (gRNA) sites were designed for the Arabidopsis thaliana high-affinity potassium transporter 1 (AtHKT1; 1, NM_117099.6) gene of length 1930 bp. The analysis yielded 42 possible gRNA candidates, of which:
1) 10 designs were successfully validated for specific target regions,
2) 35 designs matched unique targets within AtHKT1; 1,
3) 4 designs were excluded due to nucleotide composition outside acceptable GC limits,
4) 3 designs were excluded because they contained the sequence motif “TTTT”, which negatively affects gRNA stability, and
5) 25 designs were filtered out since the maximum limit of designs per exon was exceeded.
The top validated gRNAs targeted regions across the AtHKT1; 1 transcript (AT4G10310) with unique hits (1 per target) and high Specificity–Annotation–Efficiency (SAE) scores (Figure 5). Each selected gRNA contained the Cas9 PAM motif (NGG), enabling recognition by SpCas9, the most commonly used endonuclease. As illustrated in Figure 6, gRNA target sites were distributed throughout the AtHKT1; 1 gene. The visual representation along the 0–2 kb coordinate axis revealed two main clusters:
1) 5′ Region (0–1 kb): gRNAs 3_0, 4_0, 10_0, and 12_0 were located near the 5′ end of the transcript, corresponding to exon 1 and the nearby coding domain. These areas are important for transcription regulation and N-terminal domain development.
2) Mid-to-3′ Region (1.1–2 kb): gRNAs 25_0, 26_0, 27_0, 29_0, and 33_0 were positioned in the mid to downstream region, corresponding to transmembrane and C-terminal regions critical for Na+/K+ transport specificity.
Figure 5. CRISPR/Cas9 guide RNA (gRNA) candidates targeting AtHKT1; 1 (AT4G10310) identified through in silico analysis. Each row represents a unique gRNA sequence aligned to the AtHKT1; 1 coding region, along with corresponding specificity, activity, and efficiency (SAE) scores. All designed gRNAs show a single on-target hit within the AtHKT1; 1 locus, indicating high target specificity suitable for functional validation of allelic variants associated with salt tolerance in Arabidopsis thaliana.
Figure 6. Graphical representation of CRISPR/Cas9 target sites designed within the AtHKT1; 1 (AT4G10310) gene of Arabidopsis thaliana. The blue bar denotes the AtHKT1; 1 mRNA sequence (accession: NM_117099.6), while green boxes indicate predicted CRISPR guide RNA (gRNA) binding positions along the coding region. Each labeled gRNA corresponds to specific target sites identified in silico, distributed across the 1 kb gene region. This map illustrates the spatial organization of selected high-specificity gRNAs for functional genome-editing validation of AtHKT1; 1 variants associated with salt tolerance.
This distribution indicates well-distributed coverage across the gene, allowing precise editing of both the regulatory and structural regions of AtHKT1; 1. Green markers on the E-CRISP map denote high-confidence gRNA sites with unique genome hits, indicating no significant off-target effects. Each gRNA was assessed using three combined parameters.
1) Specificity: Measures how uniquely the gRNA binds to the intended site with minimal off-target effects.
2) Annotation: Prioritizes gRNAs located in functionally relevant or domain-specific coding regions.
3) Efficiency: Predicts Cas9 cutting effectiveness, influenced by GC content (40–70%) and nucleotide composition.
The shortlisted gRNAs NM_117099.6_4_0, NM_117099.6_3_0, and NM_117099.6_10_0 showed higher SAE scores, indicating better target specificity and efficiency. Each designed target corresponds to a single unique hit, confirming that these sequences are unlikely to cause off-target edits in the Arabidopsis genome. The chosen CRISPR target sites are positioned within domains involved in ion selectivity and transport efficiency. Specifically:
1) 5′-region targets may alter expression regulation or translation initiation of AtHKT1; 1.
2) Mid/3′-region targets may modify amino acid residues within transmembrane helices, potentially affecting Na+ and K+ permeability.
Such targeted modifications can be used to generate allelic variants with improved salinity tolerance, allowing functional verification of AtHKT1; 1 SNPs identified through GWAS. Despite being computational, these predictions are experimentally actionable, forming the foundation for subsequent CRISPR-Cas9 editing and phenotypic validation in Arabidopsis or orthologous HKT1 genes of crop species.
GWAS and SNP Discovery
Figure 7. SNP density plot across the AtHKT1; 1 gene region on chromosome 4 (positions 6,391,800–6,395,900 bp) in Arabidopsis thaliana. Each bar represents the number of single-nucleotide polymorphisms (SNPs) per 100 bp window, illustrating uneven variant distribution with higher SNP clustering toward the 3′ end of the gene, indicative of localized mutational hotspots potentially associated with allelic diversity in salt tolerance.
Figure 8. Box plot showing INDEL size distribution in the AtHKT1; 1 gene region. The Y-axis represents the difference between ALT and REF sequence lengths (ALT–REF), where positive values indicate insertions and negative values indicate deletions. Most INDELs are small (±3 bp) with a median near zero, reflecting balanced insertion–deletion rates, while a few outliers represent rare larger events (~–12 bp).
Figure 9. Bar chart showing the frequency of codon usage in AtHKT1; 1 sequence.
Figure 10. Representing open reading frames detected in the HKT1 gene sequence. Y axis = ORF1 TO ORF_5, x-axix = Position on the DNA (Nucleotide coordinate), Color Gradient = Represents length of ORF (from short =dark blue to long=yellow).
Figure 11. Representing open reading frames detected in the HKT1 gene sequence. Y-axis (Reading frame) = Frame 1 (Red) is a single long continuous ORF, representing dominant long ORF from ~250 to ~1600nt, true coding sequence for HKT1 protein. Frame 2 = blue Multiple shorter ORF Scattered through sequence, might be non-coding, regulatory, or non-functional peptide.
Figure 12. Bar chart illustrating the amino acid frequency distribution in Reverse Frame 2 (RF2) of the AtHKT1; 1 sequence. The X-axis represents amino acids (single-letter codes), while the Y-axis indicates their frequency of occurrence. Distinct colors differentiate individual amino acids for visual clarity. The most abundant residues are leucine (L, 76), serine (S, 62), isoleucine (I, 57), and valine (V, 50), reflecting a predominance of hydrophobic and polar amino acids typically associated with transmembrane helices in ion transporter proteins.
Figure 13. Codon Usage heatmap displaying frequency of codon usage for respective amino acid. Y-axis = Amino acid with possibly one letter code, X-axis = Codon (Triplet sequence gaa, tgc) Color scale gradient (red) = indicate frequency of codon usage, Darker shades/red = higher frequency (dark red =high frequency =more use), lighter = lower usage (Less rare Usage). Each square = one codon for specific amino acid.
Figure 14. Horizontal bar graph illustrates AtHKT1; 1 gene structure including both strands translated results.
Figure 15. Bar graph describes SNP created by base substitution with major substitutions shown by: C > G, G > A, T > A, and T > C, dominated by transversion type.
The Genome-Wide Association Study (GWAS) analysis (Figures 7, 8, and 15) of the AtHKT1; 1 gene using R and Bioconductor packages identified significant nucleotide variations across the genomic region on chromosome 4 (positions 6,391,853–6,395,921 bp). The SNP density plot revealed a high concentration of polymorphic sites near the upstream and coding regions, suggesting selective evolutionary pressure associated with salinity adaptation in Arabidopsis thaliana. Most detected variants were single-nucleotide substitutions (C > G, G > A, T > A, and T > C), while insertions and deletions (INDELs) were relatively rare and predominantly small (±1–3 bp), as confirmed by the INDEL size distribution boxplot. This pattern aligns with the typical mutational landscape of stress-responsive transport genes, where smaller variants fine-tune gene regulation rather than disrupting protein function.
Codon usage analysis based on the HKT1 coding sequence showed strong bias toward AT-rich codons, with leucine (L), serine (S), and isoleucine (I) being the most frequent amino acids—characteristic of hydrophobic transmembrane proteins. The Open Reading Frame (ORF) mapping confirmed one dominant ORF spanning approximately 1.5 kb (Frame 1), corresponding to the functional Na+ transporter domain, while shorter secondary ORFs were likely noncoding or regulatory. Translation of the longest ORF produced a predicted polypeptide consistent with known AtHKT1; 1 protein length (~506 aa), supporting correct gene annotation and functional prediction (Figures 9, 10, 11, and 14).
The codon usage heatmap (Figure 13) revealed differential synonymous codon frequencies, suggesting potential translational optimization in salt-tolerant ecotypes. Combined SNP and codon bias information indicate that allelic variants in AtHKT1; 1 may influence expression efficiency or transporter activity under salt stress. Structurally, the amino acid composition (Figure 12) — rich in leucine, valine, and isoleucine—supports the presence of multiple hydrophobic transmembrane helices involved in selective Na+ transport and K+ homeostasis.
Overall, the R-based GWAS pipeline successfully delineated sequence diversity, coding structure, and mutational hotspots in AtHKT1; 1. The results highlight SNP-rich domains suitable for functional validation using CRISPR/Cas9 targeting. Such integrated computational evidence supports that natural allelic variation within AtHKT1; 1 contributes to phenotypic diversity in salinity tolerance. These findings provide a foundation for precision editing or allele mining in orthologous HKT1 genes of crop species to enhance ion homeostasis and salt-stress resilience .
4. Conclusions
The identification of SNP quantity, distribution, and origin within the AtHKT1; 1 gene locus revealed extensive genetic variability in the potassium ion transmembrane transporter, underscoring its role in plant adaptation to salinity stress. This information can be utilized for trait improvement, fine mapping, and linking allelic variants with phenotypic responses. The successful design of CRISPR/Cas9 guide RNAs provides specific genomic targets for precise editing, enabling the modulation of transporter activity to meet environmental demands. Conventional breeding for salinity tolerance is slow and constrained by complex polygenic interactions and environmental influences, whereas random mutagenesis and marker-assisted selection often lack the accuracy to pinpoint functional nucleotide changes. The present framework addresses these limitations by integrating GWAS-identified SNPs with functional domains of ion transporter genes and by establishing a domain-specific CRISPR design system that minimizes off-target edits. This approach supports the development of salt-tolerant, high-yielding varieties through targeted editing of HKT1; 1 orthologs and promotes commercial deployment of integrated GWAS–CRISPR tools for gene prioritization and gRNA design. The framework also enables predictive variant analysis and functional validation for both research and industry, advancing precision genome editing for stress resilience. Ultimately, the goal is to enhance ion homeostasis by optimizing HKT1; 1 alleles for improved K+ retention and reduced Na+ influx, implement data-driven genome editing guided by GWAS and structural insights, and establish a universal, crop-independent workflow to strengthen sustainable agriculture and food security under saline conditions .
Author Contributions
Shephali Sachan: Data Curation, Formal Analysis, Investigation, Methodology, Resources, Writing – original draft
Uma Kumari: Conceptualization, Formal Analysis, Software, Supervision, Writing – review & editing
Data Availability Statement
The data is available from the corresponding author upon reasonable request.
Conflicts of Interest
The authors declare no conflicts of interest.
References
[1] Baxter I, Brazelton JN, Yu D, et al. “A coastal cline in sodium accumulation in Arabidopsis thaliana is driven by natural variation of the sodium transporter AtHKT1; 1.” PLoS Genetics. 2010; 6(11): e1001193.
[2] D. An, et al. “AtHKT1 drives adaptation of Arabidopsis thaliana to salinity.” Nature Communications. 2017.
[3] Huang, L., Wu, D. Z., & Zhang, G. P. (2020). Advances in studies on ion transporters involved in salt tolerance and breeding crop cultivars with high salt tolerance. Journal of Zhejiang University-SCIENCE B, 21(6), 426-441.
[4] Angon, P. B., Mondal, S., Akter, S., Sakil, M. A., & Jalil, M. A. (2023). Roles of CRISPR to mitigate drought and salinity stresses on plants. Plant Stress, 8, 100169.
[5] Dr Uma kumari, Devanshi Gupta, In silico RNA aptamer drug design and modelling, 2022/4, Journal-JETIR, Volume-9, Issue-4, Pages 718-725 Dr Uma kumari, Devanshi Gupta, In silico RNA aptamer drug design and modelling, 2022/4, Journal-JETIR, Volume-9, Issue-4, Pages 718-725
[6] Kumari, Uma &Tanwar, Aastha& George, Jositta&Nayak, Daityari. (2023). NGS Analysis to Detect Mutation in Brain Tumor Diagnostic. International Journal for Research in Applied Science and Engineering Technology. 11.
[7] Craig PA, Michel LV, Bateman RC. A survey of educational uses of molecular visualization freeware. BiochemMolBiol Educ. 2013 May-Jun; 41(3): 193-205.
[8] Johri, V., Bandbe, T., Kumari, U. (2026). CRISPR-CAS 9 Modeling of ALK Resistance Mutations Harbouring the G1202R/L1196M. American Journal of BioScience, 14(1), 8-19.
[9] Bandbe, T., Johri, V., Kumari, U. (2025). Structure-guided Genome-wide Association Analysis of ALK Variants with GWAS Data Using R. Computational Biology and Bioinformatics, 13(2), 72-85.
[10] Uma Kumari Rechel Tirkey Vipasha Rathi “Engineering Probiotic Strains for Gut Health Enhancement Using CRISPR and Molecular Marker-assisted Technologies” Published in Computational Biology and Bioinformatics (Volume 14, Issue 1) Received: 18 December 2025 Accepted: 29 December 2025 Published: 19 January 2026;
[11] Uma Kumari, Sumita Katal, Shivangi Koundal; Structure-guided CRISPR CAS9 Targeting of ABL1 for Functional Disruption of BCR-ABL1 Fusion in Chronic Myeloid Leukaemia Publication, Computational Biology and Bioinformatics, Volume 14, Issue 1, 2026 Received: 10 January 2026 February, Accepted: 21 January 2026 Published: 4 February 2026.
[12] Wang, X., Zhang, Z. X., Wang, W. X., Li, S. T., & Wang, Y. X. (2024). Functional identification of CCR1 gene enhancing saline-alkali stress tolerance. BMC Plant Biology, 24, 215.
[13] Fussy, A., & Papenbrock, J. (2024). Molecular responses of Salicornia europaea to salinity stress. International Journal of Molecular Sciences, 25(9), 5021.
[14] Popova, L. G., Khramov, D. E., Nedelyaeva, O. I., & Volkov, V. S. (2023). Yeast heterologous systems for studying plant membrane transport proteins. International Journal of Molecular Sciences, 24(12), 9987.
[15] Sher, A., Nawaz, A., Ul-Allah, S., Sattar, A., & Manaf, A. (2024). Role of 5-aminolevulinic acid in improving salt tolerance in sunflower. Acta Physiologiae Plantarum, 46(2), 89.
Cite This Article
  • APA Style

    Sachan, S., Kumari, U. (2026). Integrative GWAS and CRISPR Functional Validation of Gene Variant Structure of AtHKT1; 1 in KCl. Computational Biology and Bioinformatics, 14(1), 26-40. https://doi.org/10.11648/j.cbb.20261401.13

    Copy | Download

    ACS Style

    Sachan, S.; Kumari, U. Integrative GWAS and CRISPR Functional Validation of Gene Variant Structure of AtHKT1; 1 in KCl. Comput. Biol. Bioinform. 2026, 14(1), 26-40. doi: 10.11648/j.cbb.20261401.13

    Copy | Download

    AMA Style

    Sachan S, Kumari U. Integrative GWAS and CRISPR Functional Validation of Gene Variant Structure of AtHKT1; 1 in KCl. Comput Biol Bioinform. 2026;14(1):26-40. doi: 10.11648/j.cbb.20261401.13

    Copy | Download

  • @article{10.11648/j.cbb.20261401.13,
      author = {Shephali Sachan and Uma Kumari},
      title = {Integrative GWAS and CRISPR Functional Validation of Gene Variant Structure of AtHKT1; 1 in KCl},
      journal = {Computational Biology and Bioinformatics},
      volume = {14},
      number = {1},
      pages = {26-40},
      doi = {10.11648/j.cbb.20261401.13},
      url = {https://doi.org/10.11648/j.cbb.20261401.13},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.cbb.20261401.13},
      abstract = {Understanding ion transport mechanisms in plants is crucial for enhancing stress tolerance and crop yields. The Arabidopsis thaliana High-Affinity Potassium Transporter 1; 1 (AtHKT1; 1) is vital for maintaining sodium and potassium balance under saline stress. However, the structural and functional effects of genetic variants in AtHKT1; 1, especially in potassium chloride (KCl) environments, are not fully understood. This research combiness Genome-Wide Association Studies (GWAS) with CRISPR-based functional validation to examine AtHKT1; 1 gene variants, focusing on the protein structure with PDB ID: 8W9O. The study first used GWAS and SNP discovery to find significant genetic variations linked to ion transport and salt tolerance. Candidate SNPs were selected based on their statistical significance and potential biological roles. Structural analysis of the protein 8W9O involved PDB and MMDB resources, with validation through ERRAT and visualization/mutation mapping in PyMOL. InterProScan was used to identify conserved functional motifs. To validate SNP effects, CRISPR guide RNAs were designed with E-CRISP and CHOPCHOP, targeting key gene regions for precise editing. This integrated approach linked genetic variations to structural changes and their potential impact on ion binding and transport, especially under KCl conditions. Results showed certain SNPs cause conformational shifts in key transmembrane regions of AtHKT1; 1, possibly influencing ion selectivity and transport efficiency. Structural validation confirmed the accuracy of the modeled variants, and domain analysis revealed disruptions in conserved functional areas. CRISPR strategies proved feasible for precise gene editing to test these functional hypotheses. Overall, this study offers a comprehensive framework that combines GWAS, structural bioinformatics, and CRISPR technology to explore how genetic variants affect AtHKT1; 1 function. These insights improve understanding of ion transport regulation in saline environments and support the development of genetically modified crops with enhanced salt tolerance.},
     year = {2026}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Integrative GWAS and CRISPR Functional Validation of Gene Variant Structure of AtHKT1; 1 in KCl
    AU  - Shephali Sachan
    AU  - Uma Kumari
    Y1  - 2026/04/25
    PY  - 2026
    N1  - https://doi.org/10.11648/j.cbb.20261401.13
    DO  - 10.11648/j.cbb.20261401.13
    T2  - Computational Biology and Bioinformatics
    JF  - Computational Biology and Bioinformatics
    JO  - Computational Biology and Bioinformatics
    SP  - 26
    EP  - 40
    PB  - Science Publishing Group
    SN  - 2330-8281
    UR  - https://doi.org/10.11648/j.cbb.20261401.13
    AB  - Understanding ion transport mechanisms in plants is crucial for enhancing stress tolerance and crop yields. The Arabidopsis thaliana High-Affinity Potassium Transporter 1; 1 (AtHKT1; 1) is vital for maintaining sodium and potassium balance under saline stress. However, the structural and functional effects of genetic variants in AtHKT1; 1, especially in potassium chloride (KCl) environments, are not fully understood. This research combiness Genome-Wide Association Studies (GWAS) with CRISPR-based functional validation to examine AtHKT1; 1 gene variants, focusing on the protein structure with PDB ID: 8W9O. The study first used GWAS and SNP discovery to find significant genetic variations linked to ion transport and salt tolerance. Candidate SNPs were selected based on their statistical significance and potential biological roles. Structural analysis of the protein 8W9O involved PDB and MMDB resources, with validation through ERRAT and visualization/mutation mapping in PyMOL. InterProScan was used to identify conserved functional motifs. To validate SNP effects, CRISPR guide RNAs were designed with E-CRISP and CHOPCHOP, targeting key gene regions for precise editing. This integrated approach linked genetic variations to structural changes and their potential impact on ion binding and transport, especially under KCl conditions. Results showed certain SNPs cause conformational shifts in key transmembrane regions of AtHKT1; 1, possibly influencing ion selectivity and transport efficiency. Structural validation confirmed the accuracy of the modeled variants, and domain analysis revealed disruptions in conserved functional areas. CRISPR strategies proved feasible for precise gene editing to test these functional hypotheses. Overall, this study offers a comprehensive framework that combines GWAS, structural bioinformatics, and CRISPR technology to explore how genetic variants affect AtHKT1; 1 function. These insights improve understanding of ion transport regulation in saline environments and support the development of genetically modified crops with enhanced salt tolerance.
    VL  - 14
    IS  - 1
    ER  - 

    Copy | Download

Author Information
  • Abstract
  • Keywords
  • Document Sections

    1. 1. Introduction
    2. 2. Materials and Methodology
    3. 3. Results and Discussion
    4. 4. Conclusions
    Show Full Outline
  • Author Contributions
  • Data Availability Statement
  • Conflicts of Interest
  • References
  • Cite This Article
  • Author Information