| Peer-Reviewed

Topic Modeling of Environmental Data on Social Networks Based on ED-LDA

Received: 24 April 2018    Accepted: 21 June 2018    Published: 23 July 2018
Views:       Downloads:
Abstract

The rapid development in information technology and web technology has facilitated an extreme increase in the collection and storage of digital data. With the development of environmental online monitoring science and internet technology development, more and more environmental data are stored on the Internet and shared by people on social networks. Therefore, there is a growing interest in automatically identifying environmental factors and environmental big data mining that contribute to public environmental risks, such as mining water quality problem, air pollution problem, soil problem on internet. Better understanding of these factors and analysis data will enable more precise prediction of the location and time of high risk events for environmental management. These environmental data from social networks by using WebCrawler in Twitter, Early work research on environmental data analysis focused more on specific filed analysis for traditional data without consider data relationships and data structure on social networks. The traditional environmental data analysis methods have been studied well, but no algorithms are designed for analysis environmental data on social networks. In this paper, this research propose a novel probabilistic generative model based on LDA, it called ED-LDA algorithm model that algorithm model not only consider the traditional environmental data analysis method, but also include the environmental data relationship and structure to help us find out the useful information and analysis to mine the relationship between users and their posted environmental data on social network to better understand data meaning for environmental management. This research present a Gibbs sampling implementation for inference of our model, and find out the environmental data topic on twitters. Besides our model can be used to many other environmental context files. The experimental result shows that Comparing with the traditional LDA clustering algorithm ED-LDA method can effectively mine and classify environmental data. This method can be a powerful computational approach for clustering environmental data on internet.

Published in International Journal of Environmental Monitoring and Analysis (Volume 6, Issue 3)
DOI 10.11648/j.ijema.20180603.12
Page(s) 77-83
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2024. Published by Science Publishing Group

Keywords

ED-LDA, Probabilistic, Environmental Data, Social Network, Data Mining

References
[1] Robert C. Machine Learning, a Probabilistic Perspective [J]. Chance, 2014, 27(2):62-63.
[2] Boneschanscher M P, Evers W H, Geuchies J J, et al. Long-range orientation and atomic attachment of nanocrystals in 2D honeycomb superlattices [J]. Science, 2014, 344(6190):1377.
[3] Schwarz C. ldagibbs: A command for topic modeling in Stata using latent Dirichlet allocation [J]. Stata Journal, 2018, 18.
[4] Turney, Peter D, Pantel, et al. From frequency to meaning: vector space models of semantics [J]. Journal of Artificial Intelligence Research, 2010, 37(1):141-188.
[5] Xie L, Li G, Xiao M, et al. Novel classification method for remote sensing images based on information entropy discretization algorithm and vector space model [J]. Computers & Geosciences, 2016, 89(C):252-259.
[6] Hebballi V, Rojit V. Latent Semantic Analysis (LSA) based object recognition and clustering[C]// International Conference on Green Computing and Internet of Things. IEEE, 2016:416-421.
[7] Zhang M, Li P, Wang W. An index-based algorithm for fast on-line query processing of latent semantic analysis [J]. Plos One, 2017, 12(5):e0177523.
[8] Littman M L, Dumais S T, Landauer T K. Automatic Cross-Language Information Retrieval Using Latent Semantic Indexing [M]// Cross-Language Information Retrieval. Springer US, 1998:51-62.
[9] Wang H L, Sui D N. Latent Semantic Analysis for Text-Based Research [J]. Journal of Chongqing University, 2005.
[10] Hofmann T. Unsupervised Learning by Probabilistic Latent Semantic Analysis [J]. Machine Learning, 2001, 42(1-2):177-196.
[11] Wu X, Yan J, Liu N, et al. Probabilistic latent semantic user segmentation for behavioral targeted advertising[C]// ACM SIGKDD Workshop on Data Mining and Audience Intelligence for Advertising, Paris, France, June. DBLP, 2009:10-17.
[12] Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993-1022.
[13] Griffiths, T. L., & Steyvers, M. (2003). Prediction and semantic association. In Neural information processing systems 15. Cambridge, MA: MIT Press.
[14] Chae, B. K. (2015). Insights from hashtag# supplychain and Twitter Analytics: Considering Twitter and Twitter data for supply chain practice and research. International Journal of Production Economics, 165, 247-259.
[15] Wang H L, Sui D N. Latent Semantic Analysis for Text-Based Research [J]. Journal of Chongqing University, 2005.
Cite This Article
  • APA Style

    Lei Feng, Jose López, Li Feng, Sheng Zhang, Bormin Huang, et al. (2018). Topic Modeling of Environmental Data on Social Networks Based on ED-LDA. International Journal of Environmental Monitoring and Analysis, 6(3), 77-83. https://doi.org/10.11648/j.ijema.20180603.12

    Copy | Download

    ACS Style

    Lei Feng; Jose López; Li Feng; Sheng Zhang; Bormin Huang, et al. Topic Modeling of Environmental Data on Social Networks Based on ED-LDA. Int. J. Environ. Monit. Anal. 2018, 6(3), 77-83. doi: 10.11648/j.ijema.20180603.12

    Copy | Download

    AMA Style

    Lei Feng, Jose López, Li Feng, Sheng Zhang, Bormin Huang, et al. Topic Modeling of Environmental Data on Social Networks Based on ED-LDA. Int J Environ Monit Anal. 2018;6(3):77-83. doi: 10.11648/j.ijema.20180603.12

    Copy | Download

  • @article{10.11648/j.ijema.20180603.12,
      author = {Lei Feng and Jose López and Li Feng and Sheng Zhang and Bormin Huang and Fang Fang and Chongming Li},
      title = {Topic Modeling of Environmental Data on Social Networks Based on ED-LDA},
      journal = {International Journal of Environmental Monitoring and Analysis},
      volume = {6},
      number = {3},
      pages = {77-83},
      doi = {10.11648/j.ijema.20180603.12},
      url = {https://doi.org/10.11648/j.ijema.20180603.12},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ijema.20180603.12},
      abstract = {The rapid development in information technology and web technology has facilitated an extreme increase in the collection and storage of digital data. With the development of environmental online monitoring science and internet technology development, more and more environmental data are stored on the Internet and shared by people on social networks. Therefore, there is a growing interest in automatically identifying environmental factors and environmental big data mining that contribute to public environmental risks, such as mining water quality problem, air pollution problem, soil problem on internet. Better understanding of these factors and analysis data will enable more precise prediction of the location and time of high risk events for environmental management. These environmental data from social networks by using WebCrawler in Twitter, Early work research on environmental data analysis focused more on specific filed analysis for traditional data without consider data relationships and data structure on social networks. The traditional environmental data analysis methods have been studied well, but no algorithms are designed for analysis environmental data on social networks. In this paper, this research propose a novel probabilistic generative model based on LDA, it called ED-LDA algorithm model that algorithm model not only consider the traditional environmental data analysis method, but also include the environmental data relationship and structure to help us find out the useful information and analysis to mine the relationship between users and their posted environmental data on social network to better understand data meaning for environmental management. This research present a Gibbs sampling implementation for inference of our model, and find out the environmental data topic on twitters. Besides our model can be used to many other environmental context files. The experimental result shows that Comparing with the traditional LDA clustering algorithm ED-LDA method can effectively mine and classify environmental data. This method can be a powerful computational approach for clustering environmental data on internet.},
     year = {2018}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Topic Modeling of Environmental Data on Social Networks Based on ED-LDA
    AU  - Lei Feng
    AU  - Jose López
    AU  - Li Feng
    AU  - Sheng Zhang
    AU  - Bormin Huang
    AU  - Fang Fang
    AU  - Chongming Li
    Y1  - 2018/07/23
    PY  - 2018
    N1  - https://doi.org/10.11648/j.ijema.20180603.12
    DO  - 10.11648/j.ijema.20180603.12
    T2  - International Journal of Environmental Monitoring and Analysis
    JF  - International Journal of Environmental Monitoring and Analysis
    JO  - International Journal of Environmental Monitoring and Analysis
    SP  - 77
    EP  - 83
    PB  - Science Publishing Group
    SN  - 2328-7667
    UR  - https://doi.org/10.11648/j.ijema.20180603.12
    AB  - The rapid development in information technology and web technology has facilitated an extreme increase in the collection and storage of digital data. With the development of environmental online monitoring science and internet technology development, more and more environmental data are stored on the Internet and shared by people on social networks. Therefore, there is a growing interest in automatically identifying environmental factors and environmental big data mining that contribute to public environmental risks, such as mining water quality problem, air pollution problem, soil problem on internet. Better understanding of these factors and analysis data will enable more precise prediction of the location and time of high risk events for environmental management. These environmental data from social networks by using WebCrawler in Twitter, Early work research on environmental data analysis focused more on specific filed analysis for traditional data without consider data relationships and data structure on social networks. The traditional environmental data analysis methods have been studied well, but no algorithms are designed for analysis environmental data on social networks. In this paper, this research propose a novel probabilistic generative model based on LDA, it called ED-LDA algorithm model that algorithm model not only consider the traditional environmental data analysis method, but also include the environmental data relationship and structure to help us find out the useful information and analysis to mine the relationship between users and their posted environmental data on social network to better understand data meaning for environmental management. This research present a Gibbs sampling implementation for inference of our model, and find out the environmental data topic on twitters. Besides our model can be used to many other environmental context files. The experimental result shows that Comparing with the traditional LDA clustering algorithm ED-LDA method can effectively mine and classify environmental data. This method can be a powerful computational approach for clustering environmental data on internet.
    VL  - 6
    IS  - 3
    ER  - 

    Copy | Download

Author Information
  • Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences, Chongqing, China; Institute for Applied Microelectronics (IUMA), ULPGC, Las Palmas de G.C., Spain; School of Urban Construction and Environmental Engineering, Chongqing University, Chongqing, China

  • Institute for Applied Microelectronics (IUMA), ULPGC, Las Palmas de G.C., Spain

  • Chongqing Academe of Environmental Science, Chongqing, China

  • Chongqing Academe of Environmental Science, Chongqing, China

  • Institute for Applied Microelectronics (IUMA), ULPGC, Las Palmas de G.C., Spain

  • School of Urban Construction and Environmental Engineering, Chongqing University, Chongqing, China

  • Chongqing Academe of Environmental Science, Chongqing, China

  • Sections