No Description

Andreas Kopmann df837f9778 fixed accidental write to wrong log file 6 years ago
doc bfd1f56e7b Started to generate documentation of the scripts with Python docstrings and Sphinx 7 years ago
etc df837f9778 fixed accidental write to wrong log file 6 years ago
log 7fb235b19e Tuned author list for DTS setup 6 years ago
.gitignore 95e36f37d8 Added ufo.kit.edu site configuration + log file 7 years ago
README.md 5730f8a3fb Added Webers Scopus IDs and started affiliation splitting 6 years ago
ak_scopus.py bfd1f56e7b Started to generate documentation of the scripts with Python docstrings and Sphinx 7 years ago
ak_wordpress.py bfd1f56e7b Started to generate documentation of the scripts with Python docstrings and Sphinx 7 years ago
config.py.sample d04208d108 Fixed author id of Michele Caselle 6 years ago
create_scopus.sql e58e305e1d Initial version 7 years ago
scopus-update-database.py 0fa61ce682 Excluded the disabled publication already in SQL query 7 years ago
scopus_get_publications.py ad519b4511 Removed unnecessary message for new messages (no category now) 6 years ago
test-scopus.py b40eb84e14 Added script to remove unused publications from the database 7 years ago
test-scopus2.py 5730f8a3fb Added Webers Scopus IDs and started affiliation splitting 6 years ago
test-wp.py b388290e39 Extracted site specific configuration in config.py 7 years ago
test-wp2.py b388290e39 Extracted site specific configuration in config.py 7 years ago
update.sh 7fb235b19e Tuned author list for DTS setup 6 years ago

README.md

README scopus

Ak, 23.5.2017

Get information on publications of work groups from Elsevier's Scopus database for usage in websites. For each publication a post on a Wordpress CMS is created. Citations are mapped to Wordpress comments. The get-publication script is intended to run on a regualr basis (e.g. by cron).

Note: All scopus scripts run only with valid access to the Scopus database (e.g. from KIT LAN). The Scopus service is not public available.

Version history

Todo:

  • Add script to list all affiliations for one of the authors
  • Change author defintion to author id+ affiliation id If a author has been in two places, add a second author with the same id the other affilitation
  • Generate a sensible API documentation; Add a basic user documentation along with the API description
  • Add maintenance scripts, that check consittence of Scopus data and cache database; update of post categories; warning in case of inconsistencies
  • Add configuration for all IPE scientists (and setup IPE publication website)

Version 1.3, 23.5.17 (ak):

  • generated python inline documentation

Version 1.2, 24.4.17 (ak):

  • move complete configuration of author lists to config file
  • removed old list of Scopus author keys in my_scopus.py
  • added Philipp Lösel to the author list
  • added config management in etc
  • added log file

Version 1.1, 12.4.17 (ak):

  • added a second script to synchronize publication database and posts in wordpress
  • pushed repository to IPE GIT server
  • using markup for documentation
  • added configration file to become site indepandant

Version 1.0, 8.3.17 (ak):

  • initial version of a single script without any options It runs in 4 phases: get publiations for individual author groups, create posts, get all citations, create comments.
  • used with the test installation at the UFO server in March 2017

Content

readme.md       This file
config.py       Site dependant configuration file (in GIT as config.py.sample)
my_scopus.py	List of scopus author ids
ak_scopus.py	Functions to access scopus
ak_wordpress.py Functions to creates Wordpress posts + comments
scopus-get-publications.py  Script to query Scopus
scopus-update-database.py   Synchronize database and available Wordpress posts

test-scopus.py	Application with some functions to get publication entries
		Prints a list with some formatting
test-scopus2.py Example from one of the website, only one query
test-wp.py	Test script for access to the wordpress API	
test-wp2.py 	Test script for wordpress - only query, no modification

etc		Configuration files of different installations
info            Documentation, website, etc (not in GIT)
log		Log file use scopus-publications-<hostname>.log 

Usage

  1. Go to Scopus and retrieve the scopus author ids for the scientists in your group. Define the ids in etc/config-.py and group them.

  2. Create a symbolic link

    ln -s etc/config-<hostname>.py  config.py
    
  3. Select one of more author groups and define the list sc_workgroups in config.py Check the definition of database and wordpress installation.

  4. Execute scopus-get-publications.py. python -W ignore scopus-get-publications.py

  5. Note: The -W ignore flag might be necessary if the INSERT IGNORE causes warnings.

    Example run:

    ufo:~/scopus # python -W ignore scopus-get-publications.py 
    
    ***********************************************
    **** scopus-get-publications / 2017-03-27 *****
    ***********************************************
    
    === Update of publications for the author group: Computing
    Total number of publications: 54
    === Update of publications for the author group: X-ray Imaging
    Total number of publications: 39
    === Update of publications for the author group: Electronics
    Total number of publications: 132
    === Update of publications for the author group: Morphology
    Total number of publications: 21
    
    === Create posts for newly registered publication in scopus
    Nothing new found
    
    === Update citatation of all publication in the database
    Total number of publications is 281
    
    === Create comments for newly registered citations in scopus
    Number of new citations is 0
    
    Summary: (see also logfile /root/scopus/scopus-publications.log) 
    Date       = 2017-03-27 21:28:36.002624
    NPubs      = 281
    NNewPubs   = 0
    NCites     = 4699
    NNewCites  = 0
    Runtime    = 0:00:11.496362
    

    Further enhancements

    Todo:

    • Reprocessing of all post, if the format has changed E.g. add button with Email to author or a new category has been added
    • Query only the latest citations for each publications not all.
    • Store JSON-Data of all publications
    • Get bibliographic information for display at the web page of a reseach group like UFO or may be also later for the DTS program.
    • Handle wrong publications in scopus for author with same name
    • Automatically include reports and student thesis by bibtex definition and upload on a server!? This would have the nice effect, that all student work is organized systematically!!!

    Structure of the database

    Both tables keep the reference to the publications in Scopus and the Wordpress ids. With this information, reprocessing is possible (but not implemented now).

    Table publications:

    MariaDB [scopus]> describe publications;
    +--------------+--------------+------+-----+---------+----------------+
    | Field        | Type         | Null | Key | Default | Extra          |
    +--------------+--------------+------+-----+---------+----------------+
    | id           | int(11)      | NO   | PRI | NULL    | auto_increment |
    | scopusid     | varchar(255) | YES  | UNI | NULL    |                |
    | wpid         | int(11)      | YES  |     | NULL    |                |
    | citedbycount | int(11)      | YES  |     | NULL    |                |
    | citesloaded  | int(11)      | YES  |     | NULL    |                |
    | categories   | varchar(255) | YES  |     | NULL    |                |
    | doi          | varchar(255) | YES  |     | NULL    |                |
    | title        | varchar(255) | YES  |     | NULL    |                |
    | abstract     | text         | YES  |     | NULL    |                |
    | bibtex       | text         | YES  |     | NULL    |                |
    | ts           | datetime     | YES  |     | NULL    |                |
    | scopusdata   | text         | YES  |     | NULL    |                |
    | eid          | varchar(255) | YES  |     | NULL    |                |
    +--------------+--------------+------+-----+---------+----------------+
    

    Table citations:

    MariaDB [scopus]> describe citations;
    +--------------+--------------+------+-----+---------+----------------+
    | Field        | Type         | Null | Key | Default | Extra          |
    +--------------+--------------+------+-----+---------+----------------+
    | id           | int(11)      | NO   | PRI | NULL    | auto_increment |
    | scopusid     | varchar(255) | YES  |     | NULL    |                |
    | eid          | varchar(255) | YES  |     | NULL    |                |
    | wpid         | int(11)      | YES  | MUL | NULL    |                |
    | wpcommentid  | int(11)      | YES  |     | NULL    |                |
    | citedbycount | int(11)      | YES  |     | NULL    |                |
    | citesloaded  | int(11)      | YES  |     | NULL    |                |
    | categories   | varchar(255) | YES  |     | NULL    |                |
    | doi          | varchar(255) | YES  |     | NULL    |                |
    | scopusdata   | text         | YES  |     | NULL    |                |
    | title        | varchar(255) | YES  |     | NULL    |                |
    | abstract     | text         | YES  |     | NULL    |                |
    | bibtex       | text         | YES  |     | NULL    |                |
    | ts           | datetime     | YES  |     | NULL    |                |
    +--------------+--------------+------+-----+---------+----------------+
    

    Setup of scopus database in mysql

    create database scopus;
    
    CREATE USER 'scopus@localhost';
    grant all on scopus.* to 'scopus'@'localhost' identified by '$scopus$';
    
    # create tables
    mysql -u scopus -p scopus < create_scopus.sql
    

    Publications in Scopus

    Sometime (unfortunately quite often) a author id in Scopus is not unique but identifies several researchers with the same name. E.g. Michele Caselle (3 persons) Matthias Balzer (2).

    This case is currently handled manually by deleting all publications from the unknown authors. Might be possible to implement also a black list??

    Sample data from Scopus:

    {
        "abstracts-retrieval-response": {
            "authors": {
                "author": [
                    {
                        "@_fa": "true",
                        "@auid": "15076530600",
                        "@seq": "1",
                        "affiliation": {
                            "@href": "http://api.elsevier.com/content/affiliation/affiliation_id/60102538",
                            "@id": "60102538"
                        },
                        "author-url": "http://api.elsevier.com/content/author/author_id/15076530600",
                        "ce:given-name": "Suren",
                        "ce:indexed-name": "Chilingaryan S.",
                        "ce:initials": "S.",
                        "ce:surname": "Chilingaryan",
                        "preferred-name": {
                            "ce:given-name": "Suren",
                            "ce:indexed-name": "Chilingaryan S.",
                            "ce:initials": "S.",
                            "ce:surname": "Chilingaryan"
                        }
                    },
                    {
                        "@_fa": "true",
                        "@auid": "35313939900",
                        "@seq": "2",
                        "affiliation": {
                            "@href": "http://api.elsevier.com/content/affiliation/affiliation_id/60102538",
                            "@id": "60102538"
                        },
                        "author-url": "http://api.elsevier.com/content/author/author_id/35313939900",
                        "ce:given-name": "Andreas",
                        "ce:indexed-name": "Kopmann A.",
                        "ce:initials": "A.",
                        "ce:surname": "Kopmann",
                        "preferred-name": {
                            "ce:given-name": "Andreas",
                            "ce:indexed-name": "Kopmann A.",
                            "ce:initials": "A.",
                            "ce:surname": "Kopmann"
                        }
                    },
                    {
                        "@_fa": "true",
                        "@auid": "56001075000",
                        "@seq": "3",
                        "affiliation": {
                            "@href": "http://api.elsevier.com/content/affiliation/affiliation_id/60032633",
                            "@id": "60032633"
                        },
                        "author-url": "http://api.elsevier.com/content/author/author_id/56001075000",
                        "ce:given-name": "Alessandro",
                        "ce:indexed-name": "Mirone A.",
                        "ce:initials": "A.",
                        "ce:surname": "Mirone",
                        "preferred-name": {
                            "ce:given-name": "Alessandro",
                            "ce:indexed-name": "Mirone A.",
                            "ce:initials": "A.",
                            "ce:surname": "Mirone"
                        }
                    },
                    {
                        "@_fa": "true",
                        "@auid": "35277157300",
                        "@seq": "4",
                        "affiliation": {
                            "@href": "http://api.elsevier.com/content/affiliation/affiliation_id/60102538",
                            "@id": "60102538"
                        },
                        "author-url": "http://api.elsevier.com/content/author/author_id/35277157300",
                        "ce:given-name": "Tomy",
                        "ce:indexed-name": "Dos Santos Rolo T.",
                        "ce:initials": "T.",
                        "ce:surname": "Dos Santos Rolo",
                        "preferred-name": {
                            "ce:given-name": "Tomy",
                            "ce:indexed-name": "Dos Santos Rolo T.",
                            "ce:initials": "T.",
                            "ce:surname": "Dos Santos Rolo"
                        }
                    },
                    {
                        "@_fa": "true",
                        "@auid": "35303862100",
                        "@seq": "5",
                        "affiliation": {
                            "@href": "http://api.elsevier.com/content/affiliation/affiliation_id/60102538",
                            "@id": "60102538"
                        },
                        "author-url": "http://api.elsevier.com/content/author/author_id/35303862100",
                        "ce:given-name": "Matthias",
                        "ce:indexed-name": "Vogelgesang M.",
                        "ce:initials": "M.",
                        "ce:surname": "Vogelgesang",
                        "preferred-name": {
                            "ce:given-name": "Matthias",
                            "ce:indexed-name": "Vogelgesang M.",
                            "ce:initials": "M.",
                            "ce:surname": "Vogelgesang"
                        }
                    }
                ]
            },
            "coredata": {
                "citedby-count": "0",
                "dc:description": "X-ray tomography has been proven to be a valuable tool for understanding internal, otherwise invisible, mechanisms in biology and other fields. Recent advances in digital detector technology enabled investigation of dynamic processes in 3D with a temporal resolution down to the milliseconds range. Unfortunately it requires computationally intensive recon- struction algorithms with long post-processing times. We have optimized the reconstruction software employed at the micro-tomography beamlines at KIT and ESRF. Using a 4 stage pipelined architecture and the computational power of modern graphic cards, we were able to reduce the processing time by a factor 75 with a single server. The time required to reconstruct a typical 3D image is reduced down to several seconds only and online visualization is possible for the first time.Copyright is held by the author/owner(s).",
                "dc:identifier": "SCOPUS_ID:84859045029",
                "dc:title": "Poster: A GPU-based architecture for real-time data assessment at synchrotron experiments",
                "link": [
                    {
                        "@_fa": "true",
                        "@href": "http://api.elsevier.com/content/abstract/scopus_id/84859045029",
                        "@rel": "self"
                    }
                ],
                "prism:aggregationType": "Conference Proceeding",
                "prism:coverDate": "2011-12-01",
                "prism:doi": "10.1145/2148600.2148624",
                "prism:pageRange": "51-52",
                "prism:publicationName": "SC'11 - Proceedings of the 2011 High Performance Computing Networking, Storage and Analysis Companion, Co-located with SC'11",
                "prism:url": "http://api.elsevier.com/content/abstract/scopus_id/84859045029"
            }
        }
    }
    

    Installation of tools

    Installation of python, mysql et al:

    pip install python-wordpress-xmlrpc
    

    Konfiguration Webserver (muss man wohl nach jeder Installation neu machen!!!)

    /etc/apache2/httpd.conf:
    LoadModule userdir_module libexec/apache2/mod_userdir.so
    LoadModule php5_module libexec/apache2/libphp5.so
    Include /private/etc/apache2/extra/httpd-userdir.conf
    
    /etc/apache2/extra/httpd-userdir.conf:
    Include /private/etc/apache2/users/*.conf
    
    /etc/php.ini:
    pdo_mysql.default_socket= /tmp/mysql.sock
    mysql.default_socket = /tmp/mysql.sock
    mysqli.default_socket = /tmp/mysql.sock
    
    
    sh-3.2# apachectl restart
    

    Install website:

    Create archive with wp dublicator

    Save scopus database

    mysqldump -u scopus -p scopus > scopus-170322.sql
    

    Create database on remote system

    mysql:

    CREATE USER 'scopus'@'localhost' IDENTIFIED BY '$scopus$';
    GRANT ALL PRIVILEGES ON scopus.* TO 'scopus'@'localhost';
    
    CREATE DATABASE scopus;
    
    mysql -u scopus -p scopus < scopus-170322.sql
    

    Create database wp_ufo2;

    CREATE USER ‘ufo’@‘localhost' IDENTIFIED BY '$ipepdv$';
    GRANT ALL PRIVILEGES ON wp_ufo2.* TO ‘ufo’@‘localhost';
    
    CREATE DATABASE wp_ufo2;
    

    Import WP archive:

    mkdir ufo2
    chown -R wwwrun:www ufo2
    

    Run the installer:

    http://ufo.kit.edu/ufo2/installer.php
    

    Installation Scopus-Scripts:

    pip install requests
    pip install python-wordpress-xmlrpc
    pip install pymysql
    

    Check configurations:

    scopus-get-piblications.py
    ak_wordpress.py
    

    Limitations

    Sometimes there are errors in the database. This case required manual intervention.

    Eamples of error that have been observed:

    • The cover date is quite far in the future. In this case the post do not appear on the website but are marked as scheduled. The date of publication should be looked up at the journal page and corrected manually.
    • Some authors are listed with more than one id. In this case the merge of author ids should be requested. The second author id cann be added the the author list.
    • In rare case (scientists with common names) several persons share the same id. Splitting of accounts should be requested. Wrong publications need to be deleted manually.
    • Publications that are out of topic can't be excluded automatically
    • Reprocessing of categories is currently not foreseen
    • It is not clear how to deal with scientists that leave a group. When should the name be excluded. It migth be desirable to check for authors and their affiliation.