No Description

Andreas Kopmann 0fa61ce682 Excluded the disabled publication already in SQL query 7 years ago
README.md bf0cc38b3e Updated documentation 7 years ago
ak_scopus.py e58e305e1d Initial version 7 years ago
ak_wordpress.py b388290e39 Extracted site specific configuration in config.py 7 years ago
create_scopus.sql e58e305e1d Initial version 7 years ago
my_scopus.py e58e305e1d Initial version 7 years ago
scopus-get-publications.py b388290e39 Extracted site specific configuration in config.py 7 years ago
scopus-update-database.py 0fa61ce682 Excluded the disabled publication already in SQL query 7 years ago
test-scopus.py b40eb84e14 Added script to remove unused publications from the database 7 years ago
test-scopus2.py b40eb84e14 Added script to remove unused publications from the database 7 years ago
test-wp.py b388290e39 Extracted site specific configuration in config.py 7 years ago
test-wp2.py b388290e39 Extracted site specific configuration in config.py 7 years ago
update.sh b388290e39 Extracted site specific configuration in config.py 7 years ago

README.md

README scopus

Ak, 27.3.2017

Get information on publications of work groups from Elsevier's Scopus database for usage in websites. For each publication a post on a Wordpress CMS is created. Citations are mapped to Wordpress comments. The get-publication script is intended to run on a regualr basis (e.g. by cron).

Note: All scopus scripts run only with valid access to the Scopus database (e.g. from KIT LAN). The Scopus service is not public available.

Version history

Version 1.0, 8.3.17 (ak):

  • initial version of a single script without any options It runs in 4 phases: get publiations for individual author groups, create posts, get all citations, create comments.
  • used with the test installation at the UFO server in March 2017

Content

readme.md       This file
my_scopus.py	List of scopus author ids
ak_scopus.py	Functions to access scopus
ak_wordpress.py Functions to creates Wordpress posts + comments
scopus-get-publications.py  Script to query Scopus
scopus-update-database.py   Synchronize database and available Wordpress posts

test-scopus.py	Application with some functions to get publication entries
		Prints a list with some formatting
test-scopus2.py Example from one of the website, only one query
test-wp.py	Test script for access to the wordpress API	
test-wp2.py 	Test script for wordpress - only query, no modification

info            Documentation, website, etc (not in GIT)

Usage

  1. Go to Scopus and retrieve the scopus author ids for the scientists in your group. Define the ids in my_scopus.py and group them.

  2. Select one of more author groups in scopus-get-publications.py (main part at the end of the file). Check definition of database and wordpress installation.

  3. Execute scopus-get-publications.py. python -W ignore scopus-get-publications.py

Note: The -W ignore flag might be necessary if the INSERT IGNORE causes warnings.

Example run:

ufo:~/scopus # python -W ignore scopus-get-publications.py 

***********************************************
**** scopus-get-publications / 2017-03-27 *****
***********************************************

=== Update of publications for the author group: Computing
Total number of publications: 54
=== Update of publications for the author group: X-ray Imaging
Total number of publications: 39
=== Update of publications for the author group: Electronics
Total number of publications: 132
=== Update of publications for the author group: Morphology
Total number of publications: 21

=== Create posts for newly registered publication in scopus
Nothing new found

=== Update citatation of all publication in the database
Total number of publications is 281

=== Create comments for newly registered citations in scopus
Number of new citations is 0

Summary: (see also logfile /root/scopus/scopus-publications.log) 
Date       = 2017-03-27 21:28:36.002624
NPubs      = 281
NNewPubs   = 0
NCites     = 4699
NNewCites  = 0
Runtime    = 0:00:11.496362

Further enhancements

Todo:

  • Reprocessing of all post, if the format has changed E.g. add button with Email to author or a new category has been added
  • Query only the latest citations for each publications not all.
  • Store JSON-Data of all publications
  • Get bibliographic information for display at the web page of a reseach group like UFO or may be also later for the DTS program.
  • Handle wrong publications in scopus for author with same name
  • Automatically include reports and student thesis by bibtex definition and upload on a server!? This would have the nice effect, that all student work is organized systematically!!!

Structure of the database

Both tables keep the reference to the publications in Scopus and the Wordpress ids. With this information, reprocessing is possible (but not implemented now).

Table publications:

MariaDB [scopus]> describe publications;
+--------------+--------------+------+-----+---------+----------------+
| Field        | Type         | Null | Key | Default | Extra          |
+--------------+--------------+------+-----+---------+----------------+
| id           | int(11)      | NO   | PRI | NULL    | auto_increment |
| scopusid     | varchar(255) | YES  | UNI | NULL    |                |
| wpid         | int(11)      | YES  |     | NULL    |                |
| citedbycount | int(11)      | YES  |     | NULL    |                |
| citesloaded  | int(11)      | YES  |     | NULL    |                |
| categories   | varchar(255) | YES  |     | NULL    |                |
| doi          | varchar(255) | YES  |     | NULL    |                |
| title        | varchar(255) | YES  |     | NULL    |                |
| abstract     | text         | YES  |     | NULL    |                |
| bibtex       | text         | YES  |     | NULL    |                |
| ts           | datetime     | YES  |     | NULL    |                |
| scopusdata   | text         | YES  |     | NULL    |                |
| eid          | varchar(255) | YES  |     | NULL    |                |
+--------------+--------------+------+-----+---------+----------------+

Table citations:

MariaDB [scopus]> describe citations;
+--------------+--------------+------+-----+---------+----------------+
| Field        | Type         | Null | Key | Default | Extra          |
+--------------+--------------+------+-----+---------+----------------+
| id           | int(11)      | NO   | PRI | NULL    | auto_increment |
| scopusid     | varchar(255) | YES  |     | NULL    |                |
| eid          | varchar(255) | YES  |     | NULL    |                |
| wpid         | int(11)      | YES  | MUL | NULL    |                |
| wpcommentid  | int(11)      | YES  |     | NULL    |                |
| citedbycount | int(11)      | YES  |     | NULL    |                |
| citesloaded  | int(11)      | YES  |     | NULL    |                |
| categories   | varchar(255) | YES  |     | NULL    |                |
| doi          | varchar(255) | YES  |     | NULL    |                |
| scopusdata   | text         | YES  |     | NULL    |                |
| title        | varchar(255) | YES  |     | NULL    |                |
| abstract     | text         | YES  |     | NULL    |                |
| bibtex       | text         | YES  |     | NULL    |                |
| ts           | datetime     | YES  |     | NULL    |                |
+--------------+--------------+------+-----+---------+----------------+

Setup of scopus database in mysql

create database scopus;

CREATE USER 'scopus@localhost';
grant all on scopus.* to 'scopus'@'localhost' identified by '$scopus$';
# create tables
mysql -u scopus -p scopus < create_scopus.sql

Publications in Scopus

Sometime (unfortunately quite often) a author id in Scopus is not unique but identifies several researchers with the same name. E.g. Michele Caselle (3 persons) Matthias Balzer (2).

This case is currently handled manually by deleting all publications from the unknown authors. Might be possible to implement also a black list??

Sample data from Scopus:

{
    "abstracts-retrieval-response": {
        "authors": {
            "author": [
                {
                    "@_fa": "true",
                    "@auid": "15076530600",
                    "@seq": "1",
                    "affiliation": {
                        "@href": "http://api.elsevier.com/content/affiliation/affiliation_id/60102538",
                        "@id": "60102538"
                    },
                    "author-url": "http://api.elsevier.com/content/author/author_id/15076530600",
                    "ce:given-name": "Suren",
                    "ce:indexed-name": "Chilingaryan S.",
                    "ce:initials": "S.",
                    "ce:surname": "Chilingaryan",
                    "preferred-name": {
                        "ce:given-name": "Suren",
                        "ce:indexed-name": "Chilingaryan S.",
                        "ce:initials": "S.",
                        "ce:surname": "Chilingaryan"
                    }
                },
                {
                    "@_fa": "true",
                    "@auid": "35313939900",
                    "@seq": "2",
                    "affiliation": {
                        "@href": "http://api.elsevier.com/content/affiliation/affiliation_id/60102538",
                        "@id": "60102538"
                    },
                    "author-url": "http://api.elsevier.com/content/author/author_id/35313939900",
                    "ce:given-name": "Andreas",
                    "ce:indexed-name": "Kopmann A.",
                    "ce:initials": "A.",
                    "ce:surname": "Kopmann",
                    "preferred-name": {
                        "ce:given-name": "Andreas",
                        "ce:indexed-name": "Kopmann A.",
                        "ce:initials": "A.",
                        "ce:surname": "Kopmann"
                    }
                },
                {
                    "@_fa": "true",
                    "@auid": "56001075000",
                    "@seq": "3",
                    "affiliation": {
                        "@href": "http://api.elsevier.com/content/affiliation/affiliation_id/60032633",
                        "@id": "60032633"
                    },
                    "author-url": "http://api.elsevier.com/content/author/author_id/56001075000",
                    "ce:given-name": "Alessandro",
                    "ce:indexed-name": "Mirone A.",
                    "ce:initials": "A.",
                    "ce:surname": "Mirone",
                    "preferred-name": {
                        "ce:given-name": "Alessandro",
                        "ce:indexed-name": "Mirone A.",
                        "ce:initials": "A.",
                        "ce:surname": "Mirone"
                    }
                },
                {
                    "@_fa": "true",
                    "@auid": "35277157300",
                    "@seq": "4",
                    "affiliation": {
                        "@href": "http://api.elsevier.com/content/affiliation/affiliation_id/60102538",
                        "@id": "60102538"
                    },
                    "author-url": "http://api.elsevier.com/content/author/author_id/35277157300",
                    "ce:given-name": "Tomy",
                    "ce:indexed-name": "Dos Santos Rolo T.",
                    "ce:initials": "T.",
                    "ce:surname": "Dos Santos Rolo",
                    "preferred-name": {
                        "ce:given-name": "Tomy",
                        "ce:indexed-name": "Dos Santos Rolo T.",
                        "ce:initials": "T.",
                        "ce:surname": "Dos Santos Rolo"
                    }
                },
                {
                    "@_fa": "true",
                    "@auid": "35303862100",
                    "@seq": "5",
                    "affiliation": {
                        "@href": "http://api.elsevier.com/content/affiliation/affiliation_id/60102538",
                        "@id": "60102538"
                    },
                    "author-url": "http://api.elsevier.com/content/author/author_id/35303862100",
                    "ce:given-name": "Matthias",
                    "ce:indexed-name": "Vogelgesang M.",
                    "ce:initials": "M.",
                    "ce:surname": "Vogelgesang",
                    "preferred-name": {
                        "ce:given-name": "Matthias",
                        "ce:indexed-name": "Vogelgesang M.",
                        "ce:initials": "M.",
                        "ce:surname": "Vogelgesang"
                    }
                }
            ]
        },
        "coredata": {
            "citedby-count": "0",
            "dc:description": "X-ray tomography has been proven to be a valuable tool for understanding internal, otherwise invisible, mechanisms in biology and other fields. Recent advances in digital detector technology enabled investigation of dynamic processes in 3D with a temporal resolution down to the milliseconds range. Unfortunately it requires computationally intensive recon- struction algorithms with long post-processing times. We have optimized the reconstruction software employed at the micro-tomography beamlines at KIT and ESRF. Using a 4 stage pipelined architecture and the computational power of modern graphic cards, we were able to reduce the processing time by a factor 75 with a single server. The time required to reconstruct a typical 3D image is reduced down to several seconds only and online visualization is possible for the first time.Copyright is held by the author/owner(s).",
            "dc:identifier": "SCOPUS_ID:84859045029",
            "dc:title": "Poster: A GPU-based architecture for real-time data assessment at synchrotron experiments",
            "link": [
                {
                    "@_fa": "true",
                    "@href": "http://api.elsevier.com/content/abstract/scopus_id/84859045029",
                    "@rel": "self"
                }
            ],
            "prism:aggregationType": "Conference Proceeding",
            "prism:coverDate": "2011-12-01",
            "prism:doi": "10.1145/2148600.2148624",
            "prism:pageRange": "51-52",
            "prism:publicationName": "SC'11 - Proceedings of the 2011 High Performance Computing Networking, Storage and Analysis Companion, Co-located with SC'11",
            "prism:url": "http://api.elsevier.com/content/abstract/scopus_id/84859045029"
        }
    }
}

Installation of tools

Installation of python, mysql et al:

pip install python-wordpress-xmlrpc

Konfiguration Webserver (muss man wohl nach jeder Installation neu machen!!!)

/etc/apache2/httpd.conf:
LoadModule userdir_module libexec/apache2/mod_userdir.so
LoadModule php5_module libexec/apache2/libphp5.so
Include /private/etc/apache2/extra/httpd-userdir.conf

/etc/apache2/extra/httpd-userdir.conf:
Include /private/etc/apache2/users/*.conf

/etc/php.ini:
pdo_mysql.default_socket= /tmp/mysql.sock
mysql.default_socket = /tmp/mysql.sock
mysqli.default_socket = /tmp/mysql.sock


sh-3.2# apachectl restart

Install website:

Create archive with wp dublicator

Save scopus database

mysqldump -u scopus -p scopus > scopus-170322.sql

Create database on remote system

mysql:

CREATE USER 'scopus'@'localhost' IDENTIFIED BY '$scopus$';
GRANT ALL PRIVILEGES ON scopus.* TO 'scopus'@'localhost';

CREATE DATABASE scopus;

mysql -u scopus -p scopus < scopus-170322.sql

Create database wp_ufo2;

CREATE USER ‘ufo’@‘localhost' IDENTIFIED BY '$ipepdv$';
GRANT ALL PRIVILEGES ON wp_ufo2.* TO ‘ufo’@‘localhost';

CREATE DATABASE wp_ufo2;

Import WP archive:

mkdir ufo2
chown -R wwwrun:www ufo2

Run the installer:

http://ufo.kit.edu/ufo2/installer.php

Installation Scopus-Scripts:

pip install requests
pip install python-wordpress-xmlrpc
pip install pymysql

Check configurations:

scopus-get-piblications.py
ak_wordpress.py