Tuesday 2 October 2018

Identifying sentence boundary in a paragraph for only fullstops - Python


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#open a file and clean its contents tokenize and identifyits sentence boundary using .

#!/usr/bin/python
import re

with open('raw_corpus.txt') as fp:
    lines = fp.read().split("\n")   #here lines contains entire file contents

#sentence incremental variable
i=1;

#to access file contents line by line
for line in lines:

#if empty break from current iteration
    if line == "":
        break

#convert to lowercase
   # line = line.lower()

#leaning
    line = re.sub(r'\.', " .", line) #substitute . with space .
    line = re.sub(r',', " ,", line) #substitute , with space ,
    line = re.sub(r'\?', " ?", line) #substitute ? with space ?
    line = re.sub(r'!', " !", line)  #substitute ! with space !

#replace multiple spaces into single spaces
    line = re.sub(r'\s+', " ", line)

#get words in current line
    if line != "" and line != " ":
        sentences = line.split('.')
        
        for sentence in sentences:
            #print ("Iam|",sentence,"|",sep='') #debugging statement
            if sentence !="" and sentence !=" ":

                words = sentence.split(' ')

                print ("<Sentence Id='",i,"'>",sep='')  #use sep='' to suppress white space while printing

                j=1   #token counter
                for word in words:
                    if word != "" and word !=" ":
                        print (j,"\t",word)
                        j=j+1
                print (j,"\t.",word)     
                print ("</Sentence>",sep='')
                i = i + 1                 #increment i 


This script opens 'raw_coprus.txt', reads its contents line by line.

Then splits each line using '.' which is identified as a sentence boundary. Each sentence is now been split into tokens using space. These tokens are incremented for each sentence and printed along with current sentence.

Finding word frequency in Python - Dictionary


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
#Program to read a file(corpus) and find frequency of each token


#!/usr/bin/python
import re

#read file 
file=open("raw_corpus.txt","r+")


#dictionary to save tokens as keys and values fruquency as values
wordcount={}

for word in file.read().split():
#split() will split according to whitespace that includes ' ' '\t' and '\n'. It will split all the values into one list.
    #print (word)

    #cleaning corpus
    word = word.lower() #convert to lowercase
    word = re.sub('\.', "", word) #substitute . with empty

    #check if current token already exists in dictionary
    if word not in wordcount:
        wordcount[word] = 1
    else:
        wordcount[word] += 1


#print the dictionary with keys and values
#for k,v in wordcount.items():
    #print (k, v)

#print the dictionary with sorted keys(tokens) and values
for k in sorted(wordcount):
    print (k, wordcount[k])

This script opens the file 'raw_corpus.txt', each lines is split into words. Each word is stored in dictionary, with key as word and value as the frequency. When the same word(key) is encountered again value is incremented by 1.


Finding character frequency using Python - Dictionary


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
#open a file and find its character frequency
with open('raw_corpus.txt') as fp:
    lines = fp.read().split("\n")   #here lines contains entire file contents

#incremental variable
i=1;

#dictionary to save characters as keys and values as fruquency
charcount={}


#to access file contents line by line
for line in lines:

    #convert to lowercase
    lower_line = line.lower()

    chars = lower_line

    #for loop to access current line characters 
    for char in chars:
        if char not in charcount:
            charcount[char] = 1
        else:
            charcount[char] += 1
    
    #print (i,"\t",lower_line)
    
    i = i + 1                 #increment i 


#print the dictionary with sorted keys(tokens) and values
for k in sorted(charcount):
    print (k, charcount[k])

This script will open the file 'raw_corpus.txt' read its contents line by line, then find each character frequency and store in dictionary.

Dictionary in Python is similar to hashes in Perl. It stores a values for each corresponding key, duplicate keys are overridden when a same key is encountered while storing.

Tokenization and sentence boundary assuming each sentence is in new line


 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
#open a file and clean its contents

#!/usr/bin/python
import re

with open('raw_corpus.txt') as fp:
    lines = fp.read().split("\n")   #here lines contains entire file contents

#incremental variable
i=1;

#to access file contents line by line
for line in lines:

#if empty break from current iteration
    if line == "":
        break

#convert to lowercase
    line = line.lower()

#leaning
    line = re.sub(r'\.', " .", line) #substitute . with space .
    line = re.sub(r',', " ,", line) #substitute , with space ,
    line = re.sub(r'\?', " ?", line) #substitute ? with space ?
    line = re.sub(r'!', " !", line)  #substitute ! with space !

#replace multiple spaces into single spaces
    line = re.sub(r'\s+', " ", line)

#get words in current line
    words = line.split(' ')

    print ("<Sentence Id='",i,"'>",sep='')  #use sep='' to suppress white space while printing

    j=1   #token counter
    for word in words:
        print (j,"\t",word)
        j=j+1
        
    
    print ("</Sentence>",sep='')
    
    i = i + 1                 #increment i 

This script will open 'raw_corpus.txt' and remove junks.

It will also print sentence boundaries and tokenize a sentence into words

open file using 'open mode'


1
2
3
4
5
6
7
8
9
#open file using open file mode
fp = open('raw_corpus.txt') # Open file on read mode
lines = fp.read().split("\n") # Create a list containing all lines
fp.close() # Close file


#read file line by line
for line in lines:
    print (line)


This script will print contents of file 'raw_corpus.txt' line by line.

Reading a file using "with"


1
2
3
4
5
6
7
8
#file open example using "with" (recomemded)
with open('raw_corpus.txt') as fp:
    lines = fp.read().split("\n")   #here lines contains entire file contents


#to access file contents line by line
for line in lines:
    print (line)

When you run this python script, the contents of the file 'raw_corpus.txt' are printed line by line.

Monday 17 September 2018

Variables in Python

In python variable is created as in following example

1
2
3
4
x = 5 
y = "John"
print(x)
print(y)

Rules for variable names:

1. It should start wih either a letter or the underscore character.
2. Variable cannot start with a number.
3. Can only contain alpha-numeric characters and underscores (A-z, 0-9, and _ )
4. Uppercase and lowercase names are treated differently.

Python has five standard data types −
  • Numbers
  • String
  • List
  • Tuple
  • Dictionary
In the example above x is a number type and y is a string type variable.

A list contains items separated by commas and enclosed within square brackets []. To some extent, lists are similar to arrays in C. One difference between them is that all the items belonging to a list can be of different data type.

Example: 

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#!/usr/bin/python

mylist = [ 'abcd', 786 , 2.23, 'john', 70.2 ]
smalllist = [123, 'john']

print mylist          # Prints complete list
print mylist[0]       # Prints first element of the list
print mylist[1:3]     # Prints elements starting from 2nd till 3rd 
print mylist[2:]      # Prints elements starting from 3rd element
print smalllist * 2  # Prints list two times
print mylist + smalllist # Prints concatenated lists 
  

Tuples are kind of lists except that they are read only. They are enclosed by using paranthesis instead of square brackets(for lists). While lists can be updated tuples cannot be updated.

Example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#!/usr/bin/python

mytuple = ( 'abcd', 786 , 2.23, 'john', 70.2 )
smalltuple = (123, 'john')

print tuple          # Prints complete tuple
print tuple[0]       # Prints first element of the tuple
print tuple[1:3]     # Prints elements starting from 2nd till 3rd 
print tuple[2:]      # Prints elements starting from 3rd element
print smalltuple * 2  # Prints tuple two times
print tuple + smalltuple # Prints concatenated tuples

Dictionary:

 In Python's dictionaries are kind of hash table. They work like hashes in Perl and consist of key-value pairs. Dictionaries are enclosed by curly braces { } and values can be assigned and accessed using square braces [].

 Example : 
 
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
#!/usr/bin/python

dict = {}
dict['one'] = "This is one"
dict[2]     = "This is two"

dicts = {'name': 'john','code':6734, 'dept': 'sales'}


print dict['one']       # Prints value for 'one' key
print dict[2]           # Prints value for 2 key
print dicts          # Prints complete dictionary
print dicts.keys()   # Prints all the keys
print dicts.values() # Prints all the values

References:

https://www.tutorialspoint.com/python/python_variable_types.htm

https://www.w3schools.com/python/python_variables.asp

Monday 10 September 2018

How to remove 'whitespace' in the variable while 'print'ing?

Python3: 

Seperator is used to suppress whitespace  in print


  print (name, ",How are you?",sep='')

To delete end line terminator (e.g. \n, \r, \s etc.)


print (name, ",How are you?",end='')

Saturday 8 September 2018

Useful and Important commands for Fedora 28

After fresh installlation:

sudo dnf update  

Enable and start SSH:

sudo systemctl start sshd.service

sudo systemctl enable sshd.service

RPM Fusion:

sudo dnf install https://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm

Install Vlc:

sudo dnf install vlc

Restart apache:   

systemctl restart httpd  
                       or
service httpd restart
  
Enable apache on startup:

systemctl enable httpd

Allow http connections:

firewall-cmd --add-service=http --permanent
firewall-cmd --reload

Install  php and Mysql(mariadb):

dnf install php-cli
dnf install mariadb mariadb-server
systemctl restart mariadb

finalize mariadb installation

/usr/bin/mysql_secure_installation
dnf install php-mysqlnd (For php-mysql driver)

After doing all these restart apache

systemctl restart httpd

Make a bootable USB DRIVE(pendrive) in Linux

After Ubuntu 12 making a bootable pendrive using startup disk creator has not been smooth and has been difficult to do. Ofcourse this can be used in other linux platforms too.

There is a new software "Ether" that is cross platform, open source tool in the market to burn images to SD card and USB. It’s called Etcher.


Download Etcher AppImage from the link below:

Once downloaded, you need to make it executable. Right click on the downloaded file and go to Properties


And in here, check the “Allow executing file as program” option.

Then double click Etcher, Click on Select image and browse to the location where you have downloaded the ISO. Etcher automatically recognizes the USB drive. You can change it if you have multiple USBs plugged in. Once it has selected the ISO and USB drive, it’ll give you the option to flash the ISO to USB drive. Click on Flash do start flashing the drive with the selected ISO.

By the end of this process you will have a bootable USB drive with your ISO. 

 Reference and Credits: https://itsfoss.com/create-fedora-live-usb-ubuntu/

Saturday 1 September 2018

Python Tutorial - Python Installation

From wikibooks.org, Python is an interpreted programming language. For those who don't know, a programming language is what you write down(instructions) to tell a computer what to do. However, the computer doesn't read the language directly—there are hundreds of programming languages, and it couldn't understand them all. So, when someone writes a program, they will write it in their language of choice, and then compile it—that is, turn it into lots of 0s and 1s, that the computer can easily and quickly understand. A Windows program that you buy is already compiled for Windows—if you opened the program file up, you'd just get a mass of weird characters and rectangles. Give it a go—find a small Windows program, and open it up in Notepad or Wordpad. See what garbled mess you get. 

Python Installation:

By default Linux users get Python Installed. If you want to confirm the same, just type the following command and see what happens.

python --version

If python is installed you will see something like "Python 2.7" which means you have Python installed and using version 2.7.

However many Linux distributions install version 2.7, but 3.x.x is the newest version and is not backward compatible. So I recommend to upgrade your Python to 3.x.x especially for beginners.

Installing Python 3.x.x: 

sudo apt-get install python 3.3.3

This will install Python 3.3.3, you can confirm the same using

python --version

Now you have two versions of Python one is Python2 and the other is Python3.

By default python2 will be used when you try to execute any python program. So to use python3 we have to alias python3 to python.

To make alias:

In Terminal open vi or any editor the file .bashrc located in your home directory

vi ~/.bashrc

Type “alias python='python3'” without double quotes and save it, then.
source .bashrc 

Thats it now you are all ready to use python3 as default version in your machine.

Wednesday 29 August 2018

Pagination in PHP, MYSQL serverside API

Pagination is a useful concept when there is a requirement of retreiving large chunks of data from server. Pagination reduces the load on server and also is user friendly as the end user sees less data in a go.

So here I am using PHP to retreive rows from MYSQL using pagination.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
#retreive start and limit values 
if(isset($_POST["start"])) {
        $start = $_POST["start"]-1;
} else {
        $start = 0;
}

##number of rows to be retreived from starting point, can be 10,20,30...
if(isset($_POST["limit"])) {
        $limit = $_POST["limit"] ;
} else {
        $limit = 10 ; //get 10 as default from starting point
}

#set header
header('Content-Type: application/json');

#connection to database
$conn = mysqli_connect(DB_HOST, DB_USER, DB_PASS, DB_NAME);

#form query
$sql = "Select * from mytable LIMIT $start, $limit ";     ##this query retreives rows from mytable starting from start point and upto $limit rows

#execute query
$retval = mysqli_query( $conn, $sql );

#fetch rows if query executed succesfully else return error
if(! $retval ) {
        $data = array("status"=>"failure","message"=>"Log fetch error!!");
        echo json_encode( $data );
        mysqli_close($conn);
        die('Could not fetch logs: ' . mysqli_error());
}
else {
        while($row = mysqli_fetch_array($retval, MYSQL_ASSOC)) {
              $data[] = $row;
        }

##send response to client
                $data = array('status'=>'success','message'=>'Log fetched.','records'=>$data);

#close connection
        mysqli_close($conn);
        echo json_encode( $data );
}

This script will fetch rows from database based on start and limit variables. If these variables are not passed during client request it sets to default of 0 to 10 rows.

Happy Coding.

Monday 27 August 2018

Crontabs - Job scheduler explained

Open crontab using below command:

 

crontab -e 

 

Its syntax is like below 


* * * * * command/script to be executed/

# Example of job definition:

 

# .---------------- minute (0 - 59)
# |  .------------- hour (0 - 23)
# |  |  .---------- day of month (1 - 31)
# |  |  |  .------- month (1 - 12) OR jan,feb,mar,apr ...
# |  |  |  |  .---- day of week (0 - 6) (Sunday=0 or 7) OR sun,mon,tue,wed,thu,fri,sat
# |  |  |  |  |
# *  *  *  *  * user-name  command to be executed

Listing of Cronjobs can be done using below command: 

 

crontab -l 

 

 Examples:  

 

1. Take backup every day midnight.


0 0 * * * /my/path/to/script/backup.sh

2. Start mongo server @reboot


@reboot /usr/bin/mongod


3.  Mysql database backup every sunday


* * * * 0 /usr/bin/mysqldump -u root -p{root123} userdb2 &gt; /home/nagaraju/myfile_$(date +\%Y-\%m-\%d).sql&nbsp;&nbsp; #every week on Sunday<br />

Friday 24 August 2018

Saving unicode or utf8 data using PHP-MYSQL

Saving data in MYSQL is almost common in every website. When it comes to unicode date there is a bit of overhead that needs to be taken care of. I am listing those settings step by step.

1. Set table's collation to "utf8_general_ci"

 

 ALTER TABLE <table_name> CONVERT TO CHARACTER SET utf8 COLLATE utf8_unicode_ci;  

 

2. Set the column's collation to "utf8_general_ci"

 

 ALTER TABLE <table_name> MODIFY <column_name> VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_unicode_ci;

3. In PHP use the below code while the data is being inserted into the table.


 mysqli_query($conn,"SET names 'utf8'");

Thursday 23 August 2018

HTTPS to HTTP Ajax Request, Same Origin Policy.

Often there are times where we need to make a request that might not obey the Same Origin Policy . So here I am going to address Different Protocal issue in Same Origin Policy. Suppose we are making a http request from a server with https protocal like following,

$.ajax({
        url: 'http://MyAjaxHTTP Path',
        type: 'POST',
        data: 'param1=value1&param2=value',
        header: "application/x-www-form-urlencoded",

The above request cannot be made because it violates same origin policy. So we have to write a layer code between Javascript and the HTTP server that directly interacts with HTTP. So first we have to choose a server side language for this. I am choosing PHP.

In PHP (phplayer.php):


$param1 = $_POST["param1"];
$param2 = $_POST["param2"];
$data = array(“param1”=>$param1, "$param2"=>$param2);

$data = http_build_query($data);
header('content-type: application/json');
$context_options =  array(
        'http' => array(
                'method' => 'POST',
                'header' => 'Content-type: application/x-www-form-urlencoded',
                'content' => $data
        ));
$context  = stream_context_create($context_options);

//notice that our actual HTTP URL is called here
$result = file_get_contents("http://MyAjaxHTTPPath", false, $context);
echo $result;

In Javascript everything remains same except that we have to make a call to our layer php code that will actually make a http request and get back the response.


$.ajax({
        url: 'phplayer.php',
        type: 'POST',
        data: "param1=value1&param2=value",
        header: "application/x-www-form-urlencoded",

Tuesday 21 August 2018

Installing Mysql in Ubuntu 16.06

To install mysql manually in linux, see steps below:

Installing MySQL 5.5.51 on Ubuntu 16.06
  1. Uninstall any existing version of MySQL
     
    sudo rm /var/lib/mysql/ -R
     
  2. Delete the MySQL profile
     
    sudo rm /etc/mysql/ -R
     
  3. Automatically uninstall mysql

    sudo apt-get autoremove mysql* --purge
     
    sudo apt-get remove apparmor
     
  4. Download version 5.5.51 from MySQL site

    wget https://dev.mysql.com/get/Downloads/MySQL-5.5/mysql-5.5.56-linux-glibc2.5-x86_64.tar.gz
     
  5. Add mysql user group

    sudo groupadd mysql
     
  6. Add mysql (not the current user) to mysql user group

    sudo useradd -g  mysql mysql
     
  7. Extract mysql-5.5.51-linux2.6-x86_64.tar.gz to /usr/local

    cd /usr/local
    sudo tar -xvf mysql-5.5.49-linux2.6-x86_64.tar.gz
     
  8. Create mysql folder in /usr/local

    sudo mv mysql-5.5.49-linux2.6-x86_64 mysql
     
  9. Set mysql directory owner and user group

    cd mysql
    sudo chown -R mysql:mysql *
     
  10. Install the required lib package

    sudo apt-get install libaio1
     
  11. Execute mysql installation script

    sudo scripts/mysql_install_db --user=mysql
     
  12. Set mysql directory owner from outside the mysql directory

    sudo chown -R root .
     
  13. Set data directory owner from inside mysql directory

    sudo chown -R mysql data
     
  14. Copy the mysql configuration file

    sudo cp support-files/my-medium.cnf /etc/my.cnf 
     
  15. Start mysql

    sudo bin/mysqld_safe --user=mysql &
    sudo cp support-files/mysql.server /etc/init.d/mysql.server
     
  16. Initialize root user password

    sudo bin/mysqladmin -u root password '111111'
     
  17. Start mysql server

    sudo /etc/init.d/mysql.server start
     
  18. Stop mysql server

    sudo /etc/init.d/mysql.server stop
     
  19. Check status of mysql

    sudo /etc/init.d/mysql.server status
     
  20. Enable myql on startup

    sudo update-rc.d -f mysql.server defaults 
     
  21. Disable mysql on startup (Optional)

    sudo update-rc.d -f mysql.server remove
     
  22. Add mysql path to the system

    sudo ln -s /usr/local/mysql/bin/mysql /usr/local/bin/mysql
     
  23. Now directly use the command below to start mysql

    mysql -u root -p 
    
PS: One needs to reboot in order for the changes to take place.

Wednesday 21 March 2018

Developing a basic joomla plugin






Joomla is one of the flexible CMS that enables users to create a website with minimal effort. For developers, this would be a cake walk to setup joomla.  However, if you are looking to develop extension for Joomla, then you might want to consider few tips and follow some predefined set of rules which we are going to discuss in this article.
A joomla extension  basically consists of 4 basic files that are required to develop a plugin. Let's know about each file while we develop a basic hello world plugin.

The naming convention of a extension basically includes the extension type followed by the name of the extension separated by underscore('_').
Let's name our plugin as basic. Since, we are developing a module type extension, I am naming the plugin as 'mod_basic'. So our file structure would be something like below:

1. mod_basic.php
2. helper.php
3. tmpl/default.php
4. mod_basic.xml

  1. mod_basic.php:
    This file is the main file for a joomla module which is first called by joomla to initialize a module. This file includes helper.php and calls helper's class method which is responsible to execute the given query and retrieve data. Later, it includes the template of the module which is default.php that is responsible to display the data.

Here's our code for mod_basic.php

// No direct access without joomla
defined('_JEXEC') or die;
 
// Include the syndicate functions only once
require_once dirname(__FILE__) . '/helper.php';

$hello = modBasicHelper::getHello($params);
require JModuleHelper::getLayoutPath('mod_basic');
 


  2.   helper.php
   
    This file can be considered as the brain of the module because it includes the instructions that enables the module to retrieve the data. In this file, we create module's class and it's methods to process and retrieve the data. This file includes code that basically does the 'communicate with database and retrieve data' part.

Here's our code for helper.php. If you notice the code below, it defines the "ModBasicHelper" class mentioned in mod_basic.php

class ModBasicHelper
{
    /**
     * Retrieves the hello message
     *
     * @param   array  $params An object containing the module parameters
     *
     * @access public
     */
    public static function getHello($params)
    {
        return 'Hello, World! I am a basic module';
    }
}
 
3. default.php

    This file includes the template of the module that displays the module's content. This includes the HTML structure to display the data provided by mod_basic.php file. The main module's file includes the template file with the JModuleHelper::getLayoutPath method, which first will check for any template overrides.

Place this file in a folder named "tmpl".

Here we are just returning "hello world" message in default.php file

<?php
// No direct access
defined('_JEXEC') or die; ?>
<?php echo $hello; ?>

4. mod_basic.xml

 This xml file can be called as guide for the module as it helps joomla understand about the module structure during installation. It specifies the files that will be copied by the installer, and also contains the information about the parameters of the module that are used by the module manager, as well additional information about the module.
The type parameter includes values like modules, plugins while client includes administrator and site which means module access levels to be designed for admin or site user.

Here's the xml file for our module:

<?xml version="1.0" encoding="utf-8"?>
<extension type="module" version="3.1.0" client="administrator" 
 method="upgrade">
    <name> Module - Hello World!</name>
    <author>Sravan</author>
    <version>1.0.0</version>
    <description>A simple Hello, World! module.</description>
    <files>
        <filename>mod_basic.xml</filename>
        <filename module="mod_basic">mod_basic.php</filename>
        <filename>helper.php</filename>
        <filename>tmpl/default.php</filename>
    </files>
    <config>
    </config>
</extension>


We are done with the basic Joomla module. Zip these files in to mod_basic.zip file which is our module installer file.

Considering that you have already setup joomla, login to your joomla and follow  the below steps to install the module:

1. On the top menu, click on Manage > Extensions > Install
 
2. Click on "Upload From Package" option and select browse to upload the above zip file.




3. Joomla installer will install this module and shows "installation successful" message. Click on "Extensions > Modules" which lists all the modules. From the drow down filter under "New" option select "Adminstrator" since we  have  developed the module for administrator.



4. Click on your module which you named in <filename> tag in xml file. In module settings, change the "position" to home page by typing cpanel(select Isis). This decides the position of your module. You may also place it in other places of your Joomla admin page. Here I am placing it in the control panel of admin.
Change the status to "published" and select "save & close" option.



5. Now the module setup is ready, you may check your module by clicking on home(joomla icon on the top left) which shows up your custom module.









Similarly, you may customize the code in helper file to fetch the articles and other content from database which can be displayed in your module.

Hope this was helpful!

Happy learning!