MySQL and R




Example 1
Example 2
Example 3
Exploring data from database
Some Notes

Using MySQL with R is pretty easy, with RMySQL. Here are a few notes to keep me straight on a few things I always get snagged on.

Typically, most folks are going to want to analyze data that’s already in a MySQL database. Being a little bass-ackwards, I often want to go the other way. One reason to do this is to do some analysis in R and make the results available dynamically in a web app, which necessitates writing data from R into a database. As of this writing, INSERT isn’t even mentioned in the RMySQL docs, sadly for me, but it works just fine.

The docs are a bit clearer for RS-DBI, which is the standard R interface to relational databases and of which RMySQL is one implementation.

Opening and closing connections

The best way to close DB connections, like you would do in a finally clause in Java, is to use on.exit, like this:

con <- dbConnect(MySQL(),
         user="me", password="nuts2u",
         dbname="my_db", host="localhost")
on.exit(dbDisconnect(con))

Building queries

Using sprintf to build the queries feels a little primitive. As far as I can tell, there’s no prepared statements in RMySQL. I don’t suppose SQL-injection is a concern here, but prepared statements might be a little tidier, anyway.

Processing query results

You can process query results row by row, in blocks or all at once. The highly useful function dbGetQuery(con, sql) returns all query results as a data frame. With dbSendQuery, you can get all or partial results with fetch.

con <- dbConnect(MySQL(), user="network_portal", password="monkey2us", dbname=db.name, host="localhost")
rs <- dbSendQuery(con, "select name from genes limit 10;")
data <- fetch(rs, n=10)
huh <- dbHasCompleted(rs)
dbClearResult(rs)
dbDisconnect(con)

If there’s no more results, fetch returns a data frame with 0 columns and 0 rows. dbHasCompleted is supposed to indicate whether there are more records to be fetched, but seems broken. The value of huh in the code above is false, which seems wrong to me.

Retrieving AUTO_INCREMENT IDs

A standard newbie question with MySQL is how to retrieve freshly generated primary keys from AUTO_INCREMENT fields. That’s what MySQL’s LAST_INSERT_ID() is for.

You can retrieve the most recent AUTO_INCREMENT value with the LAST_INSERT_ID() SQL function or the mysql_insert_id() C API function. These functions are connection-specific, so their return values are not affected by another connection which is also performing inserts.

The same works with RMySQL, but there are some traps to watch out for. Let’s say you’re inserting a row into a table of networks. Don’t worry about the specifics. You want to insert related data in another table, so you need the ID of the newly inserted row.

create.network <- function(species.id, network.name, data.source, description) {
  
  con <- dbConnect(MySQL(),
           user="super_schmuck", password="nuts2u",
           dbname="my_db", host="localhost")
  on.exit(dbDisconnect(con))

  sql <- sprintf("insert into networks
                  (species_id, name, data_source, description, created_at)
                  values (%d, '%s', '%s', '%s', NOW());",
                 species.id, network.name, data.source, description)
  rs <- dbSendQuery(con, sql)
  dbClearResult(rs)

  id <- dbGetQuery(con, "select last_insert_id();")[1,1]

  return(id)
}

Don’t forget to clear the result of the insert. If you do, you’ll get 0 from the last_insert_id(). Also, using dbGetQuery for the insert produces an strange error when you go to call last_insert_id:

Error in mysqlExecStatement(conn, statement, ...) : 
  RS-DBI driver: (could not run statement: Commands out of sync; you can't run this command now)

Alternatively, you can also combine both SQL statements into one call to dbSendQuery, but, you have to remember to set a flag when you make the connection: client.flag=CLIENT_MULTI_STATEMENTS. Trying to use multiple queries seems not to work with dbGetQuery.

create.network <- function(species.id, network.name, data.source, description) {

  con <- dbConnect(MySQL(),
           user="super_schmuck", password="nuts2u",
           dbname="my_db", host="localhost",
           client.flag=CLIENT_MULTI_STATEMENTS)
  on.exit(dbDisconnect(con))

  sql <- sprintf("insert into networks
                  (species_id, name, data_source, description, created_at)
                  values (%d, '%s', '%s', '%s', NOW());
                  select last_insert_id();",
                 species.id, network.name, data.source, description)

  rs <- dbSendQuery(con, sql)

  if (dbMoreResults(con)) {
    rs <- dbNextResult(con)
    id <- fetch(rs)[1,1]
  } else {
    stop('Error getting last inserted id.')
  }

  dbClearResult(rs)

  return(id)
}

Any effort saved by combining the SQL queries is lost in the extra house-keeping so I prefer the first method.

In spite of these few quirks, RMySQL generally works fine and is pretty straightforward.


Accessing MySQL through R

Connecting to MySQL is made very easy with the RMySQL package. To connect to a MySQL database simply install the package and load the library.

install.packages("RMySQL")
library(RMySQL)

Connecting to MySQL:
Once the RMySQL library is installed create a database connection object.

mydb = dbConnect(MySQL(), user='user', password='password', dbname='database_name', host='host')

Listing Tables and Fields:
Now that a connection has been made we list the tables and fields in the database we connected to.

dbListTables(mydb)

This will return a list of the tables in our connection.

dbListFields(mydb, 'some_table')

This will return a list of the fields in some_table.

Running Queries:
Queries can be run using the dbSendQuery function.

dbSendQuery(mydb, 'drop table if exists some_table, some_other_table')

In my experience with this package any SQL query that will run on MySQL will run using this method.

Making tables:
We can create tables in the database using R dataframes.

dbWriteTable(mydb, name='table_name', value=data.frame.name)

Retrieving data from MySQL:
To retrieve data from the database we need to save a results set object.

rs = dbSendQuery(mydb, "select * from some_table")

I believe that the results of this query remain on the MySQL server, to access the results in R we need to use the fetch function.

data = fetch(rs, n=-1)

This saves the results of the query as a data frame object. The n in the function specifies the number of records to retrieve, using n=-1 retrieves all pending records.



R and MySQL Database Tutorial

library(RMySQL)
# Hello visitors, this tutorials is to connect and working with mysql server using RMySQl package
# Although, it is bit tricky and difficult to instal this package on lastest version on R, but you should do it before using this tutorial
# If in case, you find it difficult to instal RMySQl on your computer, please send me a message, so that I can develop a tutorials for that as well


library(RMySQL)

# mydb = dbConnect(MySQL(), user='manoj', password='password', host='localhost')

# creating a database using RMySQL in R

# dbSendQuery(mydb, "CREATE DATABASE bookstore;")

# dbSendQuery(mydb, "USE bookstore")

# reconnecting to database we just created using following command in R :

mydb = dbConnect(MySQL(), user='manoj', password='password', host='localhost', dbname="bookstore")

dbSendQuery(mydb, "drop table if exists books, authors")
# creating tables in bookstore:

dbSendQuery(mydb, "
CREATE TABLE books (
book_id INT,
title VARCHAR(50),
author VARCHAR(50));")
# Show table using R:

dbListTables(mydb)
## [1] "books"
# Considering our bookstore a bit more, we realize that we need to add a few more columns for data elements: publisher, publication year, ISBN number, genre (e.g., novel, poetry, drama), description of book, etc.

# We also realize that we want MySQL to automatically assign a number to the book_id column so that we don't have to bother creating one for each row or worry about duplicates.

# Additionally, we've decided to change the author column from the actual author's name to an identification number that we'll join to a separate table containing a list of authors.

# This will reduce typing, and will make sorting and searching easier, as the data will be uniform.

# To make these alterations to the table that we've already created, enter the following SQL command through R :

dbSendQuery(mydb, "ALTER TABLE books
CHANGE COLUMN book_id book_id INT AUTO_INCREMENT PRIMARY KEY,
CHANGE COLUMN author author_id INT,
ADD COLUMN description TEXT,
ADD COLUMN genre ENUM('novel','poetry','drama', 'tutorials', 'text', 'other'),
ADD COLUMN publisher_id INT,
ADD COLUMN pub_year VARCHAR(4),
ADD COLUMN isbn VARCHAR(20);")
# if R gives you an error:
#   Error in mysqlExecStatement(conn, statement, ...) : 
#   RS-DBI driver: (connection with pending rows, close resultSet before continuing)
# reconnect database using:

mydb = dbConnect(MySQL(), user='manoj', password='password', host='localhost', dbname="bookstore")

# and then run the above command of table alteration

# Now, Before moving on to adding data to our books table, let's quickly set up the authors table.

dbSendQuery(mydb, "CREATE TABLE authors
(author_id INT AUTO_INCREMENT PRIMARY KEY,
author_last VARCHAR(50),
author_first VARCHAR(50),
country VARCHAR(50));")
# Adding data into tables

dbSendQuery(mydb, "INSERT INTO authors
(author_last, author_first, country)
VALUES('Kumar','Manoj','India');")
# fetching last data insert id number:

last_id = fetch(dbSendQuery(mydb, "SELECT LAST_INSERT_ID();"))

# Inserting data into books table and using last insert ID number:

dbSendQuery(mydb, "INSERT INTO books
(title, author_id, isbn, genre, pub_year)
VALUES('R and MySQL', 1,'6900690075','tutorials','2014');")


# Note that I have not provided publisher's id and description in the above data insert, if you want you can modify that... at least try!

# Also note that we just entered for the author by using the LAST_INSERT_ID() function. you should also try:
# SELECT author_id, author_first FROM authors; # and insert data using that....
try1 = fetch(dbSendQuery(mydb, "SELECT book_id, title, description
FROM books
WHERE genre = 'tutorials';"))
# That's all for now... will come back soon with some advanced tutorials on R and MySQL


MySQL and R

install.packages("RMySQL")
library(RMySQL)

Opening and closing connections
 con <- dbConnect(MySQL(), user="me", password="nuts2u", dbname="my_db", host="localhost")
 on.exit(dbDisconnect(con))

Building queries
 rs <- dbSendQuery(con, "select name from genes limit 10;")
 data <- fetch(rs, n=10)
Processing query results
 huh <- dbHasCompleted(rs)
 dbClearResult(rs)
 dbDisconnect(con)

Retrieving AUTO_INCREMENT IDs


Accessing MySQL through R

Connecting to MySQL 
 mydb = dbConnect(MySQL(), user='user', password='password', dbname='database_name', host='host')
Listing Tables and Fields
 dbListTables(mydb)
Running Queries
 dbSendQuery(mydb, 'drop table if exists some_table, some_other_table')
Making tables
 dbWriteTable(mydb, name='table_name', value=data.frame.name)
Retrieving data from MySQL
 rs = dbSendQuery(mydb, "select * from some_table")
 data = fetch(rs, n=-1)


Another Trial
library(RMySQL)

con <- dbConnect(MySQL(), user="wkpc", password="a4", dbname="wkpc", host="localhost")
on.exit(dbDisconnect(con))

dbListTables(con)
archives case_actions case_activities case_file_drives case_files case_filters case_recents

dbListFields(con, 'users')

dbSendQuery(con, 'drop table if exists bookstore, andOtherbookstore')

dbWriteTable(con, name="mtcars", mtcars[1:10, ])

dbListFields(con, 'mtcars')
dbReadTable(con, "mtcars")

rs = dbSendQuery(con, "select * from mtcars")
data = fetch(rs, n=-1)

dbDisconnect(con)