How to split a data frame

To cut the data frame into an arbitrary number of smaller dataframes. Here, we cut into two dataframes.

x = data.frame(num = 1:26, let = letters, LET = LETTERS)
set.seed(10)
split(x, sample(rep(1:2, 13)))

gives

$`1`
   num let LET
3    3   c   C
6    6   f   F
10  10   j   J
12  12   l   L
14  14   n   N
15  15   o   O
17  17   q   Q
18  18   r   R
20  20   t   T
21  21   u   U
22  22   v   V
23  23   w   W
26  26   z   Z

$`2`
   num let LET
1    1   a   A
2    2   b   B
4    4   d   D
5    5   e   E
7    7   g   G
8    8   h   H
9    9   i   I
11  11   k   K
13  13   m   M
16  16   p   P
19  19   s   S
24  24   x   X
25  25   y   Y

If you want to split a dataframe according to values of some variable, I'd suggest using daply() from the plyr package.

library(plyr)
x <- daply(df, .(splitting_variable), function(x)return(x))

Now, x is an array of dataframes. To access one of the dataframes, you can index it with the name of the level of the splitting variable.

x$Level1
#or
x[["Level1"]]

I just posted a kind of a RFC that might help you: http://stackoverflow.com/questions/3318333/split-a-vector-into-chunks-in-r

x = data.frame(num = 1:26, let = letters, LET = LETTERS)
## number of chunks
n <- 2
dfchunk <- split(x, factor(sort(rank(row.names(x))%%n)))
dfchunk
$`0`
   num let LET
1    1   a   A
2    2   b   B
3    3   c   C
4    4   d   D
5    5   e   E
6    6   f   F
7    7   g   G
8    8   h   H
9    9   i   I
10  10   j   J
11  11   k   K
12  12   l   L
13  13   m   M

$`1`
   num let LET
14  14   n   N
15  15   o   O
16  16   p   P
17  17   q   Q
18  18   r   R
19  19   s   S
20  20   t   T
21  21   u   U
22  22   v   V
23  23   w   W
24  24   x   X
25  25   y   Y
26  26   z   Z

You could also use

data2 <- data[data$sum_points == 2500, ]

This will make a dataframe with the values where sum_points = 2500

It gives :

airfoils sum_points field_points   init_t contour_t   field_t
...
491        5       2500         5625 0.000086  0.004272  6.321774
498        5       2500         5625 0.000087  0.004507  6.325083
504        5       2500         5625 0.000088  0.004370  6.336034
603        5        250        10000 0.000072  0.000525  1.111278
577        5        250        10000 0.000104  0.000559  1.111431
587        5        250        10000 0.000072  0.000528  1.111524
606        5        250        10000 0.000079  0.000538  1.111685
....
> data2 <- data[data$sum_points == 2500, ]
> data2
airfoils sum_points field_points   init_t contour_t   field_t
108        5       2500          625 0.000082  0.004329  0.733109
106        5       2500          625 0.000102  0.004564  0.733243
117        5       2500          625 0.000087  0.004321  0.733274
112        5       2500          625 0.000081  0.004428  0.733587

subset() is also useful

subset(DATAFRAME, COLUMNNAME == "")

For a survey package, maybe the "survey" package is pertinent?

http://faculty.washington.edu/tlumley/survey/

The answer you want depends very much on how and why you want to break up the data frame.

For example, if you want to leave out some variables, you can create new data frames from specific columns of the database. The subscripts in brackets after the data frame refer to row and column numbers. Check out Spoetry for a complete description.

newdf <- mydf[,1:3]

Or, you can choose specific rows.

newdf <- mydf[1:3,]

And these subscripts can also be logical tests, such as choosing rows that contain a particular value, or factors with a desired value.

What do you want to do with the chunks left over? Do you need to perform the same operation on each chunk of the database? Then you'll want to ensure that the subsets of the data frame end up in a convenient object, such as a list, that will help you perform the same command on each chunk of the data frame.

If you want to split by values in one of the columns, you can use lapply. For instance, to split ChickWeight into a separate dataset for each chick:

data(ChickWeight)
lapply(unique(ChickWeight$Chick), function(x) ChickWeight[ChickWeight$Chick == x,])