Estatística Descritiva

Linguagem R

Índice Nasdaq - Histograma
Índice Nasdaq - Histograma

A estatística descritiva é um ramo da estatística que tem por objetivo descrever e sumarizar um conjunto de dados, fornecendo resumos quantitativos ou visuais simples sobre as amostras obtidas e observações realizadas.

Para o conjunto de dados a ser analisado vamos usar a biblioteca quantmod para obter o índice NASDAQ Composite (^IXIC), entre 5 de setembro de 2018 e 4 de setembro de 2019, no Yahoo! Finance:

> library(quantmod)
Carregando pacotes exigidos: xts
Carregando pacotes exigidos: zoo

Attaching package:zooThe following objects are masked frompackage:base:

    as.Date, as.Date.numeric

Carregando pacotes exigidos: TTR
Version 0.4-0 included new data defaults. See ?getSymbols.
> start <- as.Date("2018-09-04")
> end <- as.Date("2020-09-03")
> getSymbols("^IXIC", src = "yahoo", from = start, to = end)getSymbolscurrently uses auto.assign=TRUE by default, but will
use auto.assign=FALSE in 0.5-0. You will still be able to useloadSymbolsto automatically load data. getOption("getSymbols.env")
and getOption("getSymbols.auto.assign") will still be checked for
alternate defaults.

This message is shown once per session and may be disabled by setting 
options("getSymbols.warning4.0"=FALSE). See ?getSymbols for details.

[1] "^IXIC"
> # Primeiras seis lihas de IXIC
> head(IXIC)
           IXIC.Open IXIC.High IXIC.Low IXIC.Close IXIC.Volume IXIC.Adjusted
2018-09-04   8087.95   8104.07  8042.14    8091.25  2229520000       8091.25
2018-09-05   8073.53   8077.84  7962.35    7995.17  2596780000       7995.17
2018-09-06   7998.27   8001.97  7885.49    7922.73  2368680000       7922.73
2018-09-07   7878.79   7962.53  7873.93    7902.54  2146380000       7902.54
2018-09-10   7939.57   7945.03  7890.39    7924.16  2041450000       7924.16
2018-09-11   7894.87   7986.32  7880.92    7972.47  2336600000       7972.47
> # Últimas seis lihas de IXIC
> tail(IXIC)
           IXIC.Open IXIC.High IXIC.Low IXIC.Close IXIC.Volume IXIC.Adjusted
2020-08-26  11516.62  11672.05 11507.46   11665.06  3441550000      11665.06
2020-08-27  11688.19  11730.01 11551.01   11625.34  3535800000      11625.34
2020-08-28  11689.28  11708.77 11634.77   11695.63  2997810000      11695.63
2020-08-31  11718.81  11829.84 11697.42   11775.46  3596980000      11775.46
2020-09-01  11850.96  11945.72 11794.78   11939.67  3480780000      11939.67
2020-09-02  12047.26  12074.06 11836.18   12056.44  3966140000      12056.44
> # Classes de IXIC
> # xts - eXtensible Time Series
> # zoo - Infrastructure for Regular and Irregular Time Series
> class(IXIC)
[1] "xts" "zoo"
> # Dimensões de IXIC
> dim(IXIC)
[1] 504   6

Como pode ser observado acima, o objeto IXIC possui 504 linhas e seis colunas (IXIC.Open, IXIC.High, IXIC.Low, IXIC.Close, IXIC.Volume e IXIC.Adjusted), indexadas pela data.

A função summary retorna um resumo estatístico do objeto passado como parâmetro. O resumo produzido é dependente da classe do objeto passado como parâmetro:

summary(objeto, …)
> # Resumo do objeto IXIC
> summary(IXIC)
     Index              IXIC.Open       IXIC.High        IXIC.Low    
 Min.   :2018-09-04   Min.   : 6258   Min.   : 6355   Min.   : 6190  
 1st Qu.:2019-03-06   1st Qu.: 7633   1st Qu.: 7693   1st Qu.: 7562  
 Median :2019-09-04   Median : 8059   Median : 8108   Median : 8008  
 Mean   :2019-09-04   Mean   : 8358   Mean   : 8419   Mean   : 8292  
 3rd Qu.:2020-03-05   3rd Qu.: 8914   3rd Qu.: 8962   3rd Qu.: 8829  
 Max.   :2020-09-02   Max.   :12047   Max.   :12074   Max.   :11836  
   IXIC.Close     IXIC.Volume        IXIC.Adjusted  
 Min.   : 6193   Min.   :1.494e+08   Min.   : 6193  
 1st Qu.: 7637   1st Qu.:2.075e+09   1st Qu.: 7637  
 Median : 8049   Median :2.366e+09   Median : 8049  
 Mean   : 8361   Mean   :2.744e+09   Mean   : 8361  
 3rd Qu.: 8946   3rd Qu.:3.449e+09   3rd Qu.: 8946  
 Max.   :12056   Max.   :7.279e+09   Max.   :12056  
> # Resumo da 6ª coluna, 'IXIC.Adjusted'
> summary(IXIC[,6])
     Index            IXIC.Adjusted  
 Min.   :2018-09-04   Min.   : 6193  
 1st Qu.:2019-03-06   1st Qu.: 7637  
 Median :2019-09-04   Median : 8049  
 Mean   :2019-09-04   Mean   : 8361  
 3rd Qu.:2020-03-05   3rd Qu.: 8946  
 Max.   :2020-09-02   Max.   :12056  
> summary(IXIC[,'IXIC.Adjusted'])
     Index            IXIC.Adjusted  
 Min.   :2018-09-04   Min.   : 6193  
 1st Qu.:2019-03-06   1st Qu.: 7637  
 Median :2019-09-04   Median : 8049  
 Mean   :2019-09-04   Mean   : 8361  
 3rd Qu.:2020-03-05   3rd Qu.: 8946  
 Max.   :2020-09-02   Max.   :12056  

A função stem produz um diagrama de ramos e folhas dos valores em x:

stem(x, scale = 1, width = 80, atom = 1e-08)
> # Diagrama de ramos e folhas
> stem(IXIC[,'IXIC.Adjusted'], scale = 1, width = 80, atom = 1000)

  The decimal point is 3 digit(s) to the right of the |

   6 | 23
   6 | 55666667788899999999
   7 | 00000000000000111111111122222222222233333333333333334444444444444444
   7 | 55555555555555555666666666666666666666677777777777777888888888888888+26
   8 | 00000000000000000000000000000000000000000000000000000111111111111111+45
   8 | 555555555555555666666666666666777777777778888999999999
   9 | 00000000000011111122222223333333333344444444
   9 | 555556666666677777788889999999
  10 | 0001112234444
  10 | 555555556667778899
  11 | 0000001112334
  11 | 567789
  12 | 1

A função hist produz um histograma dos dados (imagem destacada):

hist(x, …)
> # Histograma
> hist(IXIC[,'IXIC.Adjusted'],
+      breaks=seq(from=6000, to=12500, by=500),
+      col="#009090", border="#20B2AA",
+      main="Nasdaq - 2018-09-04 a 2020-09-02",
+      xlab="Índice ajustado", ylab="Frequência")

Referências:

Leia mais