💾 Archived View for godocs.io › github.com › temoto › robotstxt captured on 2024-06-16 at 17:10:32. Gemini links have been rewritten to link to archived content

View Raw

More Information

⬅️ Previous capture (2024-05-26)

➡️ Next capture (2024-08-19)

🚧 View Differences

-=-=-=-=-=-=-

package robotstxt - github.com/temoto/robotstxt - godocs.io

import "github.com/temoto/robotstxt"

Package robotstxt implements the robots.txt Exclusion Protocol as specified in http://www.robotstxt.org/wc/robots.html with various extensions.

Variables

var WhitespaceChars = []rune{' ', '\t', '\v'}

Types

type Group

type Group struct {
	Agent      string
	CrawlDelay time.Duration
	// contains filtered or unexported fields
}

func (*Group) Test

func (g *Group) Test(path string) bool

type ParseError

type ParseError struct {
	Errs []error
}

func (ParseError) Error

func (e ParseError) Error() string

type RobotsData

type RobotsData struct {
	Host     string
	Sitemaps []string
	// contains filtered or unexported fields
}

func FromBytes

func FromBytes(body []byte) (r *RobotsData, err error)

func FromResponse

func FromResponse(res *http.Response) (*RobotsData, error)

func FromStatusAndBytes

func FromStatusAndBytes(statusCode int, body []byte) (*RobotsData, error)

func FromStatusAndString

func FromStatusAndString(statusCode int, body string) (*RobotsData, error)

func FromString

func FromString(body string) (r *RobotsData, err error)

func (*RobotsData) FindGroup

func (r *RobotsData) FindGroup(agent string) (ret *Group)

FindGroup searches block of declarations for specified user-agent. From Google's spec: Only one group of group-member records is valid for a particular crawler. The crawler must determine the correct group of records by finding the group with the most specific user-agent that still matches. All other groups of records are ignored by the crawler. The user-agent is non-case-sensitive. The order of the groups within the robots.txt file is irrelevant.

func (*RobotsData) TestAgent

func (r *RobotsData) TestAgent(path, agent string) bool

Directories

robots.txt-check

Details

Version: v1.1.2 (latest)

Platform: linux/amd64

Imports: 15 packages

Refresh now

Back to home

Search