Job Opportunities in other groups

Web Wide Crawl Engineer - Collections Department

Web Wide Crawl Engineer - Collections Department

Location - Based in SF

The Internet Archive is seeking a Web Wide Crawl Engineer. Our crawl engineering team is responsible for capturing and managing the highest quality content from the web. An ideal candidate demonstrates independence and initiative, is a problem solver, works well autonomously, and is technologically savvy. Additionally, the ideal candidate is open to being trained on best practices and standards around large-scale web harvests.

You will work with Web Collections Manager to design the strategy and implementation of a Web Harvesting Program using open source tools and platforms. Develop harvest techniques and tools to enable archival capture and re-rendering of rich media, streaming content, social media, as well as traditional web page content. Analyze Web collections to ensure the harvest of a representative sample, completeness and quality. Create tools and services as needed to improve the crawl through analyzing, reporting, importing data, identifying program requirements and defining technical, operational and data analysis requirements. Lead efforts to define deployment architecture and workflows. Develop tools for automated and human directed analysis and reporting of crawl material, monitor production systems using automated tools.

Your responsibilities include:

Running web harvests on specific topics, themes and/or domains using Heritrix, our open source Web crawler. You can find out more about the crawler at crawler.archive.org

Trouble shooting and running interference during the crawl to ensure its' on time and successful completion.

Analysis and QA of content collected to ensure it is complete and of highest quality

Development of tools for automated analysis and reporting of crawl material

Contribute to the development of the open source crawler and related access/analysis tools

Demonstrated experience of delivering on comitments with deadlines and project time lines. Experience Needed:

Solid experience in Internet protocols (HTTP is must.) Strong knowledge of HTML, JavaScript and Web technologies in general

Experience coding with Java

Experience with open source technology and/or Heritirx

Knowledge of basic Linux system administration

Knowledge of basic building and deploying web applications

Ability to work in, and enjoy, a loosely structured work environment

Flexibility and a sense of humor Requirements: Bachelor's Degree in Computer Science or a related field, five years of progressively responsible experience in software development. Find out more about our organization and web archiving at www.archive.org as well as our tools and services at http://wa.archive.org/

Education: Computer Science, Math BS/BA or equivalent work experience We are an equal opportunity employer. Please send your resume and cover letter to jobs at archive dot org with the subject line "Crawl Engineer". The Archive thanks all applicants for their interest, but advises that only those selected for an interview will be contacted. No phone calls please.

Open Library Engineer

Open Library Engineer

Open Library is seeking an experienced Python developer to join our small, experienced team. We're working towards providing a page on the web for every book ever written, and we need your help. Open Library is open, editable and freely available. We want to enhance the way data moves in and out of Open Library by building features that make it simple for people to contribute records to the library as well as extracting them. We want to connect our records to as many online resources as possible, to be the locus for information about books online.

You will be responsible for core application development (running a system called Infogami) as well as development of new website features. You will review and enhance the Open Library's current API offering, as well as looking out on to the broader web to find and develop useful API integrations back into Open Library.

Must haves:

Software engineering experience, 3-5 years

Mad Python skillz

Applied use of PostgreSQL, Ubuntu/Linux, Javascript/AJAX

Demonstrable working code online

Experience with triplestore database architecture; RDF/XML formats

Experience with open-source development projects and practice

Ability to and enjoyment of working under your own supervision towards a shared outcome

Excellent communication skills, both written and verbal

Desirable:

Wikipedia hacks

Demonstrable, creative API integration projects, preferably with mashes from more than one system

A presence in the Python community

An interest in excellent user interface design

Experience working with SOLR/Lucene

Experience with data processing (we have millions of records)!

Interest in data visualisation

Located in San Francisco

We're working towards big goals at Open Library, and at its parent organization, the non-profit Internet Archive. The online presence of books is a very interesting space at the moment, ripe for an innovative outlook and wide integration with all sorts of other systems. If you enjoy breaking new ground, iterative development and huge datasets, please let us know!

About the Internet Archive The Internet Archive is a non-profit digital library committed to preserving the world's digital cultural artifacts. Used by over 6 million people, this resource is becoming part of how the Internet works. Our job is to put the best humanity has to offer within reach of students, educators and the general public. Find out more about our organization and web archive at www.archive.org.

The Internet Archive is an equal opportunity employer. We provide medical and dental benefits. Please send your resume and cover letter to jobs at archive dot org with the subject line "Open Library Engineer". The Internet Archive thanks all applicants for their interest, but advises that only those selected for an interview will be contacted. No phone calls please.

Engineer Petabox Team

Engineer Petabox Team

Do you want to work for an engineering-focused organization that's dedicated to providing universal access to all knowledge?

Does working on large-scale projects (petabytes of storage, massive bandwidth, millions of media items) excite you? Are you interested in working with smart people, solving interesting problems and benefiting humanity?

The Internet Archive (archive.org) is looking for an exceptional Engineer to support and extend the main archival system as well as associated projects such as nasaimages.org and openlibrary.org.

We are seeking a talented Engineer with the following qualifications and experience:

Solid Linux skills. Preferably someone who was responsible for maintaining a high-traffic Linux-based Web server.

Ability to write and analyze software, and is language agnostic.

Fluent in a modern programming language (C, C++, Java, Python, PHP, etc.).

Understands Unix automation at scale. Can build good automated systems.

Has software engineering database experience (MySQL or Postgres).

Prior Open Source software engineering experience.

Possesses some experience with large scale web server infrastructure.

Comfortable in multiple software environments.

Communicates clearly and codes collaboratively.

Has a B.S in Math or Computer Science.

Candidate will be asked to take a short quiz to assess their qualifications.

The Internet Archive is an equal opportunity employer. We provide medical and dental benefits. Please send your resume and cover letter to jobs at archive dot org with the subject line "Engineer-Petabox Team". The Internet Archive thanks all applicants for their interest, but advises that only those selected for an interview will be contacted. No phone calls please.

A/V Collections Engineer

A/V Collections Engineer

Internet Archive is the largest digital library in the world containing several million media items. We are a non-profit dedicated to gathering and preserving cultural materials and making them available to well over 1 million users every day around the world.

"Collections" at the archive refers to audio, video and text collections. Some of these items are submitted by individual users, and others come from institutions or private collectors. You might be familiar with the feature films, live music recordings, or retro ephemeral films available on our site, among many other collections. We work with interesting content every day, and we get to spend our time helping humanity.

The Collections Engineer will assist staff and collection owners with pulling more items into our collections and helping users with access issues. S/he should be familiar with doing small crawls to gather information, parsing RSS feeds, text encoding issues, and writing scripts to use web services like Amazon's S3.

2-5 years software engineering experience
Experience working with video, audio, text and image files
Communicate clearly and code collaboratively
Be willing to use the right language for the job (but Python and PHP are useful here)
Familiarity with metadata standards would be helpful

We are an equal opportunity employer. Please send your resume and cover letter to jobs at archive dot org with the subject line "A/V Collections Engineer". The Archive thanks all applicants for their interest, but advises that only those selected for an interview will be contacted. No phone calls please.

Digital Archive Engineer

The Internet Archive is building a new, hosted Digital Archiving service to be launched in 2011 that will provide storage, maintenance and access for the digital collections of memory institutions. The Digital Archive Engineer will be a key player on a small team of people building this service from the ground up.

This new service will interact with several existing projects at the Internet Archive, requiring the ability to understand and integrate technologies written in several different programming languages. The ideal candidate must therefore be resourceful, flexible and enjoy working on a highly collaborative but loosely structured team.

You will help us define the parameters of the service, figure out how to ingest media files from partner institutions and other storage systems and build methods for patrons to access the materials. Your work will range from back end to middleware right on up to the user interface but will initially focus on design and development of the Digital Archive API.

Must Have:

A strong commitment to the goals and values of the Internet Archive mission

Several years of recent, hands-on software development experience

Experience both designing and implementing APIs for both public and internal use

Be language agnostic (use the right tool for the job)

Must be able to communicate clearly and effectively with coworkers and partners of all technical levels

Knowledge of basic Linux system administration

Strongly Desired:

Strong knowledge of web development technologies including HTML, CSS, etc.

Fluency in at least one scripting language like shell or Perl

Ability to work in--and enjoy--a loosely structured work environment

Experience with one or more common web development programming language (PHP, Python, Ruby, etc.)

Very Helpful:

A strong appreciation for and at least basic understanding of UI and UX design

Experience with JavaScript, including but not limited to one or more common JavaScript frameworks such as JQuery or Sencha/extJS

Experience building user-facing media or information portals

Software development experience in Java (server and UI)

Knowledge of deployment of web applications and services (hands-on experience strongly preferred) Experience working with libraries or other metadata-heavy institutions or processes

This position is based in San Francisco, CA. We cannot consider telecommuters at this time.

Applicants must be able to work in the United States. We are unable to sponsor work visas at this time.

We are an equal opportunity employer. Please send your resume and cover letter to vicky at archive dot org with the subject line "Digital Archive Engineer". The Archive thanks all applicants for their interest, but advises that only those selected for an interview will be contacted. No phone calls please.

Internet Archive is a 501(c)(3) non-profit that was founded to build an Internet library. Its purposes include offering permanent access for researchers, historians, scholars, people with disabilities, and the general public to historical collections that exist in digital format. Now the Internet Archive includes texts, audio, moving images, and software as well as archived web pages in our collections, and provides specialized services for adaptive reading and information access for the blind and other persons with disabilities. Internet Archive is the home of services such as OpenLibrary, the Wayback Machine and Archive-It along with several other open projects.

Volunteer Scanning Positions

Volunteer Scanning Positions

The Internet Archive is offering really exciting opportunities at the scanning centers. We are looking for volunteers at the Indiana, Toronto, and Princeton scanning centers.

Help us digitize library books to go on-line to be seen by millions of people for years to come! We need your help! We are trying to get your public library books up online and need some volunteers to help our regular non-profit staff. If you can give us some of your time, we can give you and chance to help bring digital knowledge to others both near and far! Come join us!

Position Summary: Internet Archive is a non-profit organization working with 80+world-class universities and libraries to create the world's largest digital open-source library.

We are looking for people who are patient, conscientious and detail oriented to work on this exciting project digitizing books. Basic knowledge of computers, digital files and digital cameras helpful. Pleasant, low-stress work environment. A love of books is a plus.

We are seeking volunteers who can operate a Scribe scanning machine that takes digital photos of books from various collections and puts them online for universal access.

Gain experience in the following fields of endeavor: building an open source digital library, digital photography, and digital scanning software, preservation, presentation and production of digital books, digitizing special collection books from different centuries, understanding copyrights and public domain materials.

Commitment: Assistance is needed Monday through Friday from 8:00am to 5:00pm. Position involves a commitment of a minimum 3 hours and up, at least one day a week, as well as a minimum commitment length of one month. For those interested in bolstering their credentials, we offer a four-hour, one day a week, six-week internship. Applicable fields of endeavor include digital photography, digital media, Non-Profit, Library Science, Computer Science. We will train you on Scribe 2 software.

Interrelations:

The volunteer Scanner will interact with the Coordinator, the professional scanning staff and other volunteers.

Physical/Special Requirements:

Must have reasonable computer skills; e.g. can navigate desktop and computer programs.

Must be able to work independently.

Must be able to lift heavy books.

Must be able to sit for 1-2 hour periods and are comfortable with repetitive motion.

Must be able to gently handle special collection books.

Must have the desire to contribute to the world’s largest digital open source library.

THIS IS A NON-PAYING VOLUNTEER POSITION

If interested please send a resume and cover letter expressing your interest to the following:

For Indiana, jeffs - at - archive.org,

For Princeton, stacy - at - archive.org

For Toronto, gabe - at - archive.org

The Internet Archive is a non-profit organization seeking to provide universal access to all knowledge. We are working with world-class universities and libraries to create the world's largest digital public library. The collections in our online digital library also include audio, video, web sites, and software.

Internet Archive works together with organizations like Creative Commons and EFF to preserve and expand the public domain, the open-source movement, and the commons in general. You will be a part of a multi-national effort that is presently in 5 countries and involves over 4,500 libraries and institutions.

Please read more about Internet Archive here: http://www.archive.org/about/about.php http://openlibrary.org/about